Tools like Copilot and ChatGPT can increase developer productivity. That they work at all is amazing—and they do—but companies need to be mindful about the quality of code they produce.
General concerns about AI-generated code are the same as those for human-written code: licensing, quality, and application security. Those concerns are relatively higher given that Generative AI (GenAI) is still just “learning.” Regarding open source license compliance specifically, because GenAI can copy/paste small chunks of third-party code, the need to identify snippets, not just libraries pulled in by the build system, is greater.
Companies may also want to track the code that comes from GenAI. Development managers may want to understand adoption in their organizations, and private companies should anticipate being asked about it when they are acquired.
The same risk management approaches that apply to human-written code are suited to AI-generated code. Similarly, acquirers in merger and acquisition (M&A) transactions will ask the same questions regardless of who or what wrote the code. GenAI writes code faster, which puts scaling pressure on tools and approaches.
Big, commercial GenAI tools recognize the legal concern and have started to offer terms that provide some protection for users. However, they are not without caveats and carveouts, so while these terms relieve some of the risk, companies must still pay heed.
There are two schools of thought on tagging AI-generated code. Some lawyers feel there could be legal advantage to not tracking in anticipation of knowledge qualifiers in future agreements. An in-between approach is to ban the use of AI-generated code in software’s most critical areas, so the company can represent that they own the copyright to what matters.
Software tool vendors rightly claim that their solutions help manage risk in AI-generated code, meaning they help in the same way they do for human-written code. Tools like Black Duck® SCA can identify open source snippets, which is critical for full license compliance. For M&A due diligence Black Duck audits dig into all areas of code risk at an unmatched level of completeness and accuracy. Such capabilities have become extra important with AI.
These approaches identify flaws code without regard to who or what wrote. Identifying how code was written is a different problem, and one for which there is currently no reliable solution in the marketplace. A legitimate question is how important it really is to detect AI-generated code in light of the difficulties in doing so accurately.
Because code is highly structured and limited in grammar and vocabulary, merely looking at it provides few clues about how it was written and by whom or what (as compared to, say, written text where style may help distinguish). Black Duck experts are skeptical that there will ever be a way to accurately determine the code’s origin solely on the basis of inspection (automated or human). For example, we know that sometimes GenAI copies human-written code verbatim; this would be impossible to classify.
If, rather than relying solely on code inspection after the fact, the development process was instrumented for the purpose of distinguishing between human- and AI-generated code, guesses based on style could be triangulated with behavioral data pulled from the software development life cycle (SDLC) to increase resolution. Say a developer checked in some code that seemed to deviate from their typical coding style, and additionally their productivity that day was unusually high, maybe a tool could infer that they’d started playing around with Copilot that day. Or maybe the developer just copied and pasted a bunch of open source. It’s still inherently difficult to tell.
How about enlisting developers? At the risk of over-generalizing, developers recoil from any overhead that would slow their development process. But if development tools could make it easy for them to tag AI-generated code, and (importantly) if they are educated on and bought into the need, this could be viable. Humans can make mistakes or lie, so this approach doesn’t ensure 100% accuracy, but it may be more accurate than inspection and inference ever can ever be.
Black Duck is already helping customers protect their SDLC while they adopt Generative AI and will continue to be on cutting edge of AI Supply chain management. So far, we have found customers putting the priority on identifying the general software risks, regardless of how the code was written, which our tools handle beautifully today. In addition, our Innovation Lab is working on novel solutions to enable organizations identify, track, and manage AI-generated code, just as we have for open source code for nearly two decades.
GenAI increases developer productivity and will be an important tool in development into the future. At least today, AI-generated code needs to be scrutinized like any code written by inexperienced, unproven humans. And acquirers in M&A transactions need to account for this. Tools and techniques exist to vet code, and those provided by Black Duck are particularly well-suited.
Separately, there are reasons, beyond general software risk, why companies may want to be able distinguish code written by GenAI from code written by humans. Instrumenting the SDLC is likely to be the only viable way to accurately assess this. That is, trying to identify AI-written code after the fact with any accuracy may be just a pipe dream.
- This blog post was reviewed by Mike McGuire.