The Synopsys Software Integrity Group is now Black Duck®. Learn More

close search bar

Sorry, not available in this language yet

close language selection

Analyze AI-Generated Code with the Black Duck Snippet API

Mike McGuire

Feb 03, 2025 / 4 min read

Open source is free in the monetary sense, but consumers are not free to use it however they please. Open source software is intellectual property just as much as a novel, a painting, or an essay is. It’s up to the owner of the project to specify how their property can be used, and it’s up to the consumer, or the licensee, to adhere to those requirements. Software composition analysis (SCA) tools can help open source users stay in compliance.

Basic SCA tools can identify open source code in an application and the licenses it’s made available under. Some tools can go farther by identifying deep or embedded license data and offering simplified breakdowns of specific license terms and restrictions. Black Duck SCA can do all that, as well as identify snippets of open source that were introduced into applications by developers, since even this small amount of code still carries license obligations.

Most Black Duck customers choose to run this scan during nightly builds, or even later in the life cycle, right before a new version is released; it is often treated as more of an “audit” than an automated scan. Since most development teams have standardized practices in place that specify which projects may be used and how to include them, this approach serves as a final compliance scan before shipping. But with the introduction of AI coding assistants, there is a brand new use case.


AI and open source license compliance concerns

The large language models (LLMs) that power AI coding assistants like GitHub Copilot and ChatGPT are trained on open source projects. Although these LLMs were created to generate original source code, there are times when their output is similar enough to open source projects that it can be matched back to their source by Black Duck snippet analysis technology. Naturally, this creates a significant license compliance concern for the consumers of these tools, especially those distributing the software they create.

Black Duck snippet analysis can mitigate these concerns, but the widespread use of AI coding assistants means that open source snippets will be used in source code at a scale never seen before. Conducting a snippet analysis on entire projects right before release may be too late, and scanning after every code commit is not feasible. For this reason, we’ve introduced the Black Duck Open Source Snippet API.

Open source license compliance with Black Duck Open Source Snippet API

You’ve most likely heard of shifting security left; Black Duck is now helping teams shift compliance left. This API gives teams the ability to access Black Duck snippet analysis without needing to scan entire projects, or even entire files. Instead, teams can analyze snippets of open source code as they’re being introduced to a project. In the context of AI code generation, this means that teams can test the blocks of code provided by their commercial or in-house LLM as they’re created and before they’re merged into a main or release branch. And since most code created by AI as the result of a single prompt is usually in the range of 20 to 50 lines, this analysis can be completed in around two seconds. This makes it possible to move a task that used to be done as late as possible in the software developerment life cycle (SDLC) into the coding/prebuild phase, without bogging down developers or hindering velocity.

How Black Duck Open Source Snippet API is used

Since it’s a simple API endpoint, development and compliance teams can build this snippet analysis into any part of the development life cycle. Usually, the most valuable workflow for teams is building the analysis into a source code management (SCM) tool like GitHub. Using GitHub Actions, teams can trigger a variety of events when a pull request is submitted, including calling the snippet API. This will automatically detect the new source code included in the pull request, send it to Black Duck via API, and return the results of the analysis in the form of a pull request comment.

Some organizations—mainly those building their own LLMs—will want to move this analysis even further left. Organizations building their own internal LLM are doing so to have complete control over the software it generates. Some of those LLMs will still be trained with proprietary code, but many will be trained using open source projects. Regardless, it’s important to know that 96% of commercial applications contain open source code. The snippet API can help organizations quickly analyze all code generated as a result of any prompt, and use the results to further train their LLMs to avoid generating license-protected code. Moving snippet analysis further left will intercept any snippet generated before it reaches the IDE and analyze it for license issues.

How Black Duck Open Source Snippet API works

We offer our customers flexibility in where they would like the Black Duck engine to be deployed. Customers can deploy a Black Duck instance on premises, or opt for the hosted option and let us handle the infrastructure and maintenance.

Regardless of how customers deploy Black Duck, the API will work the same way. The only data that needs to be sent to the API is the raw source code to be analyzed. The source will be hashed and compared against the Black Duck KnowledgeBase™, which contains petabytes of open source code and metadata. If there are any matches, Black Duck will identify the license associated with the source project. The results will be sent back via JSON file, including matching component and version, license name, license type, and location of the matched snippet in the source file. If using the GitHub workflow, this information is provided as a pull request comment, as well as in SARIF format so it can be added to the scanning dashboard along with other AppSec test results.

Black Duck Snippet API results

Getting started

We’re excited to offer this Black Duck Open Source Snippet API to any team, regardless of whether they currently use Black Duck SCA, or plan to use it in the future. This offering can also fill any license compliance gaps left by SCA tools from other vendors, since it can be leveraged alongside them. Either way, getting started is as simple as obtaining a license to use the endpoint.

Click the link below if you’re interested in learning more about the snippet analysis API, would like to see a demo, or are curious about pricing.

Continue Reading

Explore Topics