| name | update-codeql-query-dataflow-java |
|---|---|
| description | Update CodeQL queries for Java and Kotlin from legacy v1 dataflow API to modern v2 shared dataflow API. Use this skill when migrating Java/Kotlin queries to use DataFlow::ConfigSig modules, ensuring query results remain equivalent through TDD. |
This skill guides you through migrating Java and Kotlin CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query results remain equivalent.
- Migrating Java/Kotlin queries from
DataFlow::ConfigurationtoDataFlow::ConfigSigmodules - Ensuring query result equivalence during dataflow API migration
- Existing Java/Kotlin query with v1 dataflow API and unit tests
- Access to CodeQL Development MCP Server tools
v1 (Legacy):
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) { ... }
override predicate isSink(DataFlow::Node sink) { ... }
override predicate isSanitizer(DataFlow::Node node) { ... }
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}v2 (Modern):
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node sink) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
module MyFlow = TaintTracking::Global<MyConfig>;| v1 API | v2 API | Purpose |
|---|---|---|
DataFlow::Configuration |
DataFlow::ConfigSig |
Configuration signature |
isSanitizer |
isBarrier |
Stop data flow propagation |
isAdditionalTaintStep |
isAdditionalFlowStep |
Custom flow steps |
this.hasFlow(source, sink) |
MyFlow::flow(source, sink) |
Query flow paths |
ExprNode: AST expressions (method calls, field access)ParameterNode: Method parametersRemoteFlowSource: User-controllable input sourcesInstanceParameterNode: Implicitthisparameter
Critical: Before any code changes, capture current query behavior.
Use codeql_test_run to establish baseline:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Save the output - this is your reference for query result equivalence.
Create a reference file with current results:
cp <query-pack>/test/{QueryName}/{QueryName}.expected \
<query-pack>/test/{QueryName}/{QueryName}.expected.v1-baselineThis ensures you can verify equivalence after migration.
Review the query for v1 API usage:
class X extends DataFlow::Configurationorclass X extends TaintTracking::ConfigurationisSanitizerpredicatesisAdditionalTaintSteppredicatesthis.hasFlow(source, sink)orthis.hasFlowPath(source, sink)queries
Review dataflow constructs: sources (RemoteFlowSource, servlet/Spring params), sinks (SQL/command execution), barriers (sanitization), custom flow (collections, streams, lambdas).
Before:
class SqlInjectionConfig extends TaintTracking::Configuration {
SqlInjectionConfig() { this = "SqlInjectionConfig" }
override predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
override predicate isSink(DataFlow::Node sink) {
exists(MethodCall mc |
mc.getMethod().hasName("executeQuery") and
sink.asExpr() = mc.getAnArgument()
)
}
override predicate isSanitizer(DataFlow::Node node) {
node.asExpr().(MethodCall).getMethod().hasName("sanitize")
}
}
from SqlInjectionConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "SQL injection from $@", source.getNode(), "user input"After:
module SqlInjectionConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
predicate isSink(DataFlow::Node sink) {
exists(MethodCall mc |
mc.getMethod().hasName("executeQuery") and
sink.asExpr() = mc.getAnArgument()
)
}
predicate isBarrier(DataFlow::Node node) {
node.asExpr().(MethodCall).getMethod().hasName("sanitize")
}
}
module SqlInjectionFlow = TaintTracking::Global<SqlInjectionConfig>;
from SqlInjectionFlow::PathNode source, SqlInjectionFlow::PathNode sink
where SqlInjectionFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "SQL injection from $@", source.getNode(), "user input"isSanitizer→isBarrier: Change method name only, logic unchangedisAdditionalTaintStep→isAdditionalFlowStep: Change method name only
Replace cfg.hasFlow(source, sink) with MyFlow::flow(source, sink):
- Remove configuration variable from
fromclause - Use module flow predicate directly
- For path queries, use
MyFlow::PathNodeandMyFlow::flowPath(source, sink)
RemoteFlowSource works identically in v1 and v2:
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource or
// Spring framework
exists(Parameter p | p.getAnAnnotation().getType().hasQualifiedName("org.springframework.web.bind.annotation", "RequestParam") |
source.asParameter() = p
) or
// Servlet API
exists(MethodCall mc | mc.getMethod().hasQualifiedName("javax.servlet.http", "HttpServletRequest", "getParameter") |
source.asExpr() = mc
)
}Track flows through collections and streams:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Flow through collection add/get
exists(MethodCall add, MethodCall get |
add.getMethod().hasName("add") and
get.getMethod().hasName("get") and
add.getQualifier() = get.getQualifier() and
n1.asExpr() = add.getAnArgument() and
n2.asExpr() = get
) or
// Flow through Stream operations
exists(MethodCall stream |
stream.getMethod().hasName(["map", "flatMap", "filter"]) and
n1.asExpr() = stream.getQualifier() and
n2.asExpr() = stream
)
}For functional programming patterns:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Flow through lambda parameters to body
exists(LambdaExpr lambda |
n1.asParameter() = lambda.getAParameter() and
DataFlow::localFlow(n1, n2) and
n2.asExpr().getEnclosingCallable() = lambda
) or
// Flow through method references
exists(MemberRefExpr ref |
n1.asExpr() = ref.getQualifier() and
n2.asExpr() = ref
)
}Handle Java type casts and Kotlin smart casts:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Flow through type casts
exists(CastExpr cast |
n1.asExpr() = cast.getExpr() and
n2.asExpr() = cast
) or
// Kotlin NotNullExpr
exists(NotNullExpr notNull |
n1.asExpr() = notNull.getExpr() and
n2.asExpr() = notNull
)
}Java/Kotlin have implicit conversions to track:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// String concatenation
exists(AddExpr concat |
concat.getType() instanceof TypeString and
n1.asExpr() = concat.getAnOperand() and
n2.asExpr() = concat
) or
// Boxing/unboxing
exists(Expr e |
(e instanceof BoxExpr or e instanceof UnboxExpr) and
n1.asExpr() = e.(ConversionExpr).getExpr() and
n2.asExpr() = e
)
}Use codeql_query_compile to check for errors:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}Fix any compilation errors before testing.
Use codeql_test_run on migrated query:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Critical: Results MUST match baseline from Phase 1.
Compare results line-by-line:
diff <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline \
<query-pack>/test/{QueryName}/{QueryName}.expectedSuccess: Empty diff (identical results) Failure: Any differences require investigation and fixes
Add tests for lambdas, streams, generics, Kotlin features, frameworks. For each: add test code, update .expected, re-extract with codeql_test_extract, run tests.
Run query on realistic database. If performance degrades: cache expensive predicates, use local flow where possible, limit scope.
Ensure query metadata reflects v2 API usage:
/**
* @name SQL Injection
* @description Executing SQL queries with untrusted user input
* @kind path-problem
* @id java/sql-injection
* @tags security external/cwe/cwe-089
* @precision high
*/
import java
import semmle.code.java.dataflow.TaintTracking
import DataFlow::PathGraphRemove v1 baseline files, add migration notes if needed, format with codeql_query_format.
// Spring MVC request parameters
predicate isSource(DataFlow::Node source) {
exists(Parameter p |
p.getAnAnnotation().getType().hasQualifiedName("org.springframework.web.bind.annotation", ["RequestParam", "PathVariable", "RequestBody"]) and
source.asParameter() = p
)
}// Servlet request sources
predicate isSource(DataFlow::Node source) {
exists(MethodCall mc |
mc.getMethod().getDeclaringType().hasQualifiedName("jakarta.servlet.http", "HttpServletRequest") and
mc.getMethod().hasName(["getParameter", "getHeader", "getCookie"]) and
source.asExpr() = mc
)
}// Kotlin when expression flow
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(WhenExpr when, WhenBranch branch |
when.getBranch(_) = branch and
n1.asExpr() = when.getExpr() and
n2.asExpr() = branch.getResult()
)
}
// Kotlin extension function receiver flow
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
exists(ExtensionMethod ext, MethodCall call |
call.getMethod() = ext and
n1.asExpr() = call.getQualifier() and
n2.(InstanceParameterNode).getCallable() = ext
)
}// Sink: Dynamic JPQL/HQL query
predicate isSink(DataFlow::Node sink) {
exists(MethodCall mc |
mc.getMethod().getDeclaringType().hasQualifiedName("javax.persistence", "EntityManager") and
mc.getMethod().hasName("createQuery") and
sink.asExpr() = mc.getArgument(0)
)
}codeql_test_run: Run tests and compare with expected resultscodeql_test_extract: Extract test databases from Java/Kotlin source codecodeql_query_compile: Compile queries and check for errorscodeql_query_run: Run queries for analysiscodeql_bqrs_decode: Decode binary query resultscodeql_query_format: Format query files for consistencycodeql_pack_install: Install query pack dependencies
❌ Don't:
- Skip baseline test establishment before migration
- Change query logic alongside API migration (separate concerns)
- Accept test results without verifying equivalence
- Remove v1 baseline until migration is confirmed successful
- Ignore performance regressions
- Forget to update imports if needed
- Overlook Kotlin-specific patterns when migrating mixed Java/Kotlin queries
✅ Do:
- Establish test baseline BEFORE any changes
- Make purely mechanical API changes first
- Verify exact result equivalence after migration
- Keep v1 baseline for comparison during migration
- Test edge cases specific to Java (generics, lambdas, streams) and Kotlin (when, extensions, null safety)
- Document any intentional behavior changes separately
If results differ after migration:
- Check node type conversions: Ensure
asExpr(),asParameter()usage is correct - Verify predicate renames: Confirm
isBarriervsisSanitizerlogic is identical - Review flow predicates: Check
isAdditionalFlowStepmirrorsisAdditionalTaintStep - Inspect missing results: Use
MyFlow::flow(source, sink)for debugging partial flows - Debug with PrintAST: Use
codeql_query_runto understand AST structure
- New dataflow API for writing custom CodeQL queries - Official v2 API announcement
- Analyzing data flow in Java and Kotlin - Java/Kotlin dataflow guide
- CodeQL Java Library Reference - Standard library documentation
- Create CodeQL Query TDD Generic - TDD workflow for queries
- Create CodeQL Query Unit Test for Java - Java unit testing guide
- QSpec Reference for Java - Java-specific QSpec patterns
- Java Query Development Prompts - Java query guidance
Your dataflow migration is successful when:
- ✅ Test baseline established before migration
- ✅ Query compiles without errors using v2 API
- ✅ All configuration classes converted to modules
- ✅ All
isSanitizerrenamed toisBarrier - ✅ All
isAdditionalTaintSteprenamed toisAdditionalFlowStep - ✅ All
cfg.hasFlow()calls replaced with module flow predicates - ✅ Test results EXACTLY match v1 baseline (zero diff)
- ✅ No performance regressions
- ✅ Query metadata updated appropriately
- ✅ Java/Kotlin-specific patterns (lambdas, streams, Kotlin features) handled correctly