| name | update-codeql-query-dataflow-javascript |
|---|---|
| description | Update CodeQL queries for JavaScript and TypeScript from legacy v1 dataflow API to modern v2 shared dataflow API. Use this skill when migrating JavaScript/TypeScript queries to use DataFlow::ConfigSig modules, ensuring query results remain equivalent through TDD. |
This skill guides you through migrating JavaScript/TypeScript CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query results remain equivalent.
- Migrating JavaScript/TypeScript queries from
DataFlow::ConfigurationtoDataFlow::ConfigSigmodules - Updating queries to use the shared dataflow library (v2 API)
- Ensuring query result equivalence during dataflow API migration
Migrated queries must produce exact same results as original queries. Result changes cause alert flapping, trust issues, and CI/CD disruption.
JavaScript-specific behavioral changes in v2:
- Taint steps propagate all flow states (not just
taintlabel) - Jump steps across function boundaries behave differently
- Barriers block all flows (even when tracked value is inside content)
Use TDD with comprehensive unit tests to ensure equivalence.
- Existing JavaScript/TypeScript query with v1 dataflow API and unit tests
- Access to CodeQL Development MCP Server tools
v1 (Legacy):
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source, FlowLabel label) { ... }
override predicate isSink(DataFlow::Node sink, FlowLabel label) { ... }
override predicate isSanitizer(DataFlow::Node node) { ... }
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}v2 (Modern):
module MyConfig implements DataFlow::StateConfigSig {
class FlowState = string;
predicate isSource(DataFlow::Node source, FlowState state) { ... }
predicate isSink(DataFlow::Node sink, FlowState state) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
predicate isAdditionalFlowStep(DataFlow::Node n1, FlowState state1, DataFlow::Node n2, FlowState state2) { ... }
}
module MyFlow = TaintTracking::GlobalWithState<MyConfig>;| v1 API | v2 API |
|---|---|
DataFlow::Configuration |
DataFlow::ConfigSig |
FlowLabel |
FlowState |
isSanitizer |
isBarrier |
isAdditionalTaintStep |
isAdditionalFlowStep |
this.hasFlow(source, sink) |
MyFlow::flow(source, sink) |
isSanitizerGuard |
isBarrierGuard |
Critical: Before any code changes, capture current query behavior.
Use codeql_test_run to establish baseline:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Save the output - this is your reference for query result equivalence.
Create a reference file with current results:
cp <query-pack>/test/{QueryName}/{QueryName}.expected \
<query-pack>/test/{QueryName}/{QueryName}.expected.v1-baselineThis ensures you can verify equivalence after migration.
Review for v1 API: DataFlow::Configuration classes, FlowLabel, isSanitizer, isAdditionalTaintStep, isSanitizerGuard, this.hasFlow().
Identify JavaScript constructs: HTTP/DOM sources, eval/innerHTML sinks, flow labels, Promise/async flows, prototype chains.
Before (v1 with FlowLabel):
class XssConfig extends TaintTracking::Configuration {
XssConfig() { this = "XssConfig" }
override predicate isSource(DataFlow::Node source, FlowLabel label) {
label = "taint" and
source instanceof RemoteFlowSource
}
override predicate isSink(DataFlow::Node sink, FlowLabel label) {
label = "taint" and
exists(DOM::DomMethodCallExpr call |
call.getMethodName() = "write" and
sink = call.getAnArgument()
)
}
override predicate isSanitizer(DataFlow::Node node) {
node = any(SanitizationCall c).getResult()
}
}
from XssConfig cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "XSS from $@", source.getNode(), "user input"After (v2 with FlowState):
module XssConfig implements DataFlow::StateConfigSig {
class FlowState = string;
predicate isSource(DataFlow::Node source, FlowState state) {
state = "taint" and
source instanceof RemoteFlowSource
}
predicate isSink(DataFlow::Node sink, FlowState state) {
state = "taint" and
exists(DOM::DomMethodCallExpr call |
call.getMethodName() = "write" and
sink = call.getAnArgument()
)
}
predicate isBarrier(DataFlow::Node node) {
node = any(SanitizationCall c).getResult()
}
}
module XssFlow = TaintTracking::GlobalWithState<XssConfig>;
from XssFlow::PathNode source, XssFlow::PathNode sink
where XssFlow::flowPath(source, sink)
select sink.getNode(), source, sink, "XSS from $@", source.getNode(), "user input"If the query doesn't use flow labels, use simpler DataFlow::ConfigSig:
module SimpleConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node sink) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
}
module SimpleFlow = TaintTracking::Global<SimpleConfig>;isSanitizer→isBarrier(logic unchanged)isAdditionalTaintStep→isAdditionalFlowStep(logic unchanged)isSanitizerGuard→isBarrierGuard(update signature)
v1:
override predicate isSanitizerGuard(TaintTracking::SanitizerGuardNode guard) {
guard instanceof WhitelistGuard
}v2:
predicate isBarrierGuard(DataFlow::BarrierGuard guard) {
guard instanceof WhitelistGuard
}Use DataFlow::MakeBarrierGuard to create barrier guards:
class WhitelistGuard extends DataFlow::BarrierGuard {
WhitelistGuard() {
this = DataFlow::MakeBarrierGuard::equalityTest(_, _, _, true)
}
override predicate checks(Expr e, boolean branch) {
// Define guard logic
}
}Track flows through asynchronous operations:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Promise.then() flow
exists(DataFlow::MethodCallNode then |
then.getMethodName() = "then" and
n1 = then.getReceiver() and
n2 = then.getCallback(0).getParameter(0)
)
or
// async/await implicit flow
exists(AwaitExpr await |
n1.asExpr() = await.getOperand() and
n2.asExpr() = await
)
}Track flows through DOM operations:
predicate isSource(DataFlow::Node source) {
// URL parameters
source = DOM::locationSource("search").getALocalSource() or
source = DOM::locationSource("hash").getALocalSource() or
// localStorage/sessionStorage
exists(DataFlow::CallNode storage |
storage = DataFlow::globalVarRef(["localStorage", "sessionStorage"]).getAMethodCall("getItem") and
source = storage
)
}
predicate isSink(DataFlow::Node sink) {
// innerHTML assignment
exists(DataFlow::PropWrite write |
write.getPropertyName() = "innerHTML" and
sink = write.getRhs()
)
or
// document.write
exists(DataFlow::CallNode write |
write = DataFlow::globalVarRef("document").getAMethodCall("write") and
sink = write.getAnArgument()
)
}Express.js Sources:
predicate isSource(DataFlow::Node source) {
exists(Express::RouteHandler handler |
source = handler.getARequestExpr(["params", "query", "body"])
)
}React Sinks:
predicate isSink(DataFlow::Node sink) {
exists(DataFlow::PropWrite write |
write.getPropertyName() = "dangerouslySetInnerHTML" and
sink = write.getRhs().getAPropertyWrite("__html").getRhs()
)
}Track flows through prototype modifications:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// Object.assign flow
exists(DataFlow::CallNode assign |
assign = DataFlow::globalVarRef("Object").getAMethodCall("assign") and
n1 = assign.getAnArgument() and
n2 = assign
)
or
// Spread operator flow
exists(SpreadElement spread |
n1.asExpr() = spread.getOperand() and
n2.asExpr() = spread.getParent()
)
}Track flows through module boundaries:
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) {
// CommonJS require/exports
exists(Module m |
n1 = m.getAnExportedValue(_) and
n2 = m.getAnImportedValue(_)
)
or
// ES6 import/export
exists(ImportDeclaration imp |
n1 = imp.getImportedModule().getAnExportedValue(_) and
n2 = imp.getASpecifier().getLocal().getAnAssignedValue()
)
}Use codeql_query_compile to check for errors:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}Fix any compilation errors before testing.
Use codeql_test_run on migrated query:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Critical: Results MUST match baseline from Phase 1.
Compare results line-by-line:
diff <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline \
<query-pack>/test/{QueryName}/{QueryName}.expectedSuccess: Empty diff (identical results) Failure: Any differences require investigation
Taint propagation: v2 propagates all flow states (not just taint). If new results: add state-specific steps, use barriers, update sources/sinks.
Jump steps: v2 handles cross-function boundaries differently. Review interprocedural flows, add flow steps, verify barriers if results differ.
Cover: Promise chains, async/await, DOM manipulation, frameworks, prototype pollution, modules, TypeScript patterns.
- Run on realistic database; optimize if needed (cache predicates, use local flow)
- Update query metadata to reflect v2 API
- Remove v1 baseline files, format with
codeql_query_format
// Command injection sinks
predicate isSink(DataFlow::Node sink) {
exists(SystemCommandExecution cmd |
sink = cmd.getACommandArgument()
)
}
// File system operations
predicate isSink(DataFlow::Node sink) {
exists(FileSystemAccess fs |
sink = fs.getAPathArgument()
)
}predicate isBarrierGuard(DataFlow::BarrierGuard guard) {
guard instanceof TypeGuard
}
class TypeGuard extends DataFlow::BarrierGuard {
TypeGuard() {
this = any(TypeAssertion ta)
}
override predicate checks(Expr e, boolean branch) {
// Define type guard logic
}
}predicate isSource(DataFlow::Node source) {
// Cookies
source = DOM::documentRef().getAPropertySource("cookie") or
// Storage APIs
exists(DataFlow::CallNode storage |
storage.getReceiver() = DataFlow::globalVarRef(["localStorage", "sessionStorage"]) and
storage.getCalleeName() = "getItem" and
source = storage
)
}codeql_test_run: Run tests and compare with expected resultscodeql_test_extract: Extract test databases from JavaScript/TypeScript source codecodeql_query_compile: Compile queries and check for errorscodeql_query_run: Run queries for analysiscodeql_bqrs_decode: Decode binary query resultscodeql_query_format: Format query files for consistencycodeql_pack_install: Install query pack dependencies
❌ Don't:
- Skip baseline test establishment
- Change query logic alongside API migration
- Accept results without verifying equivalence
- Remove v1 baseline prematurely
- Ignore behavioral changes in taint propagation or flow labels/states
✅ Do:
- Establish test baseline BEFORE changes
- Make mechanical API changes first
- Verify exact result equivalence
- Test JavaScript patterns (async, promises, DOM, frameworks)
- Document intentional behavior changes
If results differ after migration:
- Check flow state usage: Ensure flow states match v1 flow labels
- Verify taint propagation: Review if new results are due to all-state propagation
- Inspect barrier guards: Confirm
isBarrierGuardimplementation matches v1 semantics - Review jump steps: Check interprocedural flows (callbacks, promises)
- Debug with partial flow: Use flow exploration to find missing/new edges
- Check barrier behavior: Verify barriers block appropriately (v2 blocks all flows including content)
- New dataflow API for writing custom CodeQL queries - Official v2 API announcement
- Migrating JavaScript Dataflow Queries - JavaScript-specific migration guide
- Analyzing data flow in JavaScript and TypeScript - JavaScript dataflow guide
- CodeQL JavaScript Library Reference - Standard library documentation
- Create CodeQL Query TDD Generic - TDD workflow for queries
Your dataflow migration is successful when:
- ✅ Test baseline established before migration
- ✅ Query compiles without errors using v2 API
- ✅ All configuration classes converted to modules
- ✅ All flow labels migrated to flow states (if applicable)
- ✅ All
isSanitizerrenamed toisBarrier - ✅ All
isAdditionalTaintSteprenamed toisAdditionalFlowStep - ✅ All
isSanitizerGuardrenamed toisBarrierGuardwith updated signature - ✅ All
cfg.hasFlow()calls replaced with module flow predicates - ✅ Test results EXACTLY match v1 baseline (zero diff) OR documented behavioral differences are understood and accepted
- ✅ No performance regressions
- ✅ Query metadata updated appropriately
- ✅ JavaScript/TypeScript-specific patterns (async/await, promises, DOM, frameworks) handled correctly