| name | update-codeql-query-dataflow-csharp |
|---|---|
| description | Upgrade C# CodeQL queries from legacy (v1) language-specific dataflow API to modern (v2) shared dataflow API while ensuring query result equivalence through test-driven development. Use this skill when modernizing C# dataflow queries to use the unified dataflow library. |
This skill guides you through migrating C# CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query result equivalence.
- Migrating legacy C# queries using
DataFlow::Configurationto modernDataFlow::ConfigSig - Updating queries to use the shared dataflow library introduced in 2023
- Modernizing C# queries that use deprecated dataflow patterns
- Ensuring migrated queries produce identical results as the original
The most important outcome is ensuring the migrated query produces exactly the same results as the original query. Result changes due to query migration alone can cause:
- Alert flapping: Issues appearing and disappearing without code changes
- False confidence: Developers may lose trust in CodeQL analysis
- Deployment issues: CI/CD pipelines may fail due to new/changed alerts
Test-Driven Development (TDD) is mandatory for dataflow migration to guarantee result equivalence.
- Configuration: Uses
DataFlow::Configurationclasses - Predicates:
isSource(),isSink(),isSanitizer(),isAdditionalTaintStep() - Library: Language-specific dataflow library (e.g.,
semmle.code.csharp.dataflow.DataFlow) - Nodes: Language-specific node types (e.g.,
ExprNode,ParameterNode)
- Configuration: Uses
DataFlow::ConfigSigsignature modules withDataFlow::Global<ConfigSig> - Predicates:
isSource(),isSink(),isBarrier(),isAdditionalFlowStep() - Library: Shared dataflow library (e.g.,
semmle.code.csharp.dataflow.DataFlow2) - Nodes: Unified
DataFlow::Nodetype with consistent semantics
| Aspect | v1 API | v2 API |
|---|---|---|
| Configuration | class Config extends DataFlow::Configuration |
module ConfigSig implements DataFlow::ConfigSig + module Config = DataFlow::Global<ConfigSig> |
| Sanitizer | isSanitizer(DataFlow::Node node) |
isBarrier(DataFlow::Node node) |
| Custom flow | isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) |
isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) |
| Flow check | config.hasFlow(source, sink) |
Config::flow(source, sink) |
| Flow path | config.hasFlowPath(source, sink) |
Config::flowPath(source, sink) |
Review the existing v1 query: metadata (@name, @kind, @id), sources, sinks, barriers, and custom flow steps.
Run existing tests with codeql_test_run and capture baseline results - the migrated query must match these exactly:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Add test cases for positive (vulnerable), negative (safe), and edge cases (sanitization, async/await, LINQ, properties). Update .expected file, then extract and run tests:
{ "testPath": "<query-pack>/test/{QueryName}", "searchPath": ["<query-pack>"] }Create new query file (or backup original and modify in place):
Import v2 dataflow library:
import csharp
import semmle.code.csharp.dataflow.TaintTracking
import semmle.code.csharp.security.dataflow.SqlInjectionQueryDefine configuration signature:
/**
* Configuration for detecting [vulnerability description]
*/
module MyConfigSig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
// Define sources - user-controllable input
source instanceof RemoteFlowSource or
exists(Parameter p |
p.fromSource() and
source.asParameter() = p
)
}
predicate isSink(DataFlow::Node sink) {
// Define sinks - dangerous operations
exists(MethodCall call |
call.getTarget().hasName("FromSqlRaw") and
call.getAnArgument() = sink.asExpr()
)
}
predicate isBarrier(DataFlow::Node node) {
// Define barriers - sanitization/validation
// Renamed from isSanitizer in v1
node instanceof SanitizerGuard or
exists(MethodCall sanitize |
sanitize.getTarget().hasName(["Sanitize", "Validate", "Encode"]) and
DataFlow::localFlow(sanitize, node)
)
}
predicate isAdditionalFlowStep(DataFlow::Node node1, DataFlow::Node node2) {
// Define custom flow steps
// Renamed from isAdditionalTaintStep in v1
// Example: flow through string formatting
exists(MethodCall format |
format.getTarget().hasName("Format") and
format.getAnArgument() = node1.asExpr() and
format = node2.asExpr()
)
}
}Instantiate global flow module:
module MyConfig = TaintTracking::Global<MyConfigSig>;Update query select:
from DataFlow::Node source, DataFlow::Node sink
where MyConfig::flow(source, sink)
select sink, "Dataflow from $@ to sink", source, "user input"For path queries (@kind path-problem), use PathNode from the instantiated module:
import DataFlow::PathGraph
from MyConfig::PathNode source, MyConfig::PathNode sink
where MyConfig::flowPath(source, sink)
select sink.getNode(), source, sink, "Dataflow from $@ to sink", source.getNode(), "user input"RemoteFlowSource (v2 pattern):
// v1 pattern
source.asExpr().(Parameter).fromSource()
// v2 pattern
source instanceof RemoteFlowSourceBarrier Guards (renamed from sanitizers):
// v1 pattern
predicate isSanitizer(DataFlow::Node node) {
// guard logic
}
// v2 pattern
predicate isBarrier(DataFlow::Node node) {
// same guard logic
}Library Extensions (C#-specific):
// For custom library dataflow in C#
import semmle.code.csharp.dataflow.LibraryTypeDataFlow
class MyLibraryFlow extends LibraryTypeDataFlow {
override predicate callableFlow(
CallableFlowSource source,
CallableFlowSink sink,
SourceDeclarationCallable c,
boolean preservesValue
) {
// Define flow through library methods
}
}ASP.NET Patterns:
// Web input sources
source.asExpr() instanceof WebInput or
exists(Parameter p |
p.getAnAttribute().getType().hasName("FromBodyAttribute") and
source.asParameter() = p
)
// Controller action sinks
exists(MethodCall redirect |
redirect.getTarget().hasName(["Redirect", "RedirectToAction"]) and
sink.asExpr() = redirect.getAnArgument()
)Compile with codeql_query_compile, then run tests with codeql_test_run to validate result equivalence.
If tests fail, analyze differences: missing sources (v2 RemoteFlowSource coverage), barrier semantics, flow steps, or node conversions. Use codeql_query_run with PrintAST for debugging. Iterate until tests pass.
Once tests pass: extract helper predicates, update QLDoc comments, format with codeql_query_format, install pack dependencies with codeql_pack_install, and run full test suite to ensure no regressions.
// Controller actions as sources
exists(Method m, Parameter p |
m.getDeclaringType().getABaseType*().hasName("Controller") and
p = m.getAParameter() and
source.asParameter() = p
)// Database sinks
exists(MethodCall call |
call.getTarget().hasName(["FromSqlRaw", "ExecuteSqlRaw", "ExecuteSqlCommand"]) and
sink.asExpr() = call.getAnArgument()
)// Flow through LINQ
exists(LambdaExpr lambda |
node1.asExpr() = lambda.getAParameter() and
node2.asExpr() = lambda.getExpressionBody()
)// Async method flow
exists(AwaitExpr await |
node1.asExpr() = await.getExpr() and
DataFlow::localFlow(await, node2)
)codeql_test_extract: Extract test databases from C# codecodeql_test_run: Run query tests and validate result equivalencecodeql_query_compile: Compile queries and check for v2 API syntax errorscodeql_query_format: Format migrated query codecodeql_query_run: Run PrintAST for debugging AST structurecodeql_pack_install: Install pack dependenciescodeql_bqrs_decode: Decode query results for analysis
❌ Don't:
- Migrate without establishing comprehensive baseline tests
- Accept test differences without understanding root cause
- Skip testing edge cases (async, LINQ, properties)
- Ignore custom flow steps that may need adjustment
- Assume v1 and v2 semantics are identical
- Forget to update query metadata and documentation
✅ Do:
- Create extensive test coverage before migration
- Verify exact result equivalence after migration
- Test all C# language features (async, LINQ, properties, etc.)
- Review official migration guides and documentation
- Use TDD to catch regressions early
- Document migration decisions in code comments
- Format code consistently with
codeql_query_format
Before considering migration complete:
- Baseline test results captured from original v1 query
- Comprehensive test coverage (positive, negative, edge cases)
- All v1 imports replaced with v2 imports
-
DataFlow::Configurationreplaced withDataFlow::ConfigSig -
isSanitizerrenamed toisBarrier -
isAdditionalTaintSteprenamed toisAdditionalFlowStep -
hasFlow/hasFlowPathreplaced withflow/flowPath - Query compiles without errors or warnings
- All tests pass with exact result match to baseline
- No regressions in other queries (full test suite passes)
- Query properly formatted with
codeql_query_format - QLDoc comments updated to reflect v2 API
- C#-specific patterns properly migrated
- Performance is acceptable (similar to v1 query)
For detailed v2 API information, consult:
- New dataflow API for writing custom CodeQL queries - Official announcement and migration guide
- Analyzing data flow in C# - C#-specific dataflow documentation
- Create CodeQL Query Unit Test for C# - For creating comprehensive tests
- Test-Driven CodeQL Query Development - TDD methodology for queries
Your dataflow migration is successful when:
- ✅ All tests pass with exact result equivalence to v1 baseline
- ✅ Query uses v2 API consistently (
ConfigSig,isBarrier,isAdditionalFlowStep) - ✅ No deprecated v1 patterns remain
- ✅ All C#-specific patterns properly migrated
- ✅ Query compiles without warnings
- ✅ Full test suite passes (no regressions)
- ✅ Code is clean, formatted, and well-documented
- ✅ Performance is acceptable for production use