| name | update-codeql-query-dataflow-ruby |
|---|---|
| description | Update CodeQL queries for Ruby from legacy v1 dataflow API to modern v2 shared dataflow API. Use this skill when migrating Ruby queries to use DataFlow::ConfigSig modules, ensuring query results remain equivalent through TDD. |
This skill guides you through migrating Ruby CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query result equivalence.
- Migrating Ruby queries using deprecated
DataFlow::Configurationclasses - Updating queries to use
DataFlow::ConfigSigmodules - Modernizing Ruby queries to use the shared dataflow library
- Ensuring query result equivalence during dataflow API migration
- Existing Ruby CodeQL query using v1 dataflow API that you want to migrate
- Existing unit tests for the query
- Understanding of the query's detection purpose
- Access to CodeQL Development MCP Server tools
v1 (Legacy):
class MyConfig extends DataFlow::Configuration {
MyConfig() { this = "MyConfig" }
override predicate isSource(DataFlow::Node source) { ... }
override predicate isSink(DataFlow::Node sink) { ... }
override predicate isSanitizer(DataFlow::Node node) { ... }
override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}v2 (Modern):
module MyConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) { ... }
predicate isSink(DataFlow::Node sink) { ... }
predicate isBarrier(DataFlow::Node node) { ... }
predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}
module MyFlow = TaintTracking::Global<MyConfig>;| v1 API | v2 API | Purpose |
|---|---|---|
DataFlow::Configuration |
DataFlow::ConfigSig |
Configuration signature |
isSanitizer |
isBarrier |
Stop data flow propagation |
isAdditionalTaintStep |
isAdditionalFlowStep |
Custom flow steps |
this.hasFlow(source, sink) |
MyFlow::flow(source, sink) |
Query flow paths |
Ruby dataflow uses multiple node representations:
ExprNode: AST expression nodes (method calls, literals)ParameterNode: Method parameter nodesCfgNodes::ExprCfgNode: Control-flow graph nodes (returned byasExpr())LocalSourceNode: Local sources for API graph analysisRemoteFlowSource: Predefined sources for user-controllable input
Critical: Before any code changes, capture current query behavior.
Use codeql_test_run to establish baseline:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Save the output - this is your reference for query result equivalence.
Create a reference file with current results:
cp <query-pack>/test/{QueryName}/{QueryName}.expected \
<query-pack>/test/{QueryName}/{QueryName}.expected.v1-baselineThis ensures you can verify equivalence after migration.
Review the query for v1 API usage:
class X extends DataFlow::ConfigurationisSanitizerpredicatesisAdditionalTaintSteppredicatesthis.hasFlow(source, sink)queries
Identify how the query uses Ruby dataflow constructs:
- CFG node conversions (e.g.,
asExpr()returnsCfgNodes::ExprCfgNode) RemoteFlowSourcefor user input (Railsparams, HTTP requests)- API graphs for tracking gem/framework usage (
codeql.ruby.ApiGraphs) - Ruby-specific sources:
ARGV,ENV, Rails parameters, HTTP requests - Ruby-specific sinks:
eval,send,system, ActiveRecord queries
Before:
class CommandInjectionConfig extends TaintTracking::Configuration {
CommandInjectionConfig() { this = "CommandInjectionConfig" }
override predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
override predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getMethodName() = "system" and
sink = call.getAnArgument()
)
}
override predicate isSanitizer(DataFlow::Node node) {
node = any(ShellquoteCall c).getResult()
}
}
from CommandInjectionConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select sink, "Untrusted data flows to command execution"After:
module CommandInjectionConfig implements DataFlow::ConfigSig {
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource
}
predicate isSink(DataFlow::Node sink) {
exists(DataFlow::CallNode call |
call.getMethodName() = "system" and
sink = call.getAnArgument()
)
}
predicate isBarrier(DataFlow::Node node) {
node = any(ShellquoteCall c).getResult()
}
}
module CommandInjectionFlow = TaintTracking::Global<CommandInjectionConfig>;
from DataFlow::Node source, DataFlow::Node sink
where CommandInjectionFlow::flow(source, sink)
select sink, "Untrusted data flows to command execution"isSanitizer→isBarrier: Change method name only, logic unchangedisAdditionalTaintStep→isAdditionalFlowStep: Change method name only
Replace cfg.hasFlow(source, sink) with MyFlow::flow(source, sink):
- Remove configuration variable from
fromclause - Use module flow predicate directly
Ruby's asExpr() returns CfgNodes::ExprCfgNode, not AST nodes:
// v1 and v2 both support these conversions
DataFlow::Node n;
CfgNodes::ExprCfgNode cfgExpr = n.asExpr(); // CFG expression, not AST
DataFlow::ParameterNode param = n.asParameter(); // Method parameterTo get AST nodes from CFG nodes:
Expr astExpr = cfgExpr.getExpr(); // Get underlying AST expressionRemoteFlowSource works identically in v1 and v2:
predicate isSource(DataFlow::Node source) {
source instanceof RemoteFlowSource or
// Rails parameters
source.asExpr().getExpr().(MethodCall).getMethodName() = "params" or
// Environment variables
exists(ConstantReadAccess env |
env.getExpr().(ConstRef).getName() = "ENV" and
source.asExpr().getExpr() = env.getAMethodCall()
) or
// Command line arguments
exists(ConstantReadAccess argv |
argv.getExpr().(ConstRef).getName() = "ARGV" and
source.asExpr().getExpr() = argv
)
}Use API graphs to track framework and gem usage. Example: Rails controller params via API::getTopLevelMember("ActionController").getReturn("Base")...getReturn("params").
Track flows through Rails: ActiveRecord mass assignment (create, update), ActionView render (render with inline), hash access ([], fetch, dig).
Track flows through dynamic features: send/public_send, define_method, const_get/const_set.
Use codeql_query_compile to check for errors:
{
"queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
"searchPath": ["<query-pack>"]
}Fix any compilation errors before testing.
Use codeql_test_run on migrated query:
{
"testPath": "<query-pack>/test/{QueryName}",
"searchPath": ["<query-pack>"]
}Critical: Results MUST match baseline from Phase 1.
Compare results line-by-line:
diff <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline \
<query-pack>/test/{QueryName}/{QueryName}.expectedSuccess: Empty diff (identical results) Failure: Any differences require investigation and fixes
Add test cases for Rails features, gems (Sinatra, Grape), metaprogramming, string interpolation, blocks/lambdas, and hash/array flows. For each: add test code, update .expected, extract with codeql_test_extract, run tests.
Run query on realistic database. If performance degrades: cache expensive predicates, use local flow where possible, limit scope, optimize API graph queries.
Update query metadata, remove v1 baseline files, add migration notes if needed, format with codeql_query_format.
Track user input in Rails applications:
predicate isSource(DataFlow::Node source) {
// Controller parameters
exists(DataFlow::CallNode params |
params.getMethodName() = "params" and
source = params.getAMethodCall()
) or
// Request headers
exists(DataFlow::CallNode request |
request.getMethodName() = "request" and
source = request.getAMethodCall("headers")
) or
// Cookies
exists(DataFlow::CallNode cookies |
cookies.getMethodName() = "cookies" and
source = cookies.getAMethodCall()
)
}Track dangerous database operations:
predicate isSink(DataFlow::Node sink) {
// Raw SQL execution
exists(DataFlow::CallNode query |
query.getMethodName() in ["find_by_sql", "execute", "exec_query"] and
sink = query.getAnArgument()
) or
// String interpolation in where clauses
exists(DataFlow::CallNode where |
where.getMethodName() = "where" and
exists(StringInterpolation interp |
sink.asExpr().getExpr() = interp and
interp = where.getArgument(0).asExpr().getExpr()
)
)
}Track dynamic code execution:
predicate isSink(DataFlow::Node sink) {
// eval family
exists(DataFlow::CallNode evalCall |
evalCall.getMethodName() in ["eval", "instance_eval", "class_eval", "module_eval"] and
sink = evalCall.getArgument(0)
) or
// send with dynamic method names
exists(DataFlow::CallNode send |
send.getMethodName() in ["send", "public_send"] and
sink = send.getArgument(0)
) or
// define_method with dynamic names
exists(DataFlow::CallNode define |
define.getMethodName() = "define_method" and
sink = define.getArgument(0)
)
}Track flows through string interpolation components and concatenation (AddExpr).
Track flows through block parameters and lambda/proc creation with DataFlow::localFlow.
Sinatra: Route parameters via regexpMatch("^(get|post|put|delete|patch)$"), request object.
Rack: Middleware call method with env hash.
codeql_test_run: Run tests and compare with expected resultscodeql_test_extract: Extract test databases from Ruby source codecodeql_query_compile: Compile queries and check for errorscodeql_query_run: Run queries for analysiscodeql_bqrs_decode: Decode binary query resultscodeql_query_format: Format query files for consistencycodeql_pack_install: Install query pack dependencies
❌ Don't:
- Skip baseline test establishment before migration
- Change query logic alongside API migration (separate concerns)
- Accept test results without verifying equivalence
- Remove v1 baseline until migration is confirmed successful
- Ignore performance regressions
- Forget to update imports if needed
- Overlook Ruby-specific CFG node semantics (
asExpr()returns CFG nodes)
✅ Do:
- Establish test baseline BEFORE any changes
- Make purely mechanical API changes first
- Verify exact result equivalence after migration
- Keep v1 baseline for comparison during migration
- Test edge cases specific to Ruby (metaprogramming, Rails, gems)
- Document any intentional behavior changes separately
- Understand difference between CFG nodes and AST nodes
If results differ after migration:
- Check node type conversions: Ensure
asExpr()CFG semantics are correct - Verify predicate renames: Confirm
isBarriervsisSanitizerlogic is identical - Review flow predicates: Check
isAdditionalFlowStepmirrorsisAdditionalTaintStep - Inspect CFG vs AST confusion: Use
.getExpr()on CFG nodes to get AST nodes - Debug with partial flow: Use flow exploration to find missing edges
- Check API graph usage: Ensure API graph predicates are correctly structured
- New dataflow API for writing custom CodeQL queries - Official v2 API announcement
- Analyzing data flow in Ruby - Ruby dataflow guide
- CodeQL Ruby Library Reference - Standard library documentation
- Create CodeQL Query TDD Generic - TDD workflow for queries
Your dataflow migration is successful when:
- ✅ Test baseline established before migration
- ✅ Query compiles without errors using v2 API
- ✅ All configuration classes converted to modules
- ✅ All
isSanitizerrenamed toisBarrier - ✅ All
isAdditionalTaintSteprenamed toisAdditionalFlowStep - ✅ All
cfg.hasFlow()calls replaced with module flow predicates - ✅ Test results EXACTLY match v1 baseline (zero diff)
- ✅ No performance regressions
- ✅ Query metadata updated appropriately
- ✅ Ruby-specific patterns (metaprogramming, Rails, CFG nodes) handled correctly