name	update-codeql-query-dataflow-ruby
description	Update CodeQL queries for Ruby from legacy v1 dataflow API to modern v2 shared dataflow API. Use this skill when migrating Ruby queries to use DataFlow::ConfigSig modules, ensuring query results remain equivalent through TDD.

Update CodeQL Query Dataflow for Ruby

This skill guides you through migrating Ruby CodeQL queries from the legacy v1 (language-specific) dataflow API to the modern v2 (shared) dataflow API while ensuring query result equivalence.

When to Use This Skill

Migrating Ruby queries using deprecated DataFlow::Configuration classes
Updating queries to use DataFlow::ConfigSig modules
Modernizing Ruby queries to use the shared dataflow library
Ensuring query result equivalence during dataflow API migration

Prerequisites

Existing Ruby CodeQL query using v1 dataflow API that you want to migrate
Existing unit tests for the query
Understanding of the query's detection purpose
Access to CodeQL Development MCP Server tools

Key Dataflow API Changes (v1 → v2)

Configuration Class → Configuration Module

v1 (Legacy):

class MyConfig extends DataFlow::Configuration {
  MyConfig() { this = "MyConfig" }
  override predicate isSource(DataFlow::Node source) { ... }
  override predicate isSink(DataFlow::Node sink) { ... }
  override predicate isSanitizer(DataFlow::Node node) { ... }
  override predicate isAdditionalTaintStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}

v2 (Modern):

module MyConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) { ... }
  predicate isSink(DataFlow::Node sink) { ... }
  predicate isBarrier(DataFlow::Node node) { ... }
  predicate isAdditionalFlowStep(DataFlow::Node n1, DataFlow::Node n2) { ... }
}

module MyFlow = TaintTracking::Global<MyConfig>;

Key Terminology Changes

v1 API	v2 API	Purpose
`DataFlow::Configuration`	`DataFlow::ConfigSig`	Configuration signature
`isSanitizer`	`isBarrier`	Stop data flow propagation
`isAdditionalTaintStep`	`isAdditionalFlowStep`	Custom flow steps
`this.hasFlow(source, sink)`	`MyFlow::flow(source, sink)`	Query flow paths

Ruby-Specific Node Types

Ruby dataflow uses multiple node representations:

ExprNode: AST expression nodes (method calls, literals)
ParameterNode: Method parameter nodes
CfgNodes::ExprCfgNode: Control-flow graph nodes (returned by asExpr())
LocalSourceNode: Local sources for API graph analysis
RemoteFlowSource: Predefined sources for user-controllable input

Migration Workflow

Phase 1: Establish Test Baseline (TDD Foundation)

Critical: Before any code changes, capture current query behavior.

Step 1: Run Existing Tests

Use codeql_test_run to establish baseline:

{
  "testPath": "<query-pack>/test/{QueryName}",
  "searchPath": ["<query-pack>"]
}

Save the output - this is your reference for query result equivalence.

Step 2: Document Current Results

Create a reference file with current results:

cp <query-pack>/test/{QueryName}/{QueryName}.expected \
   <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline

This ensures you can verify equivalence after migration.

Phase 2: Analyze Current Query

Step 3: Identify v1 Patterns

Review the query for v1 API usage:

class X extends DataFlow::Configuration
isSanitizer predicates
isAdditionalTaintStep predicates
this.hasFlow(source, sink) queries

Step 4: Understand Ruby-Specific Flow

Identify how the query uses Ruby dataflow constructs:

CFG node conversions (e.g., asExpr() returns CfgNodes::ExprCfgNode)
RemoteFlowSource for user input (Rails params, HTTP requests)
API graphs for tracking gem/framework usage (codeql.ruby.ApiGraphs)
Ruby-specific sources: ARGV, ENV, Rails parameters, HTTP requests
Ruby-specific sinks: eval, send, system, ActiveRecord queries

Phase 3: Migrate to v2 API

Step 5: Convert Configuration Class to Module

Before:

class CommandInjectionConfig extends TaintTracking::Configuration {
  CommandInjectionConfig() { this = "CommandInjectionConfig" }

  override predicate isSource(DataFlow::Node source) {
    source instanceof RemoteFlowSource
  }

  override predicate isSink(DataFlow::Node sink) {
    exists(DataFlow::CallNode call |
      call.getMethodName() = "system" and
      sink = call.getAnArgument()
    )
  }

  override predicate isSanitizer(DataFlow::Node node) {
    node = any(ShellquoteCall c).getResult()
  }
}

from CommandInjectionConfig cfg, DataFlow::Node source, DataFlow::Node sink
where cfg.hasFlow(source, sink)
select sink, "Untrusted data flows to command execution"

After:

module CommandInjectionConfig implements DataFlow::ConfigSig {
  predicate isSource(DataFlow::Node source) {
    source instanceof RemoteFlowSource
  }

  predicate isSink(DataFlow::Node sink) {
    exists(DataFlow::CallNode call |
      call.getMethodName() = "system" and
      sink = call.getAnArgument()
    )
  }

  predicate isBarrier(DataFlow::Node node) {
    node = any(ShellquoteCall c).getResult()
  }
}

module CommandInjectionFlow = TaintTracking::Global<CommandInjectionConfig>;

from DataFlow::Node source, DataFlow::Node sink
where CommandInjectionFlow::flow(source, sink)
select sink, "Untrusted data flows to command execution"

Step 6: Rename Predicates

isSanitizer → isBarrier: Change method name only, logic unchanged
isAdditionalTaintStep → isAdditionalFlowStep: Change method name only

Step 7: Update Flow Queries

Replace cfg.hasFlow(source, sink) with MyFlow::flow(source, sink):

Remove configuration variable from from clause
Use module flow predicate directly

Phase 4: Handle Ruby-Specific Migration Patterns

Step 8: Control-Flow Node Conversions

Ruby's asExpr() returns CfgNodes::ExprCfgNode, not AST nodes:

// v1 and v2 both support these conversions
DataFlow::Node n;
CfgNodes::ExprCfgNode cfgExpr = n.asExpr();  // CFG expression, not AST
DataFlow::ParameterNode param = n.asParameter();  // Method parameter

To get AST nodes from CFG nodes:

Expr astExpr = cfgExpr.getExpr();  // Get underlying AST expression

Step 9: RemoteFlowSource Usage

RemoteFlowSource works identically in v1 and v2:

predicate isSource(DataFlow::Node source) {
  source instanceof RemoteFlowSource or
  // Rails parameters
  source.asExpr().getExpr().(MethodCall).getMethodName() = "params" or
  // Environment variables
  exists(ConstantReadAccess env |
    env.getExpr().(ConstRef).getName() = "ENV" and
    source.asExpr().getExpr() = env.getAMethodCall()
  ) or
  // Command line arguments
  exists(ConstantReadAccess argv |
    argv.getExpr().(ConstRef).getName() = "ARGV" and
    source.asExpr().getExpr() = argv
  )
}

Step 10: API Graph Integration

Use API graphs to track framework and gem usage. Example: Rails controller params via API::getTopLevelMember("ActionController").getReturn("Base")...getReturn("params").

Step 11: Rails-Specific Patterns

Track flows through Rails: ActiveRecord mass assignment (create, update), ActionView render (render with inline), hash access ([], fetch, dig).

Step 12: Metaprogramming Patterns

Track flows through dynamic features: send/public_send, define_method, const_get/const_set.

Phase 5: Validate Equivalence Through Testing

Step 13: Compile Migrated Query

Use codeql_query_compile to check for errors:

{
  "queryPath": "<query-pack>/src/{QueryName}/{QueryName}.ql",
  "searchPath": ["<query-pack>"]
}

Fix any compilation errors before testing.

Step 14: Run Tests and Compare Results

Use codeql_test_run on migrated query:

{
  "testPath": "<query-pack>/test/{QueryName}",
  "searchPath": ["<query-pack>"]
}

Critical: Results MUST match baseline from Phase 1.

Step 15: Verify Result Equivalence

Compare results line-by-line:

diff <query-pack>/test/{QueryName}/{QueryName}.expected.v1-baseline \
     <query-pack>/test/{QueryName}/{QueryName}.expected

Success: Empty diff (identical results) Failure: Any differences require investigation and fixes

Phase 6: Expand Test Coverage (Optional)

Add test cases for Rails features, gems (Sinatra, Grape), metaprogramming, string interpolation, blocks/lambdas, and hash/array flows. For each: add test code, update .expected, extract with codeql_test_extract, run tests.

Phase 7: Performance Validation

Run query on realistic database. If performance degrades: cache expensive predicates, use local flow where possible, limit scope, optimize API graph queries.

Phase 8: Finalize Migration

Update query metadata, remove v1 baseline files, add migration notes if needed, format with codeql_query_format.

Ruby-Specific Dataflow Considerations

Rails Framework Sources

Track user input in Rails applications:

predicate isSource(DataFlow::Node source) {
  // Controller parameters
  exists(DataFlow::CallNode params |
    params.getMethodName() = "params" and
    source = params.getAMethodCall()
  ) or
  // Request headers
  exists(DataFlow::CallNode request |
    request.getMethodName() = "request" and
    source = request.getAMethodCall("headers")
  ) or
  // Cookies
  exists(DataFlow::CallNode cookies |
    cookies.getMethodName() = "cookies" and
    source = cookies.getAMethodCall()
  )
}

ActiveRecord Sinks

Track dangerous database operations:

predicate isSink(DataFlow::Node sink) {
  // Raw SQL execution
  exists(DataFlow::CallNode query |
    query.getMethodName() in ["find_by_sql", "execute", "exec_query"] and
    sink = query.getAnArgument()
  ) or
  // String interpolation in where clauses
  exists(DataFlow::CallNode where |
    where.getMethodName() = "where" and
    exists(StringInterpolation interp |
      sink.asExpr().getExpr() = interp and
      interp = where.getArgument(0).asExpr().getExpr()
    )
  )
}

Code Execution via Metaprogramming

Track dynamic code execution:

predicate isSink(DataFlow::Node sink) {
  // eval family
  exists(DataFlow::CallNode evalCall |
    evalCall.getMethodName() in ["eval", "instance_eval", "class_eval", "module_eval"] and
    sink = evalCall.getArgument(0)
  ) or
  // send with dynamic method names
  exists(DataFlow::CallNode send |
    send.getMethodName() in ["send", "public_send"] and
    sink = send.getArgument(0)
  ) or
  // define_method with dynamic names
  exists(DataFlow::CallNode define |
    define.getMethodName() = "define_method" and
    sink = define.getArgument(0)
  )
}

String Interpolation Flows

Track flows through string interpolation components and concatenation (AddExpr).

Block and Lambda Flows

Track flows through block parameters and lambda/proc creation with DataFlow::localFlow.

Gem-Specific Patterns

Sinatra: Route parameters via regexpMatch("^(get|post|put|delete|patch)$"), request object. Rack: Middleware call method with env hash.

MCP Tools Reference

codeql_test_run: Run tests and compare with expected results
codeql_test_extract: Extract test databases from Ruby source code
codeql_query_compile: Compile queries and check for errors
codeql_query_run: Run queries for analysis
codeql_bqrs_decode: Decode binary query results
codeql_query_format: Format query files for consistency
codeql_pack_install: Install query pack dependencies

Common Migration Pitfalls

❌ Don't:

Skip baseline test establishment before migration
Change query logic alongside API migration (separate concerns)
Accept test results without verifying equivalence
Remove v1 baseline until migration is confirmed successful
Ignore performance regressions
Forget to update imports if needed
Overlook Ruby-specific CFG node semantics (asExpr() returns CFG nodes)

✅ Do:

Establish test baseline BEFORE any changes
Make purely mechanical API changes first
Verify exact result equivalence after migration
Keep v1 baseline for comparison during migration
Test edge cases specific to Ruby (metaprogramming, Rails, gems)
Document any intentional behavior changes separately
Understand difference between CFG nodes and AST nodes

Troubleshooting Non-Equivalent Results

If results differ after migration:

Check node type conversions: Ensure asExpr() CFG semantics are correct
Verify predicate renames: Confirm isBarrier vs isSanitizer logic is identical
Review flow predicates: Check isAdditionalFlowStep mirrors isAdditionalTaintStep
Inspect CFG vs AST confusion: Use .getExpr() on CFG nodes to get AST nodes
Debug with partial flow: Use flow exploration to find missing edges
Check API graph usage: Ensure API graph predicates are correctly structured

Documentation References

New dataflow API for writing custom CodeQL queries - Official v2 API announcement
Analyzing data flow in Ruby - Ruby dataflow guide
CodeQL Ruby Library Reference - Standard library documentation

Related Resources

Create CodeQL Query TDD Generic - TDD workflow for queries

Success Criteria

Your dataflow migration is successful when:

✅ Test baseline established before migration
✅ Query compiles without errors using v2 API
✅ All configuration classes converted to modules
✅ All isSanitizer renamed to isBarrier
✅ All isAdditionalTaintStep renamed to isAdditionalFlowStep
✅ All cfg.hasFlow() calls replaced with module flow predicates
✅ Test results EXACTLY match v1 baseline (zero diff)
✅ No performance regressions
✅ Query metadata updated appropriately
✅ Ruby-specific patterns (metaprogramming, Rails, CFG nodes) handled correctly

FilesExpand file tree

SKILL.md

Latest commit

History