Skip to content

Improve query results caching with rule-level SARIF extraction, Mermaid dataflow visualization, and alert overlap analysis (v2.25.1-next.3) #201

@data-douser

Description

@data-douser

Summary

The current query_results_cache in the ql-mcp server uses query_name derived from file basenames or pack names, which is ambiguous and prevents reliable cross-run comparison. This issue proposes improving the cache model and adding SARIF-aware tools for rule-level extraction, visualization, and alert overlap analysis.

Target release: v2.25.1-next.3
Base: merge of PR #199 or the copilot/fix-bqrs-interpret-issues branch


Problem Statement

1. Cache key identity is ambiguous

The current query_name field is populated from:

  • query_run: the user-supplied queryName parameter, or basename(queryPath, '.ql') — e.g., "UI5Clickjacking" or "PrintAST"
  • database_analyze: basename(queries, '.qls') or the literal "database-analyze" — e.g., "javascript-sap-ui5-queries"

Neither of these uses the CodeQL query's @id metadata (e.g., js/ui5-clickjacking), which is the only value with uniqueness requirements. This causes:

  • Name collisions: different queries in different directories can share the same filename (e.g., ListRemoteFlowSources.ql exists in both UI5 and XSJS packs) — the cache cannot distinguish them
  • Pack-level vs query-level mismatch: database_analyze caches the entire SARIF under the pack name, so cache_compare(queryName="UI5Clickjacking") doesn't find pack-level results, while cache_compare(queryName="javascript-sap-ui5-queries") doesn't find individual query_run results
  • No run differentiation: running the same query multiple times (common during development) overwrites the previous cache entry because the cache key is deterministic over {databasePath, queryPath, format, codeqlVersion}

2. No rule-level SARIF extraction

When a database_analyze run produces SARIF with results from 9+ queries (9 rules, 122 results), there is no tool to extract just the results for a specific rule ID. The only option is cache_retrieve which returns the entire SARIF blob. For an LLM working on a specific query, retrieving all 122 results and the full help text for all 9 rules is wasteful and noisy.

3. No structured visualization of dataflow paths

SARIF codeFlows[].threadFlows[].locations[] encode dataflow paths as ordered location sequences. There are currently no tools to convert these into structured visual representations that an LLM or developer can reason about. Mermaid diagrams are a natural fit since they render in GitHub markdown, PR comments, and VS Code.

4. No alert overlap analysis across rules

During CodeQL query development, it is common for multiple queries to flag the same or overlapping code locations — for example, a generic js/sql-injection query and a framework-specific js/cap-sql-injection query may both flag the same sink. There are currently no tools to:

  • Compare code locations across two SARIF alerts (potentially from different rule IDs)
  • Determine whether two or more CodeQL query implementations produce overlapping results for the same underlying problem
  • Assist an LLM in reasoning about whether query results are redundant, complementary, or false positive/true positive relative to each other

This analysis requires comparing physical locations (file, line, column ranges), dataflow path endpoints (sources and sinks), and the semantic context from query help text to understand whether the overlap is intentional (e.g., defense-in-depth) or indicates redundancy.


Proposed Changes

Part 1: Cache model improvements

1a. Store the CodeQL query @id as the primary query identifier

Add a rule_id column (nullable for backward compatibility) to the query_result_cache table:

ALTER TABLE query_result_cache ADD COLUMN rule_id TEXT;
CREATE INDEX idx_qrc_rule_id ON query_result_cache (rule_id);

Population strategy:

  • query_run: After interpretation, extract runs[0].tool.driver.rules[0].id from the SARIF output (single-query runs always have exactly one rule)
  • database_analyze: Multi-rule — store NULL for the pack-level entry. Additionally (see 1c), create per-rule entries

The rule_id corresponds to the CodeQL query's @id metadata and becomes the SARIF ruleId in results. This is the only identifier with uniqueness guarantees across the CodeQL ecosystem.

1b. Add a run_id column for multi-run differentiation

ALTER TABLE query_result_cache ADD COLUMN run_id TEXT NOT NULL DEFAULT '';

The run_id is a unique identifier per analysis invocation (e.g., a timestamp-based UUID or the evaluator log directory name). This enables:

  • Storing multiple runs of the same query against the same database
  • Comparing results across development iterations
  • Maintaining a history window (configurable max entries per query+database pair)

Update computeQueryCacheKey to include run_id in the hash input, or use run_id as a separate discriminator alongside the existing cache key.

1c. Decompose database_analyze SARIF into per-rule cache entries

After a database_analyze run with SARIF output:

  1. Parse the SARIF and iterate over runs[0].tool.driver.rules[]
  2. For each rule, filter runs[0].results[] by ruleId and create a per-rule cache entry with:
    • rule_id = the rule's id field (e.g., js/cap-sql-injection)
    • query_name = the rule's id field (not the filename)
    • result_count = count of results for this rule
    • result_content = a SARIF subset containing only the relevant results, the matching rule definition, and the run metadata
  3. Additionally, store the full pack-level entry as today (for bulk retrieval)

This shared decomposition logic should be a library function (see Part 3).

1d. Update cache_lookup, cache_compare, cache_retrieve, cache_clear

  • Add ruleId as an optional filter parameter to all four tools
  • cache_compare should support comparison by ruleId across databases, which is the most natural comparison axis for query developers
  • cache_lookup should return ruleId in the entry metadata
  • cache_retrieve should accept ruleId to extract a per-rule SARIF subset from a pack-level cache entry (using the shared extraction logic)

Part 2: SARIF rule extraction tool

sarif_extract_rule

A new tool that extracts all data for a specific rule from multi-rule SARIF:

Parameters:

  • sarifPath (string, required) — path to the SARIF file, OR cacheKey to read from cache
  • ruleId (string, required) — the SARIF rule ID to extract (e.g., js/cap-sql-injection)

Response: A valid SARIF JSON subset containing:

  • runs[0].tool.driver.rules[] — only the matching rule definition (with help.markdown, properties.tags, properties.kind, etc.)
  • runs[0].results[] — only results with matching ruleId, each preserving:
    • locations[] with physical location URIs and regions
    • codeFlows[].threadFlows[].locations[] with full dataflow path steps
    • relatedLocations[] with source/sink annotations
    • message, partialFingerprints
  • runs[0].tool.extensions[] — preserved for pack attribution

This extraction logic should be the same library function used by Part 1c for cache decomposition.

Part 3: Shared SARIF decomposition library

Create server/src/lib/sarif-utils.ts with:

/** Extract results and rule definition for a specific ruleId from SARIF */
function extractRuleFromSarif(sarif: SarifLog, ruleId: string): SarifLog;

/** Decompose multi-rule SARIF into per-rule SARIF subsets */
function decomposeSarifByRule(sarif: SarifLog): Map<string, SarifLog>;

/** Convert a SARIF result with codeFlows to a Mermaid dataflow diagram */
function sarifResultToMermaid(result: SarifResult, rule: SarifRule): string;

/** Convert all results for a rule to a markdown report with Mermaid diagrams */
function sarifRuleToMarkdown(sarif: SarifLog, ruleId: string): string;

/** Compute location overlap between two SARIF results */
function computeLocationOverlap(resultA: SarifResult, resultB: SarifResult): OverlapResult;

/** Find all overlapping alerts between two sets of results (potentially different rules) */
function findOverlappingAlerts(
  resultsA: SarifResult[], ruleA: SarifRule,
  resultsB: SarifResult[], ruleB: SarifRule
): AlertOverlap[];

These are pure functions with no side effects, making them easy to test and reuse.

Part 4: SARIF-to-Markdown with Mermaid dataflow diagrams

sarif_rule_to_markdown

A new tool that converts per-rule SARIF data into a structured markdown report:

Parameters:

  • sarifPath (string) or cacheKey (string) — SARIF source
  • ruleId (string, required) — the rule to render

Response: Markdown containing:

  1. Rule summary header — rule ID, name, severity, security-severity, precision, CWE tags
  2. Query help — the rule's help.markdown content
  3. Results table — location, message, file, line for each result
  4. Dataflow diagrams — for each result with codeFlows, a Mermaid flowchart LR diagram showing:
    • Source node (first step) → intermediate steps → sink node (last step)
    • Each node labeled with the step message and file:line
    • Edges representing dataflow propagation
    • Distinct diagrams for distinct dataflow paths (deduplicated by step sequence)

Example Mermaid output for a 4-step dataflow path:

flowchart LR
    A["msg.data<br/>service.js:7"] --> B["{ messageToPass }<br/>service.js:7"]
    B --> C["messageToPass<br/>service.js:7"]
    C --> D["messageToPass<br/>service.js:9"]
    style A fill:#d4edda
    style D fill:#f8d7da
Loading

This makes dataflow paths immediately visible in GitHub issues, PR comments, and VS Code markdown preview — without requiring the user to manually parse SARIF JSON.

Part 5: Alert overlap analysis tool and prompt

sarif_compare_alerts tool

A new tool to detect overlapping code locations across SARIF alerts:

Parameters:

  • alertA — an alert specifier: {sarifPath, ruleId, resultIndex} or {cacheKey, ruleId, resultIndex}
  • alertB — same structure as alertA; may reference a different rule, different SARIF file, or different cache entry
  • overlapMode (enum: "sink", "source", "any-location", "full-path", default "sink") — what to compare:
    • "sink": compare primary locations (the final alert location) — same file + overlapping line/column range
    • "source": compare the first step of each dataflow path (the source node)
    • "any-location": check if any location in one alert overlaps with any location in the other (including intermediate dataflow steps and related locations)
    • "full-path": compare complete dataflow paths for structural similarity (same source and sink files, same number of steps, overlapping intermediate locations)

Response:

{
  "overlaps": true,
  "overlapMode": "sink",
  "alertA": { "ruleId": "js/sql-injection", "location": "service.js:42:5-42:38", "message": "..." },
  "alertB": { "ruleId": "js/cap-sql-injection", "location": "service.js:42:5-42:38", "message": "..." },
  "sharedLocations": [{ "uri": "service.js", "startLine": 42, "startColumn": 5, "endLine": 42, "endColumn": 38 }],
  "pathSimilarity": 0.85,
  "analysis": "Both alerts flag the same sink location. AlertA uses a generic SQL injection source model while AlertB uses a CAP-specific request handler source model."
}

compare_overlapping_alerts prompt

A new MCP prompt that guides an LLM through alert overlap analysis:

Prompt inputs:

  • sarifPath or cacheKey — the SARIF data to analyze
  • ruleIdA, ruleIdB — the two rules to compare (may be the same rule for cross-database comparison)
  • databasePath — the CodeQL database for source context

Prompt workflow:

  1. Extract results for both rules using sarif_extract_rule
  2. Find all overlapping alerts using sarif_compare_alerts in "sink" mode
  3. For each overlap, read the relevant source code from the database using read_database_source
  4. Present the query help text for both rules side-by-side
  5. Guide the LLM to determine for each overlapping pair:
    • Redundant: same problem, same code path → one query subsumes the other
    • Complementary: same sink but different source models → defense-in-depth, both valuable
    • False overlap: same file/line but semantically different issues (e.g., different arguments to the same call)
  6. Produce a summary with:
    • Total alerts per rule, number of overlaps
    • For each overlap: classification (redundant/complementary/false), source code snippet, both alert messages
    • Recommendations: which query to prioritize, whether to add not clauses to reduce redundancy

This prompt is essential for:

  • Query developers maintaining multiple query packs that may have overlapping coverage (e.g., standard codeql/javascript-queries vs custom advanced-security/javascript-sap-cap-queries)
  • Triaging GHAS Code Scanning alerts where the same code location appears under multiple rules
  • Evaluating whether a new custom query is adding unique value or duplicating existing standard query coverage

Implementation Notes

  • The language field currently stored as 'unknown' for database_analyze entries should be resolved from the database metadata (codeql resolve database already returns languages[])
  • The SARIF decomposition by ruleId is straightforward: filter results[] by ruleId, then include only rule definitions referenced by the filtered results
  • Mermaid diagram generation should sanitize node labels (escape quotes, limit length) and handle edge cases like single-step paths or missing step messages
  • The rule_id values follow a hierarchical namespace pattern (e.g., js/cap-sql-injection, js/ui5-clickjacking) that could support prefix-based filtering in future
  • Location overlap computation should handle both exact matches (same file+line+column) and range overlaps (regions that partially intersect), with configurable tolerance for near-misses (e.g., ±1 line)
  • All SARIF parsing should handle both sarifv2.1.0 and sarif-latest format strings
  • The sarif_compare_alerts tool's pathSimilarity score should be a normalized value (0.0–1.0) based on the proportion of shared dataflow steps relative to total unique steps, computed using location-based matching rather than string matching on step messages

Checklist

Cache model (Part 1)

  • Add rule_id column to query_result_cache table with migration
  • Add run_id column for multi-run differentiation
  • Populate rule_id from SARIF for query_run cache entries
  • Decompose database_analyze SARIF into per-rule cache entries
  • Resolve and store language from database metadata instead of 'unknown'
  • Add ruleId filter to cache_lookup, cache_compare, cache_retrieve, cache_clear
  • Update cache_compare to support comparison by ruleId

SARIF extraction and visualization (Parts 2–4)

  • Create server/src/lib/sarif-utils.ts with shared extraction, decomposition, overlap, and rendering functions
  • Implement sarif_extract_rule tool
  • Implement sarif_rule_to_markdown tool with Mermaid dataflow diagrams
  • Register new tools in the tool registry

Alert overlap analysis (Part 5)

  • Implement computeLocationOverlap and findOverlappingAlerts in sarif-utils.ts
  • Implement sarif_compare_alerts tool with overlapMode parameter
  • Implement compare_overlapping_alerts MCP prompt
  • Register prompt in the prompt registry

Testing

  • Unit tests for SARIF decomposition (multi-rule → per-rule)
  • Unit tests for Mermaid diagram generation (path-problem results with various step counts)
  • Unit tests for location overlap computation (exact match, range overlap, near-miss, no overlap)
  • Unit tests for path similarity scoring
  • Unit tests for cache model changes (rule_id population, run_id differentiation)
  • Integration tests for end-to-end: database_analyzecache_lookup(ruleId=...)cache_retrievesarif_rule_to_markdown
  • Integration test for alert overlap: two different rules flagging same sink

Documentation

  • Update server/src/resources/server-tools.md with new tools
  • Add "Analyze and Compare Results" workflow to the Common Tool Workflows section
  • Document the ruleId vs queryName distinction for users
  • Document the compare_overlapping_alerts prompt in server/src/resources/server-prompts.md

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions