Improve query results caching with rule-level SARIF extraction, Mermaid dataflow visualization, and alert overlap analysis (v2.25.1-next.3)

## Summary

The current `query_results_cache` in the ql-mcp server uses `query_name` derived from file basenames or pack names, which is ambiguous and prevents reliable cross-run comparison. This issue proposes improving the cache model and adding SARIF-aware tools for rule-level extraction, visualization, and alert overlap analysis.

**Target release**: `v2.25.1-next.3`
**Base**: merge of PR #199 or the `copilot/fix-bqrs-interpret-issues` branch

---

## Problem Statement

### 1. Cache key identity is ambiguous

The current `query_name` field is populated from:
- **`query_run`**: the user-supplied `queryName` parameter, or `basename(queryPath, '.ql')` — e.g., `"UI5Clickjacking"` or `"PrintAST"`
- **`database_analyze`**: `basename(queries, '.qls')` or the literal `"database-analyze"` — e.g., `"javascript-sap-ui5-queries"`

Neither of these uses the CodeQL query's `@id` metadata (e.g., `js/ui5-clickjacking`), which is the only value with uniqueness requirements. This causes:
- **Name collisions**: different queries in different directories can share the same filename (e.g., `ListRemoteFlowSources.ql` exists in both UI5 and XSJS packs) — the cache cannot distinguish them
- **Pack-level vs query-level mismatch**: `database_analyze` caches the entire SARIF under the pack name, so `cache_compare(queryName="UI5Clickjacking")` doesn't find pack-level results, while `cache_compare(queryName="javascript-sap-ui5-queries")` doesn't find individual `query_run` results
- **No run differentiation**: running the same query multiple times (common during development) overwrites the previous cache entry because the cache key is deterministic over `{databasePath, queryPath, format, codeqlVersion}`

### 2. No rule-level SARIF extraction

When a `database_analyze` run produces SARIF with results from 9+ queries (9 rules, 122 results), there is no tool to extract just the results for a specific rule ID. The only option is `cache_retrieve` which returns the entire SARIF blob. For an LLM working on a specific query, retrieving all 122 results and the full help text for all 9 rules is wasteful and noisy.

### 3. No structured visualization of dataflow paths

SARIF `codeFlows[].threadFlows[].locations[]` encode dataflow paths as ordered location sequences. There are currently no tools to convert these into structured visual representations that an LLM or developer can reason about. Mermaid diagrams are a natural fit since they render in GitHub markdown, PR comments, and VS Code.

### 4. No alert overlap analysis across rules

During CodeQL query development, it is common for multiple queries to flag the same or overlapping code locations — for example, a generic `js/sql-injection` query and a framework-specific `js/cap-sql-injection` query may both flag the same sink. There are currently no tools to:
- Compare code locations across two SARIF alerts (potentially from different rule IDs)
- Determine whether two or more CodeQL query implementations produce overlapping results for the same underlying problem
- Assist an LLM in reasoning about whether query results are redundant, complementary, or false positive/true positive relative to each other

This analysis requires comparing physical locations (file, line, column ranges), dataflow path endpoints (sources and sinks), and the semantic context from query help text to understand whether the overlap is intentional (e.g., defense-in-depth) or indicates redundancy.

---

## Proposed Changes

### Part 1: Cache model improvements

#### 1a. Store the CodeQL query `@id` as the primary query identifier

Add a `rule_id` column (nullable for backward compatibility) to the `query_result_cache` table:

```sql
ALTER TABLE query_result_cache ADD COLUMN rule_id TEXT;
CREATE INDEX idx_qrc_rule_id ON query_result_cache (rule_id);
```

Population strategy:
- **`query_run`**: After interpretation, extract `runs[0].tool.driver.rules[0].id` from the SARIF output (single-query runs always have exactly one rule)
- **`database_analyze`**: Multi-rule — store `NULL` for the pack-level entry. Additionally (see 1c), create per-rule entries

The `rule_id` corresponds to the CodeQL query's `@id` metadata and becomes the SARIF `ruleId` in results. This is the only identifier with uniqueness guarantees across the CodeQL ecosystem.

#### 1b. Add a `run_id` column for multi-run differentiation

```sql
ALTER TABLE query_result_cache ADD COLUMN run_id TEXT NOT NULL DEFAULT '';
```

The `run_id` is a unique identifier per analysis invocation (e.g., a timestamp-based UUID or the evaluator log directory name). This enables:
- Storing multiple runs of the same query against the same database
- Comparing results across development iterations
- Maintaining a history window (configurable max entries per query+database pair)

Update `computeQueryCacheKey` to include `run_id` in the hash input, or use `run_id` as a separate discriminator alongside the existing cache key.

#### 1c. Decompose `database_analyze` SARIF into per-rule cache entries

After a `database_analyze` run with SARIF output:
1. Parse the SARIF and iterate over `runs[0].tool.driver.rules[]`
2. For each rule, filter `runs[0].results[]` by `ruleId` and create a per-rule cache entry with:
   - `rule_id` = the rule's `id` field (e.g., `js/cap-sql-injection`)
   - `query_name` = the rule's `id` field (not the filename)
   - `result_count` = count of results for this rule
   - `result_content` = a SARIF subset containing only the relevant results, the matching rule definition, and the run metadata
3. Additionally, store the full pack-level entry as today (for bulk retrieval)

This shared decomposition logic should be a library function (see Part 3).

#### 1d. Update `cache_lookup`, `cache_compare`, `cache_retrieve`, `cache_clear`

- Add `ruleId` as an optional filter parameter to all four tools
- `cache_compare` should support comparison by `ruleId` across databases, which is the most natural comparison axis for query developers
- `cache_lookup` should return `ruleId` in the entry metadata
- `cache_retrieve` should accept `ruleId` to extract a per-rule SARIF subset from a pack-level cache entry (using the shared extraction logic)

### Part 2: SARIF rule extraction tool

#### `sarif_extract_rule`

A new tool that extracts all data for a specific rule from multi-rule SARIF:

**Parameters**:
- `sarifPath` (string, required) — path to the SARIF file, OR `cacheKey` to read from cache
- `ruleId` (string, required) — the SARIF rule ID to extract (e.g., `js/cap-sql-injection`)

**Response**: A valid SARIF JSON subset containing:
- `runs[0].tool.driver.rules[]` — only the matching rule definition (with `help.markdown`, `properties.tags`, `properties.kind`, etc.)
- `runs[0].results[]` — only results with matching `ruleId`, each preserving:
  - `locations[]` with physical location URIs and regions
  - `codeFlows[].threadFlows[].locations[]` with full dataflow path steps
  - `relatedLocations[]` with source/sink annotations
  - `message`, `partialFingerprints`
- `runs[0].tool.extensions[]` — preserved for pack attribution

This extraction logic should be the **same library function** used by Part 1c for cache decomposition.

### Part 3: Shared SARIF decomposition library

Create `server/src/lib/sarif-utils.ts` with:

```typescript
/** Extract results and rule definition for a specific ruleId from SARIF */
function extractRuleFromSarif(sarif: SarifLog, ruleId: string): SarifLog;

/** Decompose multi-rule SARIF into per-rule SARIF subsets */
function decomposeSarifByRule(sarif: SarifLog): Map<string, SarifLog>;

/** Convert a SARIF result with codeFlows to a Mermaid dataflow diagram */
function sarifResultToMermaid(result: SarifResult, rule: SarifRule): string;

/** Convert all results for a rule to a markdown report with Mermaid diagrams */
function sarifRuleToMarkdown(sarif: SarifLog, ruleId: string): string;

/** Compute location overlap between two SARIF results */
function computeLocationOverlap(resultA: SarifResult, resultB: SarifResult): OverlapResult;

/** Find all overlapping alerts between two sets of results (potentially different rules) */
function findOverlappingAlerts(
  resultsA: SarifResult[], ruleA: SarifRule,
  resultsB: SarifResult[], ruleB: SarifRule
): AlertOverlap[];
```

These are pure functions with no side effects, making them easy to test and reuse.

### Part 4: SARIF-to-Markdown with Mermaid dataflow diagrams

#### `sarif_rule_to_markdown`

A new tool that converts per-rule SARIF data into a structured markdown report:

**Parameters**:
- `sarifPath` (string) or `cacheKey` (string) — SARIF source
- `ruleId` (string, required) — the rule to render

**Response**: Markdown containing:

1. **Rule summary header** — rule ID, name, severity, security-severity, precision, CWE tags
2. **Query help** — the rule's `help.markdown` content
3. **Results table** — location, message, file, line for each result
4. **Dataflow diagrams** — for each result with `codeFlows`, a Mermaid `flowchart LR` diagram showing:
   - Source node (first step) → intermediate steps → sink node (last step)
   - Each node labeled with the step message and `file:line`
   - Edges representing dataflow propagation
   - Distinct diagrams for distinct dataflow paths (deduplicated by step sequence)

Example Mermaid output for a 4-step dataflow path:

```mermaid
flowchart LR
    A["msg.data<br/>service.js:7"] --> B["{ messageToPass }<br/>service.js:7"]
    B --> C["messageToPass<br/>service.js:7"]
    C --> D["messageToPass<br/>service.js:9"]
    style A fill:#d4edda
    style D fill:#f8d7da
```

This makes dataflow paths immediately visible in GitHub issues, PR comments, and VS Code markdown preview — without requiring the user to manually parse SARIF JSON.

### Part 5: Alert overlap analysis tool and prompt

#### `sarif_compare_alerts` tool

A new tool to detect overlapping code locations across SARIF alerts:

**Parameters**:
- `alertA` — an alert specifier: `{sarifPath, ruleId, resultIndex}` or `{cacheKey, ruleId, resultIndex}`
- `alertB` — same structure as `alertA`; may reference a different rule, different SARIF file, or different cache entry
- `overlapMode` (enum: `"sink"`, `"source"`, `"any-location"`, `"full-path"`, default `"sink"`) — what to compare:
  - `"sink"`: compare primary locations (the final alert location) — same file + overlapping line/column range
  - `"source"`: compare the first step of each dataflow path (the source node)
  - `"any-location"`: check if any location in one alert overlaps with any location in the other (including intermediate dataflow steps and related locations)
  - `"full-path"`: compare complete dataflow paths for structural similarity (same source and sink files, same number of steps, overlapping intermediate locations)

**Response**:
```json
{
  "overlaps": true,
  "overlapMode": "sink",
  "alertA": { "ruleId": "js/sql-injection", "location": "service.js:42:5-42:38", "message": "..." },
  "alertB": { "ruleId": "js/cap-sql-injection", "location": "service.js:42:5-42:38", "message": "..." },
  "sharedLocations": [{ "uri": "service.js", "startLine": 42, "startColumn": 5, "endLine": 42, "endColumn": 38 }],
  "pathSimilarity": 0.85,
  "analysis": "Both alerts flag the same sink location. AlertA uses a generic SQL injection source model while AlertB uses a CAP-specific request handler source model."
}
```

#### `compare_overlapping_alerts` prompt

A new MCP prompt that guides an LLM through alert overlap analysis:

**Prompt inputs**:
- `sarifPath` or `cacheKey` — the SARIF data to analyze
- `ruleIdA`, `ruleIdB` — the two rules to compare (may be the same rule for cross-database comparison)
- `databasePath` — the CodeQL database for source context

**Prompt workflow**:
1. Extract results for both rules using `sarif_extract_rule`
2. Find all overlapping alerts using `sarif_compare_alerts` in `"sink"` mode
3. For each overlap, read the relevant source code from the database using `read_database_source`
4. Present the query help text for both rules side-by-side
5. Guide the LLM to determine for each overlapping pair:
   - **Redundant**: same problem, same code path → one query subsumes the other
   - **Complementary**: same sink but different source models → defense-in-depth, both valuable
   - **False overlap**: same file/line but semantically different issues (e.g., different arguments to the same call)
6. Produce a summary with:
   - Total alerts per rule, number of overlaps
   - For each overlap: classification (redundant/complementary/false), source code snippet, both alert messages
   - Recommendations: which query to prioritize, whether to add `not` clauses to reduce redundancy

This prompt is essential for:
- Query developers maintaining multiple query packs that may have overlapping coverage (e.g., standard `codeql/javascript-queries` vs custom `advanced-security/javascript-sap-cap-queries`)
- Triaging GHAS Code Scanning alerts where the same code location appears under multiple rules
- Evaluating whether a new custom query is adding unique value or duplicating existing standard query coverage

---

## Implementation Notes

- The `language` field currently stored as `'unknown'` for `database_analyze` entries should be resolved from the database metadata (`codeql resolve database` already returns `languages[]`)
- The SARIF decomposition by `ruleId` is straightforward: filter `results[]` by `ruleId`, then include only rule definitions referenced by the filtered results
- Mermaid diagram generation should sanitize node labels (escape quotes, limit length) and handle edge cases like single-step paths or missing step messages
- The `rule_id` values follow a hierarchical namespace pattern (e.g., `js/cap-sql-injection`, `js/ui5-clickjacking`) that could support prefix-based filtering in future
- Location overlap computation should handle both exact matches (same file+line+column) and range overlaps (regions that partially intersect), with configurable tolerance for near-misses (e.g., ±1 line)
- All SARIF parsing should handle both `sarifv2.1.0` and `sarif-latest` format strings
- The `sarif_compare_alerts` tool's `pathSimilarity` score should be a normalized value (0.0–1.0) based on the proportion of shared dataflow steps relative to total unique steps, computed using location-based matching rather than string matching on step messages

---

## Checklist

### Cache model (Part 1)
- [ ] Add `rule_id` column to `query_result_cache` table with migration
- [ ] Add `run_id` column for multi-run differentiation
- [ ] Populate `rule_id` from SARIF for `query_run` cache entries
- [ ] Decompose `database_analyze` SARIF into per-rule cache entries
- [ ] Resolve and store `language` from database metadata instead of `'unknown'`
- [ ] Add `ruleId` filter to `cache_lookup`, `cache_compare`, `cache_retrieve`, `cache_clear`
- [ ] Update `cache_compare` to support comparison by `ruleId`

### SARIF extraction and visualization (Parts 2–4)
- [ ] Create `server/src/lib/sarif-utils.ts` with shared extraction, decomposition, overlap, and rendering functions
- [ ] Implement `sarif_extract_rule` tool
- [ ] Implement `sarif_rule_to_markdown` tool with Mermaid dataflow diagrams
- [ ] Register new tools in the tool registry

### Alert overlap analysis (Part 5)
- [ ] Implement `computeLocationOverlap` and `findOverlappingAlerts` in `sarif-utils.ts`
- [ ] Implement `sarif_compare_alerts` tool with `overlapMode` parameter
- [ ] Implement `compare_overlapping_alerts` MCP prompt
- [ ] Register prompt in the prompt registry

### Testing
- [ ] Unit tests for SARIF decomposition (multi-rule → per-rule)
- [ ] Unit tests for Mermaid diagram generation (path-problem results with various step counts)
- [ ] Unit tests for location overlap computation (exact match, range overlap, near-miss, no overlap)
- [ ] Unit tests for path similarity scoring
- [ ] Unit tests for cache model changes (rule_id population, run_id differentiation)
- [ ] Integration tests for end-to-end: `database_analyze` → `cache_lookup(ruleId=...)` → `cache_retrieve` → `sarif_rule_to_markdown`
- [ ] Integration test for alert overlap: two different rules flagging same sink

### Documentation
- [ ] Update `server/src/resources/server-tools.md` with new tools
- [ ] Add "Analyze and Compare Results" workflow to the Common Tool Workflows section
- [ ] Document the `ruleId` vs `queryName` distinction for users
- [ ] Document the `compare_overlapping_alerts` prompt in `server/src/resources/server-prompts.md`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve query results caching with rule-level SARIF extraction, Mermaid dataflow visualization, and alert overlap analysis (v2.25.1-next.3) #201

Summary

Problem Statement

1. Cache key identity is ambiguous

2. No rule-level SARIF extraction

3. No structured visualization of dataflow paths

4. No alert overlap analysis across rules

Proposed Changes

Part 1: Cache model improvements

1a. Store the CodeQL query `@id` as the primary query identifier

1b. Add a `run_id` column for multi-run differentiation

1c. Decompose `database_analyze` SARIF into per-rule cache entries

1d. Update `cache_lookup`, `cache_compare`, `cache_retrieve`, `cache_clear`

Part 2: SARIF rule extraction tool

`sarif_extract_rule`

Part 3: Shared SARIF decomposition library

Part 4: SARIF-to-Markdown with Mermaid dataflow diagrams

`sarif_rule_to_markdown`

Part 5: Alert overlap analysis tool and prompt

`sarif_compare_alerts` tool

`compare_overlapping_alerts` prompt

Implementation Notes

Checklist

Cache model (Part 1)

SARIF extraction and visualization (Parts 2–4)

Alert overlap analysis (Part 5)

Testing

Documentation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve query results caching with rule-level SARIF extraction, Mermaid dataflow visualization, and alert overlap analysis (v2.25.1-next.3) #201

Description

Summary

Problem Statement

1. Cache key identity is ambiguous

2. No rule-level SARIF extraction

3. No structured visualization of dataflow paths

4. No alert overlap analysis across rules

Proposed Changes

Part 1: Cache model improvements

1a. Store the CodeQL query @id as the primary query identifier

1b. Add a run_id column for multi-run differentiation

1c. Decompose database_analyze SARIF into per-rule cache entries

1d. Update cache_lookup, cache_compare, cache_retrieve, cache_clear

Part 2: SARIF rule extraction tool

sarif_extract_rule

Part 3: Shared SARIF decomposition library

Part 4: SARIF-to-Markdown with Mermaid dataflow diagrams

sarif_rule_to_markdown

Part 5: Alert overlap analysis tool and prompt

sarif_compare_alerts tool

compare_overlapping_alerts prompt

Implementation Notes

Checklist

Cache model (Part 1)

SARIF extraction and visualization (Parts 2–4)

Alert overlap analysis (Part 5)

Testing

Documentation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1a. Store the CodeQL query `@id` as the primary query identifier

1b. Add a `run_id` column for multi-run differentiation

1c. Decompose `database_analyze` SARIF into per-rule cache entries

1d. Update `cache_lookup`, `cache_compare`, `cache_retrieve`, `cache_clear`

`sarif_extract_rule`

`sarif_rule_to_markdown`

`sarif_compare_alerts` tool

`compare_overlapping_alerts` prompt