harden: CRAP consumer surfaces, docs contract, roadmap

SutuSebastian · SutuSebastian · commit 0cf366739b5c · 2026-06-10T12:37:53.000+03:00
Add coverage_source one-liners to served rule/skill, golden-queries and
architecture contracts, roadmap checkbox, wave slice 2.4, and recipe
precedence note (measured 0% beats graph tiers).
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -192,6 +192,8 @@ Three **mutually exclusive** CLI entry shapes; all converge on `applyDiffPayload
 
 **Evidence columns (high-judgment recipes):** Some bundled recipes add optional **`reason`** and **`evidence_json`** TEXT columns on each result row — factual detection path for agents, not pass/fail verdicts. Contract: [golden-queries.md § Evidence columns](./golden-queries.md#evidence-columns-high-judgment-recipes).
 
+**Coverage columns (CRAP recipes):** `high-crap-score` adds **`coverage_source`** and **`effective_coverage_pct`** — measured vs graph-estimated undertest signal. Contract: [golden-queries.md § Coverage columns](./golden-queries.md#coverage-columns-crap--enrichment-recipes).
+
 **Recipes wiring:** **`src/application/recipes-loader.ts`** (pure transport-agnostic loader) + **`src/application/query-recipes.ts`** (cache + public API — `getQueryRecipeSql` / `getQueryRecipeActions` / `getQueryRecipeParams` / `listQueryRecipeIds` / `listQueryRecipeCatalog` / `getQueryRecipeCatalogEntry`, shared by CLI + MCP). Recipes live as file pairs: **`<id>.sql`** + optional **`<id>.md`**. The loader reads `templates/recipes/` (bundled, ships in npm package next to `templates/agents/`) and `<state-dir>/recipes/` (project-local — default `.codemap/recipes/`; honors `--state-dir` / `CODEMAP_STATE_DIR`; root-only resolution per the registry plan, no walk-up). Project recipes win on id collision; entries that override a bundled id carry **`shadows: true`** in the catalog so agents reading `codemap://recipes` at session start see when a recipe behaves differently from the documented bundled version. Per-row **`actions`** templates and recipe **`params`** declarations live in YAML frontmatter on each `<id>.md` — uniform shape across bundled + project. Param types are `string | number | boolean`; CLI passes values via repeatable `--params key=value[,key=value]`, MCP / HTTP pass nested `params: {key: value}` to `query_recipe`. Validation runs before SQL binding; missing / unknown / malformed params return the same `{error}` envelope as query failures. Hand-rolled YAML parser is scoped to block-list `actions:` and `params:` only (no `js-yaml` dep). Load-time validation rejects empty SQL and DML / DDL keywords (`INSERT` / `UPDATE` / `DELETE` / `DROP` / `CREATE` / `ALTER` / `ATTACH` / `DETACH` / `REPLACE` / `TRUNCATE` / `VACUUM` / `PRAGMA`) with recipe-aware error messages — defence in depth alongside the runtime `PRAGMA query_only=1` backstop in `query-engine.ts` (PR #35). `<state-dir>/index.db` is gitignored; `<state-dir>/recipes/` is NOT (verified via `git check-ignore`) — recipes are git-tracked source code authored for human review.
 
 **Tool / resource handlers (transport-agnostic):** **`src/application/tool-handlers.ts`** + **`src/application/resource-handlers.ts`** — pure functions that take the args object an MCP tool / resource URI accepts and return a discriminated **`ToolResult`** (`{ok: true, format: 'json'|'sarif'|'annotations'|'mermaid'|'diff'|'diff-json'|'codeclimate'|'badge', payload}` — badge arm also carries `badgeStyle`; `{ok: false, error}`) or a **`ResourcePayload`** (`{mimeType, text}`). MCP and HTTP both wrap the same handlers — MCP translates to `{content: [{type: "text", text}]}`, HTTP translates to `(status, body)` with the right `Content-Type`. Engine layer untouched; transport changes don't ripple into the SQL.
diff --git a/docs/golden-queries.md b/docs/golden-queries.md
@@ -70,6 +70,10 @@ Scenarios live in **`fixtures/golden/scenarios.json`** (Tier A) or optional **`s
 
 Some bundled recipes add optional **`reason`** (TEXT) and **`evidence_json`** (TEXT, JSON array) columns on each row — factual detection path for agents, not engine verdicts. See [plans/evidence-chains-on-recipe-rows.md](./plans/evidence-chains-on-recipe-rows.md). Goldens assert these columns when the recipe ships evidence (`boundary-violations`, `deprecated-symbols`, `unimported-exports`).
 
+### Coverage columns (CRAP / enrichment recipes)
+
+`high-crap-score` adds **`coverage_source`** (`measured` \| `estimated`) and **`effective_coverage_pct`** on each row — measured when `coverage` has a matching symbol row after `ingest-coverage`; otherwise graph-estimated tiers from test reachability. Goldens assert `coverage_source` when the recipe ships coverage semantics (`high-crap-score`); measured override is covered by `scripts/high-crap-score-measured.test.mjs`.
+
 ---
 
 ## Status
diff --git a/docs/plans/agent-enrichment-wave.md b/docs/plans/agent-enrichment-wave.md
@@ -39,14 +39,15 @@
 
 ## Plan 2 — Graph-estimated CRAP (`graph-estimated-crap.md`)
 
-| Slice                     | Deliverable                                                     | Verify            |
-| ------------------------- | --------------------------------------------------------------- | ----------------- |
-| **2.0 spike**             | Reachability CTE on `fixtures/minimal` (script or ad-hoc query) | manual row counts |
-| **2.1 recipe**            | `high-crap-score.sql` + `.md`; `scenarios.json`                 | `test:golden`     |
-| **2.2 measured override** | golden with `ingest-coverage` setup                             | golden matrix     |
-| **2.3 cross-link**        | `high-complexity-untested.md` points at CRAP when no ingest     | doc               |
+| Slice                     | Deliverable                                                     | Verify                 |
+| ------------------------- | --------------------------------------------------------------- | ---------------------- |
+| **2.0 spike**             | Reachability CTE on `fixtures/minimal` (script or ad-hoc query) | manual row counts      |
+| **2.1 recipe**            | `high-crap-score.sql` + `.md`; `scenarios.json`                 | `test:golden`          |
+| **2.2 measured override** | `scripts/high-crap-score-measured.test.mjs`                     | `bun run test:scripts` |
+| **2.3 cross-link**        | `high-complexity-untested.md` points at CRAP when no ingest     | doc                    |
+| **2.4 agent surface**     | `rule/00-full.md` + `skill/10-recipes-context.md` one-liners    | consumer check         |
 
-**Grill before 2.1 if spike ambiguous:** Q1 type-only imports in walk (default: value edges only); Q2 recipe id `high-crap-score`.
+**Locked:** Q1 value edges only (`dependencies` — type-only omitted at index); Q2 recipe id `high-crap-score`.
 
 ---
 
@@ -91,4 +92,4 @@ Each PR: `harden-pr full` → merge. Do not batch plans 1–4 into one PR.
 
 ## Current slice
 
-**Active:** Plan 2 **in flight** on `feat/high-crap-score` — slices **2.0–2.3** (`graph-estimated-crap.md`); PR **#C** when complete.
+**Active:** Plan 2 complete on `feat/high-crap-score` — open **PR #C**, then Plan 3 slice **3.1** (`coverage-deletion-confidence.md`).
diff --git a/docs/plans/graph-estimated-crap.md b/docs/plans/graph-estimated-crap.md
@@ -1,6 +1,6 @@
 # Graph-estimated CRAP score — plan
 
-> **Status:** open · **Priority:** P2 · **Effort:** M (~2 weeks)
+> **Status:** shipped (PR #C) · **Priority:** P2 · **Effort:** M (~2 weeks)
 >
 > **Motivator:** CRAP ranks **complex and undertested** functions. Codemap has `symbols.complexity` + ingested `coverage`, but `high-complexity-untested` is **misleading without ingest** (`COALESCE(coverage_pct, 0)` treats missing as 0%). Graph-estimated tiers (85/40/0%) from test reachability when measured coverage is absent.
 >
@@ -42,7 +42,7 @@ recipe high-crap-score (SQL only)
 | 40%  | 4     | `deeplyNested`, `relay`, … — `complexity-fixture.ts` reachable from test |
 | 0%   | 39    | `createClient`, `get`, … — not dependency-reachable from tests           |
 
-Reachability walk: `test_suites` + `*.test.*` / `*.spec.*` globs → recursive `dependencies` fan-out (value edges only).
+Reachability walk: `test_suites` + `*.test.*` / `*.spec.*` globs → recursive `dependencies` fan-out (value edges only — type-only imports never enter `dependencies` at index time).
 
 ### Tracer bullet (slice 2.1)
 
@@ -126,11 +126,11 @@ bun test scripts/query-golden-coverage-matrix.test.mjs   # after golden scenario
 
 ## Open decisions (impl PR)
 
-| #   | Question                                                                                         |
-| --- | ------------------------------------------------------------------------------------------------ |
-| Q1  | Include type-only imports in reachability walk? (default: value edges only, mirror import graph) |
-| Q2  | Recipe id: `high-crap-score` vs `crap-score`?                                                    |
-| Q3  | Materialised column at index time vs recipe-only — measure CTE cost on self-index first.         |
+| #   | Question                                                                                         | Lock (wave 2026-06)                   |
+| --- | ------------------------------------------------------------------------------------------------ | ------------------------------------- |
+| Q1  | Include type-only imports in reachability walk? (default: value edges only, mirror import graph) | **Value edges** — `dependencies` only |
+| Q2  | Recipe id: `high-crap-score` vs `crap-score`?                                                    | **`high-crap-score`**                 |
+| Q3  | Materialised column at index time vs recipe-only — measure CTE cost on self-index first.         | **Recipe-only** (defer v2)            |
 
 ---
 
diff --git a/docs/roadmap.md b/docs/roadmap.md
@@ -89,7 +89,7 @@ Predicate-as-API only — enrich row shape and audit deltas; no standalone pass/
 - [ ] **Audit delta attribution** — on `audit --base <ref>` (and matching MCP/HTTP audit), tag each `added` row with **`attribution: introduced | inherited`** via stable finding keys (`requiredColumns` → deterministic key) diffed against the sha-keyed audit-cache index at the merge base. Per-delta `summary` counts (`added_introduced`, `added_inherited`) optional when `summary: true`. Reuses shipped `audit-worktree` / `git archive` cache — no new verdict primitive ([Moat A](./roadmap.md#moats-load-bearing)). Complements deferred **`codemap audit` verdict + thresholds** (consumer filters `introduced` via `jq`). Plan: [`plans/audit-delta-attribution.md`](./plans/audit-delta-attribution.md). Effort: M.
 - [ ] **Evidence chains on recipe rows** — extend high-judgment recipe SQL with standard columns `reason` (short detection code + clause) and optional `evidence_json` (bounded hop array): e.g. `unimported-exports` → `re_export_chains` summary / unresolved-import blind-spot hint; `boundary-violations` → matched deny rule; `deprecated-symbols` → top caller sites from `calls` / `references`. Phased v1 on three recipes; complements frontmatter `actions[]` — agents cite evidence before `apply` / manual edits ([Moat A](./roadmap.md#moats-load-bearing)). Plan: [`plans/evidence-chains-on-recipe-rows.md`](./plans/evidence-chains-on-recipe-rows.md). Effort: M–L.
 - [ ] **Tiered lookup fast paths** — `show` / exact-name recipe paths hit covering indexes first; document latency expectations in MCP tool descriptions. FTS and broad scans remain explicit fallbacks. Effort: S–M.
-- [ ] **Graph-estimated CRAP recipe** — bundled `high-crap-score`: CRAP = `CC² × (1 - coverage/100)³ + CC` using `symbols.complexity`; **measured** `coverage` when ingested, else **graph-estimated** tiers (85% / 40% / 0% from test-file reachability over `dependencies` / `calls` / `test_suites`). Rows expose `coverage_source: measured | estimated`. Complements `high-complexity-untested` when no coverage file exists. Plan: [`plans/graph-estimated-crap.md`](./plans/graph-estimated-crap.md). Effort: M.
+- [x] **Graph-estimated CRAP recipe** — bundled `high-crap-score`: CRAP = `CC² × (1 - coverage/100)³ + CC` using `symbols.complexity`; **measured** `coverage` when ingested, else **graph-estimated** tiers (85% / 40% / 0% from test-file reachability over `dependencies` / `calls` / `test_suites`). Rows expose `coverage_source: measured | estimated`. Complements `high-complexity-untested` when no coverage file exists. Plan: [`plans/graph-estimated-crap.md`](./plans/graph-estimated-crap.md). Effort: M.
 - [ ] **Coverage-confirmed dead recipe** — bundled `coverage-confirmed-dead`: JOIN static dead-code predicate (uncalled exports, suppression-aware) with ingested `coverage` — rows carry `confidence: high` when callers = 0 and `coverage_pct = 0`, `medium` when coverage not ingested. Predicate columns only, no verdict primitive ([Moat A](./roadmap.md#moats-load-bearing)). Plan: [`plans/coverage-deletion-confidence.md`](./plans/coverage-deletion-confidence.md). Effort: L–M.
 
 ### Distribution & evaluation depth
diff --git a/templates/agent-content/rule/00-full.md b/templates/agent-content/rule/00-full.md
@@ -22,6 +22,8 @@ codemap query --recipes-json               # canonical list of every bundled + p
 
 **Evidence columns:** Some recipe rows (e.g. `boundary-violations`, `deprecated-symbols`, `unimported-exports`) add **`reason`** and **`evidence_json`** — factual detection path for agents, not pass/fail verdicts.
 
+**Coverage columns:** `high-crap-score` rows add **`coverage_source`** (`measured` \| `estimated`) and **`effective_coverage_pct`** — measured when `ingest-coverage` has a symbol row; else graph tiers 85/40/0% from test reachability (heuristic, not execution).
+
 ## Trigger patterns
 
 If the question matches any of these, use the index instead of grepping:
@@ -60,7 +62,8 @@ If the question matches any of these, use the index instead of grepping:
 | "Which exports has nobody imported?"                         | `--recipe unimported-exports`                                                                                                                        |
 | "Which components touch deprecated APIs?"                    | `--recipe components-touching-deprecated`                                                                                                            |
 | "What's risky to refactor right now?"                        | `--recipe refactor-risk-ranking`                                                                                                                     |
-| "What's high-complexity AND undertested?"                    | `--recipe high-complexity-untested`                                                                                                                  |
+| "What's high-complexity AND undertested?"                    | `--recipe high-complexity-untested` (needs `ingest-coverage`; without ingest prefer `high-crap-score`)                                               |
+| "Complex + undertested without coverage ingest?"             | `--recipe high-crap-score` (graph-estimated tiers; `coverage_source: estimated`)                                                                     |
 | "What's cognitively complex (nesting-heavy)?"                | `--recipe high-cognitive-complexity` (default `min_score=15`; `--params min_score=20` to tighten)                                                    |
 
 ## Quick reference queries
diff --git a/templates/agent-content/skill/10-recipes-context.md b/templates/agent-content/skill/10-recipes-context.md
@@ -17,6 +17,7 @@ Replace placeholders (`'...'`) with your module path, file glob, or symbol name.
 - **`--baseline[=<name>]`** — diff the current result against the saved baseline. Output `{baseline:{...}, current_row_count, added: [...], removed: [...]}` (with `--json`) or a two-section terminal dump. Identity = per-row multiset equality (canonical `JSON.stringify` keyed frequency map; duplicates preserved). Pair with `--summary` for `{baseline:{...}, current_row_count, added: N, removed: N}`. **Mutually exclusive with `--group-by`.**
 - **`--baselines`** lists saved baselines (no `rows_json` payload); **`--drop-baseline <name>`** deletes one. Both reject every other flag — they're list-only / drop-only operations.
 - **Evidence columns** — high-judgment recipes (`boundary-violations`, `deprecated-symbols`, `unimported-exports`, …) may add **`reason`** and **`evidence_json`** on each row — factual detection path; parse before `apply` or deletion.
+- **Coverage columns** — `high-crap-score` adds **`coverage_source`** (`measured` \| `estimated`) and **`effective_coverage_pct`**; `estimated` is graph reachability, not execution — prefer `ingest-coverage` before CI gates.
 - **Per-row recipe `actions`** — recipes that define an **`actions: [{type, auto_fixable?, description?, command?}]`** template append it to every row in **`--json`** output (recipe-only; ad-hoc SQL never carries actions). Rendered **`command`** lines substitute `{{param}}` from bound recipe params — param **names vary by recipe** (`old`/`new` on `rename-preview`; `old_source`/`new_source` on `migrate-import-source`; `symbol`/`replacement` on `migrate-deprecated`; see each `<id>.md` frontmatter). Under `--baseline`, actions attach to the **`added`** rows only (the rows the agent should act on). Inspect via **`--recipes-json`**.
 - **Boundary violations (config-driven)** — declare `boundaries: [{name, from_glob, to_glob, action?}]` in `.codemap/config.ts` and run `codemap query --recipe boundary-violations [--format sarif|codeclimate|badge]`. GitLab CI: `--format codeclimate`; README/CI summary: `--format badge`. The `action` field defaults to `"deny"` (the only shape v1 surfaces); rules are reconciled into the `boundary_rules` table on every index pass and joined against `dependencies` via SQLite `GLOB`.
 - **Project-local recipes** — drop **`<id>.sql`** (and optional **`<id>.md`** for description body, params, and actions) into **`<state-dir>/recipes/`** (default `.codemap/recipes/`; honors `--state-dir` / `CODEMAP_STATE_DIR`) to make team-internal SQL a first-class CLI verb. `--recipes-json` and the `codemap://recipes` MCP resource list project recipes alongside bundled ones with **`source: "bundled" | "project"`** discriminating them. Project recipes win on id collision; entries that override a bundled id carry **`shadows: true`** so agents reading the catalog at session start know when a recipe behaves differently from the documented bundled version. `<id>.md` supports YAML frontmatter for `params:` and per-row `actions:` — **block-list shape only** (loader's hand-rolled parser; no inline-flow `[{...}]`). Param types: `string | number | boolean`; pass values with `--params key=value[,key=value]` (repeatable; last value wins). Example: `codemap query --json --recipe find-symbol-by-kind --params kind=function,name_pattern=%Query%`. Validation: SQL is rejected at load time if it starts with DML/DDL (DELETE/DROP/UPDATE/etc.); params validate before SQL binding; runtime `PRAGMA query_only=1` is the parser-proof backstop. `<state-dir>/index.db` is gitignored; **`<state-dir>/recipes/` is NOT** — recipes are git-tracked source code authored for human review.
diff --git a/templates/recipes/high-crap-score.md b/templates/recipes/high-crap-score.md
@@ -15,7 +15,7 @@ actions:
 
 Ranks symbols by **CRAP score** — `CC² × (1 - effective_coverage/100)³ + CC` where `CC = symbols.complexity`.
 
-**Coverage precedence:** ingested `coverage` rows win (`coverage_source: measured`). Otherwise graph-estimated tiers (`coverage_source: estimated`):
+**Coverage precedence:** ingested `coverage` rows win (`coverage_source: measured`) — including **0% measured**, which overrides graph tiers even when tests reference the symbol. Otherwise graph-estimated tiers (`coverage_source: estimated`) via value-only `dependencies` fan-out (type-only imports are excluded at index time):
 
 | Tier    | When                                                                                          |
 | ------- | --------------------------------------------------------------------------------------------- |