Skip to content

Commit af86d10

Browse files
feat(recipes): high-crap-score with graph-estimated coverage (#175)
* feat(recipes): high-crap-score with graph-estimated coverage (plan 2) Spike locks 85/40/0% reachability tiers on fixtures/minimal; ships high-crap-score recipe (measured override when coverage ingested), golden + script tests, and high-complexity-untested cross-link. * harden: CRAP consumer surfaces, docs contract, roadmap Add coverage_source one-liners to served rule/skill, golden-queries and architecture contracts, roadmap checkbox, wave slice 2.4, and recipe precedence note (measured 0% beats graph tiers). * docs: retire shipped plans on merge (evidence + CRAP) Delete evidence-chains and graph-estimated-crap plan files; lift contract to golden-queries/roadmap; slim wave doc to plans 3–4; encode retire-on-merge in wave conventions and harden-pr skill. * feat(recipes): unresolved_import_blind_spot; ship deferred complements unimported-exports third reason + evidence; alias-blind-spot fixture; MCP/glossary/rule parity for CRAP and evidence; no-defer convention in wave + harden-pr. * harden: fix blind-spot SQL, docs, and measured-override test for PR #175 Tighten unresolved_import_blind_spot to name-only matching, extend changeset/CAPABILITIES/fixture README/agent skill, add labyrinth measured-vs-estimated test, and refresh README goldens.
1 parent a11242e commit af86d10

33 files changed

Lines changed: 625 additions & 419 deletions

.agents/skills/harden-pr/SKILL.md

Lines changed: 15 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -53,16 +53,16 @@ Default to **lite** when invoked immediately after a slice commit. Default to **
5353

5454
Reviewers optimize for this bar on in-scope files. **Full** mode applies it to the entire `origin/main...HEAD` diff; **lite** to the slice diff.
5555

56-
| Area | Pristine = |
57-
| --------------- | ------------------------------------------------------------------------------------------------------------------ |
58-
| **Correctness** | No known bugs or unhandled edge cases in changed paths; behavior matches intent anchor |
59-
| **Tests** | Changed behavior covered; affected tests pass |
60-
| **Checks** | Format, lint, typecheck clean on touched files ([`verify-after-each-step`](../../rules/verify-after-each-step.md)) |
61-
| **Docs** | User-visible changes reflected in docs, changesets, help text — no drift |
62-
| **Surfaces** | No maintainer leaks into consumer surfaces ([`consumer-surfaces`](../../rules/consumer-surfaces.md)) |
63-
| **Structure** | No boundary violations or barrel bypasses in the diff |
64-
| **Hygiene** | No dead code, TODO slop, or sloppy naming in touched files; errors actionable |
65-
| **Ship shape** | A reviewer could merge without "fix before ship" notes (except deferred out-of-scope nits) |
56+
| Area | Pristine = |
57+
| --------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
58+
| **Correctness** | No known bugs or unhandled edge cases in changed paths; behavior matches intent anchor |
59+
| **Tests** | Changed behavior covered; affected tests pass |
60+
| **Checks** | Format, lint, typecheck clean on touched files ([`verify-after-each-step`](../../rules/verify-after-each-step.md)) |
61+
| **Docs** | User-visible changes reflected in docs, changesets, help text — no drift; **shipped `docs/plans/<topic>.md` deleted + lifted** in the same PR ([`docs-governance`](../docs-governance/SKILL.md) § Closing a plan) |
62+
| **Surfaces** | No maintainer leaks into consumer surfaces ([`consumer-surfaces`](../../rules/consumer-surfaces.md)) |
63+
| **Structure** | No boundary violations or barrel bypasses in the diff |
64+
| **Hygiene** | No dead code, TODO slop, or sloppy naming in touched files; errors actionable |
65+
| **Ship shape** | A reviewer could merge without "fix before ship" notes (except deferred out-of-scope nits) |
6666

6767
If a finding moves the bar toward pristine and stays in-bounds → **fix it**, including nits in touched files.
6868

@@ -80,7 +80,9 @@ Reviewers treat the anchor as contract. Findings that would violate it → **rep
8080

8181
**Fix:** bugs, missing tests, docs/changeset drift, lint/type/format, error-handling gaps, edge cases, **behavior-preserving refactors in touched files**, in-scope nits (naming, comment hygiene, cheap lint fixes).
8282

83-
**Report only:** redesign, new capabilities, semantic API changes, nits outside the diff, refactors unrelated to a flagged issue.
83+
**Report only:** redesign, semantic API changes, nits outside the diff, refactors unrelated to a flagged issue.
84+
85+
**Do not defer complements:** agent-surface parity (rule/skill/MCP), glossary/architecture/golden-queries contracts, script/golden tests for acceptance criteria, and cross-links named in the plan ship in the **same PR** — not "optional v2" or post-merge unless the plan **Out of scope** section explicitly excludes them.
8486

8587
## Reviewer roster
8688

@@ -89,7 +91,7 @@ Spawn applicable reviewers **in parallel** via subagents in **one batch per pass
8991
### Core (always)
9092

9193
1. **Correctness** — gaps vs production bar; bugs, edge cases, missing tests in changed paths
92-
2. **Ship-readiness** — gaps vs production bar; docs, changesets, consumer-surface leaks, error messages; run [`verify-after-each-step`](../../rules/verify-after-each-step.md) checks on touched files
94+
2. **Ship-readiness** — gaps vs production bar; docs, changesets, consumer-surface leaks, error messages; **grep inbound refs → delete shipped plan file → lift to `golden-queries.md` / `architecture.md` / `roadmap.md`**; run [`verify-after-each-step`](../../rules/verify-after-each-step.md) checks on touched files
9395
3. **Structure (lite)** — gaps vs production bar; boundary smells on the diff (imports across declared layers, barrel bypasses); query codemap per [`codemap`](../codemap/SKILL.md)
9496

9597
### Extended (adaptive — spawn when diff triggers match)
@@ -122,7 +124,7 @@ loop:
122124
pass += 1
123125
goto loop
124126
capped:
125-
emit deferred-nits list
127+
emit deferred-nits list (each nit must cite plan Out of scope or cross-PR blocker — not "optional")
126128
done:
127129
if uncommitted fixes → git commit -m "harden: …"
128130
emit final report (include babysit one-liner if full mode)

.changeset/high-crap-score.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
---
2+
"@stainless-code/codemap": patch
3+
---
4+
5+
Add `high-crap-score` recipe: CRAP ranking with measured coverage when ingested, or graph-estimated 85/40/0% tiers from test reachability otherwise.
6+
7+
Extend `unimported-exports` with `unresolved_import_blind_spot` reason and `evidence_json` (unresolved import hop) so dead-export / high-CRAP triage does not over-trust the graph past alias blind spots.

docs/architecture.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -192,6 +192,8 @@ Three **mutually exclusive** CLI entry shapes; all converge on `applyDiffPayload
192192

193193
**Evidence columns (high-judgment recipes):** Some bundled recipes add optional **`reason`** and **`evidence_json`** TEXT columns on each result row — factual detection path for agents, not pass/fail verdicts. Contract: [golden-queries.md § Evidence columns](./golden-queries.md#evidence-columns-high-judgment-recipes).
194194

195+
**Coverage columns (CRAP recipes):** `high-crap-score` adds **`coverage_source`** and **`effective_coverage_pct`** — measured vs graph-estimated undertest signal. Contract: [golden-queries.md § Coverage columns](./golden-queries.md#coverage-columns-crap--enrichment-recipes).
196+
195197
**Recipes wiring:** **`src/application/recipes-loader.ts`** (pure transport-agnostic loader) + **`src/application/query-recipes.ts`** (cache + public API — `getQueryRecipeSql` / `getQueryRecipeActions` / `getQueryRecipeParams` / `listQueryRecipeIds` / `listQueryRecipeCatalog` / `getQueryRecipeCatalogEntry`, shared by CLI + MCP). Recipes live as file pairs: **`<id>.sql`** + optional **`<id>.md`**. The loader reads `templates/recipes/` (bundled, ships in npm package next to `templates/agents/`) and `<state-dir>/recipes/` (project-local — default `.codemap/recipes/`; honors `--state-dir` / `CODEMAP_STATE_DIR`; root-only resolution per the registry plan, no walk-up). Project recipes win on id collision; entries that override a bundled id carry **`shadows: true`** in the catalog so agents reading `codemap://recipes` at session start see when a recipe behaves differently from the documented bundled version. Per-row **`actions`** templates and recipe **`params`** declarations live in YAML frontmatter on each `<id>.md` — uniform shape across bundled + project. Param types are `string | number | boolean`; CLI passes values via repeatable `--params key=value[,key=value]`, MCP / HTTP pass nested `params: {key: value}` to `query_recipe`. Validation runs before SQL binding; missing / unknown / malformed params return the same `{error}` envelope as query failures. Hand-rolled YAML parser is scoped to block-list `actions:` and `params:` only (no `js-yaml` dep). Load-time validation rejects empty SQL and DML / DDL keywords (`INSERT` / `UPDATE` / `DELETE` / `DROP` / `CREATE` / `ALTER` / `ATTACH` / `DETACH` / `REPLACE` / `TRUNCATE` / `VACUUM` / `PRAGMA`) with recipe-aware error messages — defence in depth alongside the runtime `PRAGMA query_only=1` backstop in `query-engine.ts` (PR #35). `<state-dir>/index.db` is gitignored; `<state-dir>/recipes/` is NOT (verified via `git check-ignore`) — recipes are git-tracked source code authored for human review.
196198

197199
**Tool / resource handlers (transport-agnostic):** **`src/application/tool-handlers.ts`** + **`src/application/resource-handlers.ts`** — pure functions that take the args object an MCP tool / resource URI accepts and return a discriminated **`ToolResult`** (`{ok: true, format: 'json'|'sarif'|'annotations'|'mermaid'|'diff'|'diff-json'|'codeclimate'|'badge', payload}` — badge arm also carries `badgeStyle`; `{ok: false, error}`) or a **`ResourcePayload`** (`{mimeType, text}`). MCP and HTTP both wrap the same handlers — MCP translates to `{content: [{type: "text", text}]}`, HTTP translates to `(status, body)` with the right `Content-Type`. Engine layer untouched; transport changes don't ripple into the SQL.

docs/glossary.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,6 +151,10 @@ Opt-in FTS5 virtual table over file content (`tokenize='porter unicode61'`). Alw
151151

152152
Output mode rendering `{from, to, label?, kind?}` rows as a Mermaid `flowchart LR` diagram. Sibling of `--format sarif` / `--format annotations` in `application/output-formatters.ts`. **Bounded-input contract** (50-edge ceiling; `MERMAID_MAX_EDGES`) — unbounded inputs reject with a scope-suggestion error naming the recipe + count + scoping knobs (`LIMIT` / `--via` / `WHERE`). Auto-truncation explicitly out of scope (would be a verdict masquerading as output mode, violating the predicate-as-API moat). Recipes / ad-hoc SQL must alias columns to the `{from, to}` shape (e.g. `SELECT from_path AS "from", to_path AS "to" FROM dependencies LIMIT 50`).
153153

154+
### CRAP score / `high-crap-score` / `coverage_source` / `effective_coverage_pct`
155+
156+
Change Risk Anti-Patterns score per published formula: `CC² × (1 - effective_coverage/100)³ + CC` where `CC = symbols.complexity`. Bundled recipe **`high-crap-score`** ranks symbols at or above `min_crap` (default 30). **`effective_coverage_pct`** uses ingested **`coverage.coverage_pct`** when a row exists (**`coverage_source: measured`**), else graph-estimated tiers from test reachability (**`coverage_source: estimated`**: 85% direct test reference, 40% file reachable from tests via value-only **`dependencies`**, 0% otherwise). Heuristic only — not execution coverage; prefer **`codemap ingest-coverage`** before CI gates. Complements **`high-complexity-untested`** when coverage is not ingested.
157+
154158
### `coverage` (table)
155159

156160
Statement coverage ingested from Istanbul JSON, LCOV, or V8 runtime (`NODE_V8_COVERAGE=...` directory via `--runtime`) via `codemap ingest-coverage <path>`. Natural-key PK `(file_path, name, line_start)` — intentionally **not** a FK to `symbols.id` because `symbols` re-creates with fresh AUTOINCREMENT ids on every `--full` reindex; the natural-key approach lets coverage rows survive that churn (`coverage` is also intentionally absent from `dropAll()`, joins the `query_baselines` precedent). Columns: `coverage_pct REAL` (`NULL` when `total_statements = 0` — "untested" and "no testable code" are different signals), `hit_statements`, `total_statements`. Orphan rows (file deleted from project) are cleaned by an explicit `DELETE FROM coverage WHERE file_path NOT IN (SELECT path FROM files)` at the end of every ingest. Three meta keys (`coverage_last_ingested_at` / `_path` / `_format`) record freshness — single ingest at a time, so format is meta-level not per-row.

docs/golden-queries.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,11 @@ Scenarios live in **`fixtures/golden/scenarios.json`** (Tier A) or optional **`s
6868

6969
### Evidence columns (high-judgment recipes)
7070

71-
Some bundled recipes add optional **`reason`** (TEXT) and **`evidence_json`** (TEXT, JSON array) columns on each row — factual detection path for agents, not engine verdicts. See [plans/evidence-chains-on-recipe-rows.md](./plans/evidence-chains-on-recipe-rows.md). Goldens assert these columns when the recipe ships evidence (`boundary-violations`, `deprecated-symbols`, `unimported-exports`).
71+
Some bundled recipes add optional **`reason`** (TEXT) and **`evidence_json`** (TEXT, JSON array) columns on each row — factual detection path for agents, not engine verdicts (Moat A — not pass/fail verdicts). Bounded subqueries cap evidence at three hops; list caps append `{"truncated":true}`. `unimported-exports` reasons: `no_direct_import`, `reexport_chain_possible`, `unresolved_import_blind_spot`. Goldens assert these columns when the recipe ships evidence (`boundary-violations`, `deprecated-symbols`, `unimported-exports`).
72+
73+
### Coverage columns (CRAP / enrichment recipes)
74+
75+
`high-crap-score` adds **`coverage_source`** (`measured` \| `estimated`) and **`effective_coverage_pct`** on each row — measured when `coverage` has a matching symbol row after `ingest-coverage`; otherwise graph-estimated tiers from test reachability. Goldens assert `coverage_source` when the recipe ships coverage semantics (`high-crap-score`); measured override is covered by `scripts/high-crap-score-measured.test.mjs`.
7276

7377
---
7478

0 commit comments

Comments
 (0)