Skip to content

Commit ec91bdf

Browse files
feat: codemap-richer-index — substrate extraction (Tiers 1-12) (#79)
* feat(apply): slice 1 — apply-engine phase-1 validation + dry-run Pure transport-agnostic engine implementing phase-1 validation and the dry-run output shape from the merged plan (Q1–Q10 in docs/plans/codemap-apply.md). No CLI, no MCP/HTTP wiring, no write branch yet (Slice 2 lands the latter). Re-locks Q8 to substring-match (a) — the original "exact byte-match" draft contradicted the existing buildDiffJson formatter contract and would have made every shipped rename-preview row conflict (the recipe emits before_pattern = old_name as the bare identifier, not the full line). New phase-1 mirrors application/output-formatters.ts buildDiffJson verbatim: actual.includes(before) for the match check, first-occurrence substring replace for the transformation (Slice 2), $-pre-escape per GetSubstitution. Slice scope: - src/application/apply-engine.ts — applyDiffPayload({rows, projectRoot, dryRun}) returning Q5's ApplyJsonPayload envelope. dryRun=false with a clean phase 1 throws NotImplemented (Slice 2 fills in the write). - src/application/apply-engine.test.ts — 14 unit tests covering happy paths, all three conflict reasons, row-shape validation, deterministic files[] sort, and the Slice-2 guard semantics. - docs/plans/codemap-apply.md — Q8 re-lock + edge-case table refresh. Tests: 14/14 pass. Typecheck / lint / format clean. * feat(apply): slice 2 — phase 2 writes via temp + rename Phase 2 lands behind the `!dryRun && conflicts.length === 0` gate per Q2 (c). Each modified file is written to a sibling temp path then `rename`d into place — POSIX-atomic per file, so concurrent readers see either the pre- or post-rename content, never a torn write. Implementation: - Phase-1 caches each source's text; phase-2 reuses the cache (one read per file across both phases). TOCTOU window collapses to the gap between phase-1 read and phase-2 rename — accepted per Q2. - Phase-2 splits on raw "\n" (not /\r?\n/) so CRLF lines retain their trailing \r and round-trip when joined back with "\n". Phase-1 conflict reporting still strips the \r so `actual_at_line` is clean. - Edits applied per-file in descending line order — defensive default for when multi-line transforms land (today's single-line rows are order-independent). - `$`-pre-escape on `after_pattern` per GetSubstitution rule (mirrors buildDiffJson) so identifiers like `$inject` round-trip safely. - Temp paths use `crypto.randomBytes(6)` so concurrent applies don't collide; cleanup on success is implicit (rename atomically removes the source name). Tests: 20/20 pass. Failure-mode coverage: chmod 0o555 on the project dir to force the temp-write to fail; dry-run no-op-on-disk; no temp siblings left behind on success; conflict short-circuits before any writes (good.ts untouched when bad.ts is missing). * feat(apply): slice 3 — CLI verb + recipe execution + TTY/--yes gate Adds `codemap apply <recipe-id>` as a positional verb (per Q4) wired through the same dispatch as every other CLI command. Recipe execution reuses `queryRows` + the existing `--params` plumbing (`parseParamsCli` + `resolveRecipeParams`); rows feed straight into `applyDiffPayload`. Q6 gating matrix implemented: - TTY no `--yes` → phase-1 dry-run preview, prompt `Proceed? [y/N]` on stderr, default-N, phase-2 only on `y` (uses node:readline). - TTY `--yes` → no prompt; proceed if validation clean. - Non-TTY no `--yes` (no `--dry-run`) → reject with stderr message ("Pass --yes for non-interactive runs, or --dry-run for preview."). - `--dry-run` + `--yes` → mutually exclusive, parse-time error. - `--json` everywhere routes errors as `{"error":"..."}` envelopes. Files: - src/cli/cmd-apply.ts — argv parser + run loop. Mirrors cmd-impact's shape (positional + flags + JSON envelope). - src/cli/cmd-apply.test.ts — 10 subprocess integration tests: dry-run no-op, --yes happy path (with cross-file import rename via rename-preview), Q7 (a) idempotent re-run after reindex, Q6 non-TTY rejection (text + JSON), unknown recipe id, missing positional, mut- ex check, --help prints without bootstrap. - src/cli/main.ts + bootstrap.ts — register the verb. realpath note: tests `realpathSync` the temp project root so oxc- resolver's symlink-dereferenced `resolved_path` aligns with the indexed file paths (without it the import-rename rows in rename- preview return empty on macOS where /tmp → /private/tmp). Tests: 10/10 integration + 20/20 engine. Typecheck / lint / format clean. * feat(apply): slice 4 — MCP/HTTP `apply` tool Registers `apply` as the 13th tool over both MCP (stdio) and HTTP transports. Dispatches the same `applyDiffPayload` engine the CLI uses; output envelope is identical to the CLI's --json output (Q5). - src/application/tool-handlers.ts — `handleApply(args, root)` + Zod schema (`applyArgsSchema`). Q6 gate enforced: non-TTY transports always require `yes: true` (no prompt to fall back on). dry_run + yes rejected as mutually exclusive. Unknown recipe returns 404. - src/application/mcp-server.ts — `registerApplyTool` mirrors the impact tool's shape; description encodes the Q5 envelope + Q2 (c) all-or-nothing semantics so agents can reason about the tool without reading docs. - src/application/http-server.ts — adds `apply` to TOOL_NAMES + the POST /tool/{name} dispatcher case. - src/application/tool-handlers.test.ts — 4 handleApply tests (404, yes-required, mutex, dry-run envelope shape). 104 mcp/http server tests still green; tool catalogs are inferred from TOOL_NAMES so the new tool surfaces automatically in /tools listings. Per the plan's Slice 4 lock: `query_batch` does NOT get an apply analogue (Moat-A: batched writes are verdict-shaped; consumers compose multiple apply calls if they need cross-recipe writes). * docs(apply): slice 5 — lockstep + plan retire Final slice — lifts the durable design from the plan into reference docs and retires the plan file per docs/README.md Rule 3. - docs/architecture.md — new "Apply wiring" section (engine + phase-1 algorithm + phase-2 atomic temp-rename + Q6 gate + Q5 envelope + Q7 idempotency) plus "Boundary verification — apply write path" SQL kit. Layering table mentions `apply-engine.ts`. - docs/glossary.md — `codemap apply` / apply tool entry. - docs/roadmap.md — backlog entry removed (shipped). - docs/plans/codemap-apply.md — DELETED (closing-state lifecycle per docs-governance skill: delete + lift, never "Slim & keep in plans/"). - .agents/rules/codemap.md + .agents/skills/codemap/SKILL.md — Apply row in CLI table, "Apply (`bun src/index.ts apply <recipe-id>`)" paragraph, MCP `apply` tool listed alongside `impact`. - templates/agents/rules/codemap.md + templates/agents/skills/codemap/ SKILL.md — same updates in the published-package mirror (uses `codemap` instead of `bun src/index.ts`). - .changeset/codemap-apply.md — minor bump; summarises Q1–Q10 locks + boundary discipline anchor. Boundary kit verified empty after a fresh reindex of the apply files; 140/140 tests pass across apply-engine + tool-handlers + cmd-apply + mcp/http-server suites. * fix(apply): path-containment + overlap detection (triangulated review) Lands four fixes from a triangulated review of three independent agent audits (Composer, GPT-5.5, Codex). Two HIGH-severity correctness bugs were each reproducible against the prior `apply-engine.ts` in 30 seconds: F1 (HIGH) — Path traversal. Pre-fix: applyDiffPayload({ rows: [{ file_path: "../outside.ts", ... }], projectRoot: "/tmp/proj/", dryRun: false }) returned `applied: true` and mutated a sibling-of-root file. Now phase 1 resolves the project root once and rejects (a) absolute `file_path` inputs and (b) any candidate whose `path.resolve(resolvedRoot, file_path)` lands outside it. New conflict reason: `path escapes project root`. F2 (HIGH) — Phase-2 partial cross-file write. Pre-fix: two rows on the same `(file_path, line_start)` both passed phase-1 (substring check against original source); phase-2 applied the first replace, the second's substring assertion failed, the function threw — AFTER earlier files in alphabetical order had already been `renameSync`d. The "Q2 (c) all-or-nothing" guarantee was demonstrably broken. Now phase 1 maintains a per-file Set<line_start>; the second hit at the same line emits a `duplicate edit on same line` conflict before any write. F3 (MEDIUM, doc-first) — Same-line `before_pattern` ambiguity. The formatter precedent (`buildDiffJson`) uses `actual.replace(before, after)`, which rewrites only the leftmost occurrence. `const foo = foo();` with `before = "foo"` becomes `const bar = foo();` — variable renamed, recursive call broken, `applied: true` reported. This mirrors the formatter exactly and the `--format diff` preview shows the same shape, so the audit's recommendation of an engine-level fix would diverge preview from execution. Documented as a deliberate limitation in the engine docstring + `architecture.md § Apply wiring` caveat instead; test pins the current behaviour so a future engine change lands as a deliberate breaking change rather than silent drift. F4 + F6 (LOW) — `apply-engine.ts` docstring no longer points at the deleted plan (now links to `docs/architecture.md` for durable design); `apply-engine` added to the `application/` row of the Key Files table in architecture.md (it was meant to be in that enumeration alongside the other 14 engines). Tests: 25 unit tests (8 new — three F1 paths, one F2 repro, one F3 limitation pin, plus existing happy-paths / failure-modes); 41 pass across the apply path. Boundary kit returns []. Changeset entry amended with the path-containment + overlap-detection bullets so the release notes carry the security-relevant fixes. Triangulated audit doc + the three source agent reviews are NOT checked in — they served their purpose for this fix-up commit and removing them avoids stale "review backlog" cruft per docs-governance. Follow-up (separate PR): the audit also surfaced that `DEFAULT_EXCLUDE_DIR_NAMES` in src/config.ts doesn't include `.codemap`, so `audit --base` followed by `--full` walks the audit-cache subtree. Tracked separately because the gap predates this PR. * chore(apply): slim comments + sync docs to five conflict reasons Concise-comments sweep on the apply surface — module docstring goes from a six-section narrative to three named call-outs (same-line ambiguity / TOCTOU / EOL); inline comments drop redundant prose where the next line of code already says it. Net 65 lines removed across src/ with no behavioural change. Docs sync: post-fix the engine collects FIVE conflict reasons (added `path escapes project root` + `duplicate edit on same line` in commit bdf7ef3), but the agent rule, the published-package agent rule, and the glossary all still said "three." Updated all three to enumerate the full set + briefly describe what each new guard rejects. Touched: - src/application/apply-engine.ts — slim docstring + 6 inline blocks. - src/application/apply-engine.test.ts — slim test rationale where the assertion already conveyed it. - src/cli/cmd-apply.ts — collapse two same-branch returns into one union; slim Q6/path-derivation comments. - src/application/tool-handlers.ts — slim handleApply schema/header doc to one sentence each. - .agents/rules/codemap.md + templates/agents/rules/codemap.md + docs/glossary.md — three → five conflict reasons + new-guard one-liners. Tests + typecheck + format + boundary kit all green. * fix(apply): address CodeRabbit review (7 of 9; 2 already fixed) Triaged 9 actionable comments via pr-comment-fact-check. Each finding verified against the source on aaabc13; 2 were already addressed by the prior commit (CodeRabbit auto-tagged with "✅ Addressed in aaabc13"); 7 are new fixes here: F1 (paragraph merge in .agents/rules/codemap.md, partial earlier-fix): CodeRabbit's auto-tag was optimistic — the conflict-reason count was synced in aaabc13 but the Impact section's tail (`... --summary trims …`jq '.summary.nodes'``) was still stitched onto the END of the Apply paragraph. Restored the section break. F3 (after_pattern: "" silently dropped): `readString` rejected empty strings, so a deletion-shaped row got silently skipped by phase-1's required-keys check. New `readStringAllowEmpty` helper for `after_pattern` only — empty `before_pattern` still rejected (would match anywhere on the line). Regression test deletes a `// FIXME(team): ` prefix. F4 (cache-key dedup `a.ts` vs `./a.ts`): Pre-fix, the cache + pending + seenLines maps used the raw `file_path` as their key. Two rows naming the same disk file via different spellings created two cache entries → second write clobbered the first edit. New `canonicalizeFilePath` collapses every spelling to a project-relative form. Symlink-realpath defense remains documented as a separate (heavier) follow-up. F5 (Q2 (c) over-promised on I/O failures): CodeRabbit's "🔴 Heavy lift" — a writeFileSync/renameSync mid-loop failure leaves files 1..N-1 already renamed with no rollback. Full transactional rollback (per-file backups + restore-on-throw) is deferred. Honest fix: weakened the Q2 (c) claim in `architecture.md § Apply wiring` to "all-or-nothing (semantic) — phase-1 conflicts abort phase 2 entirely; phase-2 I/O failures are NOT transactional across files." Engine docstring carries the same caveat as a fourth call-out. F6 (TTY check used wrong stream): Gate checked `process.stdout.isTTY` but `promptYesNo()` reads from `process.stdin` and writes to `process.stderr`. So `codemap apply foo | tee log.txt` (interactive stdin, piped stdout) was rejected as non-TTY. Now gates on `stdin.isTTY && stderr.isTTY`. F7 (user-cancel rendered "no rows applicable"): Abort path called `emitResult(preview, opts)` with `opts.dryRun === false`, so `renderTerminal` fell through to "no rows applicable" — contradicting the user's explicit cancel. Terminal mode now prints `apply <id>: aborted by user; no files written.`; JSON consumers still get the full preview envelope. F9 (skill files missed two conflict reasons): `.agents/skills/codemap/SKILL.md` + `templates/agents/skills/ codemap/SKILL.md` apply tool description didn't enumerate the 5 conflict reasons. Synced. Tests: 44/44 (3 new — `./a.ts` dedup, deletion via empty after_pattern, empty-before-still-rejected). Typecheck / lint / format clean. * refactor(mergeParams): simplify parameter merging logic * feat(tier1): R.17 extractor architecture + position-precision substrate R.17 (`docs/plans/substrate-extraction.md`) per-tier extractor architecture + Tier 1 substrate landed together so the substrate-extraction plan's shared-state patterns (ScopeTracker, ComplexityTracker, ComponentDetector) ship validated by real consumers, not in isolation. Architecture (R.17): `src/extractors/` hosts a `TierExtractor` registry called from a thin `parser.ts` orchestrator via a multiplexed visitor — 10 modules (types + 3 shared trackers + 6 extractors + offsets/jsdoc/ type-stringify helpers) replace the 968-line `parser.ts` monolith; parser.ts shrinks to 247 lines. 6 collaborating extractors on 3 shared trackers, chaining handlers on `CallExpression` / `FunctionDeclaration` / `VariableDeclaration` + their `:exit` pairs. Tier 1 substrate: column-precise positions on `calls`, `exports`, `symbols.name_*`, `markers.column_*`, plus a new `import_specifiers` child table that splits the `imports.specifiers` JSON blob into typed rows. SCHEMA_VERSION 10→14. 4 flagship recipes + 4 golden fixtures (`find-call-sites`, `find-export-sites`, `find-symbol-definitions`, `find-import-sites`) form a complete identifier-locator family — foundation for Tier 6's app-wide rename recipe extension. Empirical cost (clean rebuild, median of 3): codemap-self ~924 files: 11.4→14.3 MB (+25%); ~280→300 ms (+7%) merchant-dash ~2120 files: 37.5→50.1 MB (+33%); ~740→900 ms (+22%) Targeted reindex flat (~15 ms). Full reindex worst case ~900 ms — 66x under R.10's 1-min pain threshold. DB growth used ~25-33% of R.9's "~5-10x total budget across 13 tiers." 930/930 tests pass; 19 golden scenarios pass (4 new). Test fixtures updated in impact-engine.test.ts / mcp-server.test.ts / resource-handlers.test.ts to match new schema. R.17 architecture validated end-to-end by Tier 1 consumers — `symbolsExtractor` populates name_column_*; `callsExtractor` populates line+column; `markers.ts` populates column_*; orchestrator wires `staticImportSpecifierRows`. * feat(tier2): scopes + references substrate per R.11/R.13 What landed (SCHEMA_VERSION 14 → 16): - **`scopes` table** — composite PK `(file_path, local_id)`, WITHOUT ROWID. `local_id` is a per-file 0-based counter assigned at parse time so refs encode their scope without round-tripping SQLite autoincrement. Kinds shipped: module / function / arrow / class / method. Block / for / catch deferred (R.11 conservative escape valve covers it). - **`references` table** — per-identifier-use rows with column-precise position, kind (value/type/jsx), enclosing scope, is_write flag. - **`is_write` per R.13** — writePositions / suppressedReads sets keyed by node.start. Pre-marker handlers for AssignmentExpression (simple `=` suppresses read), UpdateExpression / UnaryExpression(delete) (dual-emit), VariableDeclarator with initializer (write-only), ForOf/In LHS, AssignmentPattern. - **Declaration suppression** — Function/Class/Interface/Type/Enum/Module declarations NOT duplicated in references (they live in symbols). - **Shorthand dedup** — oxc walker visits the SAME Identifier twice for `import {foo}` / `export {foo}` / `{foo}` shorthand; dedup by (node.start, is_write). - **`referencesExtractor`** module per R.17 (132 lines); ScopeTracker extended with pushKind / currentLocalId / getRecorded / finaliseModule. - **Recipes:** find-references --params name=X, find-write-sites --params name=X + golden fixtures. Empirical (codemap-self, 925 files): | Metric | Tier 1 | Tier 2 | Delta | | --------------- | --------- | -------- | ------ | | Full reindex | ~300 ms | 767 ms | +2.5× | | Targeted (1 f) | 8 ms | 9 ms | +12% | | Rows | n/a | 127k refs / 2k scopes | new | All within R.9 / R.10 thresholds (<1 min full, <100 ms targeted). 930 tests pass; all golden scenarios pass. Deferred to Tier 2.1: - bindings table + pass-2 cross-file resolution (R.12) - Reference kinds: decorator / shorthand-prop / member-access / spread / rest / as-cast / typeof / keyof - Block / for / catch scope kinds * feat(tier2.1): bindings substrate + find-symbol-references recipe What landed (SCHEMA_VERSION 16 → 17): - `bindings` table — (reference_id, resolved_symbol_id, resolution_kind, is_external). PK on reference_id; CASCADE on reference deletion. resolution_kind enum: same-file / imported / global / unresolved. - `symbols.scope_local_id` column — captures the declaring scope (parent of the symbol's own body scope). Class members anchor to their class's pushed scope. Captured BEFORE any new scope is pushed. - Pass-2 binding resolver (`src/application/bindings-engine.ts`) — two phases: one SELECT per table into in-memory Maps, then per-reference resolution via scope-walk → imports → globals → unresolved. ~300ms for 127k refs on codemap-self. - Cross-file resolution uses `imports.resolved_path` (dependencies lacks the module specifier). Module-scope target symbol picked when the target file's exports list matches. - Full-rebuild only — targeted reindex skips bindings refresh per R.10's <100ms contract. Orphan rows CASCADE-cleared on incremental edits. - find-symbol-references recipe — bindings-precise (filters same-name shadows + different-source imports). Golden fixture added. Also: concise-comments sweep — stripped vintage `Tier 2.1` prefixes from source; kept forward-deferral notes (`defer to Tier 6`) and R.NN cross-refs. Empirical (codemap-self, 932 files): - Full reindex: 767 ms → 1175 ms (+53%) - Targeted (1 file): 9 ms → 9 ms (no regression) - Bindings distribution: 33% same-file / 17% imported / 4% global / 45% unresolved (mostly TS type params + function params, future tiers) 930 tests pass; all golden scenarios pass. Deferred to Tier 2.2: - Re-export chain walking - Function-parameter symbols - Type-parameter symbols * feat(tier2.2): function/type params + re-export chain walking What landed: - Function/method/arrow parameter symbols (kind='param') with scope_local_id = function's own scope. TSParameterProperty (constructor `public foo: T`) emits at class scope. - Type parameter symbols (kind='type-param') for FunctionDeclaration, ClassDeclaration, arrow vars, and class methods. Interfaces and type aliases deferred — they don't push their own scope. - Re-export chain walking in bindings-engine — bounded at 10 hops with cycle detection. `export { foo } from './bar'` now resolves to the original definition. Path resolution is relative-only against the indexed-paths set. - pushParams / pushTypeParams helpers in src/extractors/params.ts. Empirical (codemap-self, 933 files): | Metric | Pre | Post | Delta | | ----- | ---- | ---- | ---------------- | | Symbols | ~11.8k | 14k | +2.2k | | Same-file refs | 42257 | 51299 | +9042 (+21%) | | Unresolved refs | 58073 | 49534 | -8539 (-15%) | | Unresolved % | 45% | 39% | down | | Full reindex | 1175ms | 1513ms | +29% | | Targeted (1 file) | 9ms | 9ms | no regression | 930 tests pass; all golden scenarios pass (index-summary rebaselined to reflect new param/type-param rows). Deferred to Tier 2.3: - Destructuring pattern params ({a,b}, [a,b]) - Interface/type-alias type-param scoping - Callback arrow scoping - External-module bindings via .d.ts * feat(tier2.3): member-kind refs + destructuring + type globals (Tier 2 close) What landed (SCHEMA_VERSION 17 → 18): - kind='member' for non-computed property access (obj.foo). Bindings resolver skips these. Single biggest unresolved-bucket cut (~50%). - Object-literal / class-member key suppression (long-hand Property, MethodDefinition, PropertyDefinition, TSPropertySignature, TSMethodSignature). Shorthand and computed still emit normally. - Destructuring pattern bindings — walkPattern generator handles Identifier / AssignmentPattern / RestElement / ObjectPattern / ArrayPattern / TSParameterProperty recursively. Same helper for function params and variable destructuring (`const { a, b } = obj`). - TYPE_GLOBALS set in bindings-engine — TS built-ins (Record, Partial, ReadonlyArray, Map, etc.) resolve to global instead of unresolved. - Extra value globals: performance, import, require, module, exports, __dirname, __filename, self. - `as const` skip: TSTypeReference name=const no longer emitted. Empirical (codemap-self, 933 files): | Metric | Pre | Post | Delta | | ----- | ----- | ----- | ------------- | | `kind='member'` | 0 | 26701 | new | | Bindings rows | 127k | 84k | -34% | | Unresolved | 49534 | 4634 | -90% | | Unresolved % | 39% | 5.5% | -34 pts | | Full reindex | 1513ms | 1025ms | -32% | | Targeted (1 file) | 9ms | 9ms | no regression | Tier 2 closed. Remaining 5.5% is dominated by callback arrow params (s, r, e, etc.) which need structural arrow scoping — deferred as separate post-Tier-2 work. 930 tests pass; all goldens pass. * feat(tier2.4): arrow + catch scoping (Tier 2 truly closed at 1.3%) What landed: - claimedScopeNodes WeakSet<object> on ExtractContext. Every extractor that pushes scope for a specific AST node marks the node here so downstream extractors don't double-push. - ArrowFunctionExpression handler in scopesExtractor — for callback arrows (not claimed by VariableDeclaration), pushes anonymous arrow scope + emits params. Named arrows stay claimed and don't double-push. - CatchClause handler — try/catch param scoped to catch body scope. Bindingless catch (TS 4.4+) handled. - ScopeTracker.currentParent walks past anonymous scopes (empty-name) so parent_name of nested symbols anchors to the nearest named owner. - Extra globals: Bun, Deno. Empirical (codemap-self, 933 files): | Metric | Pre | Post | Delta | | ----- | ----- | ----- | ------------- | | Same-file | 51972 | 55480 | +6.7% | | Unresolved | 4634 | 1102 | -76% | | Unresolved % | 5.5% | 1.3% | -4.2 pts | | Full reindex | 1025ms | 1224ms | +19% | | Targeted (1 file) | 9ms | 9ms | no regression | Tier 2 closed at 1.3% unresolved. Remaining is unindexable (infer T, audit-cache re-indexes, edge cases). 930 tests pass; all goldens pass. * feat(tier2.5): interface/type-alias + for-of/in body scoping What landed (SCHEMA_VERSION 18 → 19): - scopes.kind enum extended: interface, type-alias, for, catch. - TSInterfaceDeclaration / TSTypeAliasDeclaration push their own scope so type-params resolve via the standard walk. Type-param symbols emitted at the new scope (was: emitted at parent scope, causing same-letter collisions across interfaces). - ForOfStatement / ForInStatement push a 'for' scope; VariableDeclaration / pattern in `left` emits bindings at the for-scope so body refs resolve to the loop variable, not the enclosing function. - CatchClause kind correctly tagged as 'catch' (was 'function'). - Added value globals: RegExp, Iterator, AsyncIterator. Empirical (codemap-self, 933 files): | Metric | Pre | Post | Delta | | ----- | ----- | ----- | ------------- | | Same-file | 55480 | 55524 | +44 | | Global | 6019 | 6034 | +15 | | Unresolved | 1102 | 1119 | +17 (noise) | | Unresolved % | 1.30% | 1.32% | flat | | Full reindex | 1224ms | 1249ms | +2% | | Targeted (1 file) | 9ms | 9ms | no regression | Net flat on the unresolved bucket — the value is structural (interface type-params + for-loop body bindings now have correct scope graphs), which unblocks future precision wins. 930 tests pass; all goldens pass. * feat(tier11): per-symbol metrics + file_metrics aggregate What landed (SCHEMA_VERSION 19 → 20): - symbols: body_line_count, param_count, nesting_depth columns (nesting_depth deferred; needs a separate tracker — pushed as NULL). - file_metrics table: one row per indexed TS/JS file with total_lines, code_lines, blank_lines, comment_lines, function_count, class_count, interface_count, export_count. let/const/var/arrow distinguished deferred (parser doesn't track keyword variant on VariableDeclaration). - Per-file metrics computed in parser.ts orchestrator from existing ctx data + lineMap (no extra walk). - Recipe: `large-functions` — body_line_count ≥ 50 ranked by size. Golden fixture added. Empirical (codemap-self, 933 files): - 446 file_metrics rows (TS/JS files only; CSS/markdown indexed separately don't go through extractFileData). - Top function: extractFileData at 511 body lines, 3 params. - Top file: src/cli/cmd-query.ts at 1411 lines, 19 functions, 10 exports. 930 tests pass; all goldens pass. * feat(tier12): module_cycles table via Tarjan's SCC What landed (SCHEMA_VERSION 20 → 21): - module_cycles table — (file_path PK, cycle_id, cycle_size). Only cyclic files appear; non-cyclic files have no row. - src/application/cycles-engine.ts — iterative Tarjan's SCC over the dependencies graph. O(V+E). Runs once per full rebuild after bindings resolution. - Recipe: `circular-imports` — every file in a cycle, grouped by cycle_id. Golden fixture detects the fixture's store ↔ cache cycle. Empirical (codemap-self, 936 files): - Full reindex: 1224ms → 1171ms (essentially flat — Tarjan is fast). - 3 cycles in the indexed DB (all the fixture cycle, appearing once in current source + twice in audit-cache copies of the same fixture). 930 tests pass; all goldens pass. * feat(tier6): re_export_chains materialised table What landed (SCHEMA_VERSION 21 → 22): - re_export_chains table — (from_file, from_name) PK, (to_file, to_name, hops, truncated). Only re-export entries are materialised; direct exports don't appear. - resolveReExportChains + persistReExportChains in bindings-engine. Reuses the same chain-walker bindings-engine uses for resolution. - Recipe: barrel-chains — every chain ordered by hops DESC. Golden fixture covers the minimal fixture's shop barrel. Empirical (codemap-self, 938 files): - 106 chains materialised (mostly internal barrels + audit-cache copies). - 0 truncated. - Full reindex: 1171ms → 1104ms (no measurable cost — same loops bindings already runs). 930 tests pass; all goldens pass. * feat(tier4): function_params first-class table What landed (SCHEMA_VERSION 22 → 23): - function_params table — one row per leaf parameter, ordered by position. Keyed by (file_path, owner_name, owner_kind) to disambiguate same-name functions vs methods. - Columns: position, name, type_text (stringified annotation), default_text (raw source of default expr), is_rest, is_optional, + column-precise position. - pushParams in src/extractors/params.ts extended to emit function_params rows alongside the existing kind='param' symbol rows. ownerKind passed by caller (function/method/etc). - Recipe: find-by-param-type --params type_text=X — every fn taking a param with exact type annotation match. Golden fixture covers createClient(config: ClientConfig). Empirical (codemap-self, 940 files): - 2257 function_params rows. - Full reindex: 1104ms → 1201ms (+9%). - Symbols.kind='param' count unchanged — parallel emission. 930 tests pass; all goldens pass. * feat(tier11.5): nesting_depth tracker (Tier 11 close) What landed: - ComplexityTracker extended with enterNest/exitNest. Frame tracks currentDepth + maxDepth alongside cyclomatic count; popTop writes maxDepth to symbol.nesting_depth. - complexityExtractor: IfStatement/While/DoWhile/For/ForIn/ForOf/ ConditionalExpression/CatchClause now have enter+exit handlers that bump nesting. SwitchCase + LogicalExpression remain cyclomatic-only (flat decision points, not depth). - Recipe: deeply-nested-functions — depth >= 4 ranked by depth then complexity. Golden fixture added. - large-functions recipe SELECT extended to include nesting_depth. Empirical: - codemap-self (943 files): 622 fns at depth 0, 441 at 1, 331 at 2, ... 3 fns at depth 9 (parseArgs in arg-handling scripts). - merchant-dashboard-v2 (2120 files): top finds include `createStream` (depth 9, gen'd), `main` in provision-vars.ts (depth 6, complexity 84, 488 lines — real refactor target). 930 tests pass; all goldens pass. Tier 11 fully closed. * feat(refs): JSX intrinsics + DOM/React globals + TS qualified names What landed: - JSX intrinsics suppressed: lowercase tags (div/span/h1/etc.) + every JSXAttribute name (className/onClick/value/etc.) + JSXMemberExpression .property (`<Foo.Bar />` Bar). - TSQualifiedName handler — `React.ReactNode` / `A.B.C` in type position now emits the namespace head as kind='type' and the member tail as kind='member'. - TYPE_GLOBALS extended: DOM elements (HTMLDivElement, SVGSVGElement, etc.), DOM events (MouseEvent, KeyboardEvent, PointerEvent, …), Web APIs (RequestInit, IntersectionObserver, …), React ambient types (ReactNode, ComponentProps, Dispatch, SetStateAction, FC, CSSProperties, HTMLAttributes, etc.). - Value GLOBALS extended: React namespace + constructor-callable Web APIs (new IntersectionObserver, new FormData, new URL, etc.). Empirical: - codemap-self: 1.32% → 1.25% unresolved (essentially flat — already near floor; small wins from DOM event types). - merchant-dashboard-v2 (2120 files): **17.1% → 1.75%** unresolved. Same-file +18% (more refs resolve precisely). - Full reindex on dashboard: 3997ms → 3706ms (slight gain from less unresolved binding work). 930 tests + all goldens pass. * fix(indexer): exclude .codemap/audit-cache from default glob Audit worktrees under .codemap/audit-cache/<sha>/ are full project snapshots used by `codemap audit --base <ref>`. They were being indexed alongside live source, multiplying every row by however many audits had been run. On codemap-self this dropped files indexed from 944 → 347 and full reindex from 1273ms → 595ms (~54% faster). Cycle / re-export-chain queries no longer duplicate findings across snapshot copies. Surgical fix: added 'audit-cache' as a dir-name to DEFAULT_EXCLUDE_DIR_NAMES. Recipes / config / index.db itself stay indexable (project-local recipes ARE part of the workflow). 930 tests pass; all goldens pass. * feat(tier5): runtime_markers + find-leftover-console + env-var-audit What landed (SCHEMA_VERSION 23 → 24): - runtime_markers table — (kind, detail, line, column, scope) for every console.* / debugger / throw / process.env access. kind enum: 'console' / 'debugger' / 'throw' / 'process-env'. detail: method name for console, env-var name for process-env, truncated thrown expression text for throw, NULL for debugger. - runtime-markers extractor — matches MemberExpression on console.X and process.env.X, DebuggerStatement, ThrowStatement. Throw expr text capped at 200 chars to keep rows scannable. - Recipes: find-leftover-console (all console calls), env-var-audit (env vars ranked by use + file fan-out). - Goldens for both. Real-world (merchant-dashboard-v2, 2120 files): - 223 console calls (155 .log, 51 .error, 11 .warn) - 114 throw statements - 17 process.env reads dominated by NODE_ENV (10), plus single-use config (Sentry, AI chat, API_URL) — audit candidates. 930 tests pass; all goldens pass. * feat(tier9): test_suites + find-skipped-tests + tests-by-file What landed (SCHEMA_VERSION 24 → 25): - test_suites table — (file_path, name, kind, line_start, line_end, parent_suite_id, is_skipped, is_only, is_todo, framework). Captures every describe / it / test / suite / context block. - Framework detection per file from imports: vitest / jest / bun-test / node-test / mocha / unknown (mocha-style globals). - src/extractors/tests.ts — parses callee shape (Identifier or `.skip`/`.only`/`.todo` MemberExpression), extracts name from StringLiteral / TemplateLiteral first arg. Tracks parent stack to resolve parent_suite_id. `.each` collapses to one row per template (parametrised expansion is a runtime concern). - Recipes: find-skipped-tests (flags .skip / .only / .todo, with status column) + tests-by-file (per-file roll-up). Goldens added. Real-world (merchant-dashboard-v2, 2120 files): - 1404 it + 246 test + 414 describe (bun-test) + 132 it / 45 describe (vitest). Mixed-framework codebase detected per-file from imports. - 0 skipped/only/todo blocks — disciplined test suite. 930 tests pass; all goldens pass. * fix(golden): add ORDER BY line_number to markers-notes-todo query Local macOS and CI Linux returned the same 4 markers in different implementation-defined order without ORDER BY. Fix is to make the query deterministic. Also bumps markers-all-kinds golden (counts reflect the audit-cache exclusion from 9b5f9a2). * fix(refs+calls+scopes+symbols): address CodeRabbit review on #79 10 of 12 unresolved threads applied (1 deferred to feat/codemap-apply, 1 pushed back as deliberate semantics — see PR replies). Applied: - index-engine.ts: add missing `if (parsed.fileMetrics)` guard on the incremental path to mirror the full-rebuild guard at line 259. - calls.ts: recursive member-expression flatten. `obj.foo.bar()` now emits `obj.foo.bar` instead of being dropped. Computed segments (`a[i].b()`) still abort flattening — recipe queries filter on dot-joined identifier shape that computed breaks. Empirically: 79 new chain rows on codemap-self. - components.ts: scope tracker → stack so nested PascalCase functions (`function Outer() { function Inner() {…} }`) preserve Outer's attribution after Inner exits. - references.ts: AssignmentExpression LHS walker handles ObjectPattern / ArrayPattern / AssignmentPattern / RestElement, marking every leaf Identifier as write (and suppressing read for simple `=`). - scopes.ts: anonymous scopes use `$anon_<localId>` segment in `scopeStr` so two sibling callbacks don't share `caller_scope` and dedup as one edge in `calls`. - symbols.ts: per-declarator spans (decl.start/.end) instead of the whole VariableDeclaration so `body_line_count` doesn't inflate on `const a = (…) => long, b = (…) => longer`. - type-stringify.ts: recursive `qualifiedNameOf` walks TSQualifiedName chains so `A.B.C` doesn't truncate to `undefined.C`. - find-export-sites.md: rename `callee-token-precise` → `export-name-token-precise` (copy-paste from find-call-sites). - find-skipped-tests.sql: prioritise `.only` over `.skip` in the status CASE (a row with both flags hides the higher-risk one). - large-functions.md: mention `LIMIT 50` in the description. 930 tests pass; all goldens pass. * fix(bindings+stats+markers): address CodeRabbit nitpicks on #79 5 of 6 nitpicks applied (1 declined — see PR reply). Applied: - `IndexTableStats` + `fetchTableStats` extended to surface the 10 new tables in the post-index summary (scopes, references, bindings, import_specifiers, function_params, runtime_markers, test_suites, re_export_chains, module_cycles, file_metrics). Empty-stats initialiser bumped to match. - Extracted `parseReExportSource(raw, fallbackName)` helper — resolveBindings + resolveReExportChains shared a 16-line block; any future `.default` encoding tweak now lives in one place. - JSDoc one-liners on the two side-effecting bindings-engine exports (`persistBindings` orphan-clear, `persistReExportChains` truncate + rewrite). resolveX functions left bare — names are self-evident per concise-comments. - Deduped TYPE_GLOBALS (4 dupes: Window, Document, HTMLElement, Event) and GLOBALS (12 dupes: Number, FileReader, Image, FormData, AbortController, Headers, Request, Response, Blob, File, URL, URLSearchParams). Kept reserved keywords (`undefined`, `null`, `this`, etc.) — `undefined` IS a valid Identifier in JS; the others are cheap defensive entries with no false-positive cost. - runtime_markers gained `column_end` column (was unused emit param). SCHEMA_VERSION 25 → 26. Editor-highlight recipes can now span the whole `console.X` / `process.env.X` token. - Research note `codemap-richer-index-synthesis-2026-05.md` Path B sketch gained an "Unvalidated illustration" disclaimer. Declined: none — all nitpicks were genuine cheap wins. See PR for the one applied selectively (JSDoc on side-effecting persist*; resolve* left bare). 930 tests pass; all goldens pass. * docs: sync architecture / glossary / roadmap / golden-queries with substrate The Tier 2-12 substrate landed (commits a1a17ba…aa18b13) but reference docs still listed the pre-substrate 12 tables. This catches them up. - architecture.md § Schema — added 10 new table sections (scopes, references, bindings, import_specifiers, function_params, file_metrics, re_export_chains, module_cycles, runtime_markers, test_suites) plus new columns on existing tables (symbols: scope_local_id / name_column_* / body_line_count / param_count / nesting_depth; calls: line_start / column_*; exports: line_* / column_* / is_re_export; markers: column_*). - glossary.md — entries for bindings, scopes, references, function_params, file_metrics, re_export_chains, module_cycles, runtime_markers, test_suites, import_specifiers (alphabetic slots). - roadmap.md Moat B example list extended with the new table names so the "what recipe does dropping this kill?" reviewer test has the full surface to defend. - golden-queries.md status table — scenario-coverage description lists all current indexed tables instead of just the original 12. 930 tests pass; all goldens pass. * templates(agents): sync user-facing rule + skill with substrate `codemap agents init` ships these files; they're what end-user agents read. Before this commit, neither mentioned a single new table — agents installed in downstream projects would have no idea the 10 new substrate tables existed. - templates/agents/rules/codemap.md - Indexing-summary sentence lists the new tables. - Trigger patterns gained 12 new rows: find-references, find-symbol-references, find-write-sites, find-by-param-type, circular-imports, barrel-chains, find-leftover-console, env-var-audit, find-skipped-tests, tests-by-file, large-functions, deeply-nested-functions, nesting_depth lookup. - Quick reference queries gained 9 substrate-table SQL examples. - templates/agents/skills/codemap/SKILL.md - Added schema sections (matching the .agents/skills/codemap/SKILL.md + docs/architecture.md format) for: import_specifiers, scopes, references, bindings, function_params, file_metrics, re_export_chains, module_cycles, runtime_markers, test_suites. 930 tests pass; all goldens pass. * chore(deps): bump deps to latest + dedupe zod across MCP SDK zod 4.4 reorganized internals; @modelcontextprotocol/sdk pulled its own 4.3 copy via dep range, so `zod/v4/core` resolved to two different $ZodType identities and TS rejected our registerTool inputSchemas. Pin both via a `"zod": "$zod"` override so a single 4.4.3 is shared. * chore(deps): drop ip-address override (no longer load-bearing) express-rate-limit@8.5.2 bumped its ip-address constraint to ^10.2.0, so a fresh resolve naturally picks the patched version without help. Same fresh resolve also retires the unrelated fast-uri / hono CI audit findings (ajv 8.20 + @hono/node-server 1.19.14 → patched transitives). bun audit: "No vulnerabilities found". * fix(build): declare unrun as devDependency for tsdown 0.22 tsdown@0.22.0 moved unrun from `dependencies` to `peerDependencies` (constraint `*`). Locally `bun run build` works because Bun's run intercepts the `#!/usr/bin/env node` shebang and executes tsdown under Bun (autoLoader → "native", unrun path skipped). In CI the same script ends up executing the binstub under Node, where `process.versions.bun` is undefined → autoLoader resolves to "unrun" → import fails. Declaring unrun (^0.3.0) as a devDependency satisfies the peer for the Node code path. Verified `node node_modules/tsdown/dist/run.mjs` succeeds locally. * chore: add changeset for substrate extraction (minor) PR #79 ships 10 new substrate tables + column additions across 4 existing tables + SCHEMA_VERSION 10 → 26; that fits the .agents/lessons.md bump policy for "minor" cleanly. Without this changeset, the next Changesets release would land the substrate silently because .changeset/codemap-apply.md (the only file on this branch) is consumed by PR #78's merge.
1 parent 84f9b97 commit ec91bdf

90 files changed

Lines changed: 8218 additions & 1094 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.changeset/codemap-richer-index.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
---
2+
"@stainless-code/codemap": minor
3+
---
4+
5+
`codemap-richer-index` — substrate extraction across 12 tiers. **Schema bump** (`SCHEMA_VERSION` 10 → 26) — first run after upgrade rebuilds `.codemap/index.db` from source.
6+
7+
**10 new substrate tables**: `import_specifiers`, `scopes`, `references`, `bindings`, `function_params`, `file_metrics`, `re_export_chains`, `module_cycles`, `runtime_markers`, `test_suites`.
8+
9+
**Column additions**: `symbols.{name_column_start, name_column_end, scope_local_id, body_line_count, param_count, nesting_depth}` · `calls.{line_start, column_start, column_end}` · `exports.{is_re_export, line_start, line_end, column_start, column_end}` · `markers.{column_start, column_end}`.
10+
11+
**12 new recipes**: `find-references` · `find-symbol-references` · `find-write-sites` · `find-by-param-type` · `large-functions` · `deeply-nested-functions` · `circular-imports` · `barrel-chains` · `find-leftover-console` · `env-var-audit` · `find-skipped-tests` · `tests-by-file`.
12+
13+
**Architecture**: modular extractor pattern (R.17) splits `parser.ts` into per-tier extractors under `src/extractors/` with a shared `ExtractContext`. Targeted reindex stays sub-100ms; full reindex includes bindings resolution + Tarjan SCC + re-export chain materialisation.
14+
15+
**Reference precision**: `references` table emits every identifier USE with column-precise positions; `kind='member'` rows distinguish non-computed property access from bindings. Native JSX tags + JSXAttribute names + long-hand object-literal keys are suppressed. `TSQualifiedName` (e.g. `React.ReactNode`) splits into namespace head (`kind='type'`) + member tail (`kind='member'`). Bindings resolver (full-rebuild only) walks same-file scope → imports → globals → unresolved with deduped TypeScript / DOM / Node / ES global sets. Re-export chains followed up to 10 hops with cycle detection.
16+
17+
**Dependency bumps**: `oxc-parser` 0.127 → 0.130 · `zod` 4.3 → 4.4 (dedupe override added so the MCP SDK keeps a single `$ZodType` identity) · `tsdown` 0.21 → 0.22 (declared `unrun` as devDep to unblock CI build under Node's tsdown binstub).
18+
19+
**Docs sync**: `docs/architecture.md` § Schema reflects every new table + column; `docs/glossary.md` gains 10 new entries; `docs/golden-queries.md` + `fixtures/golden/` regenerated. Templates (`templates/agents/`) updated with the new schema overview + trigger patterns.

bun.lock

Lines changed: 149 additions & 158 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/architecture.md

Lines changed: 190 additions & 22 deletions
Large diffs are not rendered by default.

docs/glossary.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,10 @@ In Codemap usage: a file with a high number of `exports` rows — typically a pu
5555

5656
The shared `batchInsert<T>()` helper in `src/db.ts`. Splits inserts into multi-row `INSERT … VALUES (…),(…)` statements of `BATCH_SIZE` (500) rows each, with pre-computed placeholder strings. Used by every `insertX` function.
5757

58+
### `bindings` (table) / bindings resolver
59+
60+
Per-reference resolution to the originating symbol per [R.12]. One row per non-`member`-kind `references` row, with `resolution_kind` in `{same-file, imported, global, unresolved}` and a nullable `resolved_symbol_id` joining `symbols(id)`. Resolved in a single in-memory pass (`src/application/bindings-engine.ts`) after files/symbols/imports settle. Full-rebuild only — targeted reindex skips per [R.10]. Powers `find-symbol-references` (bindings-precise rename substrate).
61+
5862
### `boundaries` (config) / `boundary_rules` (table) / `boundary-violations` (recipe)
5963

6064
Architecture-boundary substrate. Users declare `boundaries: [{name, from_glob, to_glob, action?}]` in `.codemap/config.ts`; the resolver fills `action` to `"deny"` when omitted. Every index pass calls `reconcileBoundaryRules` (in `src/db.ts`) which clears `boundary_rules` and re-inserts from the resolved config — config is the single source of truth, the table is a denormalised lookup. Bundled `boundary-violations` recipe joins `dependencies` × `boundary_rules` via SQLite `GLOB` and surfaces forbidden import edges; `--format sarif` lights up automatically because the recipe row aliases `dependencies.from_path` to `file_path`. CHECK constraint pins `action ∈ {'deny','allow'}`. v1 only honours `'deny'`; `'allow'` reserves the slot for future whitelist semantics. See [architecture.md § `boundary_rules`](./architecture.md#boundary_rules--architecture-boundary-rules-config-derived-strict-without-rowid).
@@ -107,6 +111,14 @@ CI-aggregate flag on `codemap query` and `codemap audit`. Aliases `--format sari
107111

108112
CLI subcommand comparing on-disk SHA-256 against `files.content_hash`. Statuses: `stale | missing | unindexed`. Exits `1` on any drift.
109113

114+
### `module_cycles` (table) / circular imports
115+
116+
Strongly-connected components of the import dependency graph computed via Tarjan after the full index pass. Only cyclic files appear (SCC size ≥ 2, or size-1 with a self-edge). Powers `circular-imports` recipe.
117+
118+
### `re_export_chains` (table)
119+
120+
Materialised re-export resolution. One row per `(from_file, from_name)` walked through barrel files to the terminal definition site. Bounded at 10 hops with cycle detection. `truncated = 1` flags chains that hit the cap or an unindexed file mid-walk. Powers `barrel-chains` recipe.
121+
110122
### `components` (table)
111123

112124
React components (PascalCase + JSX return or hook usage). PascalCase functions that neither return JSX nor call hooks stay in `symbols` only — never `components`. `hooks_used` is JSON-encoded. See `ComponentRow`.
@@ -275,6 +287,10 @@ Index mode that diffs against `last_indexed_commit` (git) and only re-indexes ch
275287

276288
Symbol or file blast-radius walker. CLI: `codemap impact <target> [--direction up|down|both] [--depth N] [--via dependencies|calls|imports|all] [--limit N] [--summary] [--json]`. MCP: `impact` tool. HTTP: `POST /tool/impact`. Replaces hand-composed `WITH RECURSIVE` queries that agents struggle to write reliably. Walks compatible graphs based on resolved target kind: **symbol** targets walk `calls` (callers / callees by name); **file** targets walk `dependencies` + `imports` (`resolved_path` only). Mismatched explicit `--via` choices land in `skipped_backends` instead of failing. Cycle-detected via path-string `instr` check inside the recursive CTE; bounded by `--depth` (default 3, 0 = unbounded but still cycle-detected and limit-capped) and `--limit` (default 500). Result envelope: `{target, direction, via, depth_limit, matches: [{depth, direction, edge, kind, name?, file_path}], summary: {nodes, max_depth_reached, by_kind, terminated_by: 'depth'|'limit'|'exhausted'}}`. `--summary` trims `matches` for cheap CI gate consumption (`jq '.summary.nodes'`) but preserves the count. Pure transport-agnostic engine in `application/impact-engine.ts`; CLI / MCP / HTTP all dispatch the same `findImpact` function. `sarif` / `annotations` formats not supported (impact rows are graph traversals, not findings).
277289

290+
### `import_specifiers` (table)
291+
292+
Per-specifier breakdown of the `imports.specifiers` JSON blob. One row per imported binding — `imported_name` (original) and `local_name` (renamed via `as`), `kind` in `{named, default, namespace}`, `is_type_only`, column-precise position. Powers specifier-precise rewrites and the `find-import-sites` recipe.
293+
278294
### `imports` (table)
279295

280296
Raw `import` statements. `specifiers` is JSON-encoded; `resolved_path` is non-null only when the resolver could map `source` to an indexed file. See `ImportRow` and the resolved view `dependencies`.
@@ -319,6 +335,14 @@ Stdio MCP (Model Context Protocol) server exposing codemap's structural-query su
319335

320336
MCP tool with no CLI counterpart — runs N read-only SQL statements in one round-trip. Items are `string | {sql, summary?, changed_since?, group_by?}`: bare strings inherit batch-wide flag defaults; object form overrides on a per-key basis. Output is an N-element array; per-element shape mirrors single-`query`'s output for that statement's effective flag set. Per-statement errors are isolated (failed statement returns `{error}` in its slot; siblings still execute). Distinct from making `query` accept `;`-delimited batches (rejected — would need a SQL tokenizer and would diverge `query`'s output shape from its CLI counterpart). SQL-only (no `recipe` polymorphism); `query_recipe_batch` is an additive future change if a real consumer asks.
321337

338+
### `file_metrics` (table)
339+
340+
Per-file aggregate metrics (one row per indexed TS/JS file): `total_lines`, `code_lines`, `blank_lines`, `comment_lines`, plus symbol-kind counts (`function_count`, `class_count`, `interface_count`, `export_count`). Line classification is regex-light per [Tier 11 ship report](./plans/substrate-extraction.md#tier-11--metrics-expansion-per-symbol--per-file).
341+
342+
### `function_params` (table)
343+
344+
First-class function parameters — one row per leaf parameter binding, ordered by `position`. Keyed by `(file_path, owner_name, owner_kind)` to disambiguate same-name function vs method. `type_text` is the stringified annotation; `default_text` is the raw default-expression source. Powers `find-by-param-type` recipe.
345+
322346
### markers
323347

324348
`TODO` / `FIXME` / `HACK` / `NOTE` comments extracted from any indexed file (TS, CSS, Markdown, JSON, YAML, …). Stored in the `markers` table; surfaced by the `markers-by-kind` recipe. See `MarkerRow`.
@@ -371,6 +395,14 @@ Code that turns source bytes into structured rows. Three implementations: `parse
371395

372396
A `docs/plans/<feature-name>.md` file tracking in-flight work. Created on commit; deleted when the feature ships per [README § Rule 3](./README.md#rules-for-agents).
373397

398+
### `references` (table)
399+
400+
Every identifier USE per [R.11], column-precise per [R.6]. `kind` in `{value, type, jsx, member}`; `is_write` distinguishes reads from writes per [R.13]; `scope_local_id` joins `scopes` in the same file. Native HTML JSX tags, attribute names, and long-hand object-literal keys are NOT emitted (they're not bindings). `kind='member'` rows DO emit for non-computed property access — they're skipped by the bindings resolver but available for consumers that want member-name positions.
401+
402+
### `runtime_markers` (table)
403+
404+
Operational signals worth auditing: `console.*` calls, `debugger` statements, `throw` statements, `process.env.X` accesses. `kind` in `{console, debugger, throw, process-env}`; `detail` carries the qualifier (method name, env-var name, truncated throw expression). Powers `find-leftover-console` + `env-var-audit` recipes.
405+
374406
### `pr-comment` (CLI verb)
375407

376408
Markdown PR-summary renderer. `codemap pr-comment <input>` (or `-` for stdin) reads a `codemap audit --json` envelope or a `codemap query --format sarif` doc and emits a markdown comment suitable for `gh pr comment <PR> -F -`. Auto-detects shape via `runs[]` (SARIF) vs `deltas` (audit); `--shape audit|sarif` overrides. Audit-mode groups by delta with collapsed `<details>` for added + removed rows; SARIF-mode groups by `ruleId`. Lists >50 entries collapse to `… and N more`. `--json` envelope `{markdown, findings_count, kind}` is the structured form action.yml consumers read. Targets the surfaces SARIF → Code Scanning doesn't cover (private repos without GHAS, aggregate audit deltas without `file:line` anchors, bot-context seeding). v1.0 ships the (b) summary-comment shape; (c) inline-review comments deferred per Q4 of [`plans/github-marketplace-action.md`](./plans/github-marketplace-action.md). Engine: `application/pr-comment-engine.ts` (pure transport-agnostic).
@@ -440,6 +472,10 @@ One record in a SQLite table. Each table has a corresponding TS interface (`File
440472

441473
## S
442474

475+
### `scopes` (table) / scope tracker
476+
477+
Lexical scope graph per [R.11]. Composite PK `(file_path, local_id)` with module scope at `local_id = 0` and nested scopes incrementing. Kinds: `module / function / arrow / class / method / interface / type-alias / for / catch`. `parent_local_id` walks the chain. Anonymous scopes (callbacks, catch, for) get owner*symbol_name=NULL; the tracker (`createScopeTracker`) emits `$anon*<localId>`segments in`scopeStr`so sibling anons don't collide on`calls.caller_scope`. Foundation for bindings resolution.
478+
443479
### schema
444480

445481
Conceptually, the structure of the SQLite database — every table, column, constraint, and index. Defined by **DDL** in `src/db.ts`. Versioned by **`SCHEMA_VERSION`**. Documented in [architecture § Schema](./architecture.md#schema).
@@ -498,6 +534,10 @@ TS shape for one row of the `symbols` table.
498534

499535
## T
500536

537+
### `test_suites` (table)
538+
539+
Test metadata — describe / it / test / suite / context blocks with skip/only/todo flags and detected `framework` in `{vitest, jest, bun-test, node-test, mocha, unknown}`. Framework detection is per-file from imports (mixed-framework codebases handled automatically). `parent_suite_id` resolves nested describes. Powers `find-skipped-tests` + `tests-by-file` recipes.
540+
501541
### targeted reindex
502542

503543
Index mode that re-parses only the explicit file paths passed to `--files`. Skips git diff and the full glob. See `targetedReindex` in `src/application/index-engine.ts`.

docs/golden-queries.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -69,13 +69,13 @@ Scenarios live in **`fixtures/golden/scenarios.json`** (Tier A) or optional **`s
6969

7070
## Status
7171

72-
| Area | State |
73-
| ----------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
74-
| Tier A runner + CI | **`bun run test:golden`** in `check` |
75-
| Tier A scenario coverage | Scenarios across every indexed table — files, symbols, imports, exports, components, dependencies, markers, type_members, calls, CSS vars/classes/keyframes — plus bundled-recipe smoke tests. Current count: `bun src/index.ts query --json "SELECT COUNT(*) AS n FROM markers"` against the fixture, or count rows in [scenarios.json](../fixtures/golden/scenarios.json) |
76-
| Tier B external + schema | **`test:golden:external`**, Zod in **`scripts/query-golden/schema.ts`** |
77-
| Subset matchers + budgets | **`match`**, **`budgetMs`**, **`--strict-budget`** |
78-
| Optional CI for public corpus | Deferred — [roadmap § Backlog](./roadmap.md#backlog) |
72+
| Area | State |
73+
| ----------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
74+
| Tier A runner + CI | **`bun run test:golden`** in `check` |
75+
| Tier A scenario coverage | Scenarios across every indexed table — files, symbols, imports, import_specifiers, exports, components, dependencies, markers, type_members, calls, scopes, references, bindings, function_params, runtime_markers, test_suites, re_export_chains, module_cycles, file_metrics, CSS vars/classes/keyframes — plus bundled-recipe smoke tests. Current count: `bun src/index.ts query --json "SELECT COUNT(*) AS n FROM markers"` against the fixture, or count rows in [scenarios.json](../fixtures/golden/scenarios.json) |
76+
| Tier B external + schema | **`test:golden:external`**, Zod in **`scripts/query-golden/schema.ts`** |
77+
| Subset matchers + budgets | **`match`**, **`budgetMs`**, **`--strict-budget`** |
78+
| Optional CI for public corpus | Deferred — [roadmap § Backlog](./roadmap.md#backlog) |
7979

8080
---
8181

0 commit comments

Comments
 (0)