Status: open (tiers 7–13) · tiers 1–6 shipped — live tables and
SCHEMA_VERSIONinarchitecture.md § Schema/src/db.ts. Apply executor + eight diff-shape recipes shipped —architecture.md § Apply.Per-tier ship status (fact-checked 2026-05-19): Tiers 1–6 remainder shipped. Tier headings carry the PR landing date for that slice; the remainder wave closed 2026-05-19 (tiers 1–6 foundation landed 2026-05-14–15). Tier 1: call-shape columns, side-effect
import_specifiers+import_id. Tier 2:bindings.resolution_kind='re-exported'. Tier 3:jsx_elements/jsx_attributes. Tier 5:async_calls,try_catch,decorators,jsdoc_tags. Tier 4 partial:symbols.{return_type,is_async,is_generator};generic_params/type_predicatesdeferred. Tier 6 partial:dynamic_imports,files.{is_barrel,has_side_effects};files.is_entrydeferred toc9-plugin-layer.md. Tiers 7–13 open.Motivator: Codemap's distinctive value is the SQL-against-structural-index substrate. Per Moat B — "Extracted structure ≥ verdicts. Schema breadth is the substrate every recipe layers on." — the load-bearing growth axis is what oxc / Lightning CSS / config loaders give us that the index doesn't yet expose. Tiers 1–6 shipped: position-precise calls/imports/exports,
references/scopes/bindings, JSX, behavioral facts, module-graph flags, and more — see architecture § Schema. Tiers 7–13 below still enumerate CSS rule depth, project meta, ORM/SQL tracking, and other AST surfaces we discard at parse time today. Each remaining tier ships as an independent tracer-bullet PR that compounds into a maximal substrate.Tier: XL effort (~3-4 months) spread across ~13 sequential tracer-bullet PRs. No single PR is large; the value compounds. Each tier ships as its own vertical slice (parser → schema → migration → recipes → tests → docs) per
tracer-bullets.Goal stated by the user: "extract as much as possible from the AST and enrich the sqlite db tables, that then unlocks the capabilities we are discussing in this topic AND MORE." The "AND MORE" is the explicit invitation to think past the synthesis doc's write-engine focus — tiers 9–13 below land that.
- Pre-locked decisions
- Open decisions
- Architecture
- The 13 tiers
- Sequencing (DAG)
- Capability matrix — what unlocks post-extraction
- Operational considerations
- What's NOT in scope
- Lifecycle
- Primitive sources + internal cross-references
These commit before any PR opens. Questions opened against them must justify against the linked sources.
| # | Decision | Source |
|---|---|---|
| R.1 | Single-pass extraction. All tier extractors run in one oxc walk per file. No multi-pass over the same AST. Visitor-mode extractors register callbacks per node type; the walk is shared. Performance and correctness — one tree-walk per file is the cheapest contract. | oxc Visitor API; existing extractFileData in parser.ts |
| R.2 | Additive schema. All new substrate is new columns on existing tables OR new tables linked via foreign key. Existing recipes don't break. Schema version bumps trigger one-shot reindex on consumer upgrade (per current SCHEMA_VERSION pattern). |
Existing SCHEMA_VERSION reconciliation in db.ts + architecture.md schema § |
| R.3 | Tier-independent extractors. Proposed capability: each tier's extractor can be enabled / disabled via .codemap/config.{ts,js,json} extraction.<tier>: false. Status 2026-05-18: not implemented; current config has fts5, recipeRecency, and boundaries, but no extraction object. |
Defensive — keeps the substrate growth path opt-out-friendly per the existing fts5: true / boundaries: … config patterns |
| R.4 | Bindings cascade on file change. Incremental reindex of file X invalidates references + bindings + scopes rows for X; recomputes them. Other files' bindings to symbols defined in X don't auto-invalidate — they're recomputed lazily on the next access OR on full rebuild. Acceptable staleness for the common case (consumer edits implementation; consumers' references still resolve correctly until name change). |
SQLite cascade semantics; ON DELETE CASCADE already used pervasively |
| R.5 | Position convention. Lines 1-indexed, columns 0-indexed (byte offsets within line). Matches existing line_number / line_start convention and oxc's native offset format. Mismatched conventions inside one row are a silent foot-gun. |
oxc emits byte offsets; existing offsetToLine already converts |
| R.6 | Column-precise = identifier-token-precise. column_start / column_end are the byte offsets of the actual name / element token, NOT the containing expression's offsets. So foo() records column_start = position of foo, column_end = position after o, not after ). Matches what a rename engine wants. |
LSP Location convention; same as tsserver's reference response |
| R.7 | Recipes own visibility. New extracted facts are queryable substrate; recipes decide what to surface as findings / fixes / actions. No bare verdicts at extraction time. Same discipline as audit verdict defer per roadmap backlog. |
Moat A — verdicts are output mode |
| R.8 | No JS execution at extract time. oxc parses; we walk; we record. Same floor as today's index. No eval, no dynamic resolution, no LLM in the box. |
Floors "No JS execution at index time" |
| R.9 | No hard size ceiling; soft warn at >5× DB growth. Empirical measurement on four real fixtures with a minimal references-only probe (one of the heaviest single tiers in isolation) showed consistent ~3.6-4.5× DB growth at one tier. Projecting all 13 tiers conservatively: ~5-10× growth. SQLite handles 200-500 MB DBs trivially. Users hitting pain on large monorepos opt out of expensive tiers via R.3 — that's the safety valve, not a global ceiling. |
Measured 2026-05-14, four fixtures spanning ~900-2,100 files (see § Operational considerations § Index size growth) |
| R.10 | Latency budget tied to user-visible operations, not DB size. Soft warn when full reindex > 30s OR targeted reindex > 500ms. Measured worst-case (one tier, largest fixture ~2,100 files / 28k symbols): full ~1.9s, targeted ~15ms. Both ~10-60× under the user-stated bottleneck threshold (1 min full / sub-second targeted). Full 13-tier projection still well under budget. | Measured 2026-05-14 (see § Operational considerations § Reindex performance) |
| R.11 | Hand-rolled scope walker in the existing oxc visitor. No library dep. oxc-parser explicitly doesn't construct scopes; no NAPI binding for oxc-semantic yet. Existing scopeStack in parser.ts (used for cyclomatic complexity + call-site scope) extends to a full scope graph. Edge cases (TS namespace merge, declaration hoisting, TDZ) handled conservatively. Status 2026-05-19: the shipped bindings.resolution_kind enum is same-file / imported / re-exported / global / unresolved; the originally proposed ambiguous escape valve did not ship. |
oxc-parser's showSemanticErrors doc explicitly says "the parser does not construct symbols and scopes"; existing scopeStack infrastructure in parser.ts |
| R.12 | Pre-resolve bindings at index time (two-pass). Pass 1 (per file, in worker): extract refs, scopes, local declarations. Pass 2 (main thread, after all files parsed): walk references rows; resolve via same-file scope-walk → imports → exports → re-export chain; populate bindings. Same architecture as today's resolver.ts two-pass for dependencies. Cost: ~25-50% on top of refs-only reindex (projected worst case ~3-4s full on the largest fixture; well under R.10 budget). Recipes get a single-JOIN bindings → symbols instead of recursive-CTE-per-recipe. R.4 cascade extends: single-file reindex deletes that file's bindings rows AND any binding referencing symbols in that file. |
Existing resolver.ts two-pass pattern; dependencies table as precedent |
| R.13 | references.is_write distinguishes reads from writes. Boolean column populated by parent-node-shape check during the visitor pass (AssignmentExpression.left, UpdateExpression, delete, AssignmentPattern, VariableDeclarator.id with initializer, ForOfStatement.left, ForInStatement.left). Compound assignment (x += 1) emits TWO references rows — one with is_write = 0 (the read) and one with is_write = 1 (the write) — at the same (file_path, line_start, column_start). Substrate honesty: recipes that want a single-row-per-position can SELECT DISTINCT. Unlocks immutability audits, side-effect detection, cross-file mutation tracking. |
Cost trivial (one column + ~10 lines of visitor logic); recipe-unlock substantial (no other way to express "find writes to X" without external AST walk) |
| R.14 | FTS5 stays file-content-only. New substrate tables (references, jsx_elements, function_params, decorators, test_suites, …) are NOT indexed via FTS5 by default. Every name / identifier column gets a regular B-tree index, which covers exact match + anchored prefix (LIKE 'use%' / GLOB 'use*') at O(log N). FTS5 only helps unanchored substring search; the row counts at every tier remain small enough (~10-500k) that an unanchored LIKE '%foo%' scan still completes in tens of milliseconds. Cost saved: ~25-90 MB of FTS5 storage per project across all 13 tiers. Per-tier opt-in path: a tier PR can add FTS5 on its own table when a concrete recipe requires unanchored search — schema-additive, no breaking change. |
Existing source_fts keeps its current shape (file-content full-text); empirical row-count + B-tree-index-perf argument; substrate stays lean |
| R.15 | Tier-level opt-out via .codemap/config extraction: { … }; human-readable feature names; Tier 1 always on; orm default-off, others default-on. Config keys are capability-shaped (references, jsx, types, behavioral, moduleGraph, css, projectMeta, tests, runtimeMarkers, metrics, moduleTopology, orm) — never tier numbers or table names. Status 2026-05-18: proposed only; codemapUserConfigSchema has no extraction object today. |
Matches user's mental model (capabilities, not tables); existing config patterns are single-flag-per-feature (fts5: true, boundaries: […], recipeRecency: false) |
| R.16 | Every rebuild-forcing tier bumps SCHEMA_VERSION; full rebuild on mismatch; no in-place migrations. Existing schema-mismatch logic (createSchema() wrapping dropAll() + createTables() + createIndexes()) handles rebuild-forcing upgrades transparently. User-data tables (coverage, query_baselines, recipe_recency) stay protected via the existing dropAll() exclusion list; config-derived boundary_rules is intentionally rebuilt, not preserved. Empirical worst case across measured fixtures: full rebuild ~2s on a 28k-symbol enterprise app. Reject in-place ALTER TABLE migration scripts until concrete demand emerges. |
Empirical rebuild cost (R.10); existing dropAll() exclusion list protects user data while derivable/config data rebuilds |
| R.17 | Extractor modules (src/extractors/*.ts) are partially shipped, but not the proposed per-tier registry. Current source has dedicated extractor helpers and parser.ts is smaller than the original monolith, but extractors still run through the existing parser orchestration and are not filtered through cfg.extraction[tierId]. Future tier PRs should extend the existing extractor module layout instead of assuming the proposed register(visitor, ctx) API exists. |
Existing extractor modules + parser.ts; LanguageAdapter pattern in src/adapters/builtin.ts remains the precedent for first-class pluggable units |
| R.18 | Every tier PR ships ≥1 flagship recipe + golden fixture. Definition-of-Done for every tier PR: (a) substrate (schema + extractor + extractor tests); (b) one bundled recipe under templates/recipes/<id>.{sql,md} exercising the new substrate via real JOIN paths; (c) one golden fixture under fixtures/golden/<recipe-id>.json so the recipe is regression-tested in CI. Flagship recipe designated in the tier's plan section (currently lists 3-7 "Recipes unlocked" candidates — one gets marked "flagship" per tier). Additional candidate recipes bundle in same PR if cheap, or ship as follow-ups at author discretion. Extension recipes (e.g. Tier 5's calls.{line_start, column_start} letting rename-preview grow a call_rows CTE) ship in the same PR as their substrate. Validates substrate at ship time; catches schema-shape mistakes via real query exercise; honors Moat A reverse-test ("if we remove this column, what recipe dies?"). Avoid hardcoding recipe counts; derive the current catalog from templates/recipes/*.sql or codemap query --recipes-json. |
Moat A's reviewer test demands substrate be queryable |
Each gets a "Resolution" subsection below as it crystallises (mirrors lsp-diagnostic-push.md pattern). Numbered for stable citation from future plan PRs.
-
Q1 —
referencesresolution strategy. RESOLVED 2026-05-14 — promoted to R.11. Hand-rolled scope walker in existing oxc visitor; no library dep; reuses single-pass extraction. The shipped conservative fallback isresolution_kind = 'unresolved'; the originally proposedambiguousenum value did not ship. -
Q2 — Multi-file binding resolution. RESOLVED 2026-05-14 — promoted to R.12. Pre-resolve at index time (two-pass), same architecture as today's
resolver.ts. Pays the cost once at index time; recipes get cheap single-JOIN access. -
Q3 — Type-text stringification fidelity. Today
symbols.signaturestringifies types viastringifyTypeNode. Tier 4 extends to per-param + per-generic + return-type + predicate-target. Same stringification approach? Or shift to a richer normalized form (canonicalize whitespace; sort union members; etc.)? Plan PR for Tier 4 settles. -
Q4 — JSX element parent linking.
jsx_elements.parent_element_idrequires either second pass (after the entire tree is parsed) or order-of-emit guarantee (parent visited before children with stable IDs). oxc walks top-down by default; record IDs eagerly and link in a post-emit pass within the same parser invocation. -
Q5 — Loop / try / scope context tracking. Walking the AST top-down — how does
async_calls.in_loopknow it's inside a loop? Maintain a context stack (push on enter ForStatement/WhileStatement/etc., pop on exit). Same forin_try/in_async_fn. Visitor state shape settles in Tier 5 PR. -
Q6 — Decorator target resolution. Decorators in source appear BEFORE the symbol they decorate. Resolution requires post-pass linking — record decorator nodes with their position, then link to the following ClassDeclaration / MethodDefinition / PropertyDefinition once visited. Same pattern as Q4.
-
Q7 — JSDoc tag schema. Free-form
descriptiontext per tag, OR structured per-tag-shape (each@paramparsed intoname+type_text+description)? Bias toward structured — query power is the point. Settle in Tier 5 PR. -
Q8 — Test-framework detection.
describe/it/testare global functions in test files. Detect by: (a) config glob (test: ['**/*.test.ts', '**/*.spec.ts']); (b) file extension match (.test.,.spec.); (c) import-presence check (from 'vitest'/'@jest/globals'/'node:test'). Bias toward (b) + (c) — file extension as cheap default; import-presence as strong signal. -
Q9 — Index size budget. RESOLVED empirically 2026-05-14 — promoted to R.9. Four-fixture probe (one tier, references-only). DB grows ~4× at one tier; projected ~5-10× at full 13 tiers. No hard ceiling; per-tier opt-out (R.3) is the safety valve. Full table in § Operational considerations.
-
Q10 — Reindex performance regression. RESOLVED empirically 2026-05-14 — promoted to R.10. Full reindex ~2-2.6× slower at one tier; targeted reindex stays flat (~10-30ms regardless of project size). Largest fixture measured: ~1.9s full / 15ms targeted. Full table in § Operational considerations.
-
Q11 — Per-tier opt-out shape. RESOLVED 2026-05-14 — promoted to R.15. Tier-level opt-out with capability-shaped names; Tier 1 always on;
ormdefault-off; others default-on. -
Q12 — FTS5 integration. RESOLVED 2026-05-14 — promoted to R.14. FTS5 stays file-content-only; new substrate columns get regular B-tree indexes; per-tier opt-in path stays open for concrete recipe demand.
-
Q13 — Worker-thread message shape. Today
parse-worker.tsemits oneParsedFilemessage per file. With many tiers, that message becomes large (~10-20KB per file → ~100-200KB). Worker IPC handles this fine; no architectural change needed but plan PR confirms. -
Q14 — In-place schema migration. RESOLVED 2026-05-14 — promoted to R.16. Every tier bumps
SCHEMA_VERSION; full rebuild on mismatch; reject in-place migrations. Empirical rebuild cost (~2s worst case) makes optimisation unjustified. -
Q16 — Extractor-registration architecture. RESOLVED 2026-05-14 — decision locked in R.17. Target shape is per-tier modules under
src/extractors/<tier>.ts; not implemented as theTierExtractor { register(visitor, ctx) }registry — extend today's extractor layout per R.17. (Question added during the grill — not in the original Q1-Q15 numbering.) -
Q15 — Indexing strategy on new tables. SQLite indexes for the new tables — which columns get B-tree indexes?
references(file_path, name),references(resolved_symbol_id),jsx_elements(component_name),bindings(resolved_symbol_id)are the obvious ones. Plan PR for each tier settles its indexing strategy.
┌─────────────────────────────────────────────────────────────┐
│ parse-worker.ts (one Worker thread per file) │
│ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ oxc-parser.parseSync(filePath, source, lang) │ │
│ │ ↳ returns ASTRoot │ │
│ └─────────────────────┬──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ Visitor (single tree-walk) │ │
│ │ │ │
│ │ on each node, dispatch to registered extractors:│ │
│ │ • Tier 1 extractor (positions on existing) │ │
│ │ • Tier 2 extractor (references + scopes) │ │
│ │ • Tier 3 extractor (JSX) │ │
│ │ • Tier 4 extractor (type depth) │ │
│ │ • Tier 5 extractor (behavioral) │ │
│ │ • Tier 9 extractor (test suites) │ │
│ │ • Tier 10 extractor (suppressions/markers) │ │
│ │ • Tier 11 extractor (metrics) │ │
│ │ │ │
│ │ each extractor maintains its own per-file state │ │
│ │ (scope stack, loop context, decorator pending) │ │
│ └─────────────────────┬──────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌────────────────────────────────────────────────────┐ │
│ │ ParsedFile message (rich) │ │
│ │ ↳ symbols, imports, exports, calls, │ │
│ │ references, scopes, jsx_elements, … │ │
│ └─────────────────────┬──────────────────────────────┘ │
└─────────────────────────┼────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────────┐
│ index-engine.ts (main thread) │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Pass 1 — file-local inserts (transactional per file)│ │
│ │ ↳ symbols, imports, exports, calls, … │ │
│ │ ↳ references (file-local resolution) │ │
│ │ ↳ scopes │ │
│ │ ↳ jsx_elements + attributes │ │
│ │ ↳ … │ │
│ └──────────────────────────────────────────────────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ Pass 2 — cross-file binding resolution │ │
│ │ ↳ resolve `references` to `symbols` via │ │
│ │ imports + exports JOIN │ │
│ │ ↳ populate `bindings` │ │
│ │ ↳ resolve decorator targets │ │
│ │ ↳ resolve JSX element parent links │ │
│ │ ↳ flatten re_export_chains │ │
│ │ ↳ compute module_graph_facts (Tier 12) │ │
│ └──────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────┘
Pass 2 runs after all files complete pass 1. Incremental reindex of N files runs pass 1 per file + a scoped pass 2 over the changed files' binding closure.
Per R.2 + Q14: bump SCHEMA_VERSION only when a tier's DDL forces a rebuild; additive tables / columns can land through CREATE ... IF NOT EXISTS. The first time a user hits a rebuild-forcing schema change, the index drops and rebuilds. ~30s on a 100k-symbol project; one-time cost.
Alternative for advanced users: a future codemap migrate --in-place command runs additive ALTER TABLE for new columns + extracts new tables from existing files without re-parsing. Defer until cheap-migration demand surfaces.
Per Q13: today's parse-worker.ts emits ParsedFile. Extend the message shape additively — new fields per tier; existing fields unchanged. Workers don't need new IPC infrastructure; only the message-shape contract grows.
One-tier projection (extrapolated from the references-only probe measured below in § Operational considerations) holds steady at ~4× DB growth. Multi-tier projection assumes additive cost across tiers — most other tiers extract substantially less data than references (positions on existing tables, scope graph, JSX attributes, etc. each add far fewer rows). Conservative multi-tier estimate: ~5-10× growth across all 13 tiers.
| Project size (measured) | Pre-extraction DB | All-13-tier projected DB | Pre-extraction reindex | All-13-tier projected reindex |
|---|---|---|---|---|
| Small (~900 files, 11k symbols) | ~11 MB | ~60-110 MB | ~280 ms | ~1-2 s |
| Medium-docs (~1.8k files, 8k symbols) | ~10 MB | ~50-100 MB | ~310 ms | ~1-2 s |
| Medium-code (~1.8k files, 27k symbols) | ~18 MB | ~90-180 MB | ~570 ms | ~3-5 s |
| Large-app (~2.1k files, 28k symbols) | ~38 MB | ~190-380 MB | ~740 ms | ~4-6 s |
All four projections sit well under the Floors-relevant "codemap becomes a bottleneck" thresholds the user set (full > 1 min, targeted > 1 s). Accept the growth — the database is the product. The proposed extraction config would let monorepo users opt out of expensive tiers, but that config surface is not implemented today.
Each tier is one tracer-bullet PR: parser visitor change + schema migration + 1-2 example recipes + tests + docs entry. Sections below capture: Goal (one sentence), Schema delta (DDL), Visitor strategy (key extraction logic), Recipes unlocked (example queries + new recipe candidates), Effort (S/M/L with week estimate), Dependencies (other tiers that must ship first), Tier-specific open questions.
Goal: Make calls / exports / symbols / markers column-precise; split imports.specifiers JSON blob into a typed child table.
Ship status (fact-checked 2026-05-19): Tier 1 remainder shipped — calls.{args_count,is_method_call,is_constructor_call,is_optional_chain}, side-effect import_specifiers rows, and import_id FK. Position columns from 2026-05-14 remain.
| Slice | Substrate | Flagship recipe | Schema bump |
|---|---|---|---|
| 1.A | calls.{line_start, column_start, column_end} + idx_calls_position; call-shape flags {args_count,is_method_call,is_constructor_call,is_optional_chain} shipped 2026-05-19 |
find-call-sites (--params callee=…) |
10 → 11 |
| 1.B | exports.{line_start, line_end, column_start, column_end, is_re_export} + 2 indexes |
find-export-sites (--params name=…) |
11 → 12 |
| 1.C | symbols.{name_column_start, name_column_end} + markers.{column_start, column_end} |
find-symbol-definitions (--params name=…) |
12 → 13 |
| 1.D | import_specifiers child table (file_path, source, line, column_start/end, imported_name, local_name, kind, is_type_only) + 4 indexes |
find-import-sites (--params imported_name=…) |
13 → 14 |
Empirical post-Tier-1 cost (clean rebuild, median of 3 runs):
| Fixture | Pre-Tier-1 DB | Post-Tier-1 DB | Δ DB | Pre-Tier-1 full | Post-Tier-1 full | Δ full | Targeted (post) |
|---|---|---|---|---|---|---|---|
| codemap-self (924 files, 11.7k symbols) | 11.4 MB | 14.3 MB | +25% | ~280 ms | ~300 ms | +7% | ~15 ms |
| large-app (2,120 files, 28.5k symbols) | 37.5 MB | 50.1 MB | +33% | ~740 ms | ~900 ms | +22% | ~16 ms |
Targeted reindex stays flat (~15 ms regardless of project size — Tier 1's adds are per-row not whole-table). Full reindex worst case: 0.9 s on a 2.1k-file enterprise React app — 66× under R.10's 1-min pain threshold. DB growth (+25-33%) is well under R.9's "~5-10× total across 13 tiers" projection — Tier 1 used ~25-33% of that budget.
Validation: 930/930 tests pass · 19 golden scenarios pass (including 4 new) · format clean · lint 0/0 · row counts preserved for unchanged tables.
Schema delta:
ALTER TABLE calls ADD COLUMN line_start INTEGER NOT NULL;
ALTER TABLE calls ADD COLUMN column_start INTEGER NOT NULL;
ALTER TABLE calls ADD COLUMN column_end INTEGER NOT NULL;
ALTER TABLE calls ADD COLUMN args_count INTEGER NOT NULL;
ALTER TABLE calls ADD COLUMN is_method_call INTEGER NOT NULL DEFAULT 0;
ALTER TABLE calls ADD COLUMN is_constructor_call INTEGER NOT NULL DEFAULT 0;
ALTER TABLE calls ADD COLUMN is_optional_chain INTEGER NOT NULL DEFAULT 0;
ALTER TABLE exports ADD COLUMN line_start INTEGER NOT NULL;
ALTER TABLE exports ADD COLUMN line_end INTEGER NOT NULL;
ALTER TABLE exports ADD COLUMN column_start INTEGER NOT NULL;
ALTER TABLE exports ADD COLUMN column_end INTEGER NOT NULL;
ALTER TABLE exports ADD COLUMN is_re_export INTEGER NOT NULL DEFAULT 0;
ALTER TABLE symbols ADD COLUMN name_column_start INTEGER NOT NULL;
ALTER TABLE symbols ADD COLUMN name_column_end INTEGER NOT NULL;
ALTER TABLE markers ADD COLUMN column_start INTEGER NOT NULL;
ALTER TABLE markers ADD COLUMN column_end INTEGER NOT NULL;
CREATE TABLE import_specifiers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
import_id INTEGER NOT NULL REFERENCES imports(id) ON DELETE CASCADE,
line INTEGER NOT NULL,
column_start INTEGER NOT NULL,
column_end INTEGER NOT NULL,
imported_name TEXT NOT NULL,
local_name TEXT NOT NULL,
kind TEXT NOT NULL CHECK (kind IN ('named','default','namespace','side-effect')),
is_type_only INTEGER NOT NULL DEFAULT 0
) STRICT;
CREATE INDEX idx_import_specifiers_name ON import_specifiers(imported_name);
CREATE INDEX idx_import_specifiers_filepath ON import_specifiers(file_path);Visitor strategy: Every existing extractor that records a name records node.name.start + node.name.end (the identifier token, not the parent expression). For CallExpression, distinguish Identifier callee from MemberExpression callee (sets is_method_call); flag NewExpression (is_constructor_call); flag ChainExpression parent (is_optional_chain); count arguments.length (args_count). For imports, walk each ImportSpecifier / ImportDefaultSpecifier / ImportNamespaceSpecifier and emit import_specifiers rows.
Recipes unlocked:
SELECT * FROM calls WHERE callee_name = 'foo' AND is_method_call = 0;
SELECT * FROM import_specifiers WHERE imported_name = 'oldName';New recipe candidates: dedupe-imports, consolidate-type-only-imports, stale-imports (column-precise specifier delete).
Effort: S (~1 week). All visitor logic already runs; just record positions.
Dependencies: None.
Tier-specific open questions:
- (a) Should
args_countdistinguish spread args (foo(...args)→ unknown count)? Bias towardNULLfor spread-containing calls. - (b)
import_specifiers.local_namevsimported_name—import { foo as bar }→imported_name='foo',local_name='bar'. Both columns required.
Status (fact-checked 2026-05-19): Tier 2 shipped including bindings.resolution_kind='re-exported' (2026-05-19). references, scopes, and bindings populated. Richer proposed kind taxonomy (decorator, shorthand-*, …) and bindings.namespace remain deferred.
Goal: Every identifier use — call, type position, JSX, decorator, shorthand, member access, spread — becomes a queryable row. Plus a lexical scope graph and per-reference binding resolution to the originating symbol.
Schema delta:
CREATE TABLE references (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
name TEXT NOT NULL,
line_start INTEGER NOT NULL,
column_start INTEGER NOT NULL,
column_end INTEGER NOT NULL,
kind TEXT NOT NULL CHECK (kind IN (
'value','type','jsx','decorator','shorthand-prop','shorthand-import',
'member-access','computed-member','spread','rest','as-cast','typeof','keyof'
)),
scope_id INTEGER REFERENCES scopes(id),
is_write INTEGER NOT NULL DEFAULT 0
) STRICT;
CREATE INDEX idx_references_name ON references(name);
CREATE INDEX idx_references_filepath ON references(file_path);
CREATE INDEX idx_references_scope ON references(scope_id);
CREATE TABLE scopes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
kind TEXT NOT NULL CHECK (kind IN (
'module','function','arrow','block','class','for','catch','with','case'
)),
parent_scope_id INTEGER REFERENCES scopes(id),
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
owner_symbol_id INTEGER REFERENCES symbols(id)
) STRICT;
CREATE INDEX idx_scopes_parent ON scopes(parent_scope_id);
CREATE INDEX idx_scopes_filepath ON scopes(file_path);
CREATE TABLE bindings (
reference_id INTEGER PRIMARY KEY REFERENCES references(id) ON DELETE CASCADE,
resolved_symbol_id INTEGER REFERENCES symbols(id),
namespace TEXT NOT NULL CHECK (namespace IN ('value','type','member','default')),
is_external INTEGER NOT NULL DEFAULT 0,
resolution_kind TEXT NOT NULL CHECK (resolution_kind IN (
'same-file','imported','re-exported','global','unresolved'
))
) STRICT;
CREATE INDEX idx_bindings_resolved ON bindings(resolved_symbol_id);Visitor strategy: Two-pass per Q1 + Q2.
Pass 1 (per file, in worker): maintain a scope stack; push on entering Function*Declaration / ArrowFunction / BlockStatement / ClassDeclaration / ForStatement / CatchClause. Pop on exit. Every Identifier node visit records a references row with the current scope_id. Distinguish kinds per Q1 of Tier 2: value for normal reads, type for TSTypeReference containers, jsx for JSXIdentifier, decorator for Decorator children, shorthand-prop for Property.shorthand, etc.
Pass 2 (main thread): for each references row, resolve bindings:
- Same-file lookup: scope-walk upward from
scope_idlooking for asymbolsrow with matching name + compatible namespace. - If no same-file match: check
importsfor matchinglocal_namein same file; if found, follow toimport_specifiers→exports→symbolsin the resolved-path file. - If still unresolved: re-export chain walk via Tier 6's
re_export_chains. - If still unresolved: mark
resolution_kind = 'global'(e.g.console,window) or'unresolved'.
Recipes unlocked:
SELECT r.* FROM references r
JOIN bindings b ON b.reference_id = r.id
JOIN symbols s ON s.id = b.resolved_symbol_id
WHERE s.name = 'usePermissions' AND b.namespace = 'value';
SELECT r1.*, r2.scope_id AS shadowed_by_scope
FROM references r1
JOIN references r2 ON r1.name = r2.name AND r1.file_path = r2.file_path
WHERE r1.scope_id != r2.scope_id AND r1.kind = 'value';
SELECT s.name, COUNT(r.id) AS reads, SUM(r.is_write) AS writes
FROM symbols s
LEFT JOIN bindings b ON b.resolved_symbol_id = s.id
LEFT JOIN references r ON r.id = b.reference_id
GROUP BY s.id;
SELECT * FROM symbols s
WHERE s.is_exported = 1
AND NOT EXISTS (
SELECT 1 FROM bindings b
JOIN references r ON r.id = b.reference_id
WHERE b.resolved_symbol_id = s.id
AND r.file_path != s.file_path
);New recipe candidates: rename-app-wide (extends rename-preview to JOIN references); unused-export (precise; subsumes the heuristic unimported-exports); shadowed-names; unused-locals; find-typeof-uses.
Effort: L (~3 weeks). Biggest single oxc-visitor expansion in the entire plan. Scope-stack discipline + cross-file pass 2 + benchmark validation against medium-size project.
Dependencies: Tier 1 (need column positions to populate references).
Tier-specific open questions:
- (a)
references.is_write— RESOLVED 2026-05-14 → R.13. Boolean column; compound assignment emits two rows (one read, one write). - (b) Per Q2 / R.12: pre-resolution settled.
- (c) Per Q12 / R.14: FTS5 stays file-content-only; B-tree index on
references.nameis the strategy.
Tier 2 ship report (2026-05-15):
What landed (commit <tier2> against SCHEMA_VERSION 16):
scopestable — composite PK(file_path, local_id),WITHOUT ROWID.local_idis a per-file 0-based counter assigned at parse time so refs encode their scope without round-tripping SQLite autoincrement. Kinds:module/function/arrow/class/method. Block / for / catch deferred — R.11's conservative-on-ambiguity escape valve covers it (body refs resolve to enclosing function scope).referencestable —(id, file_path, name, line_start, column_start, column_end, kind, scope_local_id, is_write). Kinds shipped:value/type/jsx. Reserved kinds in the original CHECK enum (decorator/shorthand-prop/member-access/ etc.) deferred — see "what didn't ship" below.is_writeper R.13 — handled viawritePositions/suppressedReadssets keyed by node.start. Pre-marker handlers forAssignmentExpression(simple=suppresses the read),UpdateExpression(++/--dual-emits),UnaryExpression(delete)(dual-emit),VariableDeclaratorwith initializer (write-only),ForOfStatement/ForInStatementLHS,AssignmentPattern.- Declaration suppression —
FunctionDeclaration/ClassDeclaration/TSInterfaceDeclaration/TSTypeAliasDeclaration/TSEnumDeclaration/TSModuleDeclaration.idIdentifiers are NOT emitted as references. They live insymbols(Tier 1'sname_column_start/end). App-wide rename consumers query both tables; this avoids duplication. - Shorthand dedup — oxc walker visits the SAME Identifier twice when
import {foo}/export {foo}/{foo}(Property shorthand) share the imported/local / exported/local / key/value nodes. Dedup by(node.start, is_write)in the Identifier handler. referencesExtractor— new module per R.17 (src/extractors/references.ts, 132 lines). Registers Identifier / JSXIdentifier / TSTypeReference + the parent pre-markers above. Wires throughExtractContext/ParsedFile/index-engine.ScopeTrackerextension —pushKind(name, kind, lineStart, lineEnd)+currentLocalId()+getRecorded(). Factory acceptsfilePath(needed for scope rows). Module scope (local_id = 0) eagerly inserted;finaliseModule(lineEnd)updates itsline_endafter the walk.- Recipes (Slice 2.D):
find-references --params name=X(refs JOIN scopes, every kind/is_write),find-write-sites --params name=X(is_write=1filter). Both have golden fixtures. - Schema bumps: 14 → 15 (scopes) → 16 (references). Full rebuild on bump per R.16.
What didn't ship (deferred to Tier 2.1, a focused follow-up slice):
bindingstable + pass-2 resolution — same-file scope walk → imports → exports → re-export chains. The most complex piece of R.12; isolating it as Tier 2.1 keeps Tier 2 reviewable.- Reference kinds:
decorator/shorthand-prop/shorthand-import/member-access/computed-member/spread/rest/as-cast/typeof/keyof. The currentCHECKenum is narrowed tovalue/type/jsx. Tier 2.1 expands the enum and adds handlers. - Block / for / catch scope kinds — body refs currently resolve to the enclosing function/class scope (acceptable per R.11's conservative-on-ambiguity escape valve). True block scoping arrives with a Tier 2.2.
Empirical perf (codemap-self, 925 files):
| Metric | Pre-Tier 2 (Tier 1 shipped) | Post-Tier 2 | Delta |
|---|---|---|---|
| Full reindex | ~300 ms | 767 ms | +2.5× |
| Targeted reindex (1 file) | 8 ms | 9 ms | +12% |
| DB rows | n/a | 127,313 references / 2,062 scopes | new |
| Write refs | n/a | ~10k (~8% of all refs) | new |
Both numbers are well within the plan's thresholds (full < 1 min, targeted < 100 ms per R.9 / R.10).
Tier 2.1 ship report (2026-05-15):
What landed (SCHEMA_VERSION 16 → 17):
bindingstable —(reference_id, resolved_symbol_id, resolution_kind, is_external). PK onreference_id.resolution_kindenum:same-file/imported/global/unresolvedat initial ship;re-exportedshipped 2026-05-19 (Tier 6 rollout).symbols.scope_local_idcolumn — captured BEFORE the symbol's own scope is pushed (so it points at the declaring scope, not the body). Class members anchor to their class's pushed scope.resolveBindingsengine (src/application/bindings-engine.ts) — two-phase: one SELECT per table into in-memory Maps, then per-reference resolution via scope-walk → imports → globals → unresolved. ~300ms for ~127k refs.- Cross-file resolution uses
imports.resolved_path(notdependencies, which lacks the module specifier). Whenimp.imported_namematches an export and the target file has a module-scope symbol of the same name →is_external=0with a real symbol id. Non-indexed module →is_external=1. - Run-cadence: pass-2 runs only on full rebuild to honor R.10's <100ms targeted contract. Orphan bindings rows are CASCADE-cleared on incremental edits; the next
--fullre-resolves. - Recipe:
find-symbol-references --params name=X,file_path=Y— bindings-precise (filters same-name shadows + different-source imports).
Empirical (codemap-self, 932 files):
| Metric | Pre-Tier 2.1 | Post-Tier 2.1 | Delta |
|---|---|---|---|
| Full reindex | 767 ms | 1175 ms | +53% |
| Targeted (1 file) | 9 ms | 9 ms | 0 |
| Binding distribution | n/a | 33% same-file / 17% imported / 4% global / 45% unresolved | new |
The 45% unresolved bucket is mostly TypeScript type parameters (T/K/V), function parameters (not yet in symbols), object-pattern keys, and JSX prop names — expected v1 shape; future tiers (function-params + type-params extraction) shrink it.
Deferred to Tier 2.2 (shipped 2026-05-15 — see report below):
- Re-export chain walking (currently
import { x } from './barrel'where the barrel doesexport { x } from './x'resolves to the barrel's export row, not the original symbol). - Function-parameter symbols (would shrink the unresolved bucket significantly).
- Type-parameter symbols (
<T>,<K, V>).
Tier 2.2 ship report (2026-05-15):
What landed:
- Function/method/arrow params as
symbolsrows withkind='param',scope_local_id= the function's own scope. Constructor params withpublic/private/readonly(TSParameterProperty) emit at class scope, not method scope, matching their TS semantics. Destructuring patterns (Array/Object) deferred; v1 covers Identifier / AssignmentPattern / RestElement / TSParameterProperty. - Type params (
<T>,<K, V>) askind='type-param'forFunctionDeclaration,ClassDeclaration, arrow vars, and class methods. Interfaces / type aliases skipped — they don't push their own scope, so same-letter type params across multiple interfaces would collide at module scope (deferred). - Re-export chain walking in
bindings-engine. Bounded at 10 hops with cycle detection. Path resolution is relative-only (./foo,../bar) against the indexed-paths set — bare specifiers (react) stay external.find-symbol-referencesnow resolves through barrel files to the original definition. pushParams/pushTypeParamshelpers insrc/extractors/params.ts. Called fromsymbolsExtractor(FunctionDeclaration, VariableDeclaration arrow) +scopesExtractor(MethodDefinition) so the just-pushed scope is the param'sscope_local_id.
Empirical (codemap-self, 933 files):
| Metric | Pre-Tier 2.2 | Post-Tier 2.2 | Delta |
|---|---|---|---|
| Symbols | ~11.8k | 14k | +2.2k (params + type-params) |
| Same-file refs | 42,257 | 51,299 | +9,042 (+21%) |
| Unresolved refs | 58,073 | 49,534 | -8,539 (-15%) |
| Unresolved % | 45% | 39% | down |
| Full reindex | 1175 ms | 1513 ms | +29% |
| Targeted (1 file) | 9 ms | 9 ms | no regression |
Deferred to Tier 2.3 (shipped 2026-05-15 — see report below):
- Destructuring pattern params (
function f({ a, b }) {...}— emits 0 param symbols today). - Interface / type-alias type-param scoping (would unlock generic-type refs inside type definitions).
- Callback arrow scoping (
arr.map((x) => …)—xis currently scoped to the enclosing function, not the arrow). - External-module bindings (
react, etc.) viapackage.jsontypeslookup → indexed.d.tsparsing.
Tier 2.3 ship report (2026-05-15):
What landed (SCHEMA_VERSION 17 → 18):
kind='member'references for non-computed property access (obj.foo). Replaces the misleadingvalue-kind emission of property names. Bindings resolver skips these — they're property names, not bindings. Cut unresolved by ~50% on its own.- Object-literal / class-member key suppression. Long-hand
Propertykeys ({ name: "x" }),MethodDefinitionkeys,PropertyDefinitionkeys,TSPropertySignature/TSMethodSignaturekeys are NOT emitted as references. Shorthand and computed keys still emit normally. - Destructuring pattern bindings for
const { a, b } = objandconst [x, y] = arr.walkPatterngenerator handles Identifier / AssignmentPattern / RestElement / ObjectPattern / ArrayPattern / TSParameterProperty recursively; same helper used bypushParams(which now handlesfunction f({a, b})correctly) andpushDestructuredVars(variable destructuring at the parent scope). TYPE_GLOBALSset inbindings-engine— TypeScript built-ins (Record,Partial,ReadonlyArray,Map, etc.) resolve toresolution_kind='global'instead ofunresolved.- Extra value globals added:
performance,import,require,module,exports,__dirname,__filename,self. as constskip:TSTypeReferencewith nameconstis no longer emitted (it's a TS keyword in type position).
Empirical (codemap-self, 933 files):
| Metric | Pre-Tier 2.3 | Post-Tier 2.3 | Delta |
|---|---|---|---|
| Total refs | 127,313 | 128,387 (+1k member kind) | small |
references.kind = 'member' |
0 | 26,701 | new |
| Bindings rows | 127k | 84k | -34% (members skipped) |
| Same-file | 51,299 | 51,972 | +1.3% |
| Imported | 21,889 | 21,777 | -0.5% |
| Global | 5,554 | 5,954 | +7% (type globals) |
| Unresolved | 49,534 (39%) | 4,634 (5.5%) | -90% / -34 pts |
| Full reindex | 1513 ms | 1025 ms | -32% (less binding work) |
| Targeted (1 file) | 9 ms | 9 ms | no regression |
Tier 2.4 ship report (2026-05-15): the final close.
What landed:
claimedScopeNodes: WeakSet<object>onExtractContext. Every extractor that pushes a scope for a specific AST node marks the node here so downstream extractors don't double-push.ArrowFunctionExpressionhandler inscopesExtractor. For arrows NOT already claimed (i.e. callback arrows likearr.map((s) => …)), pushes an anonymous "arrow" scope and emits params. Named arrows (const foo = () => …) stay claimed bysymbolsExtractor.VariableDeclarationand don't double-push.CatchClausehandler —try { … } catch (err) { … }gets its own anonymous scope, with the param emitted as akind='param'symbol. Bindinglesscatch { … }(TS 4.4+ optional binding) is handled.ScopeTracker.currentParentwalks past anonymous scopes (empty-name) soparent_nameof nested symbols still anchors to the nearest named owner — preserving pre-Tier-2.4 semantics forconst foo = () => { const bar = … }.- Extra globals:
Bun,Denoruntime globals.
Empirical (codemap-self, 933 files):
| Metric | Pre-Tier 2.4 | Post-Tier 2.4 | Delta |
|---|---|---|---|
| Same-file | 51,972 | 55,480 | +6.7% |
| Global | 5,954 | 6,019 | +1.1% |
| Unresolved | 4,634 (5.5%) | 1,102 (1.3%) | -76% / -4.2 pts |
| Full reindex | 1025 ms | 1224 ms | +19% |
| Targeted (1 file) | 9 ms | 9 ms | no regression |
Deferred to a future slice (out of Tier 2 scope):
- Interface / type-alias type-param scoping (currently same-letter type params across interfaces collide at module scope).
- External-module bindings via
.d.tsparsing (react'suseState, etc. resolve tois_external=1, resolved_symbol_id=NULL). for (const x of …)/for (const x in …)body-scoping (todayxshares parent function scope).
Tier 2 closed at 1.3% unresolved. The remaining bucket is dominated by truly unindexable refs (infer T in conditional types, .codemap/audit-cache/ re-indexed extract trees) and a thin tail of complex AST patterns.
Ship status (2026-05-19): Shipped. jsx_elements / jsx_attributes in src/db.ts; extractor src/extractors/jsx.ts; recipe find-jsx-usages.
Goal: Every JSX element + every JSX attribute becomes a queryable row with column-precise positions.
Schema delta:
CREATE TABLE jsx_elements (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
component_name TEXT NOT NULL,
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
column_start INTEGER NOT NULL,
column_end INTEGER NOT NULL,
is_self_closing INTEGER NOT NULL DEFAULT 0,
is_fragment INTEGER NOT NULL DEFAULT 0,
namespace_prefix TEXT,
parent_element_id INTEGER REFERENCES jsx_elements(id),
children_count INTEGER NOT NULL DEFAULT 0,
is_lowercase INTEGER NOT NULL DEFAULT 0
) STRICT;
CREATE INDEX idx_jsx_elements_name ON jsx_elements(component_name);
CREATE INDEX idx_jsx_elements_filepath ON jsx_elements(file_path);
CREATE TABLE jsx_attributes (
id INTEGER PRIMARY KEY AUTOINCREMENT,
element_id INTEGER NOT NULL REFERENCES jsx_elements(id) ON DELETE CASCADE,
name TEXT NOT NULL,
line INTEGER NOT NULL,
column_start INTEGER NOT NULL,
column_end INTEGER NOT NULL,
value_kind TEXT NOT NULL CHECK (value_kind IN ('string','expression','boolean','spread','element')),
value_text TEXT
) STRICT;
CREATE INDEX idx_jsx_attrs_name ON jsx_attributes(name);
CREATE INDEX idx_jsx_attrs_element ON jsx_attributes(element_id);Visitor strategy: Visit JSXElement / JSXFragment nodes. Record element name from JSXOpeningElement.name. For each JSXAttribute child, record name + value (string literal text, expression source text, or 'spread' for JSXSpreadAttribute). Track parent linkage in a post-emit pass (Q4) — element IDs assigned eagerly; parent_element_id filled after the full file's tree is collected. is_lowercase distinguishes HTML elements (<div>) from React components (<Div>).
Recipes unlocked:
SELECT e.* FROM jsx_elements e
WHERE e.component_name = 'Link';
SELECT e.file_path, e.line_start, a.value_text
FROM jsx_elements e
JOIN jsx_attributes a ON a.element_id = e.id
WHERE e.component_name = 'Button' AND a.name = 'onClick';
SELECT a.* FROM jsx_attributes a
WHERE a.value_kind = 'spread';
SELECT * FROM jsx_elements WHERE component_name = 'Foo' AND children_count = 0;New recipe candidates: rename-component (alongside rename-app-wide); migrate-jsx-prop; find-spread-props; unused-jsx-components.
Effort: M (~1-2 weeks). oxc parser exposes JSX nodes natively; straightforward visitor pass. Parent-linking post-pass adds slight complexity.
Dependencies: Tier 1 (column positions), Tier 2 (references row per JSX element name → enables JOIN to bindings).
Tier-specific open questions:
- (a) Should
value_textcapture the literal source text or normalised form? Source text — recipes that rewrite need to know what's actually there. - (b) Fragment shorthand
<>...</>— emit ajsx_elementsrow withis_fragment = 1andcomponent_name = ''? Yes. - (c) Children — emit as rows linked via
parent_element_id, or only count? Both — children get rows AND parent haschildren_count.
Ship status (2026-05-19): function_params table shipped via Tier 2.2 (different keying — (file_path, owner_name, owner_kind, position) instead of symbol_id-FK; columns name / type_text / default_text / is_rest / is_optional + position triplet). Params also emit as symbols rows with kind='param' so cross-file binding resolution works. symbols.{return_type,is_async,is_generator} shipped 2026-05-19. Deferred: generic_params table (type-params currently emit as symbols.kind='type-param' instead — adequate for binding resolution; structured constraint/default columns deferred); type_predicates table; symbols.throws_clauses. Recipes that need per-param JOINs work today against the shipped function_params; recipes needing predicates / generics-with-constraints stay open.
Goal: Function parameters + generic parameters + type predicates + return types become structured queryable facts, not just stringified into symbols.signature.
Schema delta:
CREATE TABLE function_params (
id INTEGER PRIMARY KEY AUTOINCREMENT,
symbol_id INTEGER NOT NULL REFERENCES symbols(id) ON DELETE CASCADE,
position INTEGER NOT NULL,
name TEXT NOT NULL,
type_text TEXT,
default_value TEXT,
is_optional INTEGER NOT NULL DEFAULT 0,
is_rest INTEGER NOT NULL DEFAULT 0,
is_destructured INTEGER NOT NULL DEFAULT 0,
destructured_names TEXT
) STRICT;
CREATE INDEX idx_function_params_symbol ON function_params(symbol_id);
CREATE INDEX idx_function_params_type ON function_params(type_text);
CREATE TABLE generic_params (
id INTEGER PRIMARY KEY AUTOINCREMENT,
symbol_id INTEGER NOT NULL REFERENCES symbols(id) ON DELETE CASCADE,
position INTEGER NOT NULL,
name TEXT NOT NULL,
constraint_text TEXT,
default_text TEXT
) STRICT;
CREATE INDEX idx_generic_params_symbol ON generic_params(symbol_id);
CREATE TABLE type_predicates (
symbol_id INTEGER PRIMARY KEY REFERENCES symbols(id) ON DELETE CASCADE,
param_name TEXT NOT NULL,
target_type TEXT NOT NULL,
is_asserts INTEGER NOT NULL DEFAULT 0
) STRICT;
ALTER TABLE symbols ADD COLUMN return_type TEXT;
ALTER TABLE symbols ADD COLUMN is_async INTEGER NOT NULL DEFAULT 0;
ALTER TABLE symbols ADD COLUMN is_generator INTEGER NOT NULL DEFAULT 0;
ALTER TABLE symbols ADD COLUMN throws_clauses TEXT;Visitor strategy: When visiting FunctionDeclaration / MethodDefinition / ArrowFunctionExpression / FunctionExpression, after the existing symbol-row insert, also extract params via node.params and generics via node.typeParameters. For each params[i]: name from Identifier / ObjectPattern / ArrayPattern / RestElement; type from TSTypeAnnotation; default from AssignmentPattern.right. For RestElement set is_rest. For ObjectPattern / ArrayPattern set is_destructured + capture destructured names. Return type from node.returnType. is_async / is_generator from node.async / node.generator. Type predicates from TSTypePredicate return-type-annotation node.
Recipes unlocked:
SELECT s.name, s.file_path
FROM symbols s
JOIN function_params p ON p.symbol_id = s.id
WHERE p.type_text LIKE '%Date%';
SELECT s.name FROM symbols s
JOIN generic_params g ON g.symbol_id = s.id
WHERE g.constraint_text IS NULL;
SELECT * FROM type_predicates;
SELECT * FROM symbols WHERE is_async = 1 AND return_type LIKE '%Promise<void>%';New recipe candidates: swap-positional-to-named-args (extends rename-preview); find-untyped-params; find-unused-generics; migrate-callbacks-to-async.
Effort: M (~2 weeks). Type stringification logic exists for signature; extend to per-param + per-generic.
Dependencies: Tier 1 (column positions used by recipes that rewrite parameters).
Tier-specific open questions:
- (a) Per Q3: type-text normalization shape? Source text by default; normalized form deferred.
- (b)
throws_clauses— TS doesn't have athrowssyntax (it's JSDoc); should this column derive from JSDoc@throwstags (Tier 5) instead? - (c) Overloads — a function with multiple signatures has multiple
symbolsrows or one? Onesymbolsrow + multiplefunction_paramsclusters distinguished by overload position. Plan PR settles.
Ship status (2026-05-19): Shipped. Tables + src/extractors/behavioral.ts; recipes find-await-in-loop, find-swallowed-errors, find-decorator-usage, find-throws-jsdoc.
Goal: Capture runtime-shape behavioral facts the AST encodes but today's index discards.
Schema delta:
CREATE TABLE async_calls (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
caller_scope TEXT NOT NULL,
awaited_expression TEXT,
awaited_callee_name TEXT,
line_start INTEGER NOT NULL,
column_start INTEGER NOT NULL,
in_loop INTEGER NOT NULL DEFAULT 0,
in_try INTEGER NOT NULL DEFAULT 0,
scope_id INTEGER REFERENCES scopes(id)
) STRICT;
CREATE INDEX idx_async_calls_callee ON async_calls(awaited_callee_name);
CREATE INDEX idx_async_calls_filepath ON async_calls(file_path);
CREATE TABLE try_catch (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
containing_scope_id INTEGER REFERENCES scopes(id),
try_line_start INTEGER NOT NULL,
try_line_end INTEGER NOT NULL,
has_catch INTEGER NOT NULL DEFAULT 0,
catch_param TEXT,
catch_rethrows INTEGER NOT NULL DEFAULT 0,
catch_logs_only INTEGER NOT NULL DEFAULT 0,
has_finally INTEGER NOT NULL DEFAULT 0
) STRICT;
CREATE TABLE decorators (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
target_symbol_id INTEGER REFERENCES symbols(id),
target_kind TEXT NOT NULL CHECK (target_kind IN ('class','method','property','parameter','accessor')),
name TEXT NOT NULL,
line INTEGER NOT NULL,
column_start INTEGER NOT NULL,
args_text TEXT
) STRICT;
CREATE INDEX idx_decorators_name ON decorators(name);
CREATE INDEX idx_decorators_target ON decorators(target_symbol_id);
CREATE TABLE jsdoc_tags (
id INTEGER PRIMARY KEY AUTOINCREMENT,
symbol_id INTEGER NOT NULL REFERENCES symbols(id) ON DELETE CASCADE,
tag TEXT NOT NULL,
name TEXT,
type_text TEXT,
description TEXT
) STRICT;
CREATE INDEX idx_jsdoc_tags_symbol ON jsdoc_tags(symbol_id);
CREATE INDEX idx_jsdoc_tags_tag ON jsdoc_tags(tag);Visitor strategy:
- Async calls: every
AwaitExpressionnode;awaited_callee_name= caller name if argument isCallExpression. Trackin_loop/in_tryvia context stack per Q5. - Try/catch: every
TryStatementnode.catch_rethrowsflag: scan catch body forThrowStatementwhose argument references the catch param.catch_logs_onlyflag: catch body only containsconsole.*calls and nothrow. - Decorators: every
Decoratornode; post-pass link to followingClassDeclaration/MethodDefinition/PropertyDefinition/Parameterper Q6. - JSDoc tags: existing JSDoc parsing already extracts
@deprecated/ visibility tags; extend to ALL recognised tags (@param,@returns,@throws,@see,@link,@example,@since,@template,@typedef, custom). One row per tag per symbol.
Recipes unlocked:
SELECT * FROM async_calls WHERE in_loop = 1;
SELECT * FROM try_catch WHERE catch_logs_only = 1 AND has_catch = 1;
SELECT d.name, COUNT(*) AS classes
FROM decorators d
WHERE d.target_kind = 'class'
GROUP BY d.name
HAVING classes > 1;
SELECT s.name, t.tag, t.description
FROM symbols s
JOIN jsdoc_tags t ON t.symbol_id = s.id
WHERE t.tag = '@deprecated' AND s.is_exported = 1;New recipe candidates: find-awaits-in-loops; find-empty-catches; find-deprecated-with-replacement; decorator-audit.
Effort: M (~2 weeks). JSDoc parsing already partially happens; promote to structured tags. Decorator post-pass + try/catch flag detection are new.
Dependencies: Tier 2 (scope_id).
Tier-specific open questions:
- (a)
catch_rethrows— naive detection misses chained calls. Plan PR for Tier 5 settles heuristic: anythrowstatement in catch body whose argument is aMemberExpressionof the catch param or just the catch-param identifier. - (b) Per Q7: JSDoc tag schema — structured wins;
@param x {Foo} description→name='x',type_text='Foo',description='description'. - (c)
async_calls.awaited_expression— full source text of the awaited expression, or just the callee name? Source text; cheap to capture; recipes that rewrite need it.
Ship status (2026-05-19): re_export_chains, dynamic_imports, files.{is_barrel,has_side_effects} shipped. Deferred: files.is_entry (c9-plugin-layer.md).
Goal: Flatten re-export chains; record dynamic imports; mark barrel files.
Schema delta:
CREATE TABLE re_export_chains (
id INTEGER PRIMARY KEY AUTOINCREMENT,
origin_file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
origin_name TEXT NOT NULL,
final_file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
final_name TEXT NOT NULL,
hops INTEGER NOT NULL,
chain_path TEXT NOT NULL
) STRICT;
CREATE INDEX idx_re_export_origin ON re_export_chains(origin_file_path, origin_name);
CREATE INDEX idx_re_export_final ON re_export_chains(final_file_path, final_name);
CREATE TABLE dynamic_imports (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
line_start INTEGER NOT NULL,
column_start INTEGER NOT NULL,
source_kind TEXT NOT NULL CHECK (source_kind IN ('literal','template','expression')),
source_text TEXT,
resolved_path TEXT,
in_async_fn INTEGER NOT NULL DEFAULT 0,
scope_id INTEGER REFERENCES scopes(id)
) STRICT;
ALTER TABLE files ADD COLUMN is_barrel INTEGER NOT NULL DEFAULT 0;
ALTER TABLE files ADD COLUMN is_entry INTEGER NOT NULL DEFAULT 0;
ALTER TABLE files ADD COLUMN has_side_effects INTEGER NOT NULL DEFAULT 0;Visitor strategy:
- Re-export chains: post-pass over all
exportsrows wherere_export_source IS NOT NULL. Walk the chain: start at origin, followre_export_sourceuntil reaching a non-re-export (the final defining file). Record each hop count + chain path. - Dynamic imports: visit
ImportExpression(theimport('./foo')form).source_kind='literal'for string-literal arg,'template'for template-literal arg,'expression'for runtime-computed arg. Resolved path via existing oxc-resolver for literals only. files.is_barrel: post-pass per file —is_barrel = 1if the file's symbols are 100% re-exports (exports.re_export_source IS NOT NULL) AND no value-symbol definitions.files.is_entry: stub for C.9 plugin layer; default 0 today; populated via config when C.9 ships.files.has_side_effects: derived frompackage.jsonsideEffectsfield (Tier 8) OR presence of top-levelCallExpression/AssignmentExpression.
Recipes unlocked:
SELECT * FROM re_export_chains
WHERE origin_file_path = 'src/index.ts' AND origin_name = 'foo';
SELECT * FROM dynamic_imports WHERE source_kind = 'literal' AND in_async_fn = 0;
SELECT * FROM files WHERE is_barrel = 1;
SELECT s.name, c.final_file_path
FROM symbols s
JOIN re_export_chains c
ON c.origin_file_path = s.file_path AND c.origin_name = s.name
WHERE c.hops > 2;New recipe candidates: barrel-cleanup; flatten-re-export-chain; find-dynamic-import-leaks.
Effort: M (~1 week). Mostly post-pass derivation; relies on existing exports.re_export_source.
Dependencies: Tier 1 (column positions for dynamic_imports).
Tier-specific open questions:
- (a) Re-export chains across packages (npm-installed deps) — track or stop at first non-resolvable hop? Stop; codemap doesn't index
node_modulessymbols. - (b)
dynamic_importssource_kind = 'expression'— record the source text for recipe filtering, or leave NULL? Record —'expression'shouldn't lose information.
Ship status (2026-05-15): Not shipped. css_rules / css_at_rules / css_declarations absent from src/db.ts (existing css_classes / css_variables / css_keyframes unchanged). Open; parallel-safe per § Sequencing.
Goal: Structural CSS — every rule, every at-rule, every declaration with position.
Schema delta:
CREATE TABLE css_rules (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
selector TEXT NOT NULL,
specificity INTEGER NOT NULL,
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
has_important INTEGER NOT NULL DEFAULT 0,
declarations_count INTEGER NOT NULL,
parent_at_rule_id INTEGER REFERENCES css_at_rules(id)
) STRICT;
CREATE INDEX idx_css_rules_filepath ON css_rules(file_path);
CREATE TABLE css_at_rules (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
kind TEXT NOT NULL,
condition_text TEXT,
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
parent_at_rule_id INTEGER REFERENCES css_at_rules(id)
) STRICT;
CREATE INDEX idx_css_at_rules_kind ON css_at_rules(kind);
CREATE INDEX idx_css_at_rules_filepath ON css_at_rules(file_path);
CREATE TABLE css_declarations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
rule_id INTEGER NOT NULL REFERENCES css_rules(id) ON DELETE CASCADE,
property TEXT NOT NULL,
value TEXT NOT NULL,
is_important INTEGER NOT NULL DEFAULT 0,
line INTEGER NOT NULL,
column_start INTEGER NOT NULL
) STRICT;
CREATE INDEX idx_css_decls_property ON css_declarations(property);Visitor strategy: Lightning CSS visitor already walks Rule / MediaQuery / Declaration nodes. Extend the existing css-parser.ts to emit the new row shapes alongside existing css_classes / css_variables / css_keyframes. Specificity computed inline per CSS spec rules (id + class + element counts).
Recipes unlocked:
SELECT * FROM css_rules WHERE has_important = 1;
SELECT a.* FROM css_at_rules a
WHERE a.kind = '@media' AND a.condition_text LIKE '%max-width: 768px%';
SELECT property, COUNT(*) AS uses
FROM css_declarations
GROUP BY property
ORDER BY uses DESC LIMIT 20;
SELECT r.selector, r.file_path
FROM css_rules r
LEFT JOIN jsx_attributes a
ON a.name = 'className' AND a.value_text LIKE '%' || REPLACE(r.selector, '.', '') || '%'
WHERE a.id IS NULL AND r.selector LIKE '.%';New recipe candidates: dead-css-rules; important-overrides-audit; responsive-breakpoint-audit.
Effort: M (~1-2 weeks). Lightning CSS visitor extension; structurally similar to existing css_classes extraction.
Dependencies: None (parallel-safe to Tier 1-6).
Tier-specific open questions:
- (a)
css_declarationscould be enormous (everyprop: valueline). Index size impact for design-system-heavy projects. Worth measuring before commit. - (b) Nested at-rules (
@mediainside@supportsinside@layer) —parent_at_rule_idhandles N-deep nesting fine. - (c) Sass / Less / SCSS — out of scope (existing roadmap backlog item).
Ship status (2026-05-15): Not shipped. tsconfig_options / package_json_meta absent from src/db.ts. Open; parallel-safe per § Sequencing.
Goal: Resolved per-file tsconfig + package.json facts queryable.
Schema delta:
CREATE TABLE tsconfig_options (
file_path TEXT PRIMARY KEY REFERENCES files(path) ON DELETE CASCADE,
strict INTEGER NOT NULL DEFAULT 0,
no_implicit_any INTEGER NOT NULL DEFAULT 0,
strict_null_checks INTEGER NOT NULL DEFAULT 0,
target TEXT,
module TEXT,
module_resolution TEXT,
jsx TEXT,
lib TEXT,
resolved_paths TEXT,
base_url TEXT,
experimental_decorators INTEGER NOT NULL DEFAULT 0
) STRICT;
CREATE TABLE package_json_meta (
file_path TEXT PRIMARY KEY REFERENCES files(path) ON DELETE CASCADE,
package_path TEXT NOT NULL,
package_name TEXT,
package_version TEXT,
type TEXT,
main TEXT,
module_path TEXT,
exports_map TEXT,
types_path TEXT,
side_effects TEXT
) STRICT;
CREATE INDEX idx_pkg_json_name ON package_json_meta(package_name);Visitor strategy: Existing config loader already reads tsconfig; extend to record resolved options per file. Walk up the directory tree from each files.path to find the nearest tsconfig.json (or extends-chain final form) + nearest package.json. Single resolution per file, cached.
Recipes unlocked:
SELECT path FROM tsconfig_options WHERE strict = 0;
SELECT package_name, COUNT(*) AS files
FROM package_json_meta
GROUP BY package_name;
SELECT path FROM package_json_meta WHERE types_path IS NULL AND package_name IS NOT NULL;New recipe candidates: strict-mode-audit; missing-types-fields; monorepo-package-boundaries.
Effort: S (~3-5 days). Config files already loaded; just persist resolved view.
Dependencies: None.
Tier-specific open questions:
- (a)
package.jsonexportsfield — store as JSON text or parse into a child table? JSON text — too many shapes (conditional / wildcard / nested). - (b) tsconfig
extendschain — store final resolved options or each layer? Final resolved.
Ship status (2026-05-15): test_suites shipped — full hierarchy (parent_suite_id), kinds (describe / it / test / suite / context), is_skipped / is_only / is_todo flags, framework detection (vitest / jest / bun-test / node-test / mocha / unknown). Q8's framework detection landed via import-presence + extension hint. Deferred: it.each / test.each parametrised kinds (current CHECK enum doesn't include them); test_fixtures (beforeAll / beforeEach / etc.); test_assertions (per-expect() matcher rows). Recipes that need fixture / assertion-level JOINs stay open.
Goal: Test files become structurally queryable — describe / it / test hierarchy, fixtures, skipped tests, assertion counts.
Schema delta:
CREATE TABLE test_suites (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
name TEXT NOT NULL,
kind TEXT NOT NULL CHECK (kind IN ('describe','it','test','suite','it.each','test.each')),
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
parent_suite_id INTEGER REFERENCES test_suites(id),
is_skipped INTEGER NOT NULL DEFAULT 0,
is_only INTEGER NOT NULL DEFAULT 0,
is_todo INTEGER NOT NULL DEFAULT 0,
framework TEXT NOT NULL CHECK (framework IN ('vitest','jest','node-test','mocha','bun-test','unknown'))
) STRICT;
CREATE INDEX idx_test_suites_filepath ON test_suites(file_path);
CREATE INDEX idx_test_suites_parent ON test_suites(parent_suite_id);
CREATE TABLE test_fixtures (
id INTEGER PRIMARY KEY AUTOINCREMENT,
suite_id INTEGER REFERENCES test_suites(id) ON DELETE CASCADE,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
kind TEXT NOT NULL CHECK (kind IN ('beforeAll','beforeEach','afterEach','afterAll')),
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL
) STRICT;
CREATE TABLE test_assertions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
suite_id INTEGER REFERENCES test_suites(id) ON DELETE CASCADE,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
matcher_name TEXT NOT NULL,
line INTEGER NOT NULL,
column_start INTEGER NOT NULL
) STRICT;
CREATE INDEX idx_test_assertions_suite ON test_assertions(suite_id);
CREATE INDEX idx_test_assertions_matcher ON test_assertions(matcher_name);Visitor strategy: Per Q8 — detect test files by .test. / .spec. extension OR presence of vitest / jest / node:test / mocha / bun:test import. Inside test files, walk CallExpression nodes whose callee is describe / it / test / suite (or .skip / .only / .todo / .each variants). Track parent suite via call-stack-shape context. Walk expect(...) calls and capture the matcher name (chained method).
Recipes unlocked:
SELECT * FROM test_suites WHERE is_skipped = 1 OR is_only = 1;
SELECT s.* FROM test_suites s
LEFT JOIN test_assertions a ON a.suite_id = s.id
WHERE s.kind IN ('it','test') AND a.id IS NULL;
SELECT s.name, COUNT(t.id) AS tests
FROM symbols s
LEFT JOIN test_suites t
ON t.name LIKE '%' || s.name || '%' AND t.file_path LIKE 'src/%test%'
WHERE s.is_exported = 1
GROUP BY s.id
HAVING tests = 0;
SELECT * FROM test_fixtures WHERE kind = 'beforeEach';New recipe candidates: find-skipped-tests; find-tests-without-assertions; untested-exports; test-fan-out-audit.
Effort: M (~2 weeks). Test-framework detection per Q8 + AST walking + assertion-matcher capture.
Dependencies: Tier 1 (positions), Tier 2 (scope_id for nested describes).
Tier-specific open questions:
- (a)
it.each([...])parametrised tests — one row or N rows? One row withis_parametrised: 1flag; row count multiplication is a runtime concern. - (b) Cross-framework matcher mapping — vitest's
.toEqual≈ jest's.toStrictEqual. Don't normalise; record as-written. - (c)
bun:testandnode:test— newer frameworks; visitor must recognise their import shape.
Ship status (2026-05-15): runtime_markers shipped with kind IN ('console','debugger','throw','process-env') + detail qualifier + scope_local_id. The shipped CHECK enum is narrower than the proposed ('console','debugger','throw','assert','process-env','globalThis') — assert / globalThis deferred. Deferred: suppressions.{tool, rule_name, reason} columns (table today carries only (file_path, line_number, recipe_id) for codemap-native ignore directives — extending to ESLint / TS-specific suppressions still open); dev_only_branches table.
Goal: Extend existing markers + suppressions tables to cover eslint-disable-*, ts-expect-error, ts-ignore, // @ts-nocheck, console.*, debugger, dev-only branches.
Schema delta:
ALTER TABLE suppressions ADD COLUMN tool TEXT NOT NULL DEFAULT 'codemap';
ALTER TABLE suppressions ADD COLUMN rule_name TEXT;
ALTER TABLE suppressions ADD COLUMN reason TEXT;
CREATE TABLE runtime_markers (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
kind TEXT NOT NULL CHECK (kind IN ('console','debugger','throw','assert','process-env','globalThis')),
line_start INTEGER NOT NULL,
column_start INTEGER NOT NULL,
detail TEXT,
scope_id INTEGER REFERENCES scopes(id)
) STRICT;
CREATE INDEX idx_runtime_markers_kind ON runtime_markers(kind);
CREATE INDEX idx_runtime_markers_file ON runtime_markers(file_path);
CREATE TABLE dev_only_branches (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
condition_text TEXT NOT NULL,
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
body_token_count INTEGER NOT NULL
) STRICT;Visitor strategy:
- Extend marker extraction to recognise
eslint-disable-next-line,eslint-disable-line,eslint-disable,ts-expect-error,ts-ignore,@ts-nocheck— record insuppressionswithtool+rule_name+reason. - Visit
CallExpressionwith callee likeconsole.<method>— emitruntime_markersrow. - Visit
DebuggerStatement— emitruntime_markersrow. - Visit
ThrowStatement— emitruntime_markersrow with thrown-expression text. - Visit
MemberExpressionlikeprocess.env.X— emitruntime_markersrow with env-var name. - Visit
IfStatementwhose test matchesprocess.env.NODE_ENV === 'development'/'DEBUG'patterns — emitdev_only_branches.
Recipes unlocked:
SELECT * FROM suppressions WHERE tool = 'eslint' AND rule_name = 'no-console';
SELECT * FROM runtime_markers WHERE kind = 'console' AND file_path LIKE 'src/%';
SELECT * FROM runtime_markers WHERE kind = 'process-env' AND detail = 'NODE_ENV';
SELECT * FROM dev_only_branches WHERE body_token_count > 50;New recipe candidates: find-leftover-console; find-debugger-statements; env-var-audit; dev-only-code-shipped.
Effort: S (~1 week). Existing markers / suppressions machinery extends naturally.
Dependencies: Tier 2 (scope_id).
Tier-specific open questions:
- (a)
dev_only_branchesdetection — what patterns count?NODE_ENV === 'development',DEBUG,process.env.<X>truthiness checks,__DEV__global. List in plan PR. - (b)
eslint-disablewith multi-rule (eslint-disable no-console no-alert) — N rows or one with comma-separated rule_name? N rows; one rule per row.
Ship status (2026-05-15): file_metrics table shipped with line counters (total_lines / code_lines / blank_lines / comment_lines) + variable-form counters (let_count / const_count / var_count) + symbol-form counters (function_count / arrow_count / class_count / interface_count / export_count). On symbols: body_line_count / param_count / nesting_depth shipped. Deferred: symbols.{body_token_count, local_var_count, early_return_count}; file_metrics.{total_tokens, optional_chaining_count, nullish_coalescing_count, default_export_count, named_export_count, top_level_await_count, template_literal_count, tagged_template_count}. Token-based metrics await Q11.a's tokeniser-source decision.
Goal: Cheap-to-compute facts that enable refactor / size / style recipes.
Schema delta:
ALTER TABLE symbols ADD COLUMN body_token_count INTEGER;
ALTER TABLE symbols ADD COLUMN body_line_count INTEGER;
ALTER TABLE symbols ADD COLUMN nesting_depth INTEGER;
ALTER TABLE symbols ADD COLUMN param_count INTEGER;
ALTER TABLE symbols ADD COLUMN local_var_count INTEGER;
ALTER TABLE symbols ADD COLUMN early_return_count INTEGER;
CREATE TABLE file_metrics (
file_path TEXT PRIMARY KEY REFERENCES files(path) ON DELETE CASCADE,
total_lines INTEGER NOT NULL,
code_lines INTEGER NOT NULL,
comment_lines INTEGER NOT NULL,
blank_lines INTEGER NOT NULL,
total_tokens INTEGER NOT NULL,
arrow_function_count INTEGER NOT NULL DEFAULT 0,
function_declaration_count INTEGER NOT NULL DEFAULT 0,
optional_chaining_count INTEGER NOT NULL DEFAULT 0,
nullish_coalescing_count INTEGER NOT NULL DEFAULT 0,
let_count INTEGER NOT NULL DEFAULT 0,
const_count INTEGER NOT NULL DEFAULT 0,
var_count INTEGER NOT NULL DEFAULT 0,
default_export_count INTEGER NOT NULL DEFAULT 0,
named_export_count INTEGER NOT NULL DEFAULT 0,
top_level_await_count INTEGER NOT NULL DEFAULT 0,
template_literal_count INTEGER NOT NULL DEFAULT 0,
tagged_template_count INTEGER NOT NULL DEFAULT 0
) STRICT;Visitor strategy:
- Per-symbol metrics: during visitor pass over function/method bodies, increment counters:
body_token_count(cumulative tokens in body),nesting_depth(max conditional/loop nesting),local_var_count(VariableDeclaratorcount in own scope),early_return_count(return statements before final). - Per-file metrics: single counter pass per file; cheap (one walk; increment counters per node kind).
- Code / comment / blank lines: existing line counter logic; tokenize comments separately.
Recipes unlocked:
SELECT name, file_path, body_token_count
FROM symbols
WHERE body_token_count > 500
ORDER BY body_token_count DESC LIMIT 20;
SELECT file_path
FROM file_metrics
WHERE var_count > 0
ORDER BY var_count DESC;
SELECT s.name, s.complexity, s.nesting_depth, s.param_count
FROM symbols s
WHERE s.kind = 'function' AND s.complexity > 10 AND s.nesting_depth > 4;
SELECT file_path, optional_chaining_count
FROM file_metrics
WHERE optional_chaining_count > 20;New recipe candidates: refactor-large-functions (extends refactor-risk-ranking); var-to-const-migration; style-audit.
Effort: S (~1 week). Pure counting; no AST shape analysis beyond visitor presence.
Dependencies: None (parallel-safe to all tiers).
Tier-specific open questions:
- (a) "Token count" — exactly what counts? oxc's tokeniser output, or simple split-on-whitespace? oxc tokeniser — fewer surprises.
- (b)
nesting_depth— does ternary count as +1 nesting? Yes. - (c)
comment_lines— block comment spanning N lines counts as N, or 1? N.
Ship status (2026-05-15): Tarjan SCC pass shipped as module_cycles table — (file_path PK, cycle_id, cycle_size). Only cyclic files appear; non-cyclic files have no row. Deferred (the proposed module_graph_facts super-table): topological_index / depth_from_entry / fan_in / fan_out / is_terminal / is_reachable / centrality. Reachability stays gated on Tier 6's files.is_entry (which gates on the c9-plugin-layer.md plan); fan-in / fan-out are derivable today via COUNT(*) over dependencies until materialised. dead-files-by-reachability recipe stays open.
Goal: Strongly-connected components, cycle detection, depth-from-entry, topological order — pre-computed at index time.
Schema delta:
CREATE TABLE module_graph_facts (
file_path TEXT PRIMARY KEY REFERENCES files(path) ON DELETE CASCADE,
topological_index INTEGER,
cycle_id INTEGER,
cycle_size INTEGER,
depth_from_entry INTEGER,
fan_in INTEGER NOT NULL DEFAULT 0,
fan_out INTEGER NOT NULL DEFAULT 0,
is_terminal INTEGER NOT NULL DEFAULT 0,
is_reachable INTEGER NOT NULL DEFAULT 0,
centrality REAL
) STRICT;
CREATE INDEX idx_module_graph_cycle ON module_graph_facts(cycle_id);
CREATE INDEX idx_module_graph_topology ON module_graph_facts(topological_index);Visitor strategy: Pure post-pass. After all dependencies rows are populated, run:
- Tarjan's strongly-connected-components over the dependencies graph → assigns
cycle_idandcycle_size(rows in non-trivial SCCs). - Topological sort (over the cycle-collapsed DAG) → assigns
topological_index. - BFS from entry points (post-C.9 from
files.is_entry; today from heuristic entries —src/index.ts,src/main.ts, package.jsonmain/module) → assignsdepth_from_entryandis_reachable. - Fan-in / fan-out — COUNT of incoming / outgoing edges in
dependencies. is_terminal—fan_out = 0.centrality— eigenvector centrality (or PageRank) over the dependency graph. Optional; defer if perf-prohibitive.
Recipes unlocked:
SELECT * FROM module_graph_facts WHERE cycle_id IS NOT NULL;
SELECT * FROM module_graph_facts WHERE depth_from_entry > 10;
SELECT * FROM files WHERE path NOT IN (SELECT file_path FROM module_graph_facts WHERE is_reachable = 1);
SELECT * FROM module_graph_facts ORDER BY centrality DESC LIMIT 10;New recipe candidates: find-import-cycles; dead-files-by-reachability (subsumes C.9 plan's main goal); module-centrality-audit; dependency-depth-distribution.
Effort: M (~1-2 weeks). Standard graph algorithms; SQLite supports recursive CTE for fallback if needed.
Dependencies: Tier 6 (files.is_entry for proper reachability; without it, use heuristic entry detection).
Tier-specific open questions:
- (a) Centrality — eigenvector vs PageRank vs betweenness? Eigenvector for cheap O(V+E) approximation; defer betweenness (O(V·E) — too slow on large graphs).
- (b) Heuristic entry detection until C.9 — list of entry candidates:
src/index.ts,src/main.ts,src/cli/main.ts, package.jsonmain/module/bin, Next.jsapp/**/page.tsx, etc. - (c) Recursive CTE vs JS-side graph algorithm? JS-side — SQLite's recursive CTE on cycle-containing graphs is slow.
Ship status (2026-05-15): Not shipped. orm_models / sql_strings / db_migrations absent from src/db.ts. Open; orm extraction stays default-off per R.15.
Goal: Database-schema-aware recipes — find ORM model definitions, SQL template literals, migration files.
Schema delta:
CREATE TABLE orm_models (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
model_name TEXT NOT NULL,
framework TEXT NOT NULL CHECK (framework IN ('prisma','drizzle','typeorm','mongoose','sequelize','kysely','knex','sqlx','unknown')),
table_name TEXT,
line_start INTEGER NOT NULL,
line_end INTEGER NOT NULL,
fields_json TEXT NOT NULL
) STRICT;
CREATE INDEX idx_orm_models_name ON orm_models(model_name);
CREATE INDEX idx_orm_models_framework ON orm_models(framework);
CREATE TABLE sql_strings (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
line_start INTEGER NOT NULL,
column_start INTEGER NOT NULL,
query_text TEXT NOT NULL,
framework TEXT,
uses_template INTEGER NOT NULL DEFAULT 0,
has_concat INTEGER NOT NULL DEFAULT 0,
is_parameterised INTEGER NOT NULL DEFAULT 0
) STRICT;
CREATE TABLE db_migrations (
id INTEGER PRIMARY KEY AUTOINCREMENT,
file_path TEXT NOT NULL REFERENCES files(path) ON DELETE CASCADE,
migration_name TEXT NOT NULL,
framework TEXT,
up_sql TEXT,
down_sql TEXT,
applied_at TEXT
) STRICT;Visitor strategy:
- ORM detection: look for known patterns — Prisma model file (
schema.prisma— separate parser); DrizzlesqliteTable('foo', {...})/pgTable('foo', {...})calls; TypeORM@Entitydecorator (links to Tier 5 decorators); Mongoosemongoose.Schema(...)calls. - SQL strings: tagged template literals like
sql`SELECT ...`; raw string literals containing SQL-keyword sequences (SELECT,INSERT,UPDATE,DELETEfollowed by known SQL constructs). Heuristic — false positives ok; recipes can filter. - Migration files: filename patterns (
migrations/<n>-<name>.{sql,ts}); known frameworks (Knex, Drizzle Kit, Prisma Migrate).
Recipes unlocked:
SELECT * FROM orm_models WHERE framework = 'drizzle';
SELECT * FROM sql_strings WHERE has_concat = 1 AND is_parameterised = 0;
SELECT m.model_name, m.table_name
FROM orm_models m
WHERE m.framework = 'prisma';
SELECT * FROM db_migrations ORDER BY migration_name;New recipe candidates: sql-injection-audit; orm-model-coverage (which models lack tests?); unused-db-columns (column declared in ORM model but not referenced anywhere).
Effort: L (~2 weeks). Multiple framework-specific detectors; SQL parsing for safety analysis is non-trivial.
Dependencies: Tier 5 (decorators for TypeORM); Tier 1 (positions).
Tier-specific open questions:
- (a) Prisma
schema.prismarequires a separate parser (Prisma DSL, not TS). Worth aLanguageAdapterper the existing adapter registry? Probably — separate file kind anyway. - (b) SQL parsing — full parser (e.g.
node-sql-parser) or pattern-match? Pattern-match for v1; full parser ifsql-injection-auditrecipe demands it. - (c) ORM framework coverage — start with Drizzle + Prisma + TypeORM (most common in TS/JS)? Yes. Mongoose / Sequelize / Kysely / Knex as follow-ups.
Tier 1 (positions)
│
▼
Tier 2 (references + scopes + bindings)
├─────────┬──────────┬──────────┬──────────┬──────────┐
▼ ▼ ▼ ▼ ▼ ▼
Tier 3 Tier 4 Tier 5 Tier 9 Tier 10 Tier 11
(JSX) (Types) (Behav) (Tests) (Markers) (Metrics)
│
▼
Tier 13
(ORM/SQL)
Tier 6 (module-graph enrichment)
│
▼
Tier 12 (module-graph topology)
Tier 7 (CSS richness) — parallel-safe to everything; ship anytime.
Tier 8 (project meta) — parallel-safe; cheapest; ship first or last.
Hard dependencies:
- Tier 2 depends on Tier 1 (positions to populate
references). - Tier 3 depends on Tier 2 (
referencesrows for JSX element names). - Tier 4 depends on Tier 1 (positions for param rewrites).
- Tier 5 depends on Tier 2 (
scope_id). - Tier 9 depends on Tier 1 + Tier 2.
- Tier 10 depends on Tier 2 (
scope_id). - Tier 11 has no hard deps but Tier 2's
scope_idenriches some metrics. - Tier 12 depends on Tier 6 (
files.is_entry). - Tier 13 depends on Tier 1 + Tier 5 (decorators for TypeORM).
Parallel-safe: Tier 7 (CSS), Tier 8 (project meta) can ship anytime.
Recommended ship order:
- Tier 1 — foundation
- Tier 2 — foundation (3 weeks)
- Tier 8 — cheap; ships in parallel with Tier 2
- Tier 6 — module graph enrichment
- Tier 11 — metrics; parallel with Tier 6
- Tier 12 — module-graph topology
- Tier 3 — JSX
- Tier 4 — types
- Tier 5 — behavioral
- Tier 9 — tests
- Tier 10 — markers/suppressions
- Tier 13 — ORM/SQL
- Tier 7 — CSS (ship anywhere)
Every capability discussed in research/codemap-richer-index-synthesis-2026-05.md — plus everything the "AND MORE" framing extends to:
| Capability | Tier(s) | Status post-extraction |
|---|---|---|
| App-wide function rename | 1 + 2 | ✅ Full (definition + import + call + re-export + type + decorator + shorthand) |
| Component rename across JSX | 1 + 2 + 3 | ✅ Full |
| Prop migration on JSX | 3 | ✅ Full |
migrate-deprecated with replacement map |
1 + 2 + 5 | ✅ Full |
fix-boundary-violation |
existing + 2 | ✅ Full |
stale-imports (specifier-precise delete) |
1 | ✅ Full |
dedupe-imports |
1 | ✅ Full |
swap-positional-to-named-args |
1 + 4 | ✅ Full |
add-await-to-async-call |
5 | ✅ Full |
| "Find shadowed names" | 2 | ✅ Full |
| "Find awaits in loops" | 5 | ✅ Full |
| "Find empty catches" | 5 | ✅ Full |
| "Find unconstrained generics" | 4 | ✅ Full |
| "Find untested public APIs" | 2 + 9 | ✅ Full |
| "Find tests without assertions" | 9 | ✅ Full |
| "Find skipped tests" | 9 | ✅ Full |
| "Find leftover console.log" | 10 | ✅ Full |
| "Find debugger statements" | 10 | ✅ Full |
| "Find import cycles" | 12 | ✅ Full |
| "Find dead files by reachability" | 12 (+ C.9) | ✅ Full |
| "Find rules with !important" | 7 | ✅ Full |
| "Find @media queries narrower than 768px" | 7 | ✅ Full |
| "Find unused CSS rules" | 7 + 3 (JOIN className) | ✅ Full |
| "Find SQL injection risks" | 13 | ✅ Full |
| "Find ORM model coverage gaps" | 9 + 13 | ✅ Full |
| "Files using var" | 11 | ✅ Full |
| "Refactor risk ranked by complexity + size + coverage" | 11 + existing | ✅ Full |
| "Strict-mode disabled files" | 8 | ✅ Full |
| "Files missing types field" | 8 | ✅ Full |
| Cross-file binding-grade rename | 2 (bindings) | ✅ Full |
| Path A AST-shape transforms (class→function, JSX rewriting with attribute migration) | Still requires Path B adapter | Path B handles via ts-morph |
The substrate makes nearly every refactor / audit / migration expressible as query → diff-shape rows → apply. The engine doesn't grow — recipes consume the richer index. This is Moat A + Moat B working in lockstep: every new capability is a recipe (Moat A); every recipe is JOINs over the richer schema (Moat B).
Four real-world fixtures probed with a minimal Tier-2 implementation (references table only — every Identifier / JSXIdentifier / TSTypeReference emits a row with line + column + kind). No scope graph, no binding resolution; just the raw extraction-shape cost. The probe is references because it's empirically the heaviest single tier (~12-18 rows per symbol; ~3-4× larger than any other planned table). Whatever it costs is the worst-case substrate cost per tier.
| Fixture | Files | Symbols | Calls | References | DB baseline | DB w/ refs | Δ DB | Reindex baseline (median 3 runs) | Reindex w/ refs | Δ reindex | Targeted reindex w/ refs |
|---|---|---|---|---|---|---|---|---|---|---|---|
| A — small CLI/library | 906 | 11.6k | 6.6k | 137k | 11.4 MB | 47.6 MB | +4.2× | ~280 ms | ~580 ms | +2.1× | ~30 ms |
| B — docs-heavy framework | 1,832 | 8.7k | 7.2k | 137k | 9.7 MB | 35.2 MB | +3.6× | ~310 ms | ~600 ms | +2.0× | ~11 ms |
| C — medium TS library monorepo | 1,805 | 26.9k | 13.6k | 324k | 17.7 MB | 80.5 MB | +4.5× | ~570 ms | ~1.30 s | +2.3× | not measured |
| D — enterprise React app | 2,120 | 28.5k | 15.6k | 490k | 37.5 MB | 158.8 MB | +4.2× | ~740 ms | ~1.94 s | +2.6× | ~15 ms |
Observed pattern (4 fixtures, consistent):
- DB grows ~3.6-4.5× per tier of this shape (~12-18 identifier rows per symbol; ranges from React-heavy to docs-heavy with no outlier breaking the pattern)
- Full reindex grows ~2-2.6× per tier — slower than linear in DB size because parse cost dominates the wall-clock; insert cost is amortised via the existing batch-insert helpers
- Targeted reindex stays flat (~10-30 ms regardless of project size) — single-file reindex only walks one file's AST + writes its rows; the references-table size doesn't affect single-file insert cost
- Distance from user's "bottleneck" thresholds — largest fixture (D, ~2,100 files / 28k symbols): full reindex at 1.9 s is 31× under the 1-minute pain threshold; targeted at 15 ms is 66× under the 1-second pain threshold
| Fixture | Baseline DB | One-tier DB | One-tier growth | All-13-tier projection | All-13-tier growth |
|---|---|---|---|---|---|
| A | 11.4 MB | 47.6 MB | +4.2× | ~60-110 MB | ~6-10× |
| B | 9.7 MB | 35.2 MB | +3.6× | ~50-100 MB | ~5-10× |
| C | 17.7 MB | 80.5 MB | +4.5× | ~90-180 MB | ~5-10× |
| D | 37.5 MB | 158.8 MB | +4.2× | ~190-380 MB | ~5-10× |
Multi-tier projection conservative: assumes each subsequent tier adds ~50% of the references-tier cost. Most other tiers add substantially less (positions on existing tables = column additions, not new rows; scope graph + bindings = a few rows per function; JSX / decorators / async-calls / try-catch = far fewer rows per file than identifier references). Real all-13-tier growth likely lands at the lower end (~5×) for most projects.
Per R.9: no hard size ceiling. Per R.3: tier-level opt-out via .codemap/config.{ts,js,json} extraction: { references: false } is proposed and would recover ~75% of growth on monorepos that don't need cross-file identifier rename, but it is not a shipped config key.
| Fixture | Baseline full | One-tier full | One-tier slowdown | All-13-tier projected full | Targeted (w/ refs) |
|---|---|---|---|---|---|
| A | ~280 ms | ~580 ms | +2.1× | ~1-2 s | ~30 ms |
| B | ~310 ms | ~600 ms | +2.0× | ~1-2 s | ~11 ms |
| C | ~570 ms | ~1.30 s | +2.3× | ~3-5 s | (not measured) |
| D | ~740 ms | ~1.94 s | +2.6× | ~4-6 s | ~15 ms |
Targeted reindex stays sub-50 ms across the entire fixture range — only the touched file + its binding closure recompute. Full reindex is bounded by parse cost (oxc walks every file's AST); the per-tier multiplier comes from incremental SQLite insert work + larger in-flight ParsedFile messages between worker → main thread. All-13-tier full reindex stays well under the user's 1-minute threshold for projects up to ~5,000 files / 50k symbols.
Per R.10: soft warn when full > 30 s OR targeted > 500 ms. Both far above the measured worst case.
import { defineConfig } from "@stainless-code/codemap";
export default defineConfig({
extraction: {
references: true,
jsx: true,
types: true,
behavioral: true,
moduleGraph: true,
css: true,
projectMeta: true,
tests: true,
runtimeMarkers: true,
metrics: true,
moduleTopology: true,
orm: false,
},
});Status (fact-checked 2026-05-18): proposed, not implemented. Current config has feature toggles such as fts5, recipeRecency, and boundaries, but not the extraction object below; extractors run unconditionally today through the existing parser / extractor-module layout.
Defaults to all-on once implemented. Each tier's extractor checks its flag at parse-worker startup and no-ops if disabled.
ParsedFile message grows ~10× in size. Worker IPC handles this fine. Worker concurrency unchanged (one worker per CPU core, file-parallel).
Pass 2 (main thread) walks references rows and resolves bindings. Implementation:
- For each file: load same-file
symbols+ cross-fileimportsinto in-memory maps once. - For each
referencesrow: scope-walk for same-file; then checkimports; then re-export chain. - Batch-insert resolved
bindingsrows.
Expected pass-2 cost: ~30% of total reindex time on large projects.
Two genuinely-unindexable categories. Worth naming so the strategy is explicit.
-
Runtime / dynamic behavior.
obj[computedName]member access;Functionconstructor;eval; runtime-computed import paths (import(`./modules/${name}`)); macros / build-time codegen output. The index captures the AST shape; resolution happens at runtime. Recipes touching these stay conservative — same caveat asrename-preview's "What v1 does not cover" section. -
Cross-tree type resolution.
tsserver-grade type evaluation — what does this type resolve to after all conditional/mapped/inferred type operators? We extract type-text as written, not as resolved. For type-level queries (is this generic instantiated with X?;does this satisfy that interface?), Path B adapter viats-morphis the answer — same as for AST-shape rewrites. The substrate gives recipes the structural facts; type-level semantics belong to the language service.
Everything else the user could plausibly query at AST-shape granularity is in the 13 tiers above.
Per docs-governance § Closing a plan:
- When all 13 tiers ship: lift the durable bits —
architecture.md § Schemadocuments the full table catalog;glossary.mdgains entries forreferences/bindings/scopes/jsx_elements/function_params/decorators/jsdoc_tags/test_suites/orm_models/ etc. Delete this plan file. - If a tier is rejected mid-plan:
Status: Rejected (date) — <reason>on the tier's section; rest of the plan continues. - If a tier is deferred: stays in-plan with no status; not on the active sequence.
Plan-PR-shape per plan-pr-inspiration-discipline: every schema delta cites the primitive source it draws from (oxc, Lightning CSS, SQLite docs, LSP) — see § 10.
| Source | Tiers that draw from it |
|---|---|
| oxc parser — AST node reference, visitor API | 1, 2, 3, 4, 5, 9, 10, 11, 13 |
| Lightning CSS — visitor API, selector parsing, specificity | 7 |
| SQLite docs § STRICT tables | All tiers (schema discipline) |
| SQLite docs § FTS5 | 2 (Q12 — references.name FTS integration) |
| SQLite docs § Recursive CTE | 6 (re-export chains), 12 (graph traversal) |
| TC39 ECMA-262 — language-level shapes | 2, 4, 5 (await, try/catch, decorators) |
LSP Location |
1 (column-precise positions); 5 (decorator positions) |
| Tarjan's strongly-connected components algorithm | 12 |
Test framework specs (Vitest, Jest, node:test) |
9 |
| Prisma schema reference, Drizzle ORM, TypeORM | 13 |
research/codemap-richer-index-synthesis-2026-05.md— the write-engine direction this substrate unlocksroadmap.md § Moats— Moat B is the load-bearing axis ("Extracted structure ≥ verdicts")roadmap.md § Floors— preserved: no JS execution at index time; no opinionated rule enginearchitecture.md § Schema— current schema documentation; grows substantially with each tier's PRarchitecture.md § Full rebuild (parallel)— worker-pool architecture this plan extendsglossary.md— every new schema concept gets a glossary entry on tier shipdocs/plans/c9-plugin-layer.md— Tier 12 collaborates with C.9 (files.is_entry); Tier 12 ships its own reachability via heuristic entry detection if C.9 lands laterdocs/plans/lsp-diagnostic-push.md— every tier sharpens LSP diagnostic precision; the substrate is shared
docs-governanceskill — plan-PR lifecycletracer-bullets— each tier is one tracer-bullet PR; never build all 13 in isolationplan-pr-inspiration-discipline— primitive-source citations (§ 10 above)audit-pr-architecture— every tier PR should pass moat / boundary checkscodemap.mdc— Moat A reviewer test: every new table / column must be queryable via SQL, never wrapped in a CLI verbagents-tier-system— plan respects durability rules (no source-line citations; symbol references and design intent only)