Forward-looking plans only — not a mirror of src/. Doc index: README.md. Design / ship: architecture.md, packaging.md. Shipped features (adapters, fixtures, codemap agents init — agents.md) live in src/ and linked docs — not enumerated here.
- Community language adapters — optional packages (e.g. Tree-sitter) with a peerDependency on
@stainless-code/codemapand a public registration API beyond built-ins insrc/adapters/. - Agent tooling — evaluate TanStack Intent for versioned skills in
node_modules(optional;codemap agents initremains the default).
| Layer | Role |
|---|---|
| Core | Schema, incremental indexing, git invalidation, dependencies, CLI, query |
| Community adapters | Future optional packages; peerDependency on @stainless-code/codemap |
Codemap stays a structural-index primitive that other tools can consume. Two layers below: Moats are load-bearing — eroding either turns codemap into yet-another-tool-in-the-cohort instead of the predicate-shaped specialist. Floors are real shape constraints but not differentiators; soft v1 product-shape preferences. Consumer-facing framing of when to reach for codemap vs alternatives lives in why-codemap.md § When to reach for something else.
Every PR reviewer defends these. The reviewer tests embedded below are the canonical filters for any new verb / column / engine.
- A. SQL is the API. Every capability is a recipe (saved query) or a primitive recipes can compose — never a pre-baked verdict. SQL is a durable, well-known query language; agents compose any predicate without us deciding which questions are important. The moment a CLI verb returns
pass/failwithout a recipe form behind it, the moat erodes — the tool becomes "yet another linter with opinions baked in" instead of "the database your agent queries." Verdicts are an OUTPUT mode (e.g.--format sarif,audit --base <ref>deltas), never a primitive. Reviewer test for any new verb: "is this also expressible asquery --recipe <id>?" - B. Extracted structure ≥ verdicts. Schema breadth is the substrate every recipe layers on. CSS (
css_variables/css_classes/css_keyframes),markers,type_members,type_heritage,calls.caller_scope,components.hooks_used, the substrate-extraction tier (scopes/references/bindings/function_params/runtime_markers/test_suites/re_export_chains/module_cycles/file_metrics/import_specifiers/jsx_elements/jsx_attributes/async_calls/try_catch/decorators/jsdoc_tags/dynamic_imports) — these are codemap-specific extractions; their richness directly determines what JOINs are expressible and which agent questions get clean answers. Slimming the schema for theoretical perf / simplicity is a regression unless the column is empirically unread. Reviewer test for any "drop column X" PR: "what recipe (bundled or hypothetical) does this kill?"
Soft constraints — describe shipped reality. Decided-but-unshipped flips live in § Backlog, not here.
- Full-text search default-on — opt-in FTS5 ships per the
--with-ftsCLI flag /fts5: trueconfig field (default OFF; populatessource_ftsvirtual table at index time). Default-on decision gated on measurement — plan:plans/fts-default-on-evaluation.md. - No LSP engine — no rename / go-to-definition / hover types. Read-side LSP-adjacent primitives (
show/snippet/impact) ship as CLI / MCP / HTTP verbs (see README § CLI). LSP diagnostic-push server (recipes-as-Diagnostic[]) is a separate roadmap item tracked atplans/lsp-diagnostic-push.md. - No opinionated rule engine / fix engine / severity levels — verdict-shaped lints (
knip,jscpd,eslint) are a different product class. Predicate-as-API recipes (untested-and-dead,worst-covered-exports,visibility-tags,barrel-files,deprecated-symbols, …) are in scope and shipping; they're upstream of Moat A. Suppression comments ship as opt-in substrate (// codemap-ignore-{next-line,file} <recipe-id>→suppressionstable; recipes JOIN to honor) — no severity, no suppression-by-default, no universal-honor; consumer-chosen, not policy. - No renderer runtime — skyline / ASCII art / animated diagrams; the index emits structured rows. Shape-only output formatters (
--format mermaidshipped;--format sarif/annotationsfor CI; D2 / Graphviz on demand) are in scope. - No daemon for one-shot CLI — sub-100ms cold-start floor preserved for
query/show/snippet/ etc.; they spawn no watcher. The inherently long-running modes default-ON since 2026-05:mcp/serveboot the chokidar watcher in-process so every tool reads a live index. Pass--no-watchor setCODEMAP_WATCH=0to opt out for ephemeral / fire-and-forget invocations. Standalonecodemap watchdecouples the watcher from a transport. - Embedded intent classification beyond the thin keyword classifier in
codemap context --for "<intent>"— deeper routing belongs in the agent host (Cursor / Claude Code / MCP client). - No LLM in the box — embedded intent classification, semantic search over symbol names, embedding-driven recipe routing — the agent host owns this. We supply structure; they supply meaning.
- No opinionated autofix engine — codemap does not decide fixes like ESLint / codemod tools do. The shipped
codemap applypath is a substrate-shaped executor: recipes or agents provide explicit diff rows, and codemap validates / applies those rows with confirmation gates. - No runtime tracing — production beacons / live execution telemetry are a different product class (live process data, not static analysis). Post-mortem coverage ingestion (
codemap ingest-coveragereading Istanbul / LCOV / V8 protocol dumps fromNODE_V8_COVERAGE=...) is the static-side adjacent capability — local-only, no SaaS aggregation. - No JS execution at index time — config files via
import()is the only exception; recipe SQL is parsed but nevereval'd. Plugin layer (tracked atplans/c9-plugin-layer.md) must respect this — plugins describe rules in static config, not by running arbitrary code. Safety floor — protects supply-chain attack surface. - No telemetry upload — codemap never sends usage data anywhere. Local recipe-recency tracking is opt-out and stays in
.codemap/index.db. Floor exists to resist accumulation pressure. - No remote-repo cloning —
codemap github.com/x/y(clone-and-index a remote URL) is demoware, not a real workflow; the user's local checkout is always the source of truth. Indexing another tree is--root <path>/CODEMAP_ROOT, never a network fetch. Rejected in PR #23. - No split-brain incremental index — incremental /
--filespaths must update every table recipes and MCP tools query (core graph, FTS, heritage, bindings, …) for changed files in the same pass. Never ship lazy secondary surfaces where symbol lookup is fresh but body search (or any other recipe path) is stale. Incremental perf work belongs in § Perf-triangulation deferrals (trigger-gated), not deferred consistency.
Prioritized agent & indexing ops queue (2026-05). Reference: agents.md, benchmark § Agent eval harness.
Wave 1–2 shipped in #126–#138 (MCP instructions, allowlist, WSL watch, git hooks, trace/explore/node, agents init --mcp, affected tests, index lock/unlock, parse-worker hardening, field-qualified search). Agent eval (PR 9) shipped in #139 (probe) + #144 (live MCP arms + log comparison).
P2 — strategic (trigger-gated where noted)
- Framework route extraction — Express / React Router / NestJS
http_routessubstrate. Plan:plans/framework-route-extraction.md. Blocked on C.9 contract. Effort: L. - Callback dispatch synthesis (JSX tracer) —
calls.provenance, opt-insynthesis.heuristicCalls, Moat-A filters,calls-including-heuristic. Shipped #164. See architecture §calls. EventEmitter / classsetState→render heuristics skipped for now (not oxc-limited — separate design). - Cross-project MCP root — optional
rooton tools + DB cache. Plan:plans/cross-project-mcp-root.md. Effort: M. - FTS default-on evaluation — measure DB size tax; maybe flip default. Plan:
plans/fts-default-on-evaluation.md. Effort: S–M.
Long-running MCP / HTTP sessions dominate agent workflows; one-shot CLI keeps the sub-100ms cold-start floor (§ Floors — No daemon for one-shot CLI). Items here apply to mcp / serve / watch only unless noted.
- MCP shared daemon per project — one watcher + one SQLite writer per indexed root; Unix socket / named pipe so concurrent agent sessions share a live index instead of each spawning watchers and contending on WAL. Complements perf item 6.1 (read pool) but is a separate write-side + lifecycle concern. Effort: L.
- Rich
contextcomposer —start_hereon non-compactcontext: intent-ranked recipe cards, inline index summary, hub leaders with signatures (adaptive caps), debug-biased markers, optional MCP/HTTPinclude_snippets. Shipped #151. - Codebase map in bootstrap responses — hash-stable structural summary (top hubs, CLI entry hints, schema version, index freshness) auto-included in
context/ MCP initialize payload. Partial: hubs +start_here.index_summary+index_freshnessship oncontext; CLI entry hints + hash-stable map id still open. Opt-out via flag. Effort: S–M. - Index staleness surfacing —
index_freshness.pending_synconcontext, MCP tool metadata, and HTTP headers when the watcher debounce queue or in-flight reindex is active. Shipped #149. - Adaptive output budgets — scale trace/explore/node snippet char caps (and explore row limits) from indexed file counts via
resolveOutputBudget(file_count)inoutput-budget.ts. Shipped #152.contexthub/signature caps remain inresolveContextBudget(). - MCP session lifecycle hygiene — stdio disconnect detection (stdin EOF, stdout EPIPE, parent-PID poll, SIGINT/SIGTERM) and refcount-gated watcher stop on MCP client exit; HTTP
serve --watchstarts/stops the watcher per client (5s release grace between stateless requests;/healthexcluded). Explicitly no MCP idle timeout — process stays up while the stdio pipe is open even without tool calls (IDE hosts do not respawn mid-session). See architecture.md § Session lifecycle wiring. Effort: S–M. - PM-aware MCP spawn (
agents init --mcp) — resolve PMexecute-localvs dlx for MCP JSONcommand/argswhen codemap is a devDependency. Shipped #154. -
--mcp-invocation global|autoflag — explicit override to force globalcodemapon PATH vs PM-aware auto-resolve. Effort: S. -
agents init --targets(non-interactive IDE wiring) —--targets+--link-modefor CI/sandboxes; MCP subset when combined with--mcp. Shipped #158; see agents.md. -
agents inituninstall (teardown) — symmetric inverse of init for failed pilots, template mistakes, or leaving a repo: remove codemap-managed MCP entries, pointer sections, and IDE symlinks only (same scoped paths as init; never delete user-authored.agents/siblings).--targetfilter,--yesnon-interactive. Not the happy-path docs story — adoption staysinit --mcp --git-hooks+ committed.agents/. Effort: S. - HEAD / index freshness warning —
index_freshness.commit_drift+warningoncontext/ tool metadata; boot stderr oncodemap mcp/servewhen concerns remain after prime. Shipped #149.
Predicate-as-API only — enrich row shape and audit deltas; no standalone pass/fail verdict primitive (Moat A).
- Audit delta attribution — on
audit --base <ref>(and matching MCP/HTTP audit), tag eachaddedrow withattribution: introduced | inheritedvia stable finding keys (requiredColumns→ deterministic key) diffed against the sha-keyed audit-cache index at the merge base. Per-deltasummarycounts (added_introduced,added_inherited) optional whensummary: true. Reuses shippedaudit-worktree/git archivecache — no new verdict primitive (Moat A). Complements deferredcodemap auditverdict + thresholds (consumer filtersintroducedviajq). Plan:plans/audit-delta-attribution.md. Effort: M. - Evidence chains on recipe rows — extend high-judgment recipe SQL with standard columns
reason(short detection code + clause) and optionalevidence_json(bounded hop array): e.g.unimported-exports→re_export_chainssummary / unresolved-import blind-spot hint;boundary-violations→ matched deny rule;deprecated-symbols→ top caller sites fromcalls/references. Phased v1 on three recipes; complements frontmatteractions[]— agents cite evidence beforeapply/ manual edits (Moat A). Plan:plans/evidence-chains-on-recipe-rows.md. Effort: M–L. - Tiered lookup fast paths —
show/ exact-name recipe paths hit covering indexes first; document latency expectations in MCP tool descriptions. FTS and broad scans remain explicit fallbacks. Effort: S–M. - Graph-estimated CRAP recipe — bundled
high-crap-score: CRAP =CC² × (1 - coverage/100)³ + CCusingsymbols.complexity; measuredcoveragewhen ingested, else graph-estimated tiers (85% / 40% / 0% from test-file reachability overdependencies/calls/test_suites). Rows exposecoverage_source: measured | estimated. Complementshigh-complexity-untestedwhen no coverage file exists. Plan:plans/graph-estimated-crap.md. Effort: M. - Coverage-confirmed dead recipe — bundled
coverage-confirmed-dead: JOIN static dead-code predicate (uncalled exports, suppression-aware) with ingestedcoverage— rows carryconfidence: highwhen callers = 0 andcoverage_pct = 0,mediumwhen coverage not ingested. Predicate columns only, no verdict primitive (Moat A). Plan:plans/coverage-deletion-confidence.md. Effort: L–M.
- Zero-deps shell installer — curl|sh (and PowerShell) platform binary fetch alongside npm; optional bundled Node/Bun runtime for consumers without a JS toolchain. Plan:
plans/zero-deps-shell-installer.md. Effort: L. See also packaging.md. - Scripted dual-agent harness — task JSON + golden expected answers; spawn MCP-on / MCP-off LLM agents on the same structural tasks; score tool count + answer diff against index ground truth. Extends
scripts/agent-eval/(dev-only; not npm). Complements probe/live/log arms in benchmark § Agent eval harness. Effort: M. - Agent eval: quality × tokens × wall — extend benchmark § Agent eval harness with scored task rubrics (file/function/snippet correctness) on named public corpora, reported alongside structural tool cost. Complements Falsifiable benchmark CI below; same external fixture policy (public repos only). Effort: M.
- C.9 framework plugin layer — static entry-point hints on
files(is_entryorentry_annotations) to sharpen reachability-predicate recipes (untested-and-dead,unimported-exports, futuredead-files-by-reachability). Plugins: declarative globs + optional AST-parse of tool config files (Vite/Next/Vitest, …); activate via configplugins:list and/or packageenablers— no JS eval at index time. Plan:plans/c9-plugin-layer.md. Effort: XL; ships last in the impact-vs-cadence sequence (see plan § Shipping cadence). - LSP diagnostic-push + VSCode extension — recipes-as-
Diagnostic[]server + paired extension; explicitly not a go-to-def / references shim (tsservercovers those). Plan:plans/lsp-diagnostic-push.md. Effort: XL; soft ordering after C.9 for cleaner squigglies on framework files. - Apply-engine direction — diff-shape recipes (8 bundled ids),
actions[].commandon apply + read→apply pairs,auto_fixable/--force,rename-preview(calls, re-exports, barrel, JSX; homonymdefine_in#165; CLIcodemap renamealias #166),apply --rows/apply_rows,--diff-input,--commit,--until-empty,apply.autoApplyRecipes. Shipped #165 + #166. Executor + transport:architecture.md§ Apply,glossary.md§ codemap apply. -
organize-importsdiff-shape recipe — deterministic single-file import sort/group;imports.line_number+sourcesubstrate sufficient. Review-first (auto_fixable: false). Effort: S. -
codemap-to-tsmorphPath B adapter — separate package experiment:query_recipediscovery →ts-morph/jscodeshifttransforms for AST-shape edits codemap's substring executor defers (see architecture § Rejected apply-path alternatives). Not an in-tree AST writer (Path A rejected). Effort: M. - Apply write-safety hardening — close apply TOCTOU: SHA-256
hashContentat phase-1 read, recheck disk hash immediately before phase-2 write (file content changedconflict);fsynctemp file beforerename; skip files with mixed CRLF/LF (mixed line endings). Preserves all-or-nothing on any conflict. Plan:plans/apply-write-safety.md. Effort: L. -
historytable (deferred — revisit-triggered) — temporal queries: "when did symbol X get@deprecated?", "coverage trend over last 50 commits", "files that became dead this week".audit --base <ref>covers the most-common temporal question (PR-scoped diff) without schema growth, so the table earns its place only when bigger questions emerge. Two shapes (per-commit snapshots ~N × DB size; append-only event log heavier CTE walks); both pay an N-reindexes backfill cost (~30s per reindex). Revisit triggers: two consumers shipjq-based "audit-runs-over-time" workflows, ORquery_baselinesevolution becomes a recurring agent need. -
codemap auditverdict + thresholds (v1.x) —verdict: "pass" | "warn" | "fail"driven by anaudit.deltas[<key>].{added_max, action}field on the config object (.codemap/config.{ts,js,json}). Triggers: two consumers shipjq-based threshold scripts with similar shapes, OR one consumer asks with a concrete config sketch. Until then, raw deltas + consumer-sidejqis the CI exit-code idiom. Likely accelerant: the Marketplace Action (next item) shipping is the most plausible path to firing the trigger — once- uses: stainless-code/codemap@v1is the dominant CI path, realjqthreshold scripts will surface. - GitHub Marketplace Action — publish + listing finish — core Action implementation is in-tree: root
action.yml,query --ci,audit --format sarif/--ci, package-manager detection, dogfood smoke, and opt-inpr-commentsummary renderer have shipped. Remaining work is the release/listing slice:MARKETPLACE.md,v1.0.0/ floatingv1tags, Marketplace setup, sacrificial-repo smoke, and makingaction-smokeblocking once the Action tag exists. Action version stream is independent of CLI version (package.jsoncurrently drives CLI/npm version; Action publishes at its ownv1.0.0). Plan:plans/github-marketplace-action.md. Effort: S. - Cognitive complexity column —
symbols.cognitive_complexity(SonarSource rules) alongside cyclomaticcomplexityin the same oxc walk; recipehigh-cognitive-complexity. Improves refactor-priority JOINs with coverage/churn recipes. Plan:plans/cognitive-complexity.md. Effort: M. - Churn × complexity hotspots —
file_churntable (gitlog --numstatover indexed paths, recency-weighted commits, optional trend) + bundled recipechurn-complexity-hotspotsJOINingsymbols.complexityfor ranked refactor targets. Distinct from outcome aliashotspots→fan-in. Score is a recipe column, not a verdict (Moat A). Plan:plans/churn-complexity-hotspots.md. Effort: L–M. - AST-hash duplication —
symbols.body_hashcolumn (normalized AST hash via oxc, computed at parse time — Rust-native, fast) + bundledduplicatesrecipe joining onbody_hash(GROUP BY body_hash HAVING COUNT(*) > 1). Different shape from token-level suffix-array dupes (catches structurally-identical functions, not copy-paste with renamed variables). Substrate addition — consumer writes the JOIN that decides "this is a problem"; no severity, no suppression-by-default. Plan:plans/ast-hash-duplication.md. Effort: M. - Falsifiable benchmark CI on named external fixtures — structural-cost A/B (indexed queries vs
find+grep+Read-loop discovery) on zod, fastify, vue-core, next.js. Numbers land indocs/benchmark.md; headline figures surface inMARKETPLACE.mdonly after external runs land. Harness: benchmark § Agent eval harness + external fixture extension; pair with Agent eval: quality × tokens × wall for scored completion metrics. Partial: manual.github/workflows/agent-eval-external.ymlfor in-repo fixture paths (not zod/fastify/nightly). Effort: M. Self-index regression guardrail shipped (#96):bun run check:perf-baseline+ weekly scheduled workflow (demoted from PR hard gate — GHA runner variance). - In-repo test bench scale (optional) — if
fixtures/minimaloutgrows one corpus: add committedfixtures/bench/or renameminimal→bench. Harness map:testing-coverage.md,fixtures/README.md.
-
Perf-triangulation deferrals — Tier 5.2 / 5.4 / 5.6 / 5.7 / 6.1 / 6.2 from
plans/perf-triangulation-rollout.md(Phases 0-2 + Phase 5 shipped; per-model audit + triangulation source content consolidated into the rollout plan 2026-05-18). Each ships when its trigger fires:- 5.2 IPC encoding (CBOR / transferables) — after a
parse_ms_pure_workerinstrumentation split shows IPC > ~30% ofparse_ms. - 5.4
extractMarkerslineMap reuse on TS/JS — if marker extraction becomes hot on >10k-file trees (~1ms on this repo today). - 5.6 group-by bucketizer cache per root — when a
mcp/serveuser reports slow repeatedquery --group-by owner|package. - 5.7 sync git subprocess collapse — if git-subprocess time becomes measurable in incremental wall (Tier 2.3 mostly killed it).
- 6.1 persistent read-only connection pool — when
mcp/serveindexing 10k+ trees reports contention. Scoped to long-running transports only, NOT one-shot CLI (GPT-5.5 caveat). - 6.2 CI dep install /
package-manager-detectorvendoring — after timing existing CI install steps confirms meaningful savings.
Architectural follow-ups (plan-PR-first): parse → insert pipeline overlap and AST cache — see
plans/perf-triangulation-rollout.mdPhase 3. - 5.2 IPC encoding (CBOR / transferables) — after a
-
Repo-structure conversion (codemap itself: flat → monorepo) — tracked decision, not a backlog item to ship. Default bias: stay flat until a trigger fires (C.9 community plugins ship as separate packages, OR a user asks for
codemap-corelibrary export, OR a second distro emerges). Full analysis + three options + reference layouts (oxc / knip / biome / vitest) + revisit triggers inplans/lsp-diagnostic-push.md § Repo-structure tradeoffs. Don't convert preemptively. -
Monorepo / workspace awareness — discover workspaces from
pnpm-workspace.yaml/package.jsonand index per-workspace dependency graphs (separate from the codemap-itself repo-structure decision above; this is about indexing user repos) -
Cross-agent handoff artifact — speculative; layered prefix/delta JSON written on session-stop, read on session-start. Complementary to indexing rather than core to it; revisit if user demand emerges
-
Adapter scaffolding —
codemap create-adapter --name [name]generates adapter + test + fixture boilerplate; blocked on community adapter registration API (could land with manual registration) -
Config loader — two candidates: (a) c12 — battle-tested (Nuxt/Nitro), adds extends, env overrides, RC files, watching; still executes config via
jiti. (b) AST-based extraction withoxc-parser— faster, no side effects, safer in untrusted repos; can't handle async/dynamic configs, needsimport()fallback. Current: nativeimport()inconfig.ts -
Optional GitHub Actions
workflow_dispatch— run golden/benchmark against a public corpus only (never private app code). Distinct from the shipped agent-eval external workflow (in-repo fixtures only). -
Sass / Less / SCSS: Lightning CSS is CSS-only; preprocessors need a compile step before CSS parsing — see architecture.md § CSS
-
UnJS adoption — candidates:
citty(CLI builder),pathe(cross-platform paths),consola(structured logging),pkg-types(typedpackage.json/tsconfig.json),c12(config loader — see config loader item above)