Roadmap

Forward-looking plans only — not a mirror of src/. Doc index: README.md. Design / ship: architecture.md, packaging.md. Shipped features (adapters, fixtures, codemap agents init — agents.md) live in src/ and linked docs — not enumerated here.

Community language adapters — optional packages (e.g. Tree-sitter) with a peerDependency on @stainless-code/codemap and a public registration API beyond built-ins in src/adapters/.
Agent tooling — evaluate TanStack Intent for versioned skills in node_modules (optional; codemap agents init remains the default).

Strategy

Layer	Role
Core	Schema, incremental indexing, git invalidation, `dependencies`, CLI, `query`
Community adapters	Future optional packages; peerDependency on `@stainless-code/codemap`

Non-goals (v1)

Codemap stays a structural-index primitive that other tools can consume. Two layers below: Moats are load-bearing — eroding either turns codemap into yet-another-tool-in-the-cohort instead of the predicate-shaped specialist. Floors are real shape constraints but not differentiators; soft v1 product-shape preferences. Consumer-facing framing of when to reach for codemap vs alternatives lives in why-codemap.md § When to reach for something else.

Moats (load-bearing)

Every PR reviewer defends these. The reviewer tests embedded below are the canonical filters for any new verb / column / engine.

A. SQL is the API. Every capability is a recipe (saved query) or a primitive recipes can compose — never a pre-baked verdict. SQL is a durable, well-known query language; agents compose any predicate without us deciding which questions are important. The moment a CLI verb returns pass/fail without a recipe form behind it, the moat erodes — the tool becomes "yet another linter with opinions baked in" instead of "the database your agent queries." Verdicts are an OUTPUT mode (e.g. --format sarif, audit --base <ref> deltas), never a primitive. Reviewer test for any new verb: "is this also expressible as query --recipe <id>?"
B. Extracted structure ≥ verdicts. Schema breadth is the substrate every recipe layers on. CSS (css_variables / css_classes / css_keyframes), markers, type_members, type_heritage, calls.caller_scope, components.hooks_used, the substrate-extraction tier (scopes / references / bindings / function_params / runtime_markers / test_suites / re_export_chains / module_cycles / file_metrics / import_specifiers / jsx_elements / jsx_attributes / async_calls / try_catch / decorators / jsdoc_tags / dynamic_imports) — these are codemap-specific extractions; their richness directly determines what JOINs are expressible and which agent questions get clean answers. Slimming the schema for theoretical perf / simplicity is a regression unless the column is empirically unread. Reviewer test for any "drop column X" PR: "what recipe (bundled or hypothetical) does this kill?"

Floors (v1 product-shape)

Soft constraints — describe shipped reality. Decided-but-unshipped flips live in § Backlog, not here.

Full-text search default-on — opt-in FTS5 ships per the --with-fts CLI flag / fts5: true config field (default OFF; populates source_fts virtual table at index time). Default-on decision gated on measurement — plan: plans/fts-default-on-evaluation.md.
No LSP engine — no rename / go-to-definition / hover types. Read-side LSP-adjacent primitives (show / snippet / impact) ship as CLI / MCP / HTTP verbs (see README § CLI). LSP diagnostic-push server (recipes-as-Diagnostic[]) is a separate roadmap item tracked at plans/lsp-diagnostic-push.md.
No opinionated rule engine / fix engine / severity levels — verdict-shaped lints (knip, jscpd, eslint) are a different product class. Predicate-as-API recipes (untested-and-dead, worst-covered-exports, visibility-tags, barrel-files, deprecated-symbols, …) are in scope and shipping; they're upstream of Moat A. Suppression comments ship as opt-in substrate (// codemap-ignore-{next-line,file} <recipe-id> → suppressions table; recipes JOIN to honor) — no severity, no suppression-by-default, no universal-honor; consumer-chosen, not policy.
No renderer runtime — skyline / ASCII art / animated diagrams; the index emits structured rows. Shape-only output formatters (--format mermaid shipped; --format sarif / annotations for CI; D2 / Graphviz on demand) are in scope.
No daemon for one-shot CLI — sub-100ms cold-start floor preserved for query / show / snippet / etc.; they spawn no watcher. The inherently long-running modes default-ON since 2026-05: mcp / serve boot the chokidar watcher in-process so every tool reads a live index. Pass --no-watch or set CODEMAP_WATCH=0 to opt out for ephemeral / fire-and-forget invocations. Standalone codemap watch decouples the watcher from a transport.
Embedded intent classification beyond the thin keyword classifier in codemap context --for "<intent>" — deeper routing belongs in the agent host (Cursor / Claude Code / MCP client).
No LLM in the box — embedded intent classification, semantic search over symbol names, embedding-driven recipe routing — the agent host owns this. We supply structure; they supply meaning.
No opinionated autofix engine — codemap does not decide fixes like ESLint / codemod tools do. The shipped codemap apply path is a substrate-shaped executor: recipes or agents provide explicit diff rows, and codemap validates / applies those rows with confirmation gates.
No runtime tracing — production beacons / live execution telemetry are a different product class (live process data, not static analysis). Post-mortem coverage ingestion (codemap ingest-coverage reading Istanbul / LCOV / V8 protocol dumps from NODE_V8_COVERAGE=...) is the static-side adjacent capability — local-only, no SaaS aggregation.
No JS execution at index time — config files via import() is the only exception; recipe SQL is parsed but never eval'd. Plugin layer (tracked at plans/c9-plugin-layer.md) must respect this — plugins describe rules in static config, not by running arbitrary code. Safety floor — protects supply-chain attack surface.
No telemetry upload — codemap never sends usage data anywhere. Local recipe-recency tracking is opt-out and stays in .codemap/index.db. Floor exists to resist accumulation pressure.
No remote-repo cloning — codemap github.com/x/y (clone-and-index a remote URL) is demoware, not a real workflow; the user's local checkout is always the source of truth. Indexing another tree is --root <path> / CODEMAP_ROOT, never a network fetch. Rejected in PR #23.
No split-brain incremental index — incremental / --files paths must update every table recipes and MCP tools query (core graph, FTS, heritage, bindings, …) for changed files in the same pass. Never ship lazy secondary surfaces where symbol lookup is fresh but body search (or any other recipe path) is stale. Incremental perf work belongs in § Perf-triangulation deferrals (trigger-gated), not deferred consistency.

Backlog

Agent & indexing ops

Prioritized agent & indexing ops queue (2026-05). Reference: agents.md, benchmark § Agent eval harness.

Wave 1–2 shipped in #126–#138 (MCP instructions, allowlist, WSL watch, git hooks, trace/explore/node, agents init --mcp, affected tests, index lock/unlock, parse-worker hardening, field-qualified search). Agent eval (PR 9) shipped in #139 (probe) + #144 (live MCP arms + log comparison).

P2 — strategic (trigger-gated where noted)

Framework route extraction — Express / React Router / NestJS http_routes substrate. Plan: plans/framework-route-extraction.md. Blocked on C.9 contract. Effort: L.
Callback dispatch synthesis (JSX tracer) — calls.provenance, opt-in synthesis.heuristicCalls, Moat-A filters, calls-including-heuristic. Shipped #164. See architecture § calls. EventEmitter / class setState→render heuristics skipped for now (not oxc-limited — separate design).
Cross-project MCP root — optional root on tools + DB cache. Plan: plans/cross-project-mcp-root.md. Effort: M.
FTS default-on evaluation — measure DB size tax; maybe flip default. Plan: plans/fts-default-on-evaluation.md. Effort: S–M.

Agent session & warm-path economics

Long-running MCP / HTTP sessions dominate agent workflows; one-shot CLI keeps the sub-100ms cold-start floor (§ Floors — No daemon for one-shot CLI). Items here apply to mcp / serve / watch only unless noted.

Recipe & audit enrichment

Predicate-as-API only — enrich row shape and audit deltas; no standalone pass/fail verdict primitive (Moat A).

Audit delta attribution — on audit --base <ref> (and matching MCP/HTTP audit), tag each added row with attribution: introduced | inherited via stable finding keys (requiredColumns → deterministic key) diffed against the sha-keyed audit-cache index at the merge base. Per-delta summary counts (added_introduced, added_inherited) optional when summary: true. Reuses shipped audit-worktree / git archive cache — no new verdict primitive (Moat A). Complements deferred codemap audit verdict + thresholds (consumer filters introduced via jq). Plan: plans/audit-delta-attribution.md. Effort: M.
Evidence chains on recipe rows — extend high-judgment recipe SQL with standard columns reason (short detection code + clause) and optional evidence_json (bounded hop array): e.g. unimported-exports → re_export_chains summary / unresolved-import blind-spot hint; boundary-violations → matched deny rule; deprecated-symbols → top caller sites from calls / references. Phased v1 on three recipes; complements frontmatter actions[] — agents cite evidence before apply / manual edits (Moat A). Plan: plans/evidence-chains-on-recipe-rows.md. Effort: M–L.
Tiered lookup fast paths — show / exact-name recipe paths hit covering indexes first; document latency expectations in MCP tool descriptions. FTS and broad scans remain explicit fallbacks. Effort: S–M.
Graph-estimated CRAP recipe — bundled high-crap-score: CRAP = CC² × (1 - coverage/100)³ + CC using symbols.complexity; measured coverage when ingested, else graph-estimated tiers (85% / 40% / 0% from test-file reachability over dependencies / calls / test_suites). Rows expose coverage_source: measured | estimated. Complements high-complexity-untested when no coverage file exists. Plan: plans/graph-estimated-crap.md. Effort: M.
Coverage-confirmed dead recipe — bundled coverage-confirmed-dead: JOIN static dead-code predicate (uncalled exports, suppression-aware) with ingested coverage — rows carry confidence: high when callers = 0 and coverage_pct = 0, medium when coverage not ingested. Predicate columns only, no verdict primitive (Moat A). Plan: plans/coverage-deletion-confidence.md. Effort: L–M.

Distribution & evaluation depth

Zero-deps shell installer — curl|sh (and PowerShell) platform binary fetch alongside npm; optional bundled Node/Bun runtime for consumers without a JS toolchain. Plan: plans/zero-deps-shell-installer.md. Effort: L. See also packaging.md.
Scripted dual-agent harness — task JSON + golden expected answers; spawn MCP-on / MCP-off LLM agents on the same structural tasks; score tool count + answer diff against index ground truth. Extends scripts/agent-eval/ (dev-only; not npm). Complements probe/live/log arms in benchmark § Agent eval harness. Effort: M.
Agent eval: quality × tokens × wall — extend benchmark § Agent eval harness with scored task rubrics (file/function/snippet correctness) on named public corpora, reported alongside structural tool cost. Complements Falsifiable benchmark CI below; same external fixture policy (public repos only). Effort: M.

Core substrate & platform

Perf-triangulation deferrals (trigger-gated)

Perf-triangulation deferrals — Tier 5.2 / 5.4 / 5.6 / 5.7 / 6.1 / 6.2 from plans/perf-triangulation-rollout.md (Phases 0-2 + Phase 5 shipped; per-model audit + triangulation source content consolidated into the rollout plan 2026-05-18). Each ships when its trigger fires:
- 5.2 IPC encoding (CBOR / transferables) — after a parse_ms_pure_worker instrumentation split shows IPC > ~30% of parse_ms.
- 5.4 extractMarkers lineMap reuse on TS/JS — if marker extraction becomes hot on >10k-file trees (~1ms on this repo today).
- 5.6 group-by bucketizer cache per root — when a mcp / serve user reports slow repeated query --group-by owner|package.
- 5.7 sync git subprocess collapse — if git-subprocess time becomes measurable in incremental wall (Tier 2.3 mostly killed it).
- 6.1 persistent read-only connection pool — when mcp / serve indexing 10k+ trees reports contention. Scoped to long-running transports only, NOT one-shot CLI (GPT-5.5 caveat).
- 6.2 CI dep install / package-manager-detector vendoring — after timing existing CI install steps confirms meaningful savings.
Architectural follow-ups (plan-PR-first): parse → insert pipeline overlap and AST cache — see plans/perf-triangulation-rollout.md Phase 3.
Repo-structure conversion (codemap itself: flat → monorepo) — tracked decision, not a backlog item to ship. Default bias: stay flat until a trigger fires (C.9 community plugins ship as separate packages, OR a user asks for codemap-core library export, OR a second distro emerges). Full analysis + three options + reference layouts (oxc / knip / biome / vitest) + revisit triggers in plans/lsp-diagnostic-push.md § Repo-structure tradeoffs. Don't convert preemptively.
Monorepo / workspace awareness — discover workspaces from pnpm-workspace.yaml / package.json and index per-workspace dependency graphs (separate from the codemap-itself repo-structure decision above; this is about indexing user repos)
Cross-agent handoff artifact — speculative; layered prefix/delta JSON written on session-stop, read on session-start. Complementary to indexing rather than core to it; revisit if user demand emerges
Adapter scaffolding — codemap create-adapter --name [name] generates adapter + test + fixture boilerplate; blocked on community adapter registration API (could land with manual registration)
Config loader — two candidates: (a) c12 — battle-tested (Nuxt/Nitro), adds extends, env overrides, RC files, watching; still executes config via jiti. (b) AST-based extraction with oxc-parser — faster, no side effects, safer in untrusted repos; can't handle async/dynamic configs, needs import() fallback. Current: native import() in config.ts
Optional GitHub Actions workflow_dispatch — run golden/benchmark against a public corpus only (never private app code). Distinct from the shipped agent-eval external workflow (in-repo fixtures only).
Sass / Less / SCSS: Lightning CSS is CSS-only; preprocessors need a compile step before CSS parsing — see architecture.md § CSS
UnJS adoption — candidates: citty (CLI builder), pathe (cross-platform paths), consola (structured logging), pkg-types (typed package.json/tsconfig.json), c12 (config loader — see config loader item above)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Roadmap

Next

Strategy

Non-goals (v1)

Moats (load-bearing)

Floors (v1 product-shape)

Backlog

Agent & indexing ops

Agent session & warm-path economics

Recipe & audit enrichment

Distribution & evaluation depth

Core substrate & platform

Perf-triangulation deferrals (trigger-gated)

Uh oh!

Uh oh!

FilesExpand file tree

roadmap.md

Latest commit

History

roadmap.md

File metadata and controls

Roadmap

Next

Strategy

Non-goals (v1)

Moats (load-bearing)

Floors (v1 product-shape)

Backlog

Agent & indexing ops

Agent session & warm-path economics

Recipe & audit enrichment

Distribution & evaluation depth

Core substrate & platform

Perf-triangulation deferrals (trigger-gated)