Skip to content

Commit 8595173

Browse files
feat(churn): git churn ingest and churn-complexity-hotspots recipe (#179)
* feat(churn): git churn ingest and churn-complexity-hotspots recipe Add file_churn substrate with full/incremental/idle refresh on every index pass, codemap ingest-churn for golden seeds, and a refactor-priority recipe ranked by churn × complexity with perf-baseline churn_ms gating. * harden: churn idle-skip correctness, docs lift, consumer surfaces Require populated file_churn and config fingerprint before idle skip; deletions-only index skips git log; ingest-churn needs index + inline schema help; delete shipped plan and fix inbound refs. * docs(agents): churn-hotspot AX parity across MCP, skill, and recipe actions Wire churn-complexity-hotspots into MCP playbook and recipe chains, add churn column guidance to rule/skill shards, per-row review actions on the recipe, and cross-links from refactor-risk recipes. * harden: ingest_churn MCP parity, doc fixes, churn_idle_ms gate Ship post-merge items in PR #179: MCP/HTTP ingest_churn, churn.file git skip, context churn_hint, path_prefix param, churn_idle_ms perf baseline. Fix architecture/recipe doc drift (CodeRabbit), reject empty churn JSON without wiping file_churn, and sweep 21-tool consumer surfaces. * test(docs): close churn ROI nits — path_prefix golden, idle gate, leaks Add path_prefix golden + scope test, incremental churn merge test, churn_idle_ms per-phase noise floor and idle sanity cap, and genericize served skill shard comments. * harden: churn ingest correctness and index-table-stats golden order Do not wipe file_churn on git log failure; fall back to full churn refresh when config fingerprint drifts during incremental index. Run index-table-stats golden before churn seed scenarios (file_churn: 46). * test: close CodeRabbit churn nits — config path, help, golden guards Cover churn.file absolute resolution, mark computed_at optional in ingest-churn help, guard file_churn in index-table-stats matrix, and reuse parseChurnJsonPayload for golden seed validation. * harden: churn ingest transactions, trend validation, consumer parity Wrap replaceFileChurn/mergeFileChurnForPaths in transactions; validate churn_trend on JSON ingest; align README, recipe, agent-content, and CLI help for churn.file override and hotspots alias distinction.
1 parent 36106ff commit 8595173

70 files changed

Lines changed: 3002 additions & 407 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@stainless-code/codemap": minor
3+
---
4+
5+
Add churn × complexity hotspot ranking: `file_churn` refreshed on every index from git history, with `codemap ingest-churn`, MCP/HTTP `ingest_churn`, and config `churn.file` for non-git repos. New `churn-complexity-hotspots` recipe ranks files or symbols (`by_symbol`) by change frequency × complexity with normalized 0–100 scores and `churn_trend`. Outcome alias `hotspots` still maps to fan-in.

README.md

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,8 @@ codemap validate --json # detect stale / mi
5454
codemap context --compact --for "refactor auth" # JSON envelope + intent-matched recipes
5555
codemap ingest-coverage coverage/coverage-final.json --json # Istanbul / LCOV (auto-detected) → coverage table; joins with symbols
5656
NODE_V8_COVERAGE=.cov bun test && codemap ingest-coverage .cov --runtime --json # V8 protocol (per-process dumps); local-only
57+
codemap ingest-churn metrics/churn.json --json # precomputed file_churn → churn-complexity-hotspots (non-git / CI)
58+
codemap query --json --recipe churn-complexity-hotspots # change-frequency × complexity (not the hotspots alias)
5759
codemap agents init # scaffold .agents/ rules + skills
5860
codemap agents init --mcp # PM-aware project MCP config (see docs/agents.md)
5961
codemap apply rename-preview --params old=foo,new=bar --dry-run # preview recipe-driven edits (substrate executor)
@@ -86,7 +88,7 @@ codemap query --json --recipe fan-out-sample
8688
codemap dead-code --json # → query --recipe untested-and-dead
8789
codemap deprecated --ci # → query --recipe deprecated-symbols --ci
8890
codemap boundaries --format sarif > boundary-findings.sarif # → query --recipe boundary-violations --format sarif
89-
codemap hotspots --json --group-by directory # → query --recipe fan-in --json --group-by directory
91+
codemap hotspots --json --group-by directory # → query --recipe fan-in (import hubs — not churn×complexity)
9092
codemap coverage-gaps --json --summary # → query --recipe worst-covered-exports --json --summary
9193
# Parametrised recipes validate params from <id>.md frontmatter before SQL binding.
9294
codemap query --json --recipe find-symbol-by-kind --params kind=function,name_pattern=%Query%
@@ -238,12 +240,12 @@ codemap skill # full codemap S
238240
codemap rule # full codemap rule markdown to stdout
239241

240242
# MCP server (Model Context Protocol) — for agent hosts (Claude Code, Cursor, Codex, generic MCP clients)
241-
codemap mcp # JSON-RPC on stdio (20 tools; watcher default-ON)
242-
# Tools (20): query, query_batch, query_recipe, audit, save_baseline,
243+
codemap mcp # JSON-RPC on stdio (21 tools; watcher default-ON)
244+
# Tools (21): query, query_batch, query_recipe, audit, save_baseline,
243245
# list_baselines, drop_baseline, context, validate, show, snippet, impact,
244246
# affected, trace, explore, node, apply, apply_rows, apply_diff_input,
245-
# ingest_coverage
246-
# CLI twins: query batch, trace, explore, node, file, schema, symbols, context --include-snippets, ingest-coverage (same JSON as MCP/HTTP).
247+
# ingest_coverage, ingest_churn
248+
# CLI twins: query batch, trace, explore, node, file, schema, symbols, context --include-snippets, ingest-coverage, ingest-churn (same JSON as MCP/HTTP).
247249
# query / query_recipe also accept baseline (same diff envelope as codemap query --baseline).
248250
# Resources: codemap://schema, codemap://skill, codemap://rule, codemap://mcp-instructions (lazy-cached);
249251
# codemap://recipes, codemap://recipes/{id} (live read-per-call — recency fields stay fresh);

docs/agents.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -133,7 +133,7 @@ See [architecture.md § Session lifecycle wiring](./architecture.md#session-life
133133

134134
**`context.index_freshness`** — session bootstrap includes index-level freshness metadata: `commit_drift` (HEAD ≠ `last_indexed_commit`), `pending_sync` (watcher debounce queue or in-flight reindex), optional disk-drift counts when watch is off, and a single `warning` string when agents should pause or re-index. **`context.start_here`** (non-compact) adds inline index summary, intent-ranked `query_recipe` cards, and top hub files with export signatures (adaptive caps by file count; optional MCP/HTTP `include_snippets` for one-line previews). Debug intent biases `sample_markers` toward FIXME/TODO. **MCP:** array-shaped JSON tools (`query`, …) keep row payloads verbatim and append a second `content` block prefixed `@codemap/index_freshness`; object-shaped tools merge `index_freshness` inline. **HTTP:** `POST /tool/*` adds `X-Codemap-Pending-Sync`, `X-Codemap-Commit-Drift`, and `X-Codemap-Warning` headers without changing JSON bodies; **`GET /health`** includes full cheap `index_freshness` when the DB is readable. Complements per-file `validate` / snippet `stale`. See [architecture.md § Context wiring](./architecture.md#context-wiring).
135135

136-
**MCP ToolAnnotations**`tools/list` (and HTTP `GET /tools`) expose advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool so clients can gate auto-approval. Read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; disk-write apply tools → `destructiveHint: true` (writes still require `yes: true`); index mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`) → `readOnlyHint: false` without `destructiveHint`.
136+
**MCP ToolAnnotations**`tools/list` (and HTTP `GET /tools`) expose advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool so clients can gate auto-approval. Read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; disk-write apply tools → `destructiveHint: true` (writes still require `yes: true`); index mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`, `ingest_churn`) → `readOnlyHint: false` without `destructiveHint`.
137137

138138
**`CODEMAP_MCP_TOOLS`** — comma-separated snake_case MCP tool names. When set, only listed tools register (stderr lists the active set). Unknown names are ignored with a warning. Unset = all tools (default). **`query_batch`** registers only when listed or when unset (eval ablation).
139139

docs/architecture.md

Lines changed: 18 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ Three **mutually exclusive** CLI entry shapes; all converge on `applyDiffPayload
212212

213213
**`src/application/session-lifecycle.ts`** — transport-specific start/stop rules for long-running `mcp` / `serve` processes (one-shot CLI unchanged). **`createStdioDisconnectMonitor`** (MCP only) exits the process when the agent host is actually gone: stdin EOF, stdout `EPIPE`, boot parent PID no longer alive (2s poll), or SIGINT/SIGTERM. The MCP SDK's stdio `transport.onclose` alone is insufficient — it fires only after an explicit `transport.close()`, not when the parent crashes without tearing down the pipe. **`createManagedWatchSession`** refcount-gates chokidar: MCP acquires one client before `connect` and **`forceStop`** drains the watcher on disconnect; HTTP acquires per authenticated request (after auth; **`GET /health`** excluded) and **`releaseClient`** stops the watcher when the count hits zero. **No MCP idle timeout:** `codemap mcp` does **not** exit after N minutes without tool calls while the stdio pipe stays open. IDE hosts spawn MCP once per session and do not reliably respawn it mid-conversation — an idle shutdown would break long pauses (human think time, reading, multi-step plans) with no recovery path. Orphan cleanup is handled by **disconnect detection**, not inactivity timers. **HTTP watch release grace (`HTTP_WATCH_RELEASE_GRACE_MS` = 5000):** distinct from idle timeout — only stops chokidar between stateless requests so the watcher is not started/stopped on every POST; the HTTP listener keeps running. **`GET /health`** liveness probes do not acquire a watch client (probes must not keep chokidar hot). Future **`MCP shared daemon per project`** could revisit opt-in idle policies with explicit client reconnect; not planned for stdio MCP today.
214214

215-
**Performance wiring:** **`--performance`** plumbs through **`RunIndexOptions.performance`****`indexFiles({ performance, collectMs })`**. `parse-worker-core.ts` records per-file **`parseMs`** on each `ParsedFile`; main thread times the eight phases (`collect`, `parse`, `insert`, `index_create`, `bindings`, `module_cycles`, `re_export_chains`, `heritage`) and assembles **`IndexPerformanceReport`** under `IndexRunStats.performance`. Note: `total_ms` is `indexFiles` wall-clock (parse + insert + DDL + bindings + cycles + re_exports + heritage), **not** end-to-end run wall — `collect_ms` happens before `indexFiles` and is reported separately. Env var **`CODEMAP_PERFORMANCE_JSON=<path>`** dumps the report as JSON post-run (consumed by [`bun run check:perf-baseline`](./benchmark.md#perf-baseline-regression-guardrail) for local + weekly scheduled drift checks — not a PR merge gate).
215+
**Performance wiring:** **`--performance`** plumbs through **`RunIndexOptions.performance`** → **`indexFiles({ performance, collectMs })`**. `parse-worker-core.ts` records per-file **`parseMs`** on each `ParsedFile`; main thread times the eight phases (`collect`, `parse`, `insert`, `index_create`, `bindings`, `module_cycles`, `re_export_chains`, `heritage`) and assembles **`IndexPerformanceReport`** under `IndexRunStats.performance`. Post-index **`refreshFileChurn`** records **`churn_ms`** separately (patched into the performance JSON when `CODEMAP_PERFORMANCE_JSON` is set). Note: `total_ms` is `indexFiles` wall-clock (parse + insert + DDL + bindings + cycles + re_exports + heritage), **not** end-to-end run wall — `collect_ms` and `churn_ms` happen outside `indexFiles` and are reported separately. Env var **`CODEMAP_PERFORMANCE_JSON=<path>`** dumps the report as JSON post-run (consumed by [`bun run check:perf-baseline`](./benchmark.md#perf-baseline-regression-guardrail) for local + weekly scheduled drift checks — not a PR merge gate).
216216

217217
**Agent templates:** `codemap agents init` writes thin pointer files (~18-line SKILL + ~25-line rule) to consumer disk; full content is served live by `codemap skill` / `codemap rule` (CLI) and `codemap://skill` / `codemap://rule` (MCP / HTTP) from `templates/agent-content/<kind>/*.md`. Section files concatenate in lexical order; `*.gen.md` sections dispatch to renderers in `application/agent-content.ts` so recipe catalog + schema DDL auto-register. Pointer-version stamp (`<!-- codemap-pointer-version: N -->`) + once-per-process stderr nag (`maybeWarnStalePointers`) flag stale consumer templates; cure is `codemap agents init --force`. Full matrix: [agents.md](./agents.md).
218218

@@ -496,6 +496,23 @@ One row per leaf parameter binding, ordered by `position`. Pattern params (`func
496496
| column_start | INTEGER | 0-based column of the binding token |
497497
| column_end | INTEGER | One-past-last column |
498498

499+
### `file_churn` — Git churn metrics per indexed file (`STRICT`)
500+
501+
One row per indexed file with git history in scope. Populated on **every index pass** by `refreshFileChurn` — git repos via `ingestFileChurnFromGit` (`git log --numstat` scoped to the project root pathspec); when config **`churn.file`** is set, JSON ingest runs instead and **skips** git log. Tunable via `churn.halfLifeDays` (default 90) and optional `churn.since` / CLI `--churn-since <ref>`. Non-git repos skip automatic git ingest (table empty until seeded). `churn_trend` is `accelerating` \| `stable` \| `cooling` when enough history exists, else NULL.
502+
503+
| Column | Type | Description |
504+
| ---------------- | ------- | -------------------------------------------------------------------------- |
505+
| file_path | TEXT PK | FK → `files(path)` CASCADE |
506+
| commit_count | INTEGER | Distinct commits touching the file in scope |
507+
| weighted_commits | REAL | Recency-weighted commit count (default 90-day half-life exponential decay) |
508+
| lines_added | INTEGER | Sum of added lines from numstat |
509+
| lines_removed | INTEGER | Sum of removed lines from numstat |
510+
| last_commit_at | TEXT | ISO timestamp of most recent commit touching the file |
511+
| churn_trend | TEXT | `"accelerating"` \| `"stable"` \| `"cooling"` — nullable in v1 |
512+
| computed_at | TEXT | ISO timestamp when ingest last ran |
513+
514+
Powers **`churn-complexity-hotspots`** recipe (`hotspot_score`, `hotspot_score_normalized`; file or symbol grain via `by_symbol`). Non-git / fixtures: **`codemap ingest-churn`**, MCP/HTTP **`ingest_churn`**, or config **`churn.file`**. Distinct from outcome alias **`hotspots`**`fan-in`.
515+
499516
### `file_metrics` — Per-file aggregate metrics (`STRICT`)
500517

501518
One row per indexed TS/JS file. Line classification is regex-light (blank if `/^\s*$/`; comment if line starts with `//`, `/*`, `*`, `*/`).

docs/benchmark.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -207,9 +207,9 @@ Independent of the consumer-facing scenarios above, the repo carries a **per-pha
207207

208208
### Mechanism
209209

210-
1. `bun src/index.ts --full --performance` populates [`IndexPerformanceReport`](../src/application/types.ts) with `collect_ms` / `parse_ms` / `insert_ms` / `index_create_ms` / `bindings_ms` / `module_cycles_ms` / `re_export_chains_ms` / `heritage_ms` / `total_ms`.
210+
1. `bun src/index.ts --full --performance` populates [`IndexPerformanceReport`](../src/application/types.ts) with `collect_ms` / `parse_ms` / `insert_ms` / `index_create_ms` / `bindings_ms` / `module_cycles_ms` / `re_export_chains_ms` / `heritage_ms` / `total_ms`, plus post-index **`churn_ms`** (git churn ingest; patched after `indexFiles` completes).
211211
2. Setting `CODEMAP_PERFORMANCE_JSON=<path>` dumps that report as JSON to `<path>` after the run (no CLI flag added; env-var only).
212-
3. [`scripts/check-perf-baseline.ts`](../scripts/check-perf-baseline.ts) (alias `bun run check:perf-baseline`) runs the indexer 3× on this repo, takes per-phase **medians**, and compares **`collect_ms`**, **`parse_ms`**, **`insert_ms`**, **`index_create_ms`**, **`bindings_ms`**, and **`total_ms`** to `fixtures/benchmark/perf-baseline.json`. Other `IndexPerformanceReport` fields (`module_cycles_ms`, `re_export_chains_ms`, `heritage_ms`, …) appear in `--performance` JSON only — not baseline-gated.
212+
3. [`scripts/check-perf-baseline.ts`](../scripts/check-perf-baseline.ts) (alias `bun run check:perf-baseline`) runs the indexer `CODEMAP_PERF_RUNS`× on this repo (`--full --performance`), then the same count idle incremental (`codemap --performance`, no `--full`), takes per-phase **medians**, and compares **`collect_ms`**, **`parse_ms`**, **`insert_ms`**, **`index_create_ms`**, **`bindings_ms`**, **`churn_ms`**, **`churn_idle_ms`** (idle incremental `churn_ms` when HEAD unchanged), and **`total_ms`** to `fixtures/benchmark/perf-baseline.json`. Phases under their noise floor skip gating (`noise_floor_ms` default 10ms; `churn_idle_ms` uses a 5ms floor in the checker). Idle runs also fail when `churn_ms` exceeds `CODEMAP_PERF_IDLE_CHURN_MAX_MS` (default 50ms) — catches accidental full git churn on the idle path. Other `IndexPerformanceReport` fields (`module_cycles_ms`, `re_export_chains_ms`, `heritage_ms`, …) appear in `--performance` JSON only — not baseline-gated.
213213
4. **Local / scheduled only** — run before perf-sensitive PRs; [`.github/workflows/perf-baseline.yml`](../.github/workflows/perf-baseline.yml) fires weekly + `workflow_dispatch` for drift visibility. **Not** on the PR CI path (6 min × 3 runs + bimodal GHA runners → flaky merge gate).
214214

215215
### Why this is separate from `src/benchmark.ts`

0 commit comments

Comments
 (0)