Skip to content

Commit e74410c

Browse files
feat(mcp): ToolAnnotations hints on tools/list and GET /tools (#170)
* feat(mcp): add ToolAnnotations hints on tools/list and GET /tools MCP clients can distinguish read-only query tools from destructive apply tools via readOnlyHint/destructiveHint; HTTP catalog exposes the same fields. * fix(mcp): address PR review — SDK guard, tests, docs lift - M.6: probe ToolAnnotationsSchema before attaching MCP hints - Full-catalog annotation tests on tools/list and GET /tools - Lift to architecture.md; delete plan; remove roadmap item - Document index-mutator readOnlyHint: false in consumer surfaces * chore: add changeset for MCP ToolAnnotations release notes * test(mcp): cover M.6 guard false path for ToolAnnotations
1 parent 2bc86e1 commit e74410c

13 files changed

Lines changed: 349 additions & 181 deletions

.changeset/mcp-tool-annotations.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@stainless-code/codemap": patch
3+
---
4+
5+
MCP `tools/list` and HTTP `GET /tools` expose advisory `readOnlyHint`, `destructiveHint`, and `idempotentHint` per tool so clients can gate auto-approval. Apply tools carry `destructiveHint`; read-only query tools carry `readOnlyHint`.

docs/agents.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -133,6 +133,8 @@ See [architecture.md § Session lifecycle wiring](./architecture.md#session-life
133133

134134
**`context.index_freshness`** — session bootstrap includes index-level freshness metadata: `commit_drift` (HEAD ≠ `last_indexed_commit`), `pending_sync` (watcher debounce queue or in-flight reindex), optional disk-drift counts when watch is off, and a single `warning` string when agents should pause or re-index. **`context.start_here`** (non-compact) adds inline index summary, intent-ranked `query_recipe` cards, and top hub files with export signatures (adaptive caps by file count; optional MCP/HTTP `include_snippets` for one-line previews). Debug intent biases `sample_markers` toward FIXME/TODO. **MCP:** array-shaped JSON tools (`query`, …) keep row payloads verbatim and append a second `content` block prefixed `@codemap/index_freshness`; object-shaped tools merge `index_freshness` inline. **HTTP:** `POST /tool/*` adds `X-Codemap-Pending-Sync`, `X-Codemap-Commit-Drift`, and `X-Codemap-Warning` headers without changing JSON bodies; **`GET /health`** includes full cheap `index_freshness` when the DB is readable. Complements per-file `validate` / snippet `stale`. See [architecture.md § Context wiring](./architecture.md#context-wiring).
135135

136+
**MCP ToolAnnotations**`tools/list` (and HTTP `GET /tools`) expose advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool so clients can gate auto-approval. Read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; disk-write apply tools → `destructiveHint: true` (writes still require `yes: true`); index mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`) → `readOnlyHint: false` without `destructiveHint`.
137+
136138
**`CODEMAP_MCP_TOOLS`** — comma-separated snake_case MCP tool names. When set, only listed tools register (stderr lists the active set). Unknown names are ignored with a warning. Unset = all tools (default). **`query_batch`** registers only when listed or when unset (eval ablation).
137139

138140
Example: `CODEMAP_MCP_TOOLS=query,context,show codemap mcp --no-watch`

docs/architecture.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -194,9 +194,9 @@ Three **mutually exclusive** CLI entry shapes; all converge on `applyDiffPayload
194194

195195
**Tool / resource handlers (transport-agnostic):** **`src/application/tool-handlers.ts`** + **`src/application/resource-handlers.ts`** — pure functions that take the args object an MCP tool / resource URI accepts and return a discriminated **`ToolResult`** (`{ok: true, format: 'json'|'sarif'|'annotations'|'mermaid'|'diff'|'diff-json', payload}` / `{ok: false, error}`) or a **`ResourcePayload`** (`{mimeType, text}`). MCP and HTTP both wrap the same handlers — MCP translates to `{content: [{type: "text", text}]}`, HTTP translates to `(status, body)` with the right `Content-Type`. Engine layer untouched; transport changes don't ripple into the SQL.
196196

197-
**MCP wiring:** **`src/cli/cmd-mcp.ts`** (argv — `--watch` / `--no-watch` / `--debounce` + `--help`; bootstrap absorbs `--root`/`--config`) + **`src/application/mcp-server.ts`** (transport — tool / resource registry, SDK glue). Mirrors the `cmd-audit.ts ↔ audit-engine.ts` seam — CLI parses + lifecycle; engine owns the SDK. **`runMcpServer`** bootstraps codemap once at server boot (config + resolver + DB access become module-level state), instantiates `McpServer` from **`@modelcontextprotocol/sdk`**, attaches a **`StdioServerTransport`**, and resolves on client disconnect via **`src/application/session-lifecycle.ts`** (`createStdioDisconnectMonitor` — stdin EOF, stdout EPIPE, parent-PID poll — plus SDK `transport.onclose` and SIGINT/SIGTERM). With `--watch`, **`createManagedWatchSession`** holds one client for the stdio session and **`forceStop`** drains the watcher on exit. Tool handlers reuse the existing engine entry-points: **`query`** / **`query_recipe`** call **`executeQuery`** in **`src/application/query-engine.ts`** (same `[...rows]` / `{count}` / `{group_by, groups}` envelope `--json` would print) unless **`baseline`** is set — then **`compareQueryBaseline`** in **`src/application/query-baseline.ts`** (incompatible with non-`json` **`format`** / **`group_by`**); **`ingest_coverage`** calls **`runIngestCoverageOnDb`** in **`src/application/ingest-coverage-run.ts`** (CLI twin: `codemap ingest-coverage --json`); **`query_batch`** loops per statement via **`handleQueryBatch`** → **`executeQuery`** (batch-wide defaults + per-item overrides; items are `string | {sql, summary?, changed_since?, group_by?}`); **`audit`** runs `resolveAuditBaselines` + `runAudit` from PR #33 unchanged; **`context`** / **`validate`** call `buildContextEnvelope` / `computeValidateRows` from **`src/application/context-engine.ts`** + **`src/application/validate-engine.ts`** (lifted out of `src/cli/cmd-*.ts` in PR #41 — see § Tool / resource handlers above). **`save_baseline`** is one polymorphic tool (`{name, sql? | recipe?}`) with a runtime exclusivity check — mirrors the CLI's single `--save-baseline=<name>` verb. **Tool naming**: snake_case throughout — Codemap convention matching the patterns in MCP spec examples and reference servers (GitHub MCP, Cursor built-ins); the spec itself doesn't mandate it. CLI stays kebab — translation lives at the MCP-arg layer. **Resources** split by freshness contract: `codemap://schema`, `codemap://skill`, `codemap://rule`, and `codemap://mcp-instructions` use **lazy memoisation** — first `read_resource` populates a per-server-instance cache; constant for the server-process lifetime so eager-vs-lazy produce identical observable behavior. `codemap://recipes`, `codemap://recipes/{id}`, `codemap://files/{+path}`, and `codemap://symbols/{name}` are **live read-per-call** (no cache) so inline recency fields and index mutations under `--watch` don't freeze at first-read. `codemap://schema` queries `sqlite_schema` live (on first read, then cached); `codemap://skill` / `codemap://rule` / `codemap://mcp-instructions` call `assembleAgentContent(kind)` from `application/agent-content.ts`, which concatenates section files under `templates/agent-content/<kind>/` and dispatches `*.gen.md` files through `RENDERERS` (live recipe catalog, live `createTables()` DDL) — see [agents.md § Section assembler](./agents.md#section-assembler-and-genmd). Output shape: each tool returns the JSON payload its CLI counterpart would print (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `context --include-snippets`, `ingest-coverage`); MCP wraps via `content: [{type: "text", text: JSON.stringify(payload)}]`. `--changed-since` git lookups are memoised per `(root, ref)` pair across batch items so a `query_batch` of N items sharing the same ref does one git invocation, not N. Per-statement errors in `query_batch` are isolated — failed statements return `{error}` in their slot while siblings still execute.
197+
**MCP wiring:** **`src/cli/cmd-mcp.ts`** (argv — `--watch` / `--no-watch` / `--debounce` + `--help`; bootstrap absorbs `--root`/`--config`) + **`src/application/mcp-server.ts`** (transport — tool / resource registry, SDK glue). Mirrors the `cmd-audit.ts ↔ audit-engine.ts` seam — CLI parses + lifecycle; engine owns the SDK. **`runMcpServer`** bootstraps codemap once at server boot (config + resolver + DB access become module-level state), instantiates `McpServer` from **`@modelcontextprotocol/sdk`**, attaches a **`StdioServerTransport`**, and resolves on client disconnect via **`src/application/session-lifecycle.ts`** (`createStdioDisconnectMonitor` — stdin EOF, stdout EPIPE, parent-PID poll — plus SDK `transport.onclose` and SIGINT/SIGTERM). With `--watch`, **`createManagedWatchSession`** holds one client for the stdio session and **`forceStop`** drains the watcher on exit. Tool handlers reuse the existing engine entry-points: **`query`** / **`query_recipe`** call **`executeQuery`** in **`src/application/query-engine.ts`** (same `[...rows]` / `{count}` / `{group_by, groups}` envelope `--json` would print) unless **`baseline`** is set — then **`compareQueryBaseline`** in **`src/application/query-baseline.ts`** (incompatible with non-`json` **`format`** / **`group_by`**); **`ingest_coverage`** calls **`runIngestCoverageOnDb`** in **`src/application/ingest-coverage-run.ts`** (CLI twin: `codemap ingest-coverage --json`); **`query_batch`** loops per statement via **`handleQueryBatch`** → **`executeQuery`** (batch-wide defaults + per-item overrides; items are `string | {sql, summary?, changed_since?, group_by?}`); **`audit`** runs `resolveAuditBaselines` + `runAudit` from PR #33 unchanged; **`context`** / **`validate`** call `buildContextEnvelope` / `computeValidateRows` from **`src/application/context-engine.ts`** + **`src/application/validate-engine.ts`** (lifted out of `src/cli/cmd-*.ts` in PR #41 — see § Tool / resource handlers above). **`save_baseline`** is one polymorphic tool (`{name, sql? | recipe?}`) with a runtime exclusivity check — mirrors the CLI's single `--save-baseline=<name>` verb. **Tool naming**: snake_case throughout — Codemap convention matching the patterns in MCP spec examples and reference servers (GitHub MCP, Cursor built-ins); the spec itself doesn't mandate it. CLI stays kebab — translation lives at the MCP-arg layer. **Resources** split by freshness contract: `codemap://schema`, `codemap://skill`, `codemap://rule`, and `codemap://mcp-instructions` use **lazy memoisation** — first `read_resource` populates a per-server-instance cache; constant for the server-process lifetime so eager-vs-lazy produce identical observable behavior. `codemap://recipes`, `codemap://recipes/{id}`, `codemap://files/{+path}`, and `codemap://symbols/{name}` are **live read-per-call** (no cache) so inline recency fields and index mutations under `--watch` don't freeze at first-read. `codemap://schema` queries `sqlite_schema` live (on first read, then cached); `codemap://skill` / `codemap://rule` / `codemap://mcp-instructions` call `assembleAgentContent(kind)` from `application/agent-content.ts`, which concatenates section files under `templates/agent-content/<kind>/` and dispatches `*.gen.md` files through `RENDERERS` (live recipe catalog, live `createTables()` DDL) — see [agents.md § Section assembler](./agents.md#section-assembler-and-genmd). Output shape: each tool returns the JSON payload its CLI counterpart would print (`query batch`, `trace`, `explore`, `node`, `file`, `schema`, `context --include-snippets`, `ingest-coverage`); MCP wraps via `content: [{type: "text", text: JSON.stringify(payload)}]`. **`tools/list` ToolAnnotations** — advisory `readOnlyHint` / `destructiveHint` / `idempotentHint` per tool from **`src/application/mcp-tool-annotations.ts`** (central map beside **`mcp-tool-allowlist.ts`**); read paths (`query`, `show`, `audit`, …) → `readOnlyHint: true`; disk-write apply tools → `destructiveHint: true` (writes still require `yes: true`); index user-data mutators (`save_baseline`, `drop_baseline`, `ingest_coverage`) → `readOnlyHint: false` without `destructiveHint`. Omitted when an older `@modelcontextprotocol/sdk` lacks annotation fields (M.6 guard). `--changed-since` git lookups are memoised per `(root, ref)` pair across batch items so a `query_batch` of N items sharing the same ref does one git invocation, not N. Per-statement errors in `query_batch` are isolated — failed statements return `{error}` in their slot while siblings still execute.
198198

199-
**HTTP wiring:** **`src/cli/cmd-serve.ts`** (argv — `--host` / `--port` / `--token`; bootstrap absorbs `--root`/`--config`) + **`src/application/http-server.ts`** (transport — bare `node:http`; routes `POST /tool/{name}` to `tool-handlers`, `GET /resources/{encoded-uri}` to `resource-handlers`, plus `GET /health` / `GET /tools` / `GET /resources`). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`). Optional **`--token <secret>`** requires `Authorization: Bearer <secret>` on every request; `GET /health` is auth-exempt so liveness probes work without leaking the token. **CSRF + DNS-rebinding guard** (`csrfCheck`) runs before every route — rejects `Sec-Fetch-Site: cross-site` / `same-site` (modern-browser CSRF), any present `Origin` header (including the opaque string `null`; older-browser CSRF fallback), and `Host` header mismatch on loopback bind (DNS rebinding). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) don't send those headers and pass through. The guard runs even on `/health` so a malicious local webpage can't probe for liveness. Output shape: HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper — HTTP doesn't need that transport artifact); `query` / `query_recipe` match `codemap query --json` row arrays (or `{count}` / `{group_by,groups}` when `summary` / `group_by` is set, or baseline diff when `baseline` is set — incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); other tools match their CLI `--json` envelopes; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` as `text/plain; charset=utf-8`, `format: "diff-json"` as `application/json; charset=utf-8`, JSON otherwise. Per-request DB lifecycle: open / `PRAGMA query_only = 1` / close per call (SQLite reader concurrency); 1 MiB request-body cap rejects trivial DoS. SIGINT / SIGTERM → graceful drain via `server.close()`. Every response carries **`X-Codemap-Version: <semver>`** so consumers can pin / detect upgrades.
199+
**HTTP wiring:** **`src/cli/cmd-serve.ts`** (argv — `--host` / `--port` / `--token`; bootstrap absorbs `--root`/`--config`) + **`src/application/http-server.ts`** (transport — bare `node:http`; routes `POST /tool/{name}` to `tool-handlers`, `GET /resources/{encoded-uri}` to `resource-handlers`, plus `GET /health` / `GET /tools` / `GET /resources`). Default bind **`127.0.0.1:7878`** (loopback only — refuse `0.0.0.0` unless explicitly opted in via `--host 0.0.0.0`). Optional **`--token <secret>`** requires `Authorization: Bearer <secret>` on every request; `GET /health` is auth-exempt so liveness probes work without leaking the token. **CSRF + DNS-rebinding guard** (`csrfCheck`) runs before every route — rejects `Sec-Fetch-Site: cross-site` / `same-site` (modern-browser CSRF), any present `Origin` header (including the opaque string `null`; older-browser CSRF fallback), and `Host` header mismatch on loopback bind (DNS rebinding). Non-browser clients (curl, fetch from Node, MCP hosts, CI scripts) don't send those headers and pass through. The guard runs even on `/health` so a malicious local webpage can't probe for liveness. Output shape: HTTP returns each tool's native JSON payload directly (NOT MCP's `{content: [...]}` wrapper — HTTP doesn't need that transport artifact); `query` / `query_recipe` match `codemap query --json` row arrays (or `{count}` / `{group_by,groups}` when `summary` / `group_by` is set, or baseline diff when `baseline` is set — incompatible with non-`json` `format` / `group_by`; save/list/drop remain separate tools); other tools match their CLI `--json` envelopes; `format: "sarif"` payloads ship as `application/sarif+json`, `format: "annotations"` / `"mermaid"` / `"diff"` as `text/plain; charset=utf-8`, `format: "diff-json"` as `application/json; charset=utf-8`, JSON otherwise. Per-request DB lifecycle: open / `PRAGMA query_only = 1` / close per call (SQLite reader concurrency); 1 MiB request-body cap rejects trivial DoS. **`GET /tools`** returns the same advisory hint fields as MCP `tools/list` (`readOnlyHint` / `destructiveHint` / `idempotentHint` per entry via **`buildHttpToolCatalogEntry`**). SIGINT / SIGTERM → graceful drain via `server.close()`. Every response carries **`X-Codemap-Version: <semver>`** so consumers can pin / detect upgrades.
200200

201201
**Watch wiring:** **`src/cli/cmd-watch.ts`** (argv — `--debounce <ms>` / `--quiet`; bootstrap absorbs `--root`/`--config`) + **`src/application/watcher.ts`** (engine — pure debouncer + glob filter + injectable backend; production wires [chokidar v5](https://github.com/paulmillr/chokidar) selected via the 6-watcher audit in PR #46 — pure JS, runs identically on Bun + Node, ~30M repos use it). On every change/add/unlink event chokidar emits, the engine filters via `shouldIndexPath` (same indexed extensions as the indexer + project-local recipes; skips `node_modules` / `.git` / `dist`), debounces with a sliding window (default 250 ms), then calls `createReindexOnChange` which opens a DB, runs `runCodemapIndex({mode: 'files', files: [...changed]})`, closes the DB, and logs `reindex N file(s) in Mms` to stderr unless `--quiet`. SIGINT / SIGTERM drains pending edits via `flushNow()` before the watcher closes. **Default-ON for `mcp` / `serve` since 2026-05:** both transports embed the watcher via **`createManagedWatchSession`** in **`session-lifecycle.ts`** — MCP holds one client for the stdio session; HTTP acquires per request (excluding `/health`) and stops the watcher after the last client plus a 5s release grace (not an MCP idle shutdown). Opt out with `--no-watch`, `CODEMAP_WATCH=0`, or `CODEMAP_NO_WATCH=1`. **`src/application/watch-policy.ts`** disables the watcher on WSL2 Windows drive mounts (`/mnt/*`) unless `CODEMAP_FORCE_WATCH=1`; stderr points at `codemap agents init --git-hooks` for git-triggered freshness. Standalone `codemap watch` runs the watcher decoupled from a transport for users wiring it next to a separate MCP / HTTP process. **Audit prelude optimization:** module-level `watchActive` flag; `handleAudit` skips its incremental-index prelude when active (and marks the close as readonly to avoid a wasted checkpoint). Explicit `no_index: false` still forces the prelude.
202202

docs/plans/apply-write-safety.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -89,7 +89,7 @@ bun src/index.ts apply <recipe> --dry-run
8989
- [ ] Edit file on disk after dry-run passes but before `--yes` apply → `file content changed`, zero files modified
9090
- [ ] Mixed-EOL fixture file → `mixed line endings`, no write
9191
- [ ] Happy path unchanged: valid apply still returns `applied: true`
92-
- [ ] `destructiveHint` apply tools document recheck behavior in tool description (synergy with [mcp-tool-annotations](./mcp-tool-annotations.md))
92+
- [ ] `destructiveHint` apply tools document recheck behavior in tool description (synergy with [architecture.md § MCP wiring](../architecture.md) ToolAnnotations)
9393

9494
---
9595

0 commit comments

Comments
 (0)