Skip to content

Commit a5caca8

Browse files
feat(security): harden query, serve, and validate read surfaces (#180)
* feat(security): harden query, serve, and validate read surfaces Close the formatted-query query_only gap, require --token on non-loopback serve binds, and reject validate paths that escape the project root or symlink outside it. * harden: read-surface PR parity, bind policy, and ship docs Add changeset and serve-bind-policy with runHttpServer enforcement; align validate/serve consumer surfaces (CLI help, MCP, glossary, README, agent-content); extend symlink containment tests. * harden: validate containment depth, loopback range, and test coverage Add realpath-safe validate reads, broken-symlink rejection, 127.0.0.0/8 loopback token policy, expanded DML/format tests, and consumer-surface parity. * feat(agents): harden-pr probes, ledger, and recall benchmark harness Adds missing-test eval fixture, LEDGER.md, vet/reconcile/quick modes, score-probe recall scorer, and test:harden-probes smoke. Live probe run: recall 1.0 on golden finding. * chore(agents): drop harden-pr eval probes after validation Keep harden-pr skill, LEDGER.md, and tracer-bullets wiring; remove fixtures/harden-probes, score-probe harness, and related docs. * fix(serve): normalize bracketed IPv6 bind host for Node listen Strip [::1] to ::1 in parseServeRest and runHttpServer; tighten test hygiene and formatted-query test header per review. * harden: read-surface PR parity, tests, and wave1 plan retirement Align validate reason docs across consumer surfaces, extend formatted-query DML coverage to diff formats, retire security-hardening-wave1 plan with orchestrator/roadmap updates.
1 parent 8595173 commit a5caca8

28 files changed

Lines changed: 1164 additions & 70 deletions

.agents/skills/harden-pr/LEDGER.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Harden-pr ledger
2+
3+
Single durable backlog for [`harden-pr`](./SKILL.md). Parent reads **§ Rejections** at vet step; **§ Deferred** on cap and on `/harden-pr reconcile`.
4+
5+
## Rejections
6+
7+
By-design or false-positive findings — do not re-raise.
8+
9+
```markdown
10+
- **[category]** `file:line` — label: reason
11+
```
12+
13+
<!-- Example:
14+
- **[security]** `src/cli/proxy.ts:42` — https_proxy env: by-design — standard CLI proxy convention.
15+
-->
16+
17+
## Deferred
18+
19+
Capped or out-of-scope-for-now — reconcile re-vets; remove lines when fixed.
20+
21+
```markdown
22+
- **[severity]** `file:line` — finding (deferred: out of scope | cap | blocked)
23+
```

.agents/skills/harden-pr/SKILL.md

Lines changed: 69 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -4,8 +4,8 @@ description: >-
44
Bring a branch to pristine, maximum production readiness without changing PR intent —
55
spawn parallel Task subagents (never inline review), fix in-bounds findings, loop autonomously until
66
clean or pass cap, then report once. Use after a tracer-bullet commit (lite), before PR
7-
is done (full), or on "harden", "harden-pr", "pristine", "review until clean",
8-
"production-ready pass". Invoking this skill authorizes one harden commit at cycle end.
7+
is done (full), on "harden", "harden-pr", "pristine", "review until clean",
8+
"production-ready pass", or "harden-pr reconcile". Invoking this skill authorizes one harden commit at cycle end.
99
NEVER stop mid-loop to ask about commits, babysit, or the next pass. NEVER redesign the
1010
feature or change observable runtime behavior.
1111
---
@@ -16,9 +16,9 @@ description: >-
1616

1717
Local loop: parallel reviewer subagents → merge findings → fix in-bounds → re-verify → repeat until clean or cap → **one final report**.
1818

19-
**Invoking this skill (`/harden-pr`, `harden-pr lite`, `harden-pr full`) is a run-to-completion command.** The agent executes the full loop before ending the turn.
19+
**Invoking this skill (`/harden-pr`, `harden-pr lite`, `harden-pr full`, `harden-pr quick`, `harden-pr reconcile`) is a run-to-completion command.** The agent executes the full loop before ending the turn.
2020

21-
Sister skills: [`audit-pr-architecture`](../audit-pr-architecture/SKILL.md) (extended structural reviewer). Mention **`babysit`** only in the final report (full mode) — never mid-loop.
21+
Sister skills: [`audit-pr-architecture`](../audit-pr-architecture/SKILL.md) (extended structural reviewer). **Ledger:** [LEDGER.md](./LEDGER.md) (rejections + deferred — one file). Mention **`babysit`** only in the final report (full mode) — never mid-loop.
2222

2323
## Run-to-completion (read first)
2424

@@ -42,12 +42,14 @@ Otherwise: resolve anchor → run all passes → fix → verify → next pass
4242

4343
## Modes
4444

45-
| Mode | When | Scope | Max passes |
46-
| -------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------- | ---------- |
47-
| **Lite** | After each tracer-bullet slice commit ([`tracer-bullets`](../../rules/tracer-bullets.md) cadence) | Files in the slice diff | 2 |
48-
| **Full** | User intent ("full harden", "PR done", "production-ready pass") **or** offer when an in-flight `docs/plans/<topic>.md` checklist is complete | `origin/main...HEAD` | 3 |
45+
| Mode | When | Scope | Max passes |
46+
| ------------- | -------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------- | ---------- |
47+
| **Lite** | After each tracer-bullet slice commit ([`tracer-bullets`](../../rules/tracer-bullets.md) cadence) | Files in the slice diff | 2 |
48+
| **Quick** | Cheap uncertainty pass ("quick harden") | Last commit or slice diff | 1 |
49+
| **Full** | User intent ("full harden", "PR done", "production-ready pass") **or** offer when an in-flight `docs/plans/<topic>.md` checklist is complete | `origin/main...HEAD` | 3 |
50+
| **Reconcile** | `/harden-pr reconcile` — process [LEDGER.md § Deferred](./LEDGER.md#deferred), then run **full** if branch still open | `origin/main...HEAD` | 3 |
4951

50-
Default to **lite** when invoked immediately after a slice commit. Default to **full** when the user signals branch completion.
52+
Default to **lite** when invoked immediately after a slice commit. Default to **full** when the user signals branch completion. **Quick** = core 3 reviewers only (no extended roster).
5153

5254
## Production bar (what "pristine" means)
5355

@@ -76,6 +78,27 @@ Resolve in order; stop at the first hit:
7678

7779
Reviewers treat the anchor as contract. Findings that would violate it → **report, do not apply**.
7880

81+
Record `HEAD` at loop start (`git rev-parse HEAD`) in the final report. If `HEAD` changes mid-loop from unrelated work, re-resolve the anchor before the next pass.
82+
83+
## Vet step (parent, after merge — before fix)
84+
85+
Subagents over-report. After merge + dedupe:
86+
87+
1. Read [LEDGER.md § Rejections](./LEDGER.md#rejections) — drop findings matching a rejection entry.
88+
2. For each remaining finding: **re-read** `file` at `line` (or the cited region). Drop if the claim is false or by-design.
89+
3. New by-design drops → append one bullet to **§ Rejections** in [LEDGER.md](./LEDGER.md).
90+
4. Sort survivors by leverage: `severity` first, then `confidence` desc, then `effort` asc (`S` before `L`).
91+
92+
**Anti-pattern:** applying a fix without re-reading the cited location.
93+
94+
## Reconcile mode
95+
96+
Run-to-completion like other modes:
97+
98+
1. Read [LEDGER.md § Deferred](./LEDGER.md#deferred). Re-vet each row (same vet step). Fix in-bounds items; remove fixed lines.
99+
2. Run **full** harden on `origin/main...HEAD` (same loop as full mode).
100+
3. On cap: append still-deferred items to **§ Deferred** in [LEDGER.md](./LEDGER.md). Report what was reconciled vs still open.
101+
79102
## In-bounds vs out-of-bounds
80103

81104
**Fix:** bugs, missing tests, docs/changeset drift, lint/type/format, error-handling gaps, edge cases, **behavior-preserving refactors in touched files**, in-scope nits (naming, comment hygiene, cheap lint fixes).
@@ -106,10 +129,16 @@ Each reviewer returns **only** a JSON array (no prose wrapper). Parent parses ar
106129
"finding": "One-sentence claim about a gap vs production bar",
107130
"severity": "blocker | major | minor | nit | info",
108131
"file": "repo-relative/path or \"multiple\"",
109-
"fixable_in_bounds": true
132+
"line": 42,
133+
"confidence": "high | medium | low",
134+
"effort": "S | M | L",
135+
"fixable_in_bounds": true,
136+
"production_bar": "Tests | Docs | Structure | …"
110137
}
111138
```
112139

140+
Use `line: null` when the gap is file-level (e.g. missing test file).
141+
113142
**Severity → action**
114143

115144
| Severity | Parent action |
@@ -124,8 +153,9 @@ Each reviewer returns **only** a JSON array (no prose wrapper). Parent parses ar
124153
1. Concatenate all reviewer arrays.
125154
2. Drop `info` unless it blocks ship shape.
126155
3. Dedupe: same `file` + same root cause → keep highest severity, merge `finding` text.
127-
4. Sort actionable: `blocker``major``minor``nit`.
128-
5. If merged list is empty → pass succeeds; skip fix phase.
156+
4. Sort actionable: `blocker``major``minor``nit`; within tier → `confidence` desc → `effort` asc.
157+
5. **Vet** (§ Vet step).
158+
6. If vetted list is empty → pass succeeds; skip fix phase.
129159

130160
**Example merged queue (pass 1)**
131161

@@ -135,19 +165,31 @@ Each reviewer returns **only** a JSON array (no prose wrapper). Parent parses ar
135165
"finding": "CLI --help documents summary counts but not per-row attribution on --base JSON rows.",
136166
"severity": "major",
137167
"file": "src/cli/cmd-audit.ts",
138-
"fixable_in_bounds": true
168+
"line": 120,
169+
"confidence": "high",
170+
"effort": "S",
171+
"fixable_in_bounds": true,
172+
"production_bar": "Docs"
139173
},
140174
{
141175
"finding": "Skill shard leaks requiredColumns when describing attribution.",
142176
"severity": "major",
143177
"file": "templates/agent-content/skill/10-recipes-context.md",
144-
"fixable_in_bounds": true
178+
"line": null,
179+
"confidence": "high",
180+
"effort": "M",
181+
"fixable_in_bounds": true,
182+
"production_bar": "Surfaces"
145183
},
146184
{
147185
"finding": "No e2e test for attribution: inherited on deprecated delta.",
148186
"severity": "nit",
149187
"file": "src/application/audit-worktree.test.ts",
150-
"fixable_in_bounds": true
188+
"line": null,
189+
"confidence": "medium",
190+
"effort": "S",
191+
"fixable_in_bounds": true,
192+
"production_bar": "Tests"
151193
}
152194
]
153195
```
@@ -170,7 +212,7 @@ You are the **{ROLE}** reviewer for `/harden-pr` on `{REPO}`.
170212
**Task:** {EXTRA}
171213
172214
**Return ONLY** a JSON array of findings:
173-
[{ "finding": "...", "severity": "blocker|major|minor|nit|info", "file": "...", "fixable_in_bounds": true|false }]
215+
[{ "finding": "...", "severity": "blocker|major|minor|nit|info", "file": "...", "line": N|null, "confidence": "high|medium|low", "effort": "S|M|L", "fixable_in_bounds": true|false, "production_bar": "..." }]
174216
If clean: []
175217
176218
Readonly — do not edit files.
@@ -216,23 +258,25 @@ Re-derive layer globs from `docs/architecture.md` § Layering — don't hardcode
216258
Execute **without pausing for user input** until exit condition:
217259

218260
```
219-
resolve intent anchor
261+
resolve intent anchor; stamp HEAD
220262
pass = 1
221263
loop:
222264
Task-batch all applicable reviewers (parallel, readonly)
223265
parent: merge + dedupe JSON findings (§ Finding schema)
266+
parent: vet findings (§ Vet step)
224267
if none actionable → goto done
225268
fix in-bounds (pass 1: all; passes 2+: blockers first, then in-scope nits)
226-
run project checks on touched files
269+
per fix: run verification gate from verify-after-each-step on touched files
227270
if clean and no new findings → goto done
228271
if pass >= max_passes → goto capped
229272
pass += 1
230273
goto loop
231274
capped:
275+
append deferred rows to LEDGER.md § Deferred
232276
emit deferred-nits list (each nit must cite plan Out of scope or cross-PR blocker — not "optional")
233277
done:
234278
if uncommitted fixes → git commit -m "harden: …"
235-
emit final report (include babysit one-liner if full mode)
279+
emit final report (include babysit one-liner if full mode; include anchor HEAD stamp)
236280
```
237281

238282
**Pass cap behavior:** after cap, stop auto-fixing; list deferred nits. Do not block the next tracer slice.
@@ -243,9 +287,11 @@ Skill invocation **is** the commit authorization. After the loop: if fixes exist
243287

244288
## Quick invoke
245289

246-
| Intent | Say |
247-
| ----------- | ------------------------------------------------------ |
248-
| Post-slice | `/harden-pr lite` or `/harden-pr` after a slice commit |
249-
| Branch done | `/harden-pr full` or "production-ready pass" |
290+
| Intent | Say |
291+
| ---------------- | ------------------------------------------------------ |
292+
| Post-slice | `/harden-pr lite` or `/harden-pr` after a slice commit |
293+
| Cheap pass | `/harden-pr quick` |
294+
| Branch done | `/harden-pr full` or "production-ready pass" |
295+
| Deferred backlog | `/harden-pr reconcile` |
250296

251297
Replaces the old copy-paste: _"spawn subagents → fix → loop until clean"_ — this skill **is** that loop.
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@stainless-code/codemap": patch
3+
---
4+
5+
Harden read surfaces: `codemap query --format …` blocks index mutations via the same read-only guard as `--json`; `codemap serve` requires `--token` when `--host` is not loopback (any `127.0.0.0/8` address counts as loopback, so `--token` stays optional on `127.0.0.2` and similar); `codemap validate` (and MCP/HTTP `validate`) can return `rejected` rows with optional `reason` (`path escapes project root` | `path escapes via symlink` | `path resolves outside project root`) — output `path` keys are always project-relative POSIX paths.

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ codemap dead-code --json # outcome alias →
5050
codemap query --json --recipe fan-out # recipe SQL by id (alias: -r)
5151
codemap query --json "SELECT name, file_path FROM symbols WHERE name = 'foo'" # ad-hoc SQL
5252
codemap --files src/a.ts src/b.tsx # targeted re-index after edits
53-
codemap validate --json # detect stale / missing / unindexed files
53+
codemap validate --json # detect stale / missing / unindexed / rejected files
5454
codemap context --compact --for "refactor auth" # JSON envelope + intent-matched recipes
5555
codemap ingest-coverage coverage/coverage-final.json --json # Istanbul / LCOV (auto-detected) → coverage table; joins with symbols
5656
NODE_V8_COVERAGE=.cov bun test && codemap ingest-coverage .cov --runtime --json # V8 protocol (per-process dumps); local-only
@@ -162,9 +162,9 @@ codemap query --format diff-json 'SELECT "README.md" AS file_path, 1 AS line_sta
162162
codemap --with-fts --full
163163
codemap query --recipe text-in-deprecated-functions # demonstrates FTS5 ⨯ symbols ⨯ coverage JOIN
164164
# HTTP API — same tool taxonomy as `codemap mcp`, exposed over POST /tool/{name} for
165-
# non-MCP consumers (CI scripts, curl, IDE plugins). Loopback default; optional --token.
165+
# non-MCP consumers (CI scripts, curl, IDE plugins). Loopback default; --token required on non-loopback.
166166
TOKEN=$(openssl rand -hex 32)
167-
codemap serve --port 7878 --token "$TOKEN" &
167+
codemap serve --port 7878 --token "$TOKEN" & # --token required when --host is not loopback
168168
curl -s -X POST http://127.0.0.1:7878/tool/query \
169169
-H 'Content-Type: application/json' \
170170
-H "Authorization: Bearer $TOKEN" \

0 commit comments

Comments
 (0)