Skip to content

Commit 6e18b97

Browse files
authored
fix(ce-code-review): right-size code review and route it through one portable path (#1014)
1 parent ac66926 commit 6e18b97

7 files changed

Lines changed: 156 additions & 71 deletions

File tree

skills/ce-code-review/SKILL.md

Lines changed: 93 additions & 9 deletions
Large diffs are not rendered by default.

skills/ce-code-review/references/findings-schema.json

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@
7676
},
7777
"evidence": {
7878
"type": "array",
79-
"description": "Code-grounded evidence: snippets, line references, or pattern descriptions. At least 1 item.",
79+
"description": "Code-grounded evidence: snippets, line references, or pattern descriptions. At least 1 item. For any finding at confidence anchor 75 or 100, the first item MUST be the verbatim motivating line(s) with file:line -- the exact code text that makes the finding true (the quote-the-line gate). A finding whose triggering line cannot be quoted must step down to anchor 50.",
8080
"items": { "type": "string" },
8181
"minItems": 1
8282
},
@@ -130,8 +130,8 @@
130130
},
131131
"return_tiers": {
132132
"description": "Finding fields are split into two tiers. The full schema (with all required fields) applies to the artifact file on disk. The compact return to the orchestrator omits detail-tier fields. Both are valid uses of this schema in different contexts.",
133-
"merge_tier": "Returned to orchestrator: title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix (optional). Plus top-level reviewer, residual_risks, testing_gaps.",
134-
"detail_tier": "Required in artifact file, omitted from compact return: why_it_matters, evidence. The artifact file must pass full schema validation including all required fields. Headless output depends on why_it_matters and evidence being present in the artifact."
133+
"merge_tier": "Returned to orchestrator: title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix (optional), first_evidence (required for anchor 75/100; the verbatim first evidence line, used to enforce the quote-the-line gate in-band). Plus top-level reviewer, residual_risks, testing_gaps.",
134+
"detail_tier": "Required in artifact file, omitted from compact return: why_it_matters, and the full evidence array (the compact return carries only first_evidence). The artifact file must pass full schema validation including all required fields. Headless output depends on why_it_matters and evidence being present in the artifact."
135135
}
136136
}
137137
}

skills/ce-code-review/references/subagent-template.md

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -27,14 +27,15 @@ You produce up to two outputs depending on whether a run ID was provided:
2727
If no Run ID is provided (the field is empty or absent), skip this step entirely -- do not attempt any file write.
2828
2929
2. **Compact return (always).** RETURN compact JSON to the parent with ONLY merge-tier fields per finding:
30-
title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix.
31-
Do NOT include why_it_matters or evidence in the returned JSON.
30+
title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix, first_evidence.
31+
Do NOT include why_it_matters or the full evidence array in the returned JSON.
32+
`first_evidence` is the ONE exception to "no evidence in the compact return": it is the verbatim motivating line with `file:line` (the same string you put first in the `evidence` array). It is **REQUIRED for every finding at anchor 75 or 100** — the orchestrator enforces the quote-the-line gate from this field, and a 75/100 finding without it is demoted to anchor 50 at merge. Omit it only for anchor-50 findings. Keep it to that single line; the rest of `evidence` stays in the artifact file.
3233
Include reviewer, residual_risks, and testing_gaps at the top level.
3334
3435
The full file preserves detail for downstream consumers (agent-mode output, debugging).
3536
The compact return keeps the orchestrator's context lean for merge and synthesis.
3637
37-
The schema below describes the **full artifact file format** (all fields required). For the compact return, follow the field list above -- omit why_it_matters and evidence even though the schema marks them as required.
38+
The schema below describes the **full artifact file format** (all fields required). For the compact return, follow the field list above -- omit why_it_matters and the full evidence array (but include `first_evidence`) even though the schema marks evidence as required.
3839
3940
{schema}
4041
@@ -43,7 +44,7 @@ The schema below describes the **full artifact file format** (all fields require
4344
- `severity`: one of `"P0"`, `"P1"`, `"P2"`, `"P3"` — use these exact strings. Do NOT use `"high"`, `"medium"`, `"low"`, `"critical"`, or any other vocabulary, even if your persona's prose discusses priorities in those terms conceptually.
4445
- `autofix_class`: one of `"gated_auto"`, `"manual"`, `"advisory"`.
4546
- `owner`: one of `"downstream-resolver"`, `"human"`, `"release"`.
46-
- `evidence`: an ARRAY of strings with at least one element. A single string value is a validation failure — wrap every quote in `["..."]` even when there is only one.
47+
- `evidence`: an ARRAY of strings with at least one element. A single string value is a validation failure — wrap every quote in `["..."]` even when there is only one. **For any finding at anchor `75` or `100`, the first evidence item MUST be the verbatim motivating line(s) with `file:line`** — the exact code text that makes the finding true (see "Quote-the-line gate" below).
4748
- `pre_existing`: boolean, never null.
4849
- `requires_verification`: boolean, never null.
4950
- `confidence`: one of exactly `0`, `25`, `50`, `75`, or `100` — a discrete anchor, NOT a continuous number. Any other value (e.g., `72`, `0.85`, `"high"`) is a validation failure. Pick the anchor whose behavioral criterion you can honestly self-apply to this finding (see "Confidence rubric" below).
@@ -62,6 +63,15 @@ If your persona description uses severity vocabulary like "high-priority" or "cr
6263
6364
Anchor and severity are independent axes. A P2 finding can be anchor `100` if the evidence is airtight; a P0 finding can be anchor `50` if it is an important concern you could not fully verify. Anchor gates where the finding surfaces (drop / soft bucket / actionable); severity orders it within the actionable surface.
6465
66+
**Quote-the-line gate (kills the "field/symbol doesn't exist" false-positive class).** Before you anchor a finding at `75` or `100`, quote the verbatim line(s) that make it true, with `file:line`, as the first `evidence` item:
67+
68+
- "field X doesn't exist on model Y" → quote the class/`Meta`/migration where X would be defined.
69+
- "`dict.get()` may return None" → quote the dict's initialization.
70+
- "race between A and B" → quote both A and B.
71+
- "swapped argument / wrong return" → quote the call site and the signature.
72+
73+
**If you cannot quote the motivating line, you cannot claim `75`+ — step down to `50` (suppressed from primary findings).** When the symbol is generated by a framework metaclass, ORM `Meta`, decorator, or migration history (Rails `has_many`/`scope`, Django `Meta`, SQLAlchemy `Column`/`relationship`, Prisma client, TypeORM/Sequelize decorators), quote the meta-construct that creates it — reading the source that generates the symbol satisfies the gate; a failed `grep` for the literal name does not.
74+
6575
Synthesis suppresses anchors `0` and `25` silently. Anchor `50` is dropped from primary findings unless the severity is P0 (P0+50 survives) or synthesis routes it to a soft bucket (testing_gaps, residual_risks, advisory) per mode-aware demotion. Anchors `75` and `100` enter the actionable tier.
6676
6777
Example of a schema-valid finding (all required fields, correct enum values, correct array shape):

skills/ce-work/SKILL.md

Lines changed: 6 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -333,17 +333,13 @@ Determine how to proceed based on what was provided in `<input_document>`.
333333

334334
When all Phase 2 tasks are complete and execution transitions to quality check, you must read `references/shipping-workflow.md` for the full shipping workflow. Do not skip this.
335335

336-
**Code review tiers:** Tier 1 when the harness has built-in review. Tier 2 only when escalation criteria in `shipping-workflow.md` match — not because Tier 1 is missing.
336+
**Code review: one portable path.** Review with `ce-code-review`, which self-sizes (lite roster for small low-risk code-only diffs, full roster otherwise). No harness-native review detection and no escalation tiers — the size/sensitive-surface judgment lives inside `ce-code-review`. Skip dedicated review only for a purely mechanical diff (formatting, dep-bumps, lint-only, generated). Full rules (autonomous Residual Gate, infra fallback) in `shipping-workflow.md`.
337337

338-
**Tier 2 is two steps — review, then fix.** `ce-code-review` is review-only. It returns findings (markdown or `mode:agent` JSON); it never edits the checkout, commits, or applies fixes.
338+
**Review is two steps — review, then fix.** `ce-code-review` is review-only. It returns findings (markdown or `mode:agent` JSON); it never edits the checkout, commits, or applies fixes.
339339

340-
When Tier 2 applies:
341-
342-
1. **Review** — Invoke the `ce-code-review` skill (invocation command in `references/review-findings-followup.md` § Fallback). Use `mode:agent` in orchestrated workflows; pass `plan:<path>` when you have a plan and `base:<ref>` when the merge base is already known.
340+
1. **Review** — Invoke the `ce-code-review` skill (invocation command in `references/review-findings-followup.md` § Fallback). Use `mode:agent` in orchestrated workflows; pass `plan:<path>` when you have a plan, `base:<ref>` when the merge base is known, and `depth:full` when a deep/thorough review was explicitly requested.
343341
2. **Apply fixes** — Load `references/review-findings-followup.md`. Filter eligibility on JSON only, **batch applicable findings by file**, dispatch fix subagents (parallel when file sets are disjoint). The orchestrator merges diffs, runs tests, and commits — it does not pre-investigate findings.
344-
3. **Residual Work Gate** — Only after followup; unresolved actionable findings go through the gate in `shipping-workflow.md`.
345-
346-
Tier 1 harness-native review may still fix inline; Tier 2 always separates review from apply.
342+
3. **Residual Work Gate** — Only after followup; unresolved actionable findings go through the gate in `shipping-workflow.md` (autonomous sessions auto-accept + record residuals; interactive sessions ask).
347343

348344
## Key Principles
349345

@@ -367,7 +363,7 @@ Tier 1 harness-native review may still fix inline; Tier 2 always separates revie
367363

368364
### Quality is Built In
369365

370-
- Review when Tier 1 is available or Tier 2 criteria match (see `shipping-workflow.md`)
366+
- Review every non-mechanical diff with `ce-code-review` (it self-sizes; see `shipping-workflow.md`)
371367

372368
### Ship Complete Features
373369

@@ -383,5 +379,5 @@ Tier 1 harness-native review may still fix inline; Tier 2 always separates revie
383379
- **Testing at the end** - Test continuously or suffer later
384380
- **Forgetting to track progress** - Update task status as you go or lose track of what's done
385381
- **80% done syndrome** - Finish the feature, don't move on early
386-
- **Skipping review without reason**Use Tier 1 when available; escalate to Tier 2 only on criteria in `shipping-workflow.md`; document when both are skipped
382+
- **Skipping review without reason**review every non-mechanical diff with `ce-code-review`; skip only for a purely mechanical diff or when it is genuinely unavailable, and document the skip reason
387383
- **Re-scoping the plan into human-time phases** - The plan's Implementation Units define the scope of execution. Do not estimate human-hours per unit, propose multi-day breakdowns, or ask the user to pick a subset of units for "this session". Agents execute at agent speed, and context-window pressure is addressed by subagent dispatch (Phase 1 Step 4), not by phased sessions. If a plan-file input is genuinely too large for a single execution, say so plainly and suggest the user return to `/ce-plan` to reduce scope — don't invent session phases as a workaround. For bare-prompt input, Phase 0's Large routing already handles oversized work

skills/ce-work/references/review-findings-followup.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Apply Code Review Findings (after `ce-code-review`)
22

3-
Load this reference when Tier 2 `ce-code-review` has finished and **ce-work** (or another caller) should apply fixes before the Residual Work Gate.
3+
Load this reference when `ce-code-review` has finished and **ce-work** (or another caller) should apply fixes before the Residual Work Gate.
44

55
`ce-code-review` is invoked here with `mode:agent`, so it is **review-only** in this context — it reports findings and writes artifacts and does not mutate the checkout, commit, push, or file tickets. **The caller owns apply/fix policy.** (In its own default/interactive mode the review applies safe fixes itself; that path does not apply here.)
66

77
## Consume the completed review (do not re-run it)
88

9-
This reference loads **after** review has run. In the ce-work Tier 2 path, step 2a already invoked `ce-code-review`; this apply step **consumes that output** — do not start a second review, which would waste reviewer dispatches and risk overwriting the artifact the Residual Work Gate reconciles.
9+
This reference loads **after** review has run. In the ce-work shipping flow, step 3a already invoked `ce-code-review`; this apply step **consumes that output** — do not start a second review, which would waste reviewer dispatches and risk overwriting the artifact the Residual Work Gate reconciles.
1010

1111
Reuse the review output already in hand:
1212

@@ -17,7 +17,7 @@ If `status` is `failed`, stop shipping and surface `reason`. If `degraded`, note
1717

1818
### Fallback — invoke review only for cold callers
1919

20-
Only when the caller reached this file **without** already running review (no review output in hand): invoke `ce-code-review` once, then proceed to apply. Do not invoke when the caller already ran review (e.g., ce-work Tier 2 step 2a).
20+
Only when the caller reached this file **without** already running review (no review output in hand): invoke `ce-code-review` once, then proceed to apply. Do not invoke when the caller already ran review (e.g., ce-work shipping step 3a).
2121

2222
Invoke the skill explicitly — do not treat a casual "review my changes" prompt as a substitute unless the harness routed it to `ce-code-review`.
2323

0 commit comments

Comments
 (0)