fix: tighten review skill early-exit criteria and add blind spots section (#184)

SimplyLiz · claude · web-flow · commit bf28787aff32 · 2026-03-24T14:34:13.000+01:00
Syncs local skill refinements to repo and embedded constant:

- Early exit now requires score&gt;=90 + zero warns + &lt;100 lines + no new
  files (score&gt;=80 was unsafe due to per-check caps hiding warnings)
- Added "CKB's blind spots" section listing what the LLM must catch
  (logic errors, business logic, race conditions, etc.)
- Expanded Phase 2 checklist: race conditions, incomplete refactoring,
  secrets beyond CKB's 26 patterns
- Added anti-patterns: trusting score&gt;=80, skipping new files

Co-authored-by: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.claude/commands/review.md b/.claude/commands/review.md
@@ -9,6 +9,23 @@ CKB already answered the structural questions (secrets? breaking? dead code? tes
 The LLM's job is ONLY what CKB can't do: semantic reasoning about correctness, design,
 and intent. Every source line you read costs tokens — read only what CKB says is risky.
 
+### CKB's blind spots (what the LLM must catch)
+
+CKB runs 15 deterministic checks with AST rules, SCIP index, and git history.
+It is structurally sound but semantically blind:
+
+- **Logic errors**: wrong conditions (`>` vs `>=`), off-by-one, incorrect algorithm
+- **Business logic**: domain-specific mistakes CKB has no context for
+- **Design fitness**: wrong abstraction, leaky interface, coupling that metrics miss
+- **Input validation**: missing bounds checks, nil guards outside AST patterns
+- **Race conditions**: concurrency issues, mutex ordering, shared state
+- **Incomplete refactoring**: callers missed across module boundaries
+- **Domain edge cases**: error paths, boundary conditions tests don't cover
+
+CKB's scoring uses per-check caps (max -20) and per-rule caps (max -10), so a score
+of 85 can still hide multiple capped warnings. HoldTheLine only flags changed lines,
+so pre-existing issues interacting with new code won't surface.
+
 ## Phase 1: Structural scan (~1k tokens into context)
 
 ```bash
@@ -26,7 +43,14 @@ From the output, build three lists:
 - **INVESTIGATE**: warned/failed checks — these are your review scope
 - **READ**: hotspot files + files with warn/fail findings — the only files you'll read
 
-**Early exit**: If verdict=pass and score≥80, write a one-line approval and stop. No source reading needed.
+**Early exit**: Skip LLM ONLY when ALL conditions are met:
+1. Score ≥ 90 (not 80 — per-check caps hide warnings at 80)
+2. Zero warn/fail checks
+3. Small change (< 100 lines of diff)
+4. No new files (CKB has no SCIP history for them)
+
+If ANY condition fails, proceed to Phase 2 — CKB's structural pass does NOT mean
+the code is semantically correct.
 
 ## Phase 2: Targeted source reading (the only token-expensive step)
 
@@ -38,10 +62,11 @@ Read ONLY:
 3. Skip generated files, test files for existing tests, and config/CI files
 
 For each file you read, look for exactly:
-- Logic errors (wrong condition, off-by-one, nil deref)
-- Security issues (injection, auth bypass, secrets)
-- Design problems (wrong abstraction, leaky interface)
+- Logic errors (wrong condition, off-by-one, nil deref, race condition)
+- Security issues (injection, auth bypass, secrets CKB's 26 patterns missed)
+- Design problems (wrong abstraction, leaky interface, coupling metrics don't catch)
 - Missing edge cases the tests don't cover
+- Incomplete refactoring (callers that should have changed but didn't)
 
 Do NOT look for: style, naming, formatting, documentation, test coverage —
 CKB already checked these structurally.
@@ -75,3 +100,5 @@ If no issues found: just the header line + CKB passed list. Nothing else.
 - Running MCP drill-down tools when CLI already gave enough signal → waste
 - Reading test files to "verify test quality" → waste unless CKB flagged test-gaps
 - Reading hotspot-only files with no findings → high churn ≠ needs review right now
+- Trusting score >= 80 as "safe to skip" → dangerous (per-check caps hide warnings)
+- Skipping new files because CKB didn't flag them → CKB has no SCIP data for new files
diff --git a/cmd/ckb/setup.go b/cmd/ckb/setup.go
@@ -832,6 +832,23 @@ CKB already answered the structural questions (secrets? breaking? dead code? tes
 The LLM's job is ONLY what CKB can't do: semantic reasoning about correctness, design,
 and intent. Every source line you read costs tokens — read only what CKB says is risky.
 
+### CKB's blind spots (what the LLM must catch)
+
+CKB runs 15 deterministic checks with AST rules, SCIP index, and git history.
+It is structurally sound but semantically blind:
+
+- **Logic errors**: wrong conditions (` + "`" + `>` + "`" + ` vs ` + "`" + `>=` + "`" + `), off-by-one, incorrect algorithm
+- **Business logic**: domain-specific mistakes CKB has no context for
+- **Design fitness**: wrong abstraction, leaky interface, coupling that metrics miss
+- **Input validation**: missing bounds checks, nil guards outside AST patterns
+- **Race conditions**: concurrency issues, mutex ordering, shared state
+- **Incomplete refactoring**: callers missed across module boundaries
+- **Domain edge cases**: error paths, boundary conditions tests don't cover
+
+CKB's scoring uses per-check caps (max -20) and per-rule caps (max -10), so a score
+of 85 can still hide multiple capped warnings. HoldTheLine only flags changed lines,
+so pre-existing issues interacting with new code won't surface.
+
 ## Phase 1: Structural scan (~1k tokens into context)
 
 ` + "```" + `bash
@@ -849,7 +866,14 @@ From the output, build three lists:
 - **INVESTIGATE**: warned/failed checks — these are your review scope
 - **READ**: hotspot files + files with warn/fail findings — the only files you'll read
 
-**Early exit**: If verdict=pass and score>=80, write a one-line approval and stop. No source reading needed.
+**Early exit**: Skip LLM ONLY when ALL conditions are met:
+1. Score >= 90 (not 80 — per-check caps hide warnings at 80)
+2. Zero warn/fail checks
+3. Small change (< 100 lines of diff)
+4. No new files (CKB has no SCIP history for them)
+
+If ANY condition fails, proceed to Phase 2 — CKB's structural pass does NOT mean
+the code is semantically correct.
 
 ## Phase 2: Targeted source reading (the only token-expensive step)
 
@@ -861,10 +885,11 @@ Read ONLY:
 3. Skip generated files, test files for existing tests, and config/CI files
 
 For each file you read, look for exactly:
-- Logic errors (wrong condition, off-by-one, nil deref)
-- Security issues (injection, auth bypass, secrets)
-- Design problems (wrong abstraction, leaky interface)
+- Logic errors (wrong condition, off-by-one, nil deref, race condition)
+- Security issues (injection, auth bypass, secrets CKB's 26 patterns missed)
+- Design problems (wrong abstraction, leaky interface, coupling metrics don't catch)
 - Missing edge cases the tests don't cover
+- Incomplete refactoring (callers that should have changed but didn't)
 
 Do NOT look for: style, naming, formatting, documentation, test coverage —
 CKB already checked these structurally.
@@ -898,6 +923,8 @@ If no issues found: just the header line + CKB passed list. Nothing else.
 - Running MCP drill-down tools when CLI already gave enough signal — waste
 - Reading test files to "verify test quality" — waste unless CKB flagged test-gaps
 - Reading hotspot-only files with no findings — high churn does not mean needs review right now
+- Trusting score >= 80 as "safe to skip" — dangerous (per-check caps hide warnings)
+- Skipping new files because CKB did not flag them — CKB has no SCIP data for new files
 `
 
 func configureVSCodeGlobal(ckbCommand string, ckbArgs []string) error {