|
1 | | -Run a comprehensive code review using CKB's deterministic analysis + your semantic review. |
| 1 | +Run a CKB-augmented code review optimized for minimal token usage. |
2 | 2 |
|
3 | 3 | ## Input |
4 | 4 | $ARGUMENTS - Optional: base branch (default: main), or "staged" for staged changes, or a PR number |
5 | 5 |
|
6 | | -## MCP vs CLI |
| 6 | +## Philosophy |
7 | 7 |
|
8 | | -CKB runs as an MCP server in this environment. MCP mode is strongly preferred for interactive review because the SCIP index stays loaded between calls — drill-down tools like `findReferences`, `analyzeImpact`, and `explainSymbol` execute instantly against the in-memory index. CLI mode reloads the index on every invocation. |
| 8 | +CKB already answered the structural questions (secrets? breaking? dead code? test gaps?). |
| 9 | +The LLM's job is ONLY what CKB can't do: semantic reasoning about correctness, design, |
| 10 | +and intent. Every source line you read costs tokens — read only what CKB says is risky. |
9 | 11 |
|
10 | | -## The Three Phases |
| 12 | +### CKB's blind spots (what the LLM must catch) |
11 | 13 |
|
12 | | -### Phase 1: CKB structural scan (5 seconds, 0 tokens) |
| 14 | +CKB runs 15 deterministic checks with AST rules, SCIP index, and git history. |
| 15 | +It is structurally sound but semantically blind: |
13 | 16 |
|
14 | | -Call the `reviewPR` MCP tool with compact mode: |
15 | | -``` |
16 | | -reviewPR(baseBranch: "main", compact: true) |
17 | | -``` |
| 17 | +- **Logic errors**: wrong conditions (`>` vs `>=`), off-by-one, incorrect algorithm |
| 18 | +- **Business logic**: domain-specific mistakes CKB has no context for |
| 19 | +- **Design fitness**: wrong abstraction, leaky interface, coupling that metrics miss |
| 20 | +- **Input validation**: missing bounds checks, nil guards outside AST patterns |
| 21 | +- **Race conditions**: concurrency issues, mutex ordering, shared state |
| 22 | +- **Resource leaks**: file handles, goroutines, connections not closed on all paths |
| 23 | +- **Incomplete refactoring**: callers missed across module boundaries |
| 24 | +- **Domain edge cases**: error paths, boundary conditions tests don't cover |
18 | 25 |
|
19 | | -This returns ~1k tokens instead of ~30k — just the verdict, non-pass checks, top 10 findings, and action items. Use `compact: false` only if you need the full raw data. |
| 26 | +CKB's scoring uses per-check caps (max -20) and per-rule caps (max -10), so a score |
| 27 | +of 85 can still hide multiple capped warnings. HoldTheLine only flags changed lines, |
| 28 | +so pre-existing issues interacting with new code won't surface. |
| 29 | + |
| 30 | +## Phase 1: Structural scan (~1k tokens into context) |
20 | 31 |
|
21 | | -If a PR number was given, get the base branch first: |
22 | 32 | ```bash |
23 | | -BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName) |
| 33 | +ckb review --base=main --format=json 2>/dev/null |
24 | 34 | ``` |
25 | | -Then pass it: `reviewPR(baseBranch: BASE, compact: true)` |
26 | | - |
27 | | -> **If CKB is not running as an MCP server** (last resort), use the CLI instead: |
28 | | -> ```bash |
29 | | -> ./ckb review --base=main --format=json |
30 | | -> ``` |
31 | | -> Note: CLI mode reloads the SCIP index on every call, so drill-down steps will be slower. |
32 | | -
|
33 | | -From CKB's output, immediately note: |
34 | | -- **Passed checks** → skip these categories. Don't waste tokens re-checking secrets, breaking changes, test coverage, etc. |
35 | | -- **Warned checks** → your review targets |
36 | | -- **Top hotspot files** → read these first |
37 | | -- **Test gaps** → functions to evaluate |
38 | | -
|
39 | | -### Phase 2: Drill down on CKB findings (0 tokens via MCP) |
40 | | -
|
41 | | -Before reading source code, use CKB's MCP tools to investigate specific findings. These calls are instant because the SCIP index is already loaded from Phase 1. |
42 | 35 |
|
43 | | -| CKB finding | Drill-down tool | What to check | |
44 | | -|---|---|---| |
45 | | -| Dead code | `findReferences(symbolId: "...")` or `searchSymbols` → `findReferences` | Does it actually have references? CKB's SCIP index can miss cross-package refs | |
46 | | -| Blast radius | `analyzeImpact(symbolId: "...")` | Are the "callers" real logic or just framework registrations? | |
47 | | -| Coupling gap | `explainSymbol(name: "...")` on the missing file | What does the co-change partner do? Does it actually need updates? | |
48 | | -| Bug patterns | Already verified by differential analysis | Just check the specific line CKB flagged | |
49 | | -| Complexity | `explainFile(path: "...")` | What functions are driving the increase? | |
50 | | -| Test gaps | `getAffectedTests(baseBranch: "main")` | Which tests exist? Which functions are actually untested? | |
51 | | -| Hotspots | `getHotspots(limit: 10)` | Full churn history for the flagged files | |
| 36 | +If a PR number was given: |
| 37 | +```bash |
| 38 | +BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName) |
| 39 | +ckb review --base=$BASE --format=json 2>/dev/null |
| 40 | +``` |
52 | 41 |
|
53 | | -### Phase 3: Semantic review of high-risk files |
| 42 | +If "staged" was given: |
| 43 | +```bash |
| 44 | +ckb review --staged --format=json 2>/dev/null |
| 45 | +``` |
54 | 46 |
|
55 | | -Now read the actual source — but only for: |
56 | | -1. Files CKB ranked as top hotspots |
57 | | -2. Files with warned findings that survived drill-down |
58 | | -3. New files (CKB can't assess design quality of new code) |
| 47 | +Parse the JSON output to extract: |
| 48 | +- `score`, `verdict` — overall quality |
| 49 | +- `checks[]` — status + summary per check (15 checks: breaking, secrets, tests, complexity, |
| 50 | + coupling, hotspots, risk, health, dead-code, test-gaps, blast-radius, comment-drift, |
| 51 | + format-consistency, bug-patterns, split) |
| 52 | +- `findings[]` — severity + file + message + ruleId (top-level, separate from check details) |
| 53 | +- `narrative` — CKB AI-generated summary (if available) |
| 54 | +- `prTier` — small/medium/large |
| 55 | +- `reviewEffort` — estimated hours + complexity |
| 56 | +- `reviewers[]` — suggested reviewers with expertise areas |
| 57 | +- `healthReport` — degraded/improved file counts |
| 58 | + |
| 59 | +From checks, build three lists: |
| 60 | +- **SKIP**: passed checks — don't touch these files or topics |
| 61 | +- **INVESTIGATE**: warned/failed checks — these are your review scope |
| 62 | +- **READ**: files with warn/fail findings — the only files you'll read |
| 63 | + |
| 64 | +**Early exit**: Skip LLM ONLY when ALL conditions are met: |
| 65 | +1. Score ≥ 90 (not 80 — per-check caps hide warnings at 80) |
| 66 | +2. Zero warn/fail checks |
| 67 | +3. Small change (< 100 lines of diff) |
| 68 | +4. No new files (CKB has no SCIP history for them) |
| 69 | + |
| 70 | +If ANY condition fails, proceed to Phase 2 — CKB's structural pass does NOT mean |
| 71 | +the code is semantically correct. |
| 72 | + |
| 73 | +## Phase 2: Targeted source reading (the only token-expensive step) |
| 74 | + |
| 75 | +Do NOT read the full diff. Do NOT read every changed file. |
| 76 | + |
| 77 | +**For files CKB flagged (INVESTIGATE list):** |
| 78 | +Read only the changed hunks via `git diff main...HEAD -- <file>`. |
| 79 | + |
| 80 | +**For new files** (CKB has no history — these are your biggest blind spot): |
| 81 | +- If it's a new package/module: read the entry point and types/interfaces first, |
| 82 | + then follow references to understand the architecture before reading individual files |
| 83 | +- If < 500 lines: read the file |
| 84 | +- If > 500 lines: read the first 100 lines (types/imports) + functions CKB flagged |
| 85 | +- Skip generated files, test files for existing tests, and config/CI/docs files |
| 86 | + |
| 87 | +**For each file you read, look for exactly:** |
| 88 | +- Logic errors (wrong condition, off-by-one, nil deref, race condition) |
| 89 | +- Resource leaks (file handles, connections, goroutines not closed on error paths) |
| 90 | +- Security issues (injection, auth bypass, secrets CKB's patterns missed) |
| 91 | +- Design problems (wrong abstraction, leaky interface, coupling metrics don't catch) |
| 92 | +- Missing edge cases the tests don't cover |
| 93 | +- Incomplete refactoring (callers that should have changed but didn't) |
| 94 | + |
| 95 | +Do NOT look for: style, naming, formatting, documentation, test coverage — |
| 96 | +CKB already checked these structurally. |
| 97 | + |
| 98 | +## Phase 3: Write the review (be terse) |
59 | 99 |
|
60 | | -For each file, look for things CKB CANNOT detect: |
61 | | -- Logic bugs (wrong conditions, off-by-one, race conditions) |
62 | | -- Security issues (injection, auth bypass, data exposure) |
63 | | -- Design problems (wrong abstraction, unclear naming, leaky interfaces) |
64 | | -- Edge cases (nil inputs, empty collections, concurrent access) |
65 | | -- Error handling quality (not just missing — wrong strategy) |
| 100 | +```markdown |
| 101 | +## [APPROVE|REQUEST CHANGES|DISCUSS] — CKB score: [N]/100 |
66 | 102 |
|
67 | | -### Phase 4: Write the review |
| 103 | +[One sentence: what the PR does] |
68 | 104 |
|
69 | | -Format: |
| 105 | +[If CKB provided narrative, include it here] |
70 | 106 |
|
71 | | -```markdown |
72 | | -## Summary |
73 | | -One paragraph: what the PR does, overall assessment. |
| 107 | +**PR tier:** [small/medium/large] | **Review effort:** [N]h ([complexity]) |
| 108 | +**Health:** [N] degraded, [N] improved |
74 | 109 |
|
75 | | -## Must Fix |
76 | | -Findings that should block merge. File:line references. |
| 110 | +### Issues |
| 111 | +1. **[must-fix|should-fix]** `file:line` — [issue in one sentence] |
| 112 | +2. ... |
77 | 113 |
|
78 | | -## Should Fix |
79 | | -Issues worth addressing but not blocking. |
| 114 | +### CKB passed (no review needed) |
| 115 | +[comma-separated list of passed checks] |
80 | 116 |
|
81 | | -## CKB Analysis |
82 | | -- Verdict: [pass/warn/fail], Score: [0-100] |
83 | | -- [N] checks passed, [N] warned |
84 | | -- Key findings: [top 3] |
85 | | -- False positives identified: [any CKB findings you disproved] |
86 | | -- Test gaps: [N] untested functions — [your assessment of which matter] |
| 117 | +### CKB flagged (verified above) |
| 118 | +[for each warn/fail finding: confirmed/false-positive + one-line reason] |
87 | 119 |
|
88 | | -## Recommendation |
89 | | -Approve / Request changes / Needs discussion |
| 120 | +### Suggested reviewers |
| 121 | +[reviewer — expertise area] |
90 | 122 | ``` |
91 | 123 |
|
92 | | -## Tips |
93 | | -
|
94 | | -- If CKB says "secrets: pass" — trust it, don't re-scan 100+ files |
95 | | -- If CKB says "breaking: pass" — trust it, SCIP-verified API comparison |
96 | | -- If CKB says "dead-code: FormatSARIF" — DON'T trust blindly, verify with `findReferences` or grep |
97 | | -- CKB's hotspot scores are based on git churn history — higher score = more volatile file = review more carefully |
98 | | -- CKB's complexity delta shows WHERE cognitive load increased — read those functions |
| 124 | +If no issues found: just the header line + CKB passed list. Nothing else. |
| 125 | + |
| 126 | +## Anti-patterns (token waste) |
| 127 | + |
| 128 | +- Reading files CKB marked as pass → waste |
| 129 | +- Reading generated files → waste |
| 130 | +- Summarizing what the PR does in detail → waste (git log exists, CKB has narrative) |
| 131 | +- Explaining why passed checks passed → waste |
| 132 | +- Running MCP drill-down tools when CLI already gave enough signal → waste |
| 133 | +- Reading test files to "verify test quality" → waste unless CKB flagged test-gaps |
| 134 | +- Reading hotspot-only files with no findings → high churn ≠ needs review right now |
| 135 | +- Trusting score >= 80 as "safe to skip" → dangerous (per-check caps hide warnings) |
| 136 | +- Skipping new files because CKB didn't flag them → CKB has no SCIP data for new files |
| 137 | +- Reading every new file in a large new package → read entry point + types first, then follow refs |
| 138 | +- Ignoring reviewEffort/prTier → these tell you how thorough to be |
0 commit comments