Skip to content

Commit 1cae8fc

Browse files
authored
feat: v8.3 — compliance audit, MCP tools, bug-pattern FP fixes (#183)
## v8.3.0 Release ### Compliance Audit - 20 regulatory frameworks, 131 checks, cross-framework mapping - `--recommend` flag for auto-detecting applicable frameworks - `auditCompliance` MCP tool - Score: 48→97/100 after dogfood (FP reduction from 11,356→1 finding) ### MCP Tools - `listSymbols` — bulk symbol listing with complexity metrics - `getSymbolGraph` — batch call graph with complexity per node - `searchSymbols` — now returns lines/cyclomatic/cognitive + server-side filtering (minLines, minComplexity, excludePatterns) - `batchGet` — `includeCounts` for reference/caller/callee counts - FTS empty-query bug fix + MCP warmup cache poisoning fix ### Bug-Pattern FP Fixes (42→0) - FindNodesSkipping for closure-aware AST analysis - singleReturnNew/noErrorMethods allowlists - Scope-aware shadowed-err detection ### Other - 46 new tests, comprehensive generated file detection - Daemon API endpoints implemented (7 stubs→real) - Query engine stubs implemented (ownership, hotspot, responsibility refresh) - Wiki documentation updated across 14 pages - Flutter l10n coupling FP fix (#185)
1 parent ef273f4 commit 1cae8fc

File tree

166 files changed

+42246
-18620
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

166 files changed

+42246
-18620
lines changed

.claude/commands/audit.md

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
Run a CKB-augmented compliance audit optimized for minimal token usage.
2+
3+
## Input
4+
$ARGUMENTS - Optional: framework(s) to audit (default: auto-detect from repo context). Examples: "gdpr", "gdpr,pci-dss,hipaa", "all"
5+
6+
## Philosophy
7+
8+
CKB already ran deterministic checks across 20 regulatory frameworks, mapped every finding
9+
to a specific regulation article, and assigned confidence scores. The LLM's job is ONLY what
10+
CKB can't do: assess whether findings are real compliance risks or false positives given the
11+
repo's actual purpose, and prioritize remediation by business impact.
12+
13+
### Available frameworks (20 total)
14+
15+
**Privacy:** gdpr, ccpa, iso27701
16+
**AI:** eu-ai-act
17+
**Security:** iso27001, nist-800-53, owasp-asvs, soc2, hipaa
18+
**Industry:** pci-dss, dora, nis2, fda-21cfr11, eu-cra
19+
**Supply chain:** sbom-slsa
20+
**Safety:** iec61508, iso26262, do-178c
21+
**Coding:** misra, iec62443
22+
23+
### CKB's blind spots (what the LLM must catch)
24+
25+
CKB maps code patterns to regulation articles using AST + regex + tree-sitter. It is
26+
structurally correct but contextually blind:
27+
28+
- **Business context**: CKB flags PII patterns in a healthcare app and a game engine equally
29+
- **Architecture awareness**: a finding in dead/test code vs production code has different weight
30+
- **Compensating controls**: CKB can't see infrastructure-level encryption, WAFs, or IAM policies
31+
- **Regulatory applicability**: CKB flags HIPAA in a repo that doesn't handle PHI
32+
- **Risk prioritization**: 50 findings need ordering by actual business/legal exposure
33+
- **Cross-reference noise**: the same hardcoded credential maps to 6 frameworks — that's 1 fix, not 6
34+
35+
## Phase 1: Structural scan (~2k tokens into context)
36+
37+
```bash
38+
ckb audit compliance --framework=$ARGUMENTS --format=json --min-confidence=0.7 2>/dev/null
39+
```
40+
41+
For large repos, scope to a specific path to reduce noise:
42+
```bash
43+
ckb audit compliance --framework=$ARGUMENTS --scope=src/api --format=json --min-confidence=0.7 2>/dev/null
44+
```
45+
46+
If no framework specified, pick based on repo context:
47+
- Has health/patient/medical code → `hipaa,gdpr`
48+
- Has payment/billing/card code → `pci-dss,soc2`
49+
- EU company or processes EU data → `gdpr,dora,nis2`
50+
- AI/ML code → `eu-ai-act`
51+
- Safety-critical/embedded → `iec61508,iso26262,misra`
52+
- General SaaS → `iso27001,soc2,owasp-asvs`
53+
- If unsure → `iso27001,owasp-asvs` (broadest applicability)
54+
55+
From the JSON output, extract:
56+
- `score`, `verdict` (pass/warn/fail)
57+
- `coverage[]` — per-framework scores with passed/warned/failed/skipped check counts
58+
- `findings[]` — with check, severity, file, startLine, message, suggestion, confidence, CWE
59+
- `checks[]` — per-check status and summary
60+
- `summary` — total findings by severity, files scanned
61+
62+
Note:
63+
- **Per-framework scores**: which frameworks are clean vs problematic
64+
- **Finding count by severity**: errors are your priority
65+
- **CWE references**: cross-reference with known vulnerability databases
66+
- **Confidence scores**: low confidence (< 0.7) findings are likely false positives
67+
68+
**Early exit**: If verdict=pass and all framework scores ≥ 90, write a one-line summary and stop.
69+
70+
## Phase 2: Triage findings (targeted reads only)
71+
72+
Do NOT read every flagged file. Group findings by root cause first:
73+
74+
1. **Deduplicate cross-framework findings** — a hardcoded secret flagged by GDPR, PCI DSS, HIPAA, and ISO 27001 is one fix
75+
2. **Check for dominant category** — if > 50% of findings are one category (e.g., "sql-injection"), investigate that category systemically (is the pattern matching too broad?) rather than checking each file individually
76+
3. **Check applicability** — does this repo actually fall under the flagged framework? (e.g., HIPAA findings in a non-healthcare repo)
77+
4. **Read only error-severity files** — warnings and info can wait
78+
5. **For each error finding**, read just the flagged lines (not the whole file) and assess:
79+
- Is this a real compliance risk or a pattern false positive?
80+
- Are there compensating controls elsewhere? (check imports, config, middleware)
81+
- What's the remediation effort: one-liner fix vs architectural change?
82+
83+
## Phase 3: Write the audit summary (be terse)
84+
85+
```markdown
86+
## [COMPLIANT|NEEDS REMEDIATION|NON-COMPLIANT] — CKB score: [N]/100
87+
88+
[One sentence: what frameworks were audited and overall posture]
89+
90+
### Critical findings (must remediate)
91+
1. **[framework]** `file:line` Art. [X][issue + remediation in one sentence]
92+
2. ...
93+
94+
### Not applicable (false positives from context)
95+
[List findings CKB flagged but that don't apply to this repo, with one-line reason]
96+
97+
### Cross-framework deduplication
98+
[N findings deduplicated to M root causes]
99+
100+
### Framework scores
101+
| Framework | Score | Status | Checks |
102+
|-----------|-------|--------|--------|
103+
| [name] | [N] | [pass/warn/fail] | [passed]/[total] |
104+
```
105+
106+
If fully compliant: just the header + framework scores. Nothing else.
107+
108+
## Anti-patterns (token waste)
109+
110+
- Reading every flagged file → waste (group by root cause, read only errors)
111+
- Treating cross-framework duplicates as separate issues → waste (1 code fix = 1 issue)
112+
- Explaining what each regulation requires → waste (CKB already mapped articles)
113+
- Re-checking frameworks CKB scored at 100 → waste
114+
- Auditing frameworks that don't apply to this repo → waste
115+
- Reading low-confidence findings (< 0.7) → waste (likely false positives)
116+
- Suggesting infrastructure controls for code-level findings → out of scope
117+
- Using wrong framework IDs (use pci-dss not pcidss, owasp-asvs not owaspasvs) → CKB error

.claude/commands/review.md

Lines changed: 113 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -1,98 +1,138 @@
1-
Run a comprehensive code review using CKB's deterministic analysis + your semantic review.
1+
Run a CKB-augmented code review optimized for minimal token usage.
22

33
## Input
44
$ARGUMENTS - Optional: base branch (default: main), or "staged" for staged changes, or a PR number
55

6-
## MCP vs CLI
6+
## Philosophy
77

8-
CKB runs as an MCP server in this environment. MCP mode is strongly preferred for interactive review because the SCIP index stays loaded between calls — drill-down tools like `findReferences`, `analyzeImpact`, and `explainSymbol` execute instantly against the in-memory index. CLI mode reloads the index on every invocation.
8+
CKB already answered the structural questions (secrets? breaking? dead code? test gaps?).
9+
The LLM's job is ONLY what CKB can't do: semantic reasoning about correctness, design,
10+
and intent. Every source line you read costs tokens — read only what CKB says is risky.
911

10-
## The Three Phases
12+
### CKB's blind spots (what the LLM must catch)
1113

12-
### Phase 1: CKB structural scan (5 seconds, 0 tokens)
14+
CKB runs 15 deterministic checks with AST rules, SCIP index, and git history.
15+
It is structurally sound but semantically blind:
1316

14-
Call the `reviewPR` MCP tool with compact mode:
15-
```
16-
reviewPR(baseBranch: "main", compact: true)
17-
```
17+
- **Logic errors**: wrong conditions (`>` vs `>=`), off-by-one, incorrect algorithm
18+
- **Business logic**: domain-specific mistakes CKB has no context for
19+
- **Design fitness**: wrong abstraction, leaky interface, coupling that metrics miss
20+
- **Input validation**: missing bounds checks, nil guards outside AST patterns
21+
- **Race conditions**: concurrency issues, mutex ordering, shared state
22+
- **Resource leaks**: file handles, goroutines, connections not closed on all paths
23+
- **Incomplete refactoring**: callers missed across module boundaries
24+
- **Domain edge cases**: error paths, boundary conditions tests don't cover
1825

19-
This returns ~1k tokens instead of ~30k — just the verdict, non-pass checks, top 10 findings, and action items. Use `compact: false` only if you need the full raw data.
26+
CKB's scoring uses per-check caps (max -20) and per-rule caps (max -10), so a score
27+
of 85 can still hide multiple capped warnings. HoldTheLine only flags changed lines,
28+
so pre-existing issues interacting with new code won't surface.
29+
30+
## Phase 1: Structural scan (~1k tokens into context)
2031

21-
If a PR number was given, get the base branch first:
2232
```bash
23-
BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName)
33+
ckb review --base=main --format=json 2>/dev/null
2434
```
25-
Then pass it: `reviewPR(baseBranch: BASE, compact: true)`
26-
27-
> **If CKB is not running as an MCP server** (last resort), use the CLI instead:
28-
> ```bash
29-
> ./ckb review --base=main --format=json
30-
> ```
31-
> Note: CLI mode reloads the SCIP index on every call, so drill-down steps will be slower.
32-
33-
From CKB's output, immediately note:
34-
- **Passed checks** → skip these categories. Don't waste tokens re-checking secrets, breaking changes, test coverage, etc.
35-
- **Warned checks** → your review targets
36-
- **Top hotspot files**read these first
37-
- **Test gaps** → functions to evaluate
38-
39-
### Phase 2: Drill down on CKB findings (0 tokens via MCP)
40-
41-
Before reading source code, use CKB's MCP tools to investigate specific findings. These calls are instant because the SCIP index is already loaded from Phase 1.
4235

43-
| CKB finding | Drill-down tool | What to check |
44-
|---|---|---|
45-
| Dead code | `findReferences(symbolId: "...")` or `searchSymbols` → `findReferences` | Does it actually have references? CKB's SCIP index can miss cross-package refs |
46-
| Blast radius | `analyzeImpact(symbolId: "...")` | Are the "callers" real logic or just framework registrations? |
47-
| Coupling gap | `explainSymbol(name: "...")` on the missing file | What does the co-change partner do? Does it actually need updates? |
48-
| Bug patterns | Already verified by differential analysis | Just check the specific line CKB flagged |
49-
| Complexity | `explainFile(path: "...")` | What functions are driving the increase? |
50-
| Test gaps | `getAffectedTests(baseBranch: "main")` | Which tests exist? Which functions are actually untested? |
51-
| Hotspots | `getHotspots(limit: 10)` | Full churn history for the flagged files |
36+
If a PR number was given:
37+
```bash
38+
BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName)
39+
ckb review --base=$BASE --format=json 2>/dev/null
40+
```
5241

53-
### Phase 3: Semantic review of high-risk files
42+
If "staged" was given:
43+
```bash
44+
ckb review --staged --format=json 2>/dev/null
45+
```
5446

55-
Now read the actual source — but only for:
56-
1. Files CKB ranked as top hotspots
57-
2. Files with warned findings that survived drill-down
58-
3. New files (CKB can't assess design quality of new code)
47+
Parse the JSON output to extract:
48+
- `score`, `verdict` — overall quality
49+
- `checks[]` — status + summary per check (15 checks: breaking, secrets, tests, complexity,
50+
coupling, hotspots, risk, health, dead-code, test-gaps, blast-radius, comment-drift,
51+
format-consistency, bug-patterns, split)
52+
- `findings[]` — severity + file + message + ruleId (top-level, separate from check details)
53+
- `narrative` — CKB AI-generated summary (if available)
54+
- `prTier` — small/medium/large
55+
- `reviewEffort` — estimated hours + complexity
56+
- `reviewers[]` — suggested reviewers with expertise areas
57+
- `healthReport` — degraded/improved file counts
58+
59+
From checks, build three lists:
60+
- **SKIP**: passed checks — don't touch these files or topics
61+
- **INVESTIGATE**: warned/failed checks — these are your review scope
62+
- **READ**: files with warn/fail findings — the only files you'll read
63+
64+
**Early exit**: Skip LLM ONLY when ALL conditions are met:
65+
1. Score ≥ 90 (not 80 — per-check caps hide warnings at 80)
66+
2. Zero warn/fail checks
67+
3. Small change (< 100 lines of diff)
68+
4. No new files (CKB has no SCIP history for them)
69+
70+
If ANY condition fails, proceed to Phase 2 — CKB's structural pass does NOT mean
71+
the code is semantically correct.
72+
73+
## Phase 2: Targeted source reading (the only token-expensive step)
74+
75+
Do NOT read the full diff. Do NOT read every changed file.
76+
77+
**For files CKB flagged (INVESTIGATE list):**
78+
Read only the changed hunks via `git diff main...HEAD -- <file>`.
79+
80+
**For new files** (CKB has no history — these are your biggest blind spot):
81+
- If it's a new package/module: read the entry point and types/interfaces first,
82+
then follow references to understand the architecture before reading individual files
83+
- If < 500 lines: read the file
84+
- If > 500 lines: read the first 100 lines (types/imports) + functions CKB flagged
85+
- Skip generated files, test files for existing tests, and config/CI/docs files
86+
87+
**For each file you read, look for exactly:**
88+
- Logic errors (wrong condition, off-by-one, nil deref, race condition)
89+
- Resource leaks (file handles, connections, goroutines not closed on error paths)
90+
- Security issues (injection, auth bypass, secrets CKB's patterns missed)
91+
- Design problems (wrong abstraction, leaky interface, coupling metrics don't catch)
92+
- Missing edge cases the tests don't cover
93+
- Incomplete refactoring (callers that should have changed but didn't)
94+
95+
Do NOT look for: style, naming, formatting, documentation, test coverage —
96+
CKB already checked these structurally.
97+
98+
## Phase 3: Write the review (be terse)
5999

60-
For each file, look for things CKB CANNOT detect:
61-
- Logic bugs (wrong conditions, off-by-one, race conditions)
62-
- Security issues (injection, auth bypass, data exposure)
63-
- Design problems (wrong abstraction, unclear naming, leaky interfaces)
64-
- Edge cases (nil inputs, empty collections, concurrent access)
65-
- Error handling quality (not just missing — wrong strategy)
100+
```markdown
101+
## [APPROVE|REQUEST CHANGES|DISCUSS] — CKB score: [N]/100
66102

67-
### Phase 4: Write the review
103+
[One sentence: what the PR does]
68104

69-
Format:
105+
[If CKB provided narrative, include it here]
70106

71-
```markdown
72-
## Summary
73-
One paragraph: what the PR does, overall assessment.
107+
**PR tier:** [small/medium/large] | **Review effort:** [N]h ([complexity])
108+
**Health:** [N] degraded, [N] improved
74109

75-
## Must Fix
76-
Findings that should block merge. File:line references.
110+
### Issues
111+
1. **[must-fix|should-fix]** `file:line`[issue in one sentence]
112+
2. ...
77113

78-
## Should Fix
79-
Issues worth addressing but not blocking.
114+
### CKB passed (no review needed)
115+
[comma-separated list of passed checks]
80116

81-
## CKB Analysis
82-
- Verdict: [pass/warn/fail], Score: [0-100]
83-
- [N] checks passed, [N] warned
84-
- Key findings: [top 3]
85-
- False positives identified: [any CKB findings you disproved]
86-
- Test gaps: [N] untested functions — [your assessment of which matter]
117+
### CKB flagged (verified above)
118+
[for each warn/fail finding: confirmed/false-positive + one-line reason]
87119

88-
## Recommendation
89-
Approve / Request changes / Needs discussion
120+
### Suggested reviewers
121+
[reviewer — expertise area]
90122
```
91123

92-
## Tips
93-
94-
- If CKB says "secrets: pass" — trust it, don't re-scan 100+ files
95-
- If CKB says "breaking: pass" — trust it, SCIP-verified API comparison
96-
- If CKB says "dead-code: FormatSARIF" — DON'T trust blindly, verify with `findReferences` or grep
97-
- CKB's hotspot scores are based on git churn history — higher score = more volatile file = review more carefully
98-
- CKB's complexity delta shows WHERE cognitive load increased — read those functions
124+
If no issues found: just the header line + CKB passed list. Nothing else.
125+
126+
## Anti-patterns (token waste)
127+
128+
- Reading files CKB marked as pass → waste
129+
- Reading generated files → waste
130+
- Summarizing what the PR does in detail → waste (git log exists, CKB has narrative)
131+
- Explaining why passed checks passed → waste
132+
- Running MCP drill-down tools when CLI already gave enough signal → waste
133+
- Reading test files to "verify test quality" → waste unless CKB flagged test-gaps
134+
- Reading hotspot-only files with no findings → high churn ≠ needs review right now
135+
- Trusting score >= 80 as "safe to skip" → dangerous (per-check caps hide warnings)
136+
- Skipping new files because CKB didn't flag them → CKB has no SCIP data for new files
137+
- Reading every new file in a large new package → read entry point + types first, then follow refs
138+
- Ignoring reviewEffort/prTier → these tell you how thorough to be

0 commit comments

Comments
 (0)