SimplyLiz
diff --git a/‎.claude/commands/audit.md‎
Lines changed: 117 additions & 0 deletions b/‎.claude/commands/audit.md‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎.claude/commands/review.md‎
Lines changed: 113 additions & 73 deletions b/‎.claude/commands/review.md‎
Lines changed: 113 additions & 73 deletions
@@ -0,0 +1,117 @@
+Run a CKB-augmented compliance audit optimized for minimal token usage.
+
+## Input
+$ARGUMENTS - Optional: framework(s) to audit (default: auto-detect from repo context). Examples: "gdpr", "gdpr,pci-dss,hipaa", "all"
+
+## Philosophy
+
+CKB already ran deterministic checks across 20 regulatory frameworks, mapped every finding
+to a specific regulation article, and assigned confidence scores. The LLM's job is ONLY what
+CKB can't do: assess whether findings are real compliance risks or false positives given the
+repo's actual purpose, and prioritize remediation by business impact.
+
+### Available frameworks (20 total)
+
+**Privacy:** gdpr, ccpa, iso27701
+**AI:** eu-ai-act
+**Security:** iso27001, nist-800-53, owasp-asvs, soc2, hipaa
+**Industry:** pci-dss, dora, nis2, fda-21cfr11, eu-cra
+**Supply chain:** sbom-slsa
+**Safety:** iec61508, iso26262, do-178c
+**Coding:** misra, iec62443
+
+### CKB's blind spots (what the LLM must catch)
+
+CKB maps code patterns to regulation articles using AST + regex + tree-sitter. It is
+structurally correct but contextually blind:
+
+- **Business context**: CKB flags PII patterns in a healthcare app and a game engine equally
+- **Architecture awareness**: a finding in dead/test code vs production code has different weight
+- **Compensating controls**: CKB can't see infrastructure-level encryption, WAFs, or IAM policies
+- **Regulatory applicability**: CKB flags HIPAA in a repo that doesn't handle PHI
+- **Risk prioritization**: 50 findings need ordering by actual business/legal exposure
+- **Cross-reference noise**: the same hardcoded credential maps to 6 frameworks — that's 1 fix, not 6
+
+## Phase 1: Structural scan (~2k tokens into context)
+
+```bash
+ckb audit compliance --framework=$ARGUMENTS --format=json --min-confidence=0.7 2>/dev/null
+```
+
+For large repos, scope to a specific path to reduce noise:
+```bash
+ckb audit compliance --framework=$ARGUMENTS --scope=src/api --format=json --min-confidence=0.7 2>/dev/null
+```
+
+If no framework specified, pick based on repo context:
+- Has health/patient/medical code → `hipaa,gdpr`
+- Has payment/billing/card code → `pci-dss,soc2`
+- EU company or processes EU data → `gdpr,dora,nis2`
+- AI/ML code → `eu-ai-act`
+- Safety-critical/embedded → `iec61508,iso26262,misra`
+- General SaaS → `iso27001,soc2,owasp-asvs`
+- If unsure → `iso27001,owasp-asvs` (broadest applicability)
+
+From the JSON output, extract:
+- `score`, `verdict` (pass/warn/fail)
+- `coverage[]` — per-framework scores with passed/warned/failed/skipped check counts
+- `findings[]` — with check, severity, file, startLine, message, suggestion, confidence, CWE
+- `checks[]` — per-check status and summary
+- `summary` — total findings by severity, files scanned
+
+Note:
+- **Per-framework scores**: which frameworks are clean vs problematic
+- **Finding count by severity**: errors are your priority
+- **CWE references**: cross-reference with known vulnerability databases
+- **Confidence scores**: low confidence (< 0.7) findings are likely false positives
+
+**Early exit**: If verdict=pass and all framework scores ≥ 90, write a one-line summary and stop.
+
+## Phase 2: Triage findings (targeted reads only)
+
+Do NOT read every flagged file. Group findings by root cause first:
+
+1. **Deduplicate cross-framework findings** — a hardcoded secret flagged by GDPR, PCI DSS, HIPAA, and ISO 27001 is one fix
+2. **Check for dominant category** — if > 50% of findings are one category (e.g., "sql-injection"), investigate that category systemically (is the pattern matching too broad?) rather than checking each file individually
+3. **Check applicability** — does this repo actually fall under the flagged framework? (e.g., HIPAA findings in a non-healthcare repo)
+4. **Read only error-severity files** — warnings and info can wait
+5. **For each error finding**, read just the flagged lines (not the whole file) and assess:
+   - Is this a real compliance risk or a pattern false positive?
+   - Are there compensating controls elsewhere? (check imports, config, middleware)
+   - What's the remediation effort: one-liner fix vs architectural change?
+
+## Phase 3: Write the audit summary (be terse)
+
+```markdown
+## [COMPLIANT|NEEDS REMEDIATION|NON-COMPLIANT] — CKB score: [N]/100
+
+[One sentence: what frameworks were audited and overall posture]
+
+### Critical findings (must remediate)
+1. **[framework]** `file:line` Art. [X] — [issue + remediation in one sentence]
+2. ...
+
+### Not applicable (false positives from context)
+[List findings CKB flagged but that don't apply to this repo, with one-line reason]
+
+### Cross-framework deduplication
+[N findings deduplicated to M root causes]
+
+### Framework scores
+| Framework | Score | Status | Checks |
+|-----------|-------|--------|--------|
+| [name]    | [N]   | [pass/warn/fail] | [passed]/[total] |
+```
+
+If fully compliant: just the header + framework scores. Nothing else.
+
+## Anti-patterns (token waste)
+
+- Reading every flagged file → waste (group by root cause, read only errors)
+- Treating cross-framework duplicates as separate issues → waste (1 code fix = 1 issue)
+- Explaining what each regulation requires → waste (CKB already mapped articles)
+- Re-checking frameworks CKB scored at 100 → waste
+- Auditing frameworks that don't apply to this repo → waste
+- Reading low-confidence findings (< 0.7) → waste (likely false positives)
+- Suggesting infrastructure controls for code-level findings → out of scope
+- Using wrong framework IDs (use pci-dss not pcidss, owasp-asvs not owaspasvs) → CKB error
@@ -1,98 +1,138 @@
-Run a comprehensive code review using CKB's deterministic analysis + your semantic review.
+Run a CKB-augmented code review optimized for minimal token usage.
 
 ## Input
 $ARGUMENTS - Optional: base branch (default: main), or "staged" for staged changes, or a PR number
 
-## MCP vs CLI
+## Philosophy
 
-CKB runs as an MCP server in this environment. MCP mode is strongly preferred for interactive review because the SCIP index stays loaded between calls — drill-down tools like `findReferences`, `analyzeImpact`, and `explainSymbol` execute instantly against the in-memory index. CLI mode reloads the index on every invocation.
+CKB already answered the structural questions (secrets? breaking? dead code? test gaps?).
+The LLM's job is ONLY what CKB can't do: semantic reasoning about correctness, design,
+and intent. Every source line you read costs tokens — read only what CKB says is risky.
 
-## The Three Phases
+### CKB's blind spots (what the LLM must catch)
 
-### Phase 1: CKB structural scan (5 seconds, 0 tokens)
+CKB runs 15 deterministic checks with AST rules, SCIP index, and git history.
+It is structurally sound but semantically blind:
 
-Call the `reviewPR` MCP tool with compact mode:
-```
-reviewPR(baseBranch: "main", compact: true)
-```
+- **Logic errors**: wrong conditions (`>` vs `>=`), off-by-one, incorrect algorithm
+- **Business logic**: domain-specific mistakes CKB has no context for
+- **Design fitness**: wrong abstraction, leaky interface, coupling that metrics miss
+- **Input validation**: missing bounds checks, nil guards outside AST patterns
+- **Race conditions**: concurrency issues, mutex ordering, shared state
+- **Resource leaks**: file handles, goroutines, connections not closed on all paths
+- **Incomplete refactoring**: callers missed across module boundaries
+- **Domain edge cases**: error paths, boundary conditions tests don't cover
 
-This returns ~1k tokens instead of ~30k — just the verdict, non-pass checks, top 10 findings, and action items. Use `compact: false` only if you need the full raw data.
+CKB's scoring uses per-check caps (max -20) and per-rule caps (max -10), so a score
+of 85 can still hide multiple capped warnings. HoldTheLine only flags changed lines,
+so pre-existing issues interacting with new code won't surface.
+
+## Phase 1: Structural scan (~1k tokens into context)
 
-If a PR number was given, get the base branch first:
 ```bash
-BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName)
+ckb review --base=main --format=json 2>/dev/null
 ```
-Then pass it: `reviewPR(baseBranch: BASE, compact: true)`
-
-> **If CKB is not running as an MCP server** (last resort), use the CLI instead:
-> ```bash
-> ./ckb review --base=main --format=json
-> ```
-> Note: CLI mode reloads the SCIP index on every call, so drill-down steps will be slower.
-
-From CKB's output, immediately note:
-- **Passed checks** → skip these categories. Don't waste tokens re-checking secrets, breaking changes, test coverage, etc.
-- **Warned checks** → your review targets
-- **Top hotspot files** → read these first
-- **Test gaps** → functions to evaluate
-
-### Phase 2: Drill down on CKB findings (0 tokens via MCP)
-
-Before reading source code, use CKB's MCP tools to investigate specific findings. These calls are instant because the SCIP index is already loaded from Phase 1.
 
-| CKB finding | Drill-down tool | What to check |
-|---|---|---|
-| Dead code | `findReferences(symbolId: "...")` or `searchSymbols` → `findReferences` | Does it actually have references? CKB's SCIP index can miss cross-package refs |
-| Blast radius | `analyzeImpact(symbolId: "...")` | Are the "callers" real logic or just framework registrations? |
-| Coupling gap | `explainSymbol(name: "...")` on the missing file | What does the co-change partner do? Does it actually need updates? |
-| Bug patterns | Already verified by differential analysis | Just check the specific line CKB flagged |
-| Complexity | `explainFile(path: "...")` | What functions are driving the increase? |
-| Test gaps | `getAffectedTests(baseBranch: "main")` | Which tests exist? Which functions are actually untested? |
-| Hotspots | `getHotspots(limit: 10)` | Full churn history for the flagged files |
+If a PR number was given:
+```bash
+BASE=$(gh pr view $ARGUMENTS --json baseRefName -q .baseRefName)
+ckb review --base=$BASE --format=json 2>/dev/null
+```
 
-### Phase 3: Semantic review of high-risk files
+If "staged" was given:
+```bash
+ckb review --staged --format=json 2>/dev/null
+```
 
-Now read the actual source — but only for:
-1. Files CKB ranked as top hotspots
-2. Files with warned findings that survived drill-down
-3. New files (CKB can't assess design quality of new code)
+Parse the JSON output to extract:
+- `score`, `verdict` — overall quality
+- `checks[]` — status + summary per check (15 checks: breaking, secrets, tests, complexity,
+  coupling, hotspots, risk, health, dead-code, test-gaps, blast-radius, comment-drift,
+  format-consistency, bug-patterns, split)
+- `findings[]` — severity + file + message + ruleId (top-level, separate from check details)
+- `narrative` — CKB AI-generated summary (if available)
+- `prTier` — small/medium/large
+- `reviewEffort` — estimated hours + complexity
+- `reviewers[]` — suggested reviewers with expertise areas
+- `healthReport` — degraded/improved file counts
+
+From checks, build three lists:
+- **SKIP**: passed checks — don't touch these files or topics
+- **INVESTIGATE**: warned/failed checks — these are your review scope
+- **READ**: files with warn/fail findings — the only files you'll read
+
+**Early exit**: Skip LLM ONLY when ALL conditions are met:
+1. Score ≥ 90 (not 80 — per-check caps hide warnings at 80)
+2. Zero warn/fail checks
+3. Small change (< 100 lines of diff)
+4. No new files (CKB has no SCIP history for them)
+
+If ANY condition fails, proceed to Phase 2 — CKB's structural pass does NOT mean
+the code is semantically correct.
+
+## Phase 2: Targeted source reading (the only token-expensive step)
+
+Do NOT read the full diff. Do NOT read every changed file.
+
+**For files CKB flagged (INVESTIGATE list):**
+Read only the changed hunks via `git diff main...HEAD -- <file>`.
+
+**For new files** (CKB has no history — these are your biggest blind spot):
+- If it's a new package/module: read the entry point and types/interfaces first,
+  then follow references to understand the architecture before reading individual files
+- If < 500 lines: read the file
+- If > 500 lines: read the first 100 lines (types/imports) + functions CKB flagged
+- Skip generated files, test files for existing tests, and config/CI/docs files
+
+**For each file you read, look for exactly:**
+- Logic errors (wrong condition, off-by-one, nil deref, race condition)
+- Resource leaks (file handles, connections, goroutines not closed on error paths)
+- Security issues (injection, auth bypass, secrets CKB's patterns missed)
+- Design problems (wrong abstraction, leaky interface, coupling metrics don't catch)
+- Missing edge cases the tests don't cover
+- Incomplete refactoring (callers that should have changed but didn't)
+
+Do NOT look for: style, naming, formatting, documentation, test coverage —
+CKB already checked these structurally.
+
+## Phase 3: Write the review (be terse)
 
-For each file, look for things CKB CANNOT detect:
-- Logic bugs (wrong conditions, off-by-one, race conditions)
-- Security issues (injection, auth bypass, data exposure)
-- Design problems (wrong abstraction, unclear naming, leaky interfaces)
-- Edge cases (nil inputs, empty collections, concurrent access)
-- Error handling quality (not just missing — wrong strategy)
+```markdown
+## [APPROVE|REQUEST CHANGES|DISCUSS] — CKB score: [N]/100
 
-### Phase 4: Write the review
+[One sentence: what the PR does]
 
-Format:
+[If CKB provided narrative, include it here]
 
-```markdown
-## Summary
-One paragraph: what the PR does, overall assessment.
+**PR tier:** [small/medium/large] | **Review effort:** [N]h ([complexity])
+**Health:** [N] degraded, [N] improved
 
-## Must Fix
-Findings that should block merge. File:line references.
+### Issues
+1. **[must-fix|should-fix]** `file:line` — [issue in one sentence]
+2. ...
 
-## Should Fix
-Issues worth addressing but not blocking.
+### CKB passed (no review needed)
+[comma-separated list of passed checks]
 
-## CKB Analysis
-- Verdict: [pass/warn/fail], Score: [0-100]
-- [N] checks passed, [N] warned
-- Key findings: [top 3]
-- False positives identified: [any CKB findings you disproved]
-- Test gaps: [N] untested functions — [your assessment of which matter]
+### CKB flagged (verified above)
+[for each warn/fail finding: confirmed/false-positive + one-line reason]
 
-## Recommendation
-Approve / Request changes / Needs discussion
+### Suggested reviewers
+[reviewer — expertise area]
 ```
 
-## Tips
-
-- If CKB says "secrets: pass" — trust it, don't re-scan 100+ files
-- If CKB says "breaking: pass" — trust it, SCIP-verified API comparison
-- If CKB says "dead-code: FormatSARIF" — DON'T trust blindly, verify with `findReferences` or grep
-- CKB's hotspot scores are based on git churn history — higher score = more volatile file = review more carefully
-- CKB's complexity delta shows WHERE cognitive load increased — read those functions
+If no issues found: just the header line + CKB passed list. Nothing else.
+
+## Anti-patterns (token waste)
+
+- Reading files CKB marked as pass → waste
+- Reading generated files → waste
+- Summarizing what the PR does in detail → waste (git log exists, CKB has narrative)
+- Explaining why passed checks passed → waste
+- Running MCP drill-down tools when CLI already gave enough signal → waste
+- Reading test files to "verify test quality" → waste unless CKB flagged test-gaps
+- Reading hotspot-only files with no findings → high churn ≠ needs review right now
+- Trusting score >= 80 as "safe to skip" → dangerous (per-check caps hide warnings)
+- Skipping new files because CKB didn't flag them → CKB has no SCIP data for new files
+- Reading every new file in a large new package → read entry point + types first, then follow refs
+- Ignoring reviewEffort/prTier → these tell you how thorough to be