davila7 · hoiung · May 6, 2026
diff --git a/cli-tool/components/skills/development/ralph-review-trio/SKILL.md b/cli-tool/components/skills/development/ralph-review-trio/SKILL.md
@@ -0,0 +1,80 @@
+---
+name: ralph-review-trio
+description: Run a sequential three-tier code review on a finished implementation branch — Haiku (surface) → Sonnet (logic) → Opus (deep). Restarts from Tier 1 on any tier failure. Use when a solo branch or PR is code-complete and you want structured pre-merge verification before human review.
+---
+
+# Ralph Review Trio
+
+This skill triggers `/ralph-review`, which runs three sequential reviewer subagents at increasing depth. If any tier flags a failure, the loop restarts from Tier 1 after fixes.
+
+## Install
+
+The full plugin (this skill + the `/ralph-review` command + 3 reviewer agents at Haiku / Sonnet / Opus tiers) installs from the upstream marketplace:
+
+```
+/plugin marketplace add hoiung/sst3-skills
+/plugin install ralph-review-trio@sst3-skills
+/reload-plugins
+```
+
+Source: https://github.com/hoiung/sst3-skills (MIT). Writeup: https://hoiboy.uk/posts/shipping-ralph-review-trio/
+
+This skill in `claude-code-templates` is the discovery entry — it documents the trigger conditions, expected output schema, and per-tier semantics. The actual command + agents ship via the marketplace install above.
+
+## When to trigger
+
+- An implementation is code-complete on a feature / solo branch.
+- All acceptance criteria for the underlying issue are believed satisfied.
+- Pre-merge verification is needed before human review or merge-to-main.
+- You want structured evidence that each tier's checklist was walked.
+
+## When NOT to trigger
+
+- Work-in-progress branches mid-implementation — Ralph assumes the change is complete.
+- Documentation-only diffs with no code — Tier 2/3 still run but most checks short-circuit to "doc-only PR" exemption; overkill for a single README edit.
+- Hotfix branches where speed dominates verification — use the project's normal PR review.
+
+## How it works
+
+```
+/ralph-review
+    │
+    ▼
+  Tier 1  — Haiku  (surface checks)   ─── fail ──> restart
+    │  pass
+    ▼
+  Tier 2  — Sonnet (logic checks)     ─── fail ──> restart
+    │  pass
+    ▼
+  Tier 3  — Opus   (deep analysis)    ─── fail ──> restart
+    │  pass
+    ▼
+  RALPH_PASS  →  merge OK
+```
+
+A `<promise>HAIKU_PASS</promise>` / `<promise>SONNET_PASS</promise>` / `<promise>OPUS_PASS</promise>` token is emitted by each tier on pass. All three required for overall pass.
+
+## Entry point
+
+`/ralph-review` — installed by the upstream plugin (see Install above).
+
+## Per-tier checklists
+
+The full per-tier checklists ship in the upstream plugin's `agents/` directory: `haiku-reviewer.md` (Tier 1 surface), `sonnet-reviewer.md` (Tier 2 logic), `opus-reviewer.md` (Tier 3 deep-analysis).
+
+Extended reference content under `references/` (same dir as this SKILL.md) is loaded on demand by each tier when a specific check requires more context.
+
+## Outputs
+
+Each tier writes a fenced `## RESULT` block with:
+
+```
+## RESULT
+mcp_graph_available: yes|no      # first line when discussing graph queries
+verdict: pass|fail|unknown
+files_touched: [paths]
+findings: [{path, line, claim, evidence}]
+scope_gaps: [list or "none"]
+```
+
+The main agent reads the RESULT block and decides next action (restart, next tier, or PASS).
diff --git a/...ol/components/skills/development/ralph-review-trio/assets/example-transcript.md b/...ol/components/skills/development/ralph-review-trio/assets/example-transcript.md
@@ -0,0 +1,154 @@
+# Example Ralph Review transcript
+
+A scrubbed example of a Ralph Review pass over a small Python feature — adding a retry wrapper around an HTTP client. Branch: `solo/issue-42-http-retry`. Issue: "Add configurable retries to the OrderClient HTTP calls".
+
+The three tiers ran in sequence with the controller `/ralph-review` command. Output below is condensed — real transcripts are longer.
+
+---
+
+## User invocation
+
+```
+> /ralph-review
+```
+
+## Controller — Tier 1 dispatch
+
+Main agent dispatches `haiku-reviewer` against the current branch diff.
+
+## Tier 1 — Haiku RESULT
+
+```
+## RESULT
+mcp_graph_available: yes
+verdict: fail
+files_touched:
+  - src/clients/order_client.py
+  - src/clients/retry.py
+  - tests/test_retry.py
+findings:
+  - path: src/clients/retry.py
+    line: 34
+    claim: debug print statement left in
+    evidence: "print(f'retrying {attempt} of {max_retries}')"
+  - path: src/clients/order_client.py
+    line: 78
+    claim: magic number in code (max_retries=3)
+    evidence: "self._retry_wrapper(max_retries=3) — should be in config"
+scope_gaps: none
+```
+
+Main agent applies fixes:
+- Remove the `print()` (commit `a3f12e9`)
+- Move `max_retries=3` to `config/clients.yaml` under `order_client.max_retries`, read via `config.get('order_client.max_retries')` (commit `b84c5d1`)
+- Run tests, all pass.
+
+**Loop restarts from Tier 1** (restart-on-fail, not continue-with-flag).
+
+## Tier 1 — Haiku RESULT (2nd iteration)
+
+```
+## RESULT
+mcp_graph_available: yes
+verdict: pass
+files_touched:
+  - src/clients/order_client.py
+  - src/clients/retry.py
+  - tests/test_retry.py
+  - config/clients.yaml
+findings: []
+scope_gaps: none
+```
+
+`<promise>HAIKU_PASS</promise>`
+
+## Tier 2 — Sonnet RESULT
+
+```
+## RESULT
+mcp_graph_available: yes
+verdict: fail
+files_touched:
+  - src/clients/order_client.py
+  - src/clients/retry.py
+findings:
+  - path: src/clients/retry.py
+    line: 22
+    claim: exception swallowed without logging
+    evidence: "except requests.RequestException: pass  # silent fallback — no log, no metric"
+  - path: tests/test_retry.py
+    line: 51
+    claim: mock accepts **kwargs — does not prove propagation
+    evidence: "mock_call.assert_called()  # should be assert_called_with(max_retries=3)"
+scope_gaps: none
+```
+
+Main agent applies fixes:
+- Replace the silent `except` with a structured log + a `retries_exhausted` counter metric (commit `c19f7a2`)
+- Update test to `assert mock_call.call_args.kwargs['max_retries'] == 3` (commit `d55b908`)
+- Run full test suite.
+
+**Loop restarts from Tier 1**.
+
+## Tier 1 — Haiku RESULT (3rd iteration)
+
+```
+verdict: pass
+findings: []
+```
+
+`<promise>HAIKU_PASS</promise>`
+
+## Tier 2 — Sonnet RESULT (2nd iteration)
+
+```
+verdict: pass
+findings: []
+```
+
+`<promise>SONNET_PASS</promise>`
+
+## Tier 3 — Opus RESULT
+
+```
+## RESULT
+mcp_graph_available: yes
+verdict: pass
+files_touched:
+  - src/clients/order_client.py
+  - src/clients/retry.py
+  - tests/test_retry.py
+  - config/clients.yaml
+findings: []
+scope_gaps: none
+governance_drift: none_detected
+  - Tier A box "Add retry wrapper" closed at commit b84c5d1 — interleaves with phase 1 commits
+  - Tier A box "Move max_retries to config" closed at commit b84c5d1 — same phase window
+  - Tier B box "Fix Everything applied" closed at end — acceptable
+spot_check:
+  - graph_result: callers_of(_retry_wrapper) returned 1 caller
+  - source_read: src/clients/order_client.py:78 — confirmed single call site
+factual_claims: none added in this diff
+```
+
+`<promise>OPUS_PASS</promise>`
+
+## Controller — final output
+
+```
+<promise>RALPH_PASS</promise>
+
+Three tiers passed. Two restart cycles (Tier 1 × 2 iterations, Tier 2 × 1 iteration).
+Total findings fixed: 4.
+Branch ready for merge.
+```
+
+---
+
+## What this transcript demonstrates
+
+- **Restart-on-fail**: Tier 2 found issues after Tier 1 already passed. Loop went back to Tier 1, not to "Tier 3 with a flag".
+- **Cross-boundary contract**: the `**kwargs`-swallowing mock was caught at Tier 2 — surface tier doesn't trace call-args assertions.
+- **Governance cadence**: Opus tier classified the three checkbox closures correctly (two Tier A interleaved, one Tier B batched).
+- **Graph spot-check**: Opus used graph to list callers, then read the source to confirm — never trusted the graph alone.
+- **Silent fallback caught late**: the swallowed exception was invisible at surface; Tier 2 traced the exception path to find it.
diff --git a/.../components/skills/development/ralph-review-trio/references/_common-culprits.md b/.../components/skills/development/ralph-review-trio/references/_common-culprits.md
@@ -0,0 +1,25 @@
+# Ralph Review — Shared 5-Culprits Reference
+
+> **Architectural design**: Each Ralph tier scans for the same five patterns at *increasing depth*. The depth-layering is intentional. This file holds the shared framing so the category numbers and names stay in sync; each tier file keeps its own depth-appropriate bullets.
+
+## The 5 Categories
+
+Each tier checks these patterns. Bullets in the per-tier files differ by depth, not by category.
+
+| # | Category | Surface (Haiku) | Logic (Sonnet) | Architectural (Opus) |
+|---|---|---|---|---|
+| 1 | Duplicate Code (DRY / Modularity) | Visible copy-paste | Same logic in N files | Same pattern implemented differently |
+| 2 | On-the-fly Calculations (Hardcoded Settings) | Magic numbers in calculations | Inline business formulas | Calculation constants varying by env |
+| 3 | Hardcoded Settings | Embedded URLs / paths / credentials | Multipliers, percentages, timeouts | User-configurable values in code |
+| 4 | Obsolete / Dead Code (LMCE) | Commented-out blocks, dead TODOs | Never-called functions, unused imports | Modules never instantiated, dead endpoints |
+| 5 | Silent Fallbacks (Fail Fast) | `catch{}` swallow, `\|\| default` | `.get(k, {})` chains, `try / except: pass` | Cascading defaults masking root cause |
+
+## Tier Files
+
+- `haiku-checklist.md` — surface checks
+- `sonnet-checklist.md` — logic-trace checks
+- `opus-checklist.md` — architectural checks
+
+## Rule
+
+Tier-specific bullet content lives in the tier files. The category numbers / names live here — change them once, propagate by re-reading.
diff --git a/...l/components/skills/development/ralph-review-trio/references/haiku-checklist.md b/...l/components/skills/development/ralph-review-trio/references/haiku-checklist.md
@@ -0,0 +1,47 @@
+# Haiku Tier — Surface Checklist (extended reference)
+
+This is the extended reference the Tier 1 (Haiku) agent loads when a surface check needs more context than the inline agent definition provides. For the canonical flow + RESULT block schema, see `../../../agents/haiku-reviewer.md`.
+
+## Scope
+
+Surface review sits at the first line of defence. It catches:
+
+- Missing / mis-named files
+- Narrative-only "evidence"
+- Debug code left in the diff
+- Obvious copy-paste, obvious magic numbers, obvious silent-fallback patterns
+- Cross-boundary issues that are visible at the diff level (wrong column name in a WHERE, `.toFixed()` on a nullable)
+
+It does NOT trace call graphs, does NOT cross-verify config bidirectionally, does NOT audit architectural fit. Those belong in later tiers.
+
+## What counts as "evidence"
+
+For a `[x]` box to pass at the surface tier, the evidence must be at least one of:
+
+- **file:line** — a concrete pointer a reviewer can open and read
+- **commit hash** — a hash visible in `git log` of the branch
+- **command + output** — a reproducible command with captured output (tee log path is fine)
+- **subagent RESULT comment-id** — a comment number referencing a fenced `## RESULT` block
+
+Narrative-only evidence ("implemented the feature") is a fail.
+
+## Common false positives
+
+- **Intentional duplication for defence-in-depth**: a value validated at the API layer AND at the DB layer is not a DRY violation — it's a designed belt-and-braces. Flag for the subagent to confirm intent before marking as a finding.
+- **Argparse defaults for optional tunables**: `parser.add_argument("--retries", default=3)` is not a "silent fallback" — the argument is optional by contract. Only flag `default=` patterns that hide REQUIRED config.
+- **Sentinel values that are the contract**: `return None` from a "not found" function is not a silent failure — it's the advertised return contract. Flag only when the caller can't distinguish the sentinel from a valid value.
+
+## Bash output discipline
+
+- Wrap any command expected to produce > 200 lines in a tee-style capture. Report only the path + verdict in RESULT.
+- Do NOT paste `pytest -v` output, `git diff` of 50+ files, or unfiltered log tails back to the main agent.
+
+## Graph fallback
+
+When the code-review-graph MCP is unavailable or the project uses unsupported languages (Markdown, YAML, JSON, SQL, TOML, shell), do NOT silently skip the graph-backed checks. Instead:
+
+1. Retry once after a short delay.
+2. If still unavailable, fall back to an Explore-style subagent that performs the equivalent manual audit (callers, callees, large functions).
+3. Include the subagent's RESULT block in your own RESULT as `[graph unavailable: <reason>] [subagent fallback: <id>]`.
+
+A bare `[graph unavailable]` with no subagent evidence = silent skip = FAIL.
diff --git a/...ol/components/skills/development/ralph-review-trio/references/opus-checklist.md b/...ol/components/skills/development/ralph-review-trio/references/opus-checklist.md
@@ -0,0 +1,70 @@
+# Opus Tier — Deep Analysis Checklist (extended reference)
+
+This is the extended reference the Tier 3 (Opus) agent loads when a deep check needs more context than the inline agent definition. For the canonical flow + RESULT block schema, see `../../../agents/opus-reviewer.md`.
+
+## Scope
+
+Opus is the architectural pass. It catches:
+
+- Architectural misfit (new module violates the layering pattern in use)
+- Governance drift (Tier A checkbox batching against commit cadence)
+- Null propagation across the full codebase (not just the diff)
+- Factual claims without provenance in documentation
+- Overengineering (premature abstractions, unnecessary complexity)
+- Dead code at the module / endpoint / migration level
+
+## Governance drift — two-tier cadence rule
+
+When auditing checkbox close-out cadence, classify every `[x]` box into one of two tiers BEFORE judging:
+
+**Tier A — Phase-deliverable checkboxes**: items in Acceptance Criteria Phases 1..N that describe a concrete deliverable (a file edit, commit, function, section, table, example). STRICT interleaving required — Proof of Work entry order MUST interleave with git-log commit order within the same phase's commit window.
+
+**Tier B — Cross-cutting meta-checkboxes**: Triple-Check Gate items, Engineering Requirements meta-items like "Fix Everything", Cleanup Requirements, Verification Loop self-gates, PREREQUISITE CHECKPOINT, Expected Behavior post-condition items. Batched-at-end is acceptable because these describe conditions observable ONLY after all phases complete. Closing them mid-phase would be dishonest.
+
+**Classification heuristic**: if the checkbox text names a specific file / commit / section / function / table to build or change, it is **Tier A**. If it describes a cross-cutting condition requiring the WHOLE implementation to evaluate, it is **Tier B**. When in doubt, inspect the checkbox's section in the issue template: Acceptance Criteria sub-phases = Tier A; Triple-Check Gate / Engineering Requirements meta / Cleanup Requirements / Verification Loop self-gates / PREREQUISITE CHECKPOINT / Expected Behavior = Tier B.
+
+**Rule**:
+
+- Flag Tier A batching as governance drift.
+- Do NOT flag Tier B batching as drift.
+
+A Tier-B-only batch is acceptable and expected.
+
+## Governance evidence signal (canonical)
+
+The canonical audit signal for verifying that checkbox-close invocations actually happened is the `## Proof of Work` section in the issue body — NOT the GitHub timeline `edited` events.
+
+Why: GitHub's timeline API does not emit `edited` events for an issue author's own body edits on their own issue. Since solo-workflow agents ARE the issue author in almost all cases, timeline-based audits false-negative every honored invocation. The body content itself, however, is always externally readable.
+
+Verification procedure:
+
+1. Fetch the issue body via the project's governance MCP tool, or a plain GitHub API call.
+2. Parse the `## Proof of Work` section. Each entry starts with `- **<checkbox text>**: <evidence>`.
+3. For every `[x]` box in the body, there MUST be a matching entry in Proof of Work. Missing entry = drift.
+4. For each entry, spot-check the cited evidence:
+   - file:line claims → open the file on the branch
+   - commit hashes → `git show <hash>`
+   - subagent RESULT blocks → fetch the cited comment
+   - command output → reproducible via the same command
+
+Ordering of Proof of Work entries is authoritative — the section appends in invocation order. Cross-reference the entry order against `git log --oneline` to confirm Tier A items closed within the same phase's commit window.
+
+## Factual claims audit
+
+Every numeric assertion added by the implementation (counts, ratios, durations, capacities, percentages) must have a verifiable source:
+
+- **Reproducible command** (e.g. `git log --oneline | wc -l` → "10,385 commits")
+- **API query** (e.g. `gh issue list --state all` → "1,309 issues")
+- **Code reference** (e.g. `grep -c "def test_" tests/` → "N test functions")
+- **Calculation** (e.g. `(total - open) / total = 99.4% close rate`)
+
+"Seems reasonable" is not a source. If the number cannot be reproduced, it must be sourced or removed before OPUS_PASS.
+
+## Bash output discipline
+
+- Any command producing > 200 lines → capture to file, report path + verdict in RESULT.
+- Do NOT paste full pytest output, unfiltered git diffs, or log tails back to the main agent.
+
+## When graph is unavailable
+
+Opus tier has the deepest graph requirements — dead-code detection via `large_functions` + orphan scan, impact scope validation via `impact(max_depth=2)`, large-function audit. When graph is unavailable or unsupported-language, an Explore-style subagent MUST perform the equivalent manual architectural audit and the RESULT must include the subagent's RESULT block. Documenting only `[graph unavailable]` = silent skip = FAIL.