Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
80 changes: 80 additions & 0 deletions cli-tool/components/skills/development/ralph-review-trio/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
---
name: ralph-review-trio
description: Run a sequential three-tier code review on a finished implementation branch — Haiku (surface) → Sonnet (logic) → Opus (deep). Restarts from Tier 1 on any tier failure. Use when a solo branch or PR is code-complete and you want structured pre-merge verification before human review.
---

# Ralph Review Trio

This skill triggers `/ralph-review`, which runs three sequential reviewer subagents at increasing depth. If any tier flags a failure, the loop restarts from Tier 1 after fixes.

## Install

The full plugin (this skill + the `/ralph-review` command + 3 reviewer agents at Haiku / Sonnet / Opus tiers) installs from the upstream marketplace:

```
/plugin marketplace add hoiung/sst3-skills
/plugin install ralph-review-trio@sst3-skills
/reload-plugins
```

Source: https://github.com/hoiung/sst3-skills (MIT). Writeup: https://hoiboy.uk/posts/shipping-ralph-review-trio/

This skill in `claude-code-templates` is the discovery entry — it documents the trigger conditions, expected output schema, and per-tier semantics. The actual command + agents ship via the marketplace install above.

## When to trigger

- An implementation is code-complete on a feature / solo branch.
- All acceptance criteria for the underlying issue are believed satisfied.
- Pre-merge verification is needed before human review or merge-to-main.
- You want structured evidence that each tier's checklist was walked.

## When NOT to trigger

- Work-in-progress branches mid-implementation — Ralph assumes the change is complete.
- Documentation-only diffs with no code — Tier 2/3 still run but most checks short-circuit to "doc-only PR" exemption; overkill for a single README edit.
- Hotfix branches where speed dominates verification — use the project's normal PR review.

## How it works

```
/ralph-review
Tier 1 — Haiku (surface checks) ─── fail ──> restart
│ pass
Tier 2 — Sonnet (logic checks) ─── fail ──> restart
│ pass
Tier 3 — Opus (deep analysis) ─── fail ──> restart
│ pass
RALPH_PASS → merge OK
```

A `<promise>HAIKU_PASS</promise>` / `<promise>SONNET_PASS</promise>` / `<promise>OPUS_PASS</promise>` token is emitted by each tier on pass. All three required for overall pass.

## Entry point

`/ralph-review` — installed by the upstream plugin (see Install above).

## Per-tier checklists

The full per-tier checklists ship in the upstream plugin's `agents/` directory: `haiku-reviewer.md` (Tier 1 surface), `sonnet-reviewer.md` (Tier 2 logic), `opus-reviewer.md` (Tier 3 deep-analysis).

Extended reference content under `references/` (same dir as this SKILL.md) is loaded on demand by each tier when a specific check requires more context.

## Outputs

Each tier writes a fenced `## RESULT` block with:

```
## RESULT
mcp_graph_available: yes|no # first line when discussing graph queries
verdict: pass|fail|unknown
files_touched: [paths]
findings: [{path, line, claim, evidence}]
scope_gaps: [list or "none"]
```

The main agent reads the RESULT block and decides next action (restart, next tier, or PASS).
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
# Example Ralph Review transcript

A scrubbed example of a Ralph Review pass over a small Python feature — adding a retry wrapper around an HTTP client. Branch: `solo/issue-42-http-retry`. Issue: "Add configurable retries to the OrderClient HTTP calls".

The three tiers ran in sequence with the controller `/ralph-review` command. Output below is condensed — real transcripts are longer.

---

## User invocation

```
> /ralph-review
```

## Controller — Tier 1 dispatch

Main agent dispatches `haiku-reviewer` against the current branch diff.

## Tier 1 — Haiku RESULT

```
## RESULT
mcp_graph_available: yes
verdict: fail
files_touched:
- src/clients/order_client.py
- src/clients/retry.py
- tests/test_retry.py
findings:
- path: src/clients/retry.py
line: 34
claim: debug print statement left in
evidence: "print(f'retrying {attempt} of {max_retries}')"
- path: src/clients/order_client.py
line: 78
claim: magic number in code (max_retries=3)
evidence: "self._retry_wrapper(max_retries=3) — should be in config"
scope_gaps: none
```

Main agent applies fixes:
- Remove the `print()` (commit `a3f12e9`)
- Move `max_retries=3` to `config/clients.yaml` under `order_client.max_retries`, read via `config.get('order_client.max_retries')` (commit `b84c5d1`)
- Run tests, all pass.

**Loop restarts from Tier 1** (restart-on-fail, not continue-with-flag).

## Tier 1 — Haiku RESULT (2nd iteration)

```
## RESULT
mcp_graph_available: yes
verdict: pass
files_touched:
- src/clients/order_client.py
- src/clients/retry.py
- tests/test_retry.py
- config/clients.yaml
findings: []
scope_gaps: none
```

`<promise>HAIKU_PASS</promise>`

## Tier 2 — Sonnet RESULT

```
## RESULT
mcp_graph_available: yes
verdict: fail
files_touched:
- src/clients/order_client.py
- src/clients/retry.py
findings:
- path: src/clients/retry.py
line: 22
claim: exception swallowed without logging
evidence: "except requests.RequestException: pass # silent fallback — no log, no metric"
- path: tests/test_retry.py
line: 51
claim: mock accepts **kwargs — does not prove propagation
evidence: "mock_call.assert_called() # should be assert_called_with(max_retries=3)"
scope_gaps: none
```

Main agent applies fixes:
- Replace the silent `except` with a structured log + a `retries_exhausted` counter metric (commit `c19f7a2`)
- Update test to `assert mock_call.call_args.kwargs['max_retries'] == 3` (commit `d55b908`)
- Run full test suite.

**Loop restarts from Tier 1**.

## Tier 1 — Haiku RESULT (3rd iteration)

```
verdict: pass
findings: []
```

`<promise>HAIKU_PASS</promise>`

## Tier 2 — Sonnet RESULT (2nd iteration)

```
verdict: pass
findings: []
```

`<promise>SONNET_PASS</promise>`

## Tier 3 — Opus RESULT

```
## RESULT
mcp_graph_available: yes
verdict: pass
files_touched:
- src/clients/order_client.py
- src/clients/retry.py
- tests/test_retry.py
- config/clients.yaml
findings: []
scope_gaps: none
governance_drift: none_detected
- Tier A box "Add retry wrapper" closed at commit b84c5d1 — interleaves with phase 1 commits
- Tier A box "Move max_retries to config" closed at commit b84c5d1 — same phase window
- Tier B box "Fix Everything applied" closed at end — acceptable
spot_check:
- graph_result: callers_of(_retry_wrapper) returned 1 caller
- source_read: src/clients/order_client.py:78 — confirmed single call site
factual_claims: none added in this diff
```

`<promise>OPUS_PASS</promise>`

## Controller — final output

```
<promise>RALPH_PASS</promise>

Three tiers passed. Two restart cycles (Tier 1 × 2 iterations, Tier 2 × 1 iteration).
Total findings fixed: 4.
Branch ready for merge.
```

---

## What this transcript demonstrates

- **Restart-on-fail**: Tier 2 found issues after Tier 1 already passed. Loop went back to Tier 1, not to "Tier 3 with a flag".
- **Cross-boundary contract**: the `**kwargs`-swallowing mock was caught at Tier 2 — surface tier doesn't trace call-args assertions.
- **Governance cadence**: Opus tier classified the three checkbox closures correctly (two Tier A interleaved, one Tier B batched).
- **Graph spot-check**: Opus used graph to list callers, then read the source to confirm — never trusted the graph alone.
- **Silent fallback caught late**: the swallowed exception was invisible at surface; Tier 2 traced the exception path to find it.
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Ralph Review — Shared 5-Culprits Reference

> **Architectural design**: Each Ralph tier scans for the same five patterns at *increasing depth*. The depth-layering is intentional. This file holds the shared framing so the category numbers and names stay in sync; each tier file keeps its own depth-appropriate bullets.

## The 5 Categories

Each tier checks these patterns. Bullets in the per-tier files differ by depth, not by category.

| # | Category | Surface (Haiku) | Logic (Sonnet) | Architectural (Opus) |
|---|---|---|---|---|
| 1 | Duplicate Code (DRY / Modularity) | Visible copy-paste | Same logic in N files | Same pattern implemented differently |
| 2 | On-the-fly Calculations (Hardcoded Settings) | Magic numbers in calculations | Inline business formulas | Calculation constants varying by env |
| 3 | Hardcoded Settings | Embedded URLs / paths / credentials | Multipliers, percentages, timeouts | User-configurable values in code |
| 4 | Obsolete / Dead Code (LMCE) | Commented-out blocks, dead TODOs | Never-called functions, unused imports | Modules never instantiated, dead endpoints |
| 5 | Silent Fallbacks (Fail Fast) | `catch{}` swallow, `\|\| default` | `.get(k, {})` chains, `try / except: pass` | Cascading defaults masking root cause |

## Tier Files

- `haiku-checklist.md` — surface checks
- `sonnet-checklist.md` — logic-trace checks
- `opus-checklist.md` — architectural checks

## Rule

Tier-specific bullet content lives in the tier files. The category numbers / names live here — change them once, propagate by re-reading.
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Haiku Tier — Surface Checklist (extended reference)

This is the extended reference the Tier 1 (Haiku) agent loads when a surface check needs more context than the inline agent definition provides. For the canonical flow + RESULT block schema, see `../../../agents/haiku-reviewer.md`.

## Scope

Surface review sits at the first line of defence. It catches:

- Missing / mis-named files
- Narrative-only "evidence"
- Debug code left in the diff
- Obvious copy-paste, obvious magic numbers, obvious silent-fallback patterns
- Cross-boundary issues that are visible at the diff level (wrong column name in a WHERE, `.toFixed()` on a nullable)

It does NOT trace call graphs, does NOT cross-verify config bidirectionally, does NOT audit architectural fit. Those belong in later tiers.

## What counts as "evidence"

For a `[x]` box to pass at the surface tier, the evidence must be at least one of:

- **file:line** — a concrete pointer a reviewer can open and read
- **commit hash** — a hash visible in `git log` of the branch
- **command + output** — a reproducible command with captured output (tee log path is fine)
- **subagent RESULT comment-id** — a comment number referencing a fenced `## RESULT` block

Narrative-only evidence ("implemented the feature") is a fail.

## Common false positives

- **Intentional duplication for defence-in-depth**: a value validated at the API layer AND at the DB layer is not a DRY violation — it's a designed belt-and-braces. Flag for the subagent to confirm intent before marking as a finding.
- **Argparse defaults for optional tunables**: `parser.add_argument("--retries", default=3)` is not a "silent fallback" — the argument is optional by contract. Only flag `default=` patterns that hide REQUIRED config.
- **Sentinel values that are the contract**: `return None` from a "not found" function is not a silent failure — it's the advertised return contract. Flag only when the caller can't distinguish the sentinel from a valid value.

## Bash output discipline

- Wrap any command expected to produce > 200 lines in a tee-style capture. Report only the path + verdict in RESULT.
- Do NOT paste `pytest -v` output, `git diff` of 50+ files, or unfiltered log tails back to the main agent.

## Graph fallback

When the code-review-graph MCP is unavailable or the project uses unsupported languages (Markdown, YAML, JSON, SQL, TOML, shell), do NOT silently skip the graph-backed checks. Instead:

1. Retry once after a short delay.
2. If still unavailable, fall back to an Explore-style subagent that performs the equivalent manual audit (callers, callees, large functions).
3. Include the subagent's RESULT block in your own RESULT as `[graph unavailable: <reason>] [subagent fallback: <id>]`.

A bare `[graph unavailable]` with no subagent evidence = silent skip = FAIL.
Original file line number Diff line number Diff line change
@@ -0,0 +1,70 @@
# Opus Tier — Deep Analysis Checklist (extended reference)

This is the extended reference the Tier 3 (Opus) agent loads when a deep check needs more context than the inline agent definition. For the canonical flow + RESULT block schema, see `../../../agents/opus-reviewer.md`.

## Scope

Opus is the architectural pass. It catches:

- Architectural misfit (new module violates the layering pattern in use)
- Governance drift (Tier A checkbox batching against commit cadence)
- Null propagation across the full codebase (not just the diff)
- Factual claims without provenance in documentation
- Overengineering (premature abstractions, unnecessary complexity)
- Dead code at the module / endpoint / migration level

## Governance drift — two-tier cadence rule

When auditing checkbox close-out cadence, classify every `[x]` box into one of two tiers BEFORE judging:

**Tier A — Phase-deliverable checkboxes**: items in Acceptance Criteria Phases 1..N that describe a concrete deliverable (a file edit, commit, function, section, table, example). STRICT interleaving required — Proof of Work entry order MUST interleave with git-log commit order within the same phase's commit window.

**Tier B — Cross-cutting meta-checkboxes**: Triple-Check Gate items, Engineering Requirements meta-items like "Fix Everything", Cleanup Requirements, Verification Loop self-gates, PREREQUISITE CHECKPOINT, Expected Behavior post-condition items. Batched-at-end is acceptable because these describe conditions observable ONLY after all phases complete. Closing them mid-phase would be dishonest.

**Classification heuristic**: if the checkbox text names a specific file / commit / section / function / table to build or change, it is **Tier A**. If it describes a cross-cutting condition requiring the WHOLE implementation to evaluate, it is **Tier B**. When in doubt, inspect the checkbox's section in the issue template: Acceptance Criteria sub-phases = Tier A; Triple-Check Gate / Engineering Requirements meta / Cleanup Requirements / Verification Loop self-gates / PREREQUISITE CHECKPOINT / Expected Behavior = Tier B.

**Rule**:

- Flag Tier A batching as governance drift.
- Do NOT flag Tier B batching as drift.

A Tier-B-only batch is acceptable and expected.

## Governance evidence signal (canonical)

The canonical audit signal for verifying that checkbox-close invocations actually happened is the `## Proof of Work` section in the issue body — NOT the GitHub timeline `edited` events.

Why: GitHub's timeline API does not emit `edited` events for an issue author's own body edits on their own issue. Since solo-workflow agents ARE the issue author in almost all cases, timeline-based audits false-negative every honored invocation. The body content itself, however, is always externally readable.

Verification procedure:

1. Fetch the issue body via the project's governance MCP tool, or a plain GitHub API call.
2. Parse the `## Proof of Work` section. Each entry starts with `- **<checkbox text>**: <evidence>`.
3. For every `[x]` box in the body, there MUST be a matching entry in Proof of Work. Missing entry = drift.
4. For each entry, spot-check the cited evidence:
- file:line claims → open the file on the branch
- commit hashes → `git show <hash>`
- subagent RESULT blocks → fetch the cited comment
- command output → reproducible via the same command

Ordering of Proof of Work entries is authoritative — the section appends in invocation order. Cross-reference the entry order against `git log --oneline` to confirm Tier A items closed within the same phase's commit window.

## Factual claims audit

Every numeric assertion added by the implementation (counts, ratios, durations, capacities, percentages) must have a verifiable source:

- **Reproducible command** (e.g. `git log --oneline | wc -l` → "10,385 commits")
- **API query** (e.g. `gh issue list --state all` → "1,309 issues")
- **Code reference** (e.g. `grep -c "def test_" tests/` → "N test functions")
- **Calculation** (e.g. `(total - open) / total = 99.4% close rate`)

"Seems reasonable" is not a source. If the number cannot be reproduced, it must be sourced or removed before OPUS_PASS.

## Bash output discipline

- Any command producing > 200 lines → capture to file, report path + verdict in RESULT.
- Do NOT paste full pytest output, unfiltered git diffs, or log tails back to the main agent.

## When graph is unavailable

Opus tier has the deepest graph requirements — dead-code detection via `large_functions` + orphan scan, impact scope validation via `impact(max_depth=2)`, large-function audit. When graph is unavailable or unsupported-language, an Explore-style subagent MUST perform the equivalent manual architectural audit and the RESULT must include the subagent's RESULT block. Documenting only `[graph unavailable]` = silent skip = FAIL.
Loading