Skip to content

Commit 6893416

Browse files
Kasper Jungeclaude
authored andcommitted
docs: refine ralph standard blog post based on feedback
Rewrite skill comparison to emphasize inner/outer loop distinction and intentional format familiarity. Remove harness engineering and autoresearch as inspirations. Focus on controlling the outer loop and injecting context into the inner loop. Add ralph add install instructions and cookbook link. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 2126496 commit 6893416

1 file changed

Lines changed: 33 additions & 31 deletions

File tree

docs/blog/posts/the-ralph-standard.md

Lines changed: 33 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,15 @@ keywords: RALPH.md format, agent loop standard, autonomous coding format, ralph
1010

1111
# An agent skill-like standard for autonomous agent loops
1212

13-
I've spent the last few weeks messing around with [ralph loops](https://ghuntley.com/ralph/) running an agent against a prompt in a while loop. The more I used them, the more I wanted a reusable format: deterministic scripts between iterations, their output optionally injected into the prompt, and a way to parametrize the whole thing so one loop definition works across projects.
13+
I've spent the last few weeks messing around with [ralph loops](https://ghuntley.com/ralph/) - running an agent against a prompt in a while loop. The more I used them, the more I wanted a reusable format: deterministic scripts between iterations, their output optionally injected into the prompt, and a way to parametrize the whole thing so one loop definition works across projects.
1414

1515
So I designed one.
1616

1717
<!-- more -->
1818

1919
## The format
2020

21-
A ralph is a self-contained directory. The only required file is `RALPH.md` everything else is optional context:
21+
A ralph is a self-contained directory. The only required file is `RALPH.md` - everything else is optional context:
2222

2323
```
2424
bug-hunter/
@@ -78,18 +78,18 @@ Find and fix a real bug in this codebase.
7878

7979
Each iteration:
8080

81-
1. **Read code** pick a module and read it carefully. Look for
81+
1. **Read code** - pick a module and read it carefully. Look for
8282
edge cases, off-by-one errors, missing validation, incorrect
8383
error handling, race conditions, or logic errors.
84-
2. **Write a failing test** prove the bug exists with a test
84+
2. **Write a failing test** - prove the bug exists with a test
8585
that fails on the current code.
86-
3. **Fix the bug** make the test pass with a minimal fix.
87-
4. **Verify** all existing tests must still pass.
86+
3. **Fix the bug** - make the test pass with a minimal fix.
87+
4. **Verify** - all existing tests must still pass.
8888

8989
## Rules
9090

9191
- One bug per iteration
92-
- The bug must be real do not invent hypothetical issues
92+
- The bug must be real - do not invent hypothetical issues
9393
- Always write a regression test before fixing
9494
- Do not change unrelated code
9595
- Commit with `fix: resolve <description>`
@@ -99,28 +99,18 @@ Each iteration:
9999

100100
The whole format is four things:
101101

102-
1. **`agent`** the command to run (anything that reads stdin)
103-
2. **`commands`** deterministic feedback commands that run between iterations
104-
3. **`args`** declared arguments to parametrize the ralph from the command line
105-
4. **A prompt body** with `{{ placeholders }}` for command output and arguments
102+
1. **`agent`** - the command to run (anything that reads stdin)
103+
2. **`commands`** - deterministic feedback commands that run between iterations
104+
3. **`args`** - declared arguments to parametrize the ralph from the command line
105+
4. **A prompt body** - with `{{ placeholders }}` for command output and arguments
106106

107-
Each iteration: run the commands, optionally inject their output into the prompt via `{{ commands.<name> }}`, resolve `{{ args.<name> }}` placeholders for ad-hoc steering, pipe the assembled prompt to the agent, agent does its thing, repeat. Fresh context every cycle. State, progress, strategy — it all lives in the project's filesystem. Git history, markdown docs, plan files, whatever makes sense. The format doesn't prescribe where state goes.
107+
Each iteration: run the commands, optionally inject their output into the prompt via `{{ commands.<name> }}`, resolve `{{ args.<name> }}` placeholders for ad-hoc steering, pipe the assembled prompt to the agent, agent does its thing, repeat. Fresh context every cycle.
108108

109109
## Design decisions
110110

111-
**Why a directory, not just a file?** Same reason the [Agent Skills](https://agentskills.io/) format uses a directory. A `RALPH.md` on its own is enough for simple loops, but real-world loops often need a shell script for a custom check (`./check-coverage.sh`), reference docs for progressive disclosure (`coding-guidelines.md`, `architecture.md`), data files, templates. Commands starting with `./` run relative to the ralph directory, so bundled scripts just work. The directory is the unit of sharing — copy it, check it into a repo, and the whole loop travels together.
111+
**Why a directory, not just a file?** Same reason the [Agent Skills](https://agentskills.io/) format uses a directory. A `RALPH.md` on its own is enough for simple loops, but ralph loops often benefit from being bundled with shell scripts for custom checks and context injection and reference docs for progressive disclosure (`coding-guidelines.md`, `architecture.md`). Commands starting with `./` run relative to the ralph directory, so bundled scripts just work. The directory then is the unit of sharing.
112112

113-
**Why not just make it a skill?** They look similar on the surface — both are directories with a markdown file and optional bundled resources. But a skill is loaded once when an agent decides it's relevant. It adds knowledge to a single session. A ralph is executed repeatedly — it's the outer loop that launches the agent, feeds it deterministic feedback, and kicks off the next iteration. Skills live inside an agent's context. Ralphs live outside, orchestrating from the outside in. You could use both together — a ralph that runs an agent which has skills installed. Complementary layers, not competing ones.
114-
115-
## What shaped the format
116-
117-
Two things influenced the design more than anything else. OpenAI's [harness engineering](https://openai.com/index/harness-engineering/) post — build deterministic infrastructure around the agent, keep progress as markdown in the codebase, don't try to make the agent smarter. And Karpathy's [autoresearch](https://github.com/karpathy/autoresearch) — one hard metric, ~700 experiments in two days, changes that don't improve the number get reverted.
118-
119-
`commands` in a ralph are the mechanism for all of this. They're not just ground truth — they're the control structure around the loop. Enforce file and directory conventions. Run checks. Inject dynamic context. Gate progress. The deterministic scaffolding that lets you trust the agent to operate autonomously.
120-
121-
The easy wins are tasks with hard metrics — test coverage, validation loss, a reference implementation to compare against. But I think ralph loops have the potential to take on much fuzzier, higher-level work. A PRD. A loose description of a desired outcome. A strategic goal. For that kind of work, how you frame the outcome matters more than how specifically you instruct the agent. Too specific and the agent overfits to your instructions. Too vague and it drifts. There's a weird golden balance, and I've been reaching for Jobs-to-be-Done as a prompting technique — express the outcome to optimize for, not the steps to take.
122-
123-
That's what I want to build towards — a format and a tool that enable increasingly ambitious and fuzzy things to be achieved with ralph loops. Because AI is truly powerful when it surprises you with solutions to problems you didn't anticipate when you kicked off the loop. True discovery. The iterative, fresh-context way of working makes that possible in a way that single-shot prompting doesn't. A good ralph engineer figures out how to get results with agents that are as autonomous as possible — because that means the strategy and outcome definition are good enough for the agent to make decisions you couldn't have predicted. That's the power of it. And honestly why I keep rabbit-holing on all of this.
113+
**Why not just make it a skill?** They look similar on the surface - both are directories with a markdown file and optional bundled resources. That similarity is intentional - the skill format has become familiar to a lot of people, and borrowing its shape makes ralphs easy to understand at a glance. But they serve different layers. A skill provides knowledge about reusable processes in the inner loop - the agent's session. A ralph steers the outer loop by running code between iterations to deterministically control the environment and optionally inject context into the inner loop before kicking off the next iteration.
124114

125115
## Try it
126116

@@ -129,23 +119,35 @@ I'm building a tool called [Ralphify](https://github.com/computerlovetech/ralphi
129119
```bash
130120
uv tool install ralphify
131121

132-
# point the bug hunter at a specific area
133-
ralph run bug-hunter --focus "authentication and session handling"
122+
# point it at a directory containing a RALPH.md
123+
ralph run ./ralphs/bug-hunter --focus "authentication and session handling"
134124

135125
# same ralph, different focus
136-
ralph run bug-hunter --focus "edge cases in the payment flow"
126+
ralph run ./ralphs/bug-hunter --focus "edge cases in the payment flow"
137127

138-
# or run it without args unmatched placeholders just resolve to empty
139-
ralph run bug-hunter
128+
# or run it without args - unmatched placeholders just resolve to empty
129+
ralph run ./ralphs/bug-hunter
140130
```
141131

142132
Declare `args: [focus]` and you get `--focus` on the CLI. The value fills `{{ args.focus }}` in the prompt. One ralph, many use cases.
143133

144-
Ralphify is just one implementation though. The format itself is what I care about most — it's just YAML frontmatter and markdown. Any tool could read it and run the loop. I can't predict what will end up being useful here. But I built this, and maybe someone else finds the format interesting enough to build on or take in a direction I haven't thought of. The Agent Skills format started as one team's idea and ended up adopted by dozens of agents. I don't know if the same thing happens here, but the format is simple enough that it could.
134+
Because ralphs are just directories in a git repo, anyone can share them. If a repo contains a directory with a `RALPH.md`, you can install it with `ralph add`:
135+
136+
```bash
137+
# install a specific ralph from any GitHub repo
138+
ralph add owner/repo/ralph-name
139+
140+
# install all ralphs in a repo
141+
ralph add owner/repo
142+
```
143+
144+
The [ralphify examples](https://github.com/computerlovetech/ralphify/tree/main/examples) are a good place to start — and the [cookbook](https://ralphify.co/docs/cookbook/) has more.
145+
146+
Ralphify is just one implementation though. The format itself is what I care about most - it's just YAML frontmatter and markdown. Any tool could read it and run the loop. I can't predict what will end up being useful here. But I built this, and maybe someone else finds the format interesting enough to build on or take in a direction I haven't thought of.
145147

146148
## I'd love feedback
147149

148-
This is where my thinking landed, but I'm sure there are blind spots. If you're running agent loops for coding, research, testing, or something I haven't thought of I'd genuinely like to hear what you think.
150+
This is where my thinking landed, but I'm sure there are blind spots. If you're running agent loops - for coding, research, testing, or something I haven't thought of - I'd genuinely like to hear what you think.
149151

150152
- **Share a use case**: [open an issue](https://github.com/computerlovetech/ralphify/issues) describing how you'd use this, or how you already run agent loops. The weird, unexpected ones are the most useful.
151153
- **Poke holes in the format**: if something feels wrong or missing, I want to know.

0 commit comments

Comments
 (0)