Skip to content

Commit c4d9870

Browse files
Kasper JungeRalphify
authored andcommitted
docs: add blog post on the ralph standard and format design
Publishes the "An agent skill-like standard for autonomous agent loops" post covering the ralph format, design decisions, and influences. Also updates author description in .authors.yml. Co-authored-by: Ralphify <noreply@ralphify.co>
1 parent 7208f3a commit c4d9870

2 files changed

Lines changed: 154 additions & 1 deletion

File tree

docs/blog/.authors.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
authors:
22
kasper:
33
name: Kasper Junge
4-
description: Creator of Ralphify
4+
description: Founder of computerlove.tech
55
avatar: https://github.com/kasperjunge.png
Lines changed: 153 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,153 @@
1+
---
2+
date: 2026-03-24
3+
categories:
4+
- Standards
5+
authors:
6+
- kasper
7+
description: I've been obsessing over how to define autonomous agent loops in a single directory. Here's what I landed on.
8+
---
9+
10+
# An agent skill-like standard for autonomous agent loops
11+
12+
I've spent the last few weeks messing around with [ralph loops](https://ghuntley.com/ralph/) — running an agent against a prompt in a while loop. The more I used them, the more I wanted a reusable format: deterministic scripts between iterations, their output optionally injected into the prompt, and a way to parametrize the whole thing so one loop definition works across projects.
13+
14+
So I designed one.
15+
16+
<!-- more -->
17+
18+
## The format
19+
20+
A ralph is a self-contained directory. The only required file is `RALPH.md` — everything else is optional context:
21+
22+
```
23+
bug-hunter/
24+
├── RALPH.md # the loop definition (required)
25+
├── check-coverage.sh # script used by a command (optional)
26+
├── coding-guidelines.md # context the agent loads on demand (optional)
27+
└── test-data.json # whatever else the loop needs (optional)
28+
```
29+
30+
The `RALPH.md` itself has YAML frontmatter that steers the loop, and a prompt body that gets assembled and piped to the agent each iteration:
31+
32+
```markdown
33+
---
34+
agent: claude -p --dangerously-skip-permissions
35+
commands:
36+
- name: tests
37+
run: uv run pytest -x
38+
- name: types
39+
run: uv run ty check
40+
- name: lint
41+
run: uv run ruff check .
42+
- name: git-log
43+
run: git log --oneline -10
44+
args:
45+
- focus
46+
---
47+
48+
# Bug Hunter
49+
50+
You are an autonomous bug-hunting agent running in a loop.
51+
Each iteration starts with fresh context.
52+
Your progress lives in the code and git.
53+
54+
## Test results
55+
56+
{{ commands.tests }}
57+
58+
## Type checking
59+
60+
{{ commands.types }}
61+
62+
## Lint
63+
64+
{{ commands.lint }}
65+
66+
## Recent commits
67+
68+
{{ commands.git-log }}
69+
70+
If tests, types, or lint are failing, fix that before hunting
71+
for new bugs.
72+
73+
## Task
74+
75+
Find and fix a real bug in this codebase.
76+
{{ args.focus }}
77+
78+
Each iteration:
79+
80+
1. **Read code** — pick a module and read it carefully. Look for
81+
edge cases, off-by-one errors, missing validation, incorrect
82+
error handling, race conditions, or logic errors.
83+
2. **Write a failing test** — prove the bug exists with a test
84+
that fails on the current code.
85+
3. **Fix the bug** — make the test pass with a minimal fix.
86+
4. **Verify** — all existing tests must still pass.
87+
88+
## Rules
89+
90+
- One bug per iteration
91+
- The bug must be real — do not invent hypothetical issues
92+
- Always write a regression test before fixing
93+
- Do not change unrelated code
94+
- Commit with `fix: resolve <description>`
95+
```
96+
97+
## Four things
98+
99+
The whole format is four things:
100+
101+
1. **`agent`** — the command to run (anything that reads stdin)
102+
2. **`commands`** — deterministic feedback commands that run between iterations
103+
3. **`args`** — declared arguments to parametrize the ralph from the command line
104+
4. **A prompt body** — with `{{ placeholders }}` for command output and arguments
105+
106+
Each iteration: run the commands, optionally inject their output into the prompt via `{{ commands.<name> }}`, resolve `{{ args.<name> }}` placeholders for ad-hoc steering, pipe the assembled prompt to the agent, agent does its thing, repeat. Fresh context every cycle. State, progress, strategy — it all lives in the project's filesystem. Git history, markdown docs, plan files, whatever makes sense. The format doesn't prescribe where state goes.
107+
108+
## Design decisions
109+
110+
**Why a directory, not just a file?** Same reason the [Agent Skills](https://agentskills.io/) format uses a directory. A `RALPH.md` on its own is enough for simple loops, but real-world loops often need a shell script for a custom check (`./check-coverage.sh`), reference docs for progressive disclosure (`coding-guidelines.md`, `architecture.md`), data files, templates. Commands starting with `./` run relative to the ralph directory, so bundled scripts just work. The directory is the unit of sharing — copy it, check it into a repo, and the whole loop travels together.
111+
112+
**Why not just make it a skill?** They look similar on the surface — both are directories with a markdown file and optional bundled resources. But a skill is loaded once when an agent decides it's relevant. It adds knowledge to a single session. A ralph is executed repeatedly — it's the outer loop that launches the agent, feeds it deterministic feedback, and kicks off the next iteration. Skills live inside an agent's context. Ralphs live outside, orchestrating from the outside in. You could use both together — a ralph that runs an agent which has skills installed. Complementary layers, not competing ones.
113+
114+
## What shaped the format
115+
116+
Two things influenced the design more than anything else. OpenAI's [harness engineering](https://openai.com/index/harness-engineering/) post — build deterministic infrastructure around the agent, keep progress as markdown in the codebase, don't try to make the agent smarter. And Karpathy's [autoresearch](https://github.com/karpathy/autoresearch) — one hard metric, ~700 experiments in two days, changes that don't improve the number get reverted.
117+
118+
`commands` in a ralph are the mechanism for all of this. They're not just ground truth — they're the control structure around the loop. Enforce file and directory conventions. Run checks. Inject dynamic context. Gate progress. The deterministic scaffolding that lets you trust the agent to operate autonomously.
119+
120+
The easy wins are tasks with hard metrics — test coverage, validation loss, a reference implementation to compare against. But I think ralph loops have the potential to take on much fuzzier, higher-level work. A PRD. A loose description of a desired outcome. A strategic goal. For that kind of work, how you frame the outcome matters more than how specifically you instruct the agent. Too specific and the agent overfits to your instructions. Too vague and it drifts. There's a weird golden balance, and I've been reaching for Jobs-to-be-Done as a prompting technique — express the outcome to optimize for, not the steps to take.
121+
122+
That's what I want to build towards — a format and a tool that enable increasingly ambitious and fuzzy things to be achieved with ralph loops. Because AI is truly powerful when it surprises you with solutions to problems you didn't anticipate when you kicked off the loop. True discovery. The iterative, fresh-context way of working makes that possible in a way that single-shot prompting doesn't. A good ralph engineer figures out how to get results with agents that are as autonomous as possible — because that means the strategy and outcome definition are good enough for the agent to make decisions you couldn't have predicted. That's the power of it. And honestly why I keep rabbit-holing on all of this.
123+
124+
## Try it
125+
126+
I'm building a tool called [Ralphify](https://github.com/computerlovetech/ralphify) to run ralphs in this format. Arguments declared in the frontmatter become flags on the command line, so a single ralph works across different contexts:
127+
128+
```bash
129+
uv tool install ralphify
130+
131+
# point the bug hunter at a specific area
132+
ralph run bug-hunter --focus "authentication and session handling"
133+
134+
# same ralph, different focus
135+
ralph run bug-hunter --focus "edge cases in the payment flow"
136+
137+
# or run it without args — unmatched placeholders just resolve to empty
138+
ralph run bug-hunter
139+
```
140+
141+
Declare `args: [focus]` and you get `--focus` on the CLI. The value fills `{{ args.focus }}` in the prompt. One ralph, many use cases.
142+
143+
Ralphify is just one implementation though. The format itself is what I care about most — it's just YAML frontmatter and markdown. Any tool could read it and run the loop. I can't predict what will end up being useful here. But I built this, and maybe someone else finds the format interesting enough to build on or take in a direction I haven't thought of. The Agent Skills format started as one team's idea and ended up adopted by dozens of agents. I don't know if the same thing happens here, but the format is simple enough that it could.
144+
145+
## I'd love feedback
146+
147+
This is where my thinking landed, but I'm sure there are blind spots. If you're running agent loops — for coding, research, testing, or something I haven't thought of — I'd genuinely like to hear what you think.
148+
149+
- **Share a use case**: [open an issue](https://github.com/computerlovetech/ralphify/issues) describing how you'd use this, or how you already run agent loops. The weird, unexpected ones are the most useful.
150+
- **Poke holes in the format**: if something feels wrong or missing, I want to know.
151+
- **Write a ralph and share it**: if you try the format and build something interesting, I'd love to see it.
152+
153+
[GitHub](https://github.com/computerlovetech/ralphify) | [Docs](https://ralphify.co/docs/) | [PyPI](https://pypi.org/project/ralphify/)

0 commit comments

Comments
 (0)