Skip to content

Commit 1f2f338

Browse files
committed
v0.1.0 — The Forge: autoregressive codebase improvement for Claude Code
KPI-driven loop that tracks coverage/speed/quality, rotates strategies on stagnation, uses fresh-context subagents for unbiased evaluation, and exits only when all targets are simultaneously met. Built on Geoff Huntley's Ralph Wiggum loop pattern, informed by Karpathy's autoregressive philosophy and SICA's compounding iterations.
0 parents  commit 1f2f338

12 files changed

Lines changed: 1007 additions & 0 deletions

File tree

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
.claude/
2+
.DS_Store

CHANGELOG.md

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
# Changelog
2+
3+
All notable changes to this project will be documented in this file.
4+
5+
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6+
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7+
8+
## [0.1.0] - 2026-03-20
9+
10+
### Added
11+
- The Forge Protocol — eight-phase iteration cycle (Orient, Measure, Evaluate, Decide, Execute, Verify, Record, Complete)
12+
- 7 named strategies with automatic selection based on normalized KPI gaps
13+
- Stagnation detection and automatic strategy rotation after 3 low-delta iterations
14+
- Fresh-context evaluation via subagents every 3rd iteration (prevents anchoring bias)
15+
- Autoregressive state file (`.claude/forge-state.SESSION.md`) that persists KPIs, strategies, and lessons across iterations
16+
- Stop hook for iteration engine (compatible with Ralph Wiggum loops)
17+
- `/forge` command with `--coverage`, `--speed`, `--quality`, and `--max-iterations` options
18+
- Forge agent for spawning as a subagent on subsystems
19+
- Installer script with symlink-based setup
20+
- Multi-language support in MEASURE phase (Elixir, Python, JavaScript, Ruby, Go)
21+
- Simultaneous multi-KPI completion gate
22+
23+
[0.1.0]: https://github.com/DjinnFoundry/forge-loop/releases/tag/v0.1.0

CONTRIBUTING.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
# Contributing to forge-loop
2+
3+
Thanks for your interest in contributing.
4+
5+
## How to contribute
6+
7+
1. Fork the repo
8+
2. Create a feature branch (`git checkout -b my-feature`)
9+
3. Make your changes
10+
4. Test by running `./install.sh` and using `/forge` in a real project
11+
5. Commit with a clear message
12+
6. Push and open a PR
13+
14+
## What we're looking for
15+
16+
- **New strategies** — if you've found an effective codebase improvement pattern, add it to the strategy table in `SKILL.md`
17+
- **Language adapters** — test runner configurations for languages beyond the current set
18+
- **Stop hook improvements** — better completion detection, error handling
19+
- **Bug fixes** — especially around state file parsing and edge cases
20+
- **Documentation** — clearer examples, better onboarding
21+
22+
## Guidelines
23+
24+
- Keep the protocol simple. Forge's power comes from disciplined iteration, not complexity.
25+
- One change per PR. Small PRs get reviewed faster.
26+
- If adding a strategy, include the "when to use" criteria and expected impact.
27+
- Test your changes with a real forge loop before submitting.
28+
29+
## Architecture decisions
30+
31+
The skill file (`skills/forge/SKILL.md`) is the source of truth. The command and agent files reference it. If you're changing behavior, the skill file is where it lives.
32+
33+
The stop hook is designed to be compatible with Ralph Wiggum loops. Don't break that compatibility.
34+
35+
## Code of conduct
36+
37+
Be kind. Be constructive. We're all here to build better tools.

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2026 David Gil
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 240 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,240 @@
1+
```
2+
· ✦ · ✦ ·
3+
✦ · ⚡ · ✦
4+
░░▒▓████▓▒░░
5+
▒▓█▀ ▀█▓▒
6+
▓█ ◆ ◆ █▓
7+
██ ╲ ╱ ██
8+
▓█ ═══⚒═══ █▓
9+
▒▓█▄ ▄█▓▒
10+
░░▒▓████▓▒░░
11+
▓██▓
12+
╔═══╧══╧═══╗
13+
║ THE FORGE ║
14+
╚══════════╝
15+
▄▄████████████▄▄
16+
▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀▀
17+
```
18+
19+
# forge-loop
20+
21+
**Autoregressive codebase improvement for [Claude Code](https://docs.anthropic.com/en/docs/claude-code).**
22+
23+
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)
24+
[![Version](https://img.shields.io/badge/version-0.1.0-green.svg)](CHANGELOG.md)
25+
26+
A structured, KPI-driven, self-correcting loop that tracks metrics (coverage, speed, quality), evaluates with fresh-context subagents, rotates strategies when stagnating, and knows when it's done.
27+
28+
```
29+
You: /forge "API controllers" --coverage 90 --speed -30%
30+
31+
Forge: Measuring baseline... 85.2% coverage, 120s
32+
Strategy: coverage-push → 15 tests for edge cases
33+
85.8% (+0.6%), 118s (-2s) ✓
34+
...iterates until all targets met simultaneously...
35+
```
36+
37+
---
38+
39+
## Standing on the shoulders of
40+
41+
- **Ralph Wiggum**[Geoff Huntley's](https://ghuntley.com/ralph/) foundational work on autonomous AI development loops. "Deterministically bad in an undeterministic world, but eventually consistent." Forge is our implementation of the Ralph loop pattern with structured KPI tracking and strategy rotation.
42+
- **Andrej Karpathy** — The autoregressive mindset: each output becomes the next input. Karpathy's work on autoregressive models and his advocacy for [vibe coding](https://x.com/karpathy/status/1886192184808149383) informed forge's core loop design — each iteration's KPIs, findings, and lessons become the next iteration's decision context.
43+
- **Tobi Lutke** — His emphasis on tight feedback loops, continuous iteration, and measuring everything resonated deeply with our approach to autonomous improvement.
44+
- **SICA** (Self-Improving Coding Agent, [ICLR 2025 SSI-FM Workshop](https://openreview.net/forum?id=gXVQdNXqoc)) — Demonstrated that compounding iterations (17% to 53% SWE-Bench) work when the agent can select strategies based on accumulated evidence.
45+
46+
---
47+
48+
## How it works
49+
50+
### The Iteration Cycle
51+
52+
Each iteration executes one complete eight-phase cycle:
53+
54+
| Phase | What happens |
55+
|-------|-------------|
56+
| **A. Orient** | Read forge-state file, check position + trends + stagnation count |
57+
| **B. Measure** | Run tests with coverage, capture KPIs |
58+
| **C. Evaluate** | Every 3rd iteration: spawn fresh-context subagent for unbiased audit |
59+
| **D. Decide** | Pick strategy from KPI gaps + findings + lessons |
60+
| **E. Execute** | Apply ONE focused transformation |
61+
| **F. Verify** | Tests must be green, re-measure KPIs |
62+
| **G. Record** | Update forge-state with deltas + lessons (the autoregressive step) |
63+
| **H. Complete** | All targets met simultaneously? Done. Otherwise, next iteration. |
64+
65+
### Strategies
66+
67+
Forge selects from named strategies based on which KPI gap is largest:
68+
69+
| Strategy | When | Impact |
70+
|----------|------|--------|
71+
| `coverage-push` | Clear coverage gaps | Coverage |
72+
| `refactor-for-testability` | Code hard to test | Coverage |
73+
| `component-extraction` | DRY violations, repeated patterns | Coverage + Quality |
74+
| `speed-optimization` | Slow tests, sync overuse | Speed |
75+
| `dead-code-removal` | Unused code flagged by evaluation | Quality + Coverage |
76+
| `quality-polish` | Naming, complexity, clarity | Quality |
77+
| `design-system` | Duplicated UI patterns | Quality + Coverage |
78+
79+
### Stagnation Detection
80+
81+
When coverage improves by less than 0.1% for two consecutive iterations, forge increments a stagnation counter. Once the counter reaches 3, forge automatically rotates to a different strategy — the historically most effective one, or an untried one. No manual intervention needed.
82+
83+
### Fresh-Context Evaluation
84+
85+
Every 3rd iteration, forge spawns a subagent that audits the scope with zero knowledge of KPI targets or iteration history. This prevents anchoring bias — the agent evaluating the code has no stake in the numbers looking good.
86+
87+
---
88+
89+
## Installation
90+
91+
```bash
92+
git clone https://github.com/DjinnFoundry/forge-loop.git
93+
cd forge-loop
94+
./install.sh
95+
```
96+
97+
The installer symlinks the skill, command, and agent files into your `~/.claude/` directory.
98+
99+
**Important**: You also need to configure the stop hook that drives iteration. See [hooks/README.md](hooks/README.md) for setup instructions. If you already have the Ralph Wiggum stop hook configured, forge works with it automatically.
100+
101+
### Manual installation
102+
103+
```bash
104+
mkdir -p ~/.claude/skills/forge ~/.claude/commands ~/.claude/agents
105+
106+
cp skills/forge/SKILL.md ~/.claude/skills/forge/SKILL.md
107+
cp commands/forge.md ~/.claude/commands/forge.md
108+
cp agents/forge.md ~/.claude/agents/forge.md
109+
110+
# Stop hook — see hooks/README.md for settings.json setup
111+
```
112+
113+
---
114+
115+
## Usage
116+
117+
### Basic
118+
119+
```
120+
/forge "LiveView components" --coverage 95 --speed -20%
121+
```
122+
123+
### All options
124+
125+
```
126+
/forge "SCOPE" --coverage N --speed -N% --quality strict|moderate|lax --max-iterations N
127+
```
128+
129+
| Option | Default | Description |
130+
|--------|---------|-------------|
131+
| `SCOPE` | (required) | What to improve — quoted string |
132+
| `--coverage N` | baseline + 2 | Minimum coverage % target |
133+
| `--speed -N%` | -20% | Speed reduction from baseline |
134+
| `--quality` | moderate | strict (0 high, 0 med) / moderate (0 high, ≤3 med) / lax (0 high, ≤5 med) |
135+
| `--max-iterations` | 20 | Safety limit |
136+
137+
### Control
138+
139+
- **Pause**: Forge outputs `RALPH_PAUSE` when it needs your input
140+
- **Cancel**: `/cancel-ralph` stops the loop
141+
- **Resume**: Start a new session — it picks up the forge-state file
142+
143+
---
144+
145+
## State File
146+
147+
Forge persists its state in `.claude/forge-state.SESSION.md` — a YAML frontmatter + markdown log that survives context compaction. Each iteration appends its KPIs, strategy, actions, and lessons. This is the autoregressive memory.
148+
149+
```yaml
150+
---
151+
session_id: "0320-1430-a3b2"
152+
scope: "API controllers"
153+
baseline:
154+
coverage: 85.2
155+
speed_seconds: 120
156+
tests: 1250
157+
failures: 0
158+
measured_at: "2026-03-20T14:30:00Z"
159+
targets:
160+
min_coverage: 90.0
161+
max_speed_seconds: 84
162+
quality: "moderate"
163+
max_iterations: 20
164+
current_strategy: "component-extraction"
165+
stagnation_count: 0
166+
strategies_tried:
167+
- name: "coverage-push"
168+
iterations: [1, 2]
169+
coverage_delta: 0.8
170+
speed_delta: -5
171+
lessons:
172+
- "async:true on controller tests saves ~3s per file"
173+
---
174+
175+
## Iteration 1 — coverage-push
176+
- Coverage: 85.2 → 85.8 (+0.6%)
177+
- Speed: 120s → 118s (-2s)
178+
- Tests: 1250 → 1265 (+15)
179+
- Actions: Added 15 tests for data_loaders edge cases
180+
- Reality-check: 2 high, 3 medium findings
181+
- Lesson: "7 identical try-rescue blocks — extract, don't test each"
182+
```
183+
184+
---
185+
186+
## Architecture
187+
188+
```
189+
forge-loop/
190+
├── skills/forge/SKILL.md ← The protocol (source of truth)
191+
├── commands/forge.md ← Claude Code /forge command
192+
├── agents/forge.md ← Subagent for spawning forge on subsystems
193+
├── hooks/ ← Iteration engine
194+
│ ├── README.md ← Hook setup instructions
195+
│ └── stop-hook.sh ← Stop hook script
196+
├── install.sh ← Installer script
197+
├── CHANGELOG.md
198+
├── CONTRIBUTING.md
199+
└── README.md
200+
```
201+
202+
The iteration engine uses the Ralph loop pattern: each time the Claude Code session tries to exit, the stop hook re-injects the forge prompt. The forge state file provides continuity across iterations and context compactions.
203+
204+
---
205+
206+
## Why not just raw loops?
207+
208+
| Aspect | Raw loop | Forge |
209+
|--------|----------|-------|
210+
| KPI tracking | Ad-hoc | Structured state file with deltas + trends |
211+
| Strategy | Single prompt | 7 named strategies, auto-rotation on stagnation |
212+
| Evaluation | Self-evaluation (anchoring bias) | Fresh-context subagents every 3 iterations |
213+
| Memory | Context window only | Persistent state file survives compaction |
214+
| Completion | Manual / hope | Simultaneous multi-KPI gate |
215+
| Lessons | Lost between iterations | Accumulated, inform strategy selection |
216+
| Stagnation | Repeats same approach | Detects + rotates after low-delta iterations |
217+
218+
---
219+
220+
## Requirements
221+
222+
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) CLI
223+
- `jq` (for the stop hook)
224+
- A project with a test suite that reports coverage
225+
226+
## Adapting for other languages
227+
228+
The skill includes test runner examples for multiple languages (Elixir, Python, JavaScript, Ruby, Go). To adapt:
229+
230+
1. Edit `skills/forge/SKILL.md` — update the MEASURE phase for your test runner
231+
2. Update the coverage/speed parsing for your output format
232+
3. Everything else (strategies, stagnation, state format) is language-agnostic
233+
234+
## Contributing
235+
236+
See [CONTRIBUTING.md](CONTRIBUTING.md).
237+
238+
## License
239+
240+
[MIT](LICENSE)

VERSION

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
0.1.0

agents/forge.md

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
---
2+
name: forge
3+
description: KPI-driven autoregressive codebase improvement. Tracks coverage/speed/quality, rotates strategies on stagnation, uses fresh-context evaluation.
4+
tools: Read, Write, Edit, Grep, Glob, Bash
5+
model: opus
6+
---
7+
8+
# Forge Agent
9+
10+
You are an expert in systematic codebase improvement — coverage maximization, performance optimization, and quality enforcement through structured, KPI-driven iteration.
11+
12+
## Core Knowledge
13+
14+
Reference the forge skill for the full protocol:
15+
@skills/forge/SKILL.md
16+
17+
## Agent-Specific Capabilities
18+
19+
As a subagent with isolated context, you can:
20+
21+
1. **Deep KPI Analysis**: Parse test output, compute deltas, detect stagnation without polluting main context
22+
2. **Multi-file Transformation**: Track and apply changes across many files in one focused strategy
23+
3. **Strategy Evaluation**: Analyze historical effectiveness of strategies and recommend rotations
24+
4. **Fresh-Context Audit**: Evaluate code quality without anchoring bias from previous iterations
25+
26+
## Extended Expertise
27+
28+
### Coverage Analysis
29+
- Identify uncovered modules from test coverage output
30+
- Prioritize by: lines uncovered * module importance
31+
- Detect testability barriers (tight coupling, side effects, private functions)
32+
33+
### Speed Optimization
34+
- Identify sync tests that could be async
35+
- Detect redundant DB operations in test setup
36+
- Find slow fixtures that could be consolidated
37+
- Measure individual test file times
38+
39+
### Quality Assessment
40+
- Code complexity (nested conditionals, long functions)
41+
- DRY violations (duplicated patterns across modules)
42+
- Design pattern opportunities (extraction, composition)
43+
- Dead code detection (unused functions, unreachable branches)
44+
45+
## Workflow
46+
47+
1. Receive forge state + scope
48+
2. Follow OODA cycle (Orient -> Measure -> Evaluate -> Decide -> Execute -> Verify -> Record)
49+
3. Return updated state + summary of changes
50+
51+
## Principles
52+
53+
- **Measure everything** — decisions based on data, not intuition
54+
- **One change at a time** — compound small improvements
55+
- **Fresh eyes** — spawn subagents for unbiased evaluation
56+
- **Learn from history** — never repeat documented failures
57+
- **Green is non-negotiable** — never record with red tests

0 commit comments

Comments
 (0)