ralph/pytest harness pr#1
Conversation
elasticdotventures
commented
Jan 31, 2026
- feat: add FastMCP 3.0 server capabilities to Ralph
- feat: add pytest harness
- Add FastMCP 3.0.0b1 integration with dual CLI/MCP modes - Expose MCP tools: run_ralph_iteration, get_ralph_status, get_prd_status - Expose MCP resources: ralph://prd, ralph://progress - Fix ralph.sh symlink resolution (cd to script dir before uv run) - Update default model to gpt-5.2-codex - Add README-MCP.md with usage examples Ralph can now be used as both: 1. CLI tool: ./ralph.sh --agent codex 3 2. MCP server: uv run --script ralphython.py --mcp --transport http Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This pull request adds FastMCP 3.0 server capabilities to Ralph and introduces a pytest testing harness. The implementation migrates the core logic from a bash script to a Python module (ralphython.py) that can run as both a CLI tool and an MCP server.
Changes:
- Rewrote Ralph's core logic in Python with FastMCP 3.0 integration, exposing MCP tools for running iterations and checking status, plus resources for accessing PRD and progress files
- Added pytest test suite covering argument parsing, deprecated flag handling, and PRD ingestion
- Simplified ralph.sh to a thin wrapper that delegates to the Python implementation via uv
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
| ralphython.py | Main Python implementation with CLI and MCP server modes, including tools and resources |
| tests/test_ralphython.py | Pytest test suite for CLI functionality |
| tests/conftest.py | Test configuration for module imports |
| test_ralph_mcp.py | Integration test for MCP server capabilities |
| ralph.sh | Simplified bash wrapper delegating to Python |
| requirements-dev.txt | Development dependencies (pytest, mypy) |
| pytest.ini | Pytest configuration |
| README.md | Updated usage instructions and added testing section |
| README-MCP.md | New documentation for MCP features |
| AGENTS.md | Updated agent selection examples |
| tests/pycache/* | Compiled bytecode (should not be committed) |
| pycache/* | Compiled bytecode (should not be committed) |
| ralph_cli.py | Unclear purpose file containing only "ralphython.py" |
| =3.0.0b1 | Malformed or incomplete file |
| ralph.sh~ | Editor backup file (should not be committed) |
Comments suppressed due to low confidence (1)
README.md:258
- The main README.md does not mention the new MCP functionality at all, despite it being a significant feature addition mentioned in the PR description. Consider adding a reference to README-MCP.md in the main README so users are aware of the MCP capabilities.
# Ralph

Ralph is an autonomous AI agent loop that runs AI coding tools ([Amp](https://ampcode.com) or [Claude Code](https://docs.anthropic.com/en/docs/claude-code)) repeatedly until all PRD items are complete. Each iteration is a fresh instance with clean context. Memory persists via git history, `progress.txt`, and `prd.json`.
Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/).
[Read my in-depth article on how I use Ralph](https://x.com/ryancarson/status/2008548371712135632)
## Prerequisites
- One of the following AI coding tools installed and authenticated:
- [Amp CLI](https://ampcode.com)
- [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (`npm install -g @anthropic-ai/claude-code`)
- `jq` installed (`brew install jq` on macOS)
- A git repository for your project
## Setup
### Option 1: Copy to your project
Copy the ralph files into your project:
```bash
# From your project root
mkdir -p scripts/ralph
cp /path/to/ralph/ralph.sh scripts/ralph/
# Copy the prompt template for your AI tool of choice:
cp /path/to/ralph/prompt.md scripts/ralph/prompt.md # For Amp
# OR
cp /path/to/ralph/CLAUDE.md scripts/ralph/CLAUDE.md # For Claude Code
chmod +x scripts/ralph/ralph.sh
Option 2: Install skills globally (Amp)
Copy the skills to your Amp or Claude config for use across all projects:
For AMP
cp -r skills/prd ~/.config/amp/skills/
cp -r skills/ralph ~/.config/amp/skills/For Claude Code (manual)
cp -r skills/prd ~/.claude/skills/
cp -r skills/ralph ~/.claude/skills/Option 3: Use as Claude Code Marketplace
Add the Ralph marketplace to Claude Code:
/plugin marketplace add snarktank/ralphThen install the skills:
/plugin install ralph-skills@ralph-marketplaceAvailable skills after installation:
/prd- Generate Product Requirements Documents/ralph- Convert PRDs to prd.json format
Skills are automatically invoked when you ask Claude to:
- "create a prd", "write prd for", "plan this feature"
- "convert this prd", "turn into ralph format", "create prd.json"
Configure Amp auto-handoff (recommended)
Add to ~/.config/amp/settings.json:
{
"amp.experimental.autoHandoff": { "context": 90 }
}This enables automatic handoff when context fills up, allowing Ralph to handle large stories that exceed a single context window.
Workflow
1. Create a PRD
Use the PRD skill to generate a detailed requirements document:
Load the prd skill and create a PRD for [your feature description]
Answer the clarifying questions. The skill saves output to tasks/prd-[feature-name].md.
2. Convert PRD to Ralph format
Use the Ralph skill to convert the markdown PRD to JSON:
Load the ralph skill and convert tasks/prd-[feature-name].md to prd.json
This creates prd.json with user stories structured for autonomous execution.
3. Run Ralph
# Using Amp
./scripts/ralph/ralph.sh [max_iterations]
# Using Claude Code
./scripts/ralph/ralph.sh --agent claude [max_iterations]Default is 10 iterations. Use --agent amp, --agent claude, or --agent codex to select your AI coding tool.
Ralph will:
- Create a feature branch (from PRD
branchName) - Pick the highest priority story where
passes: false - Implement that single story
- Run quality checks (typecheck, tests)
- Commit if checks pass
- Update
prd.jsonto mark story aspasses: true - Append learnings to
progress.txt - Repeat until all stories pass or max iterations reached
Testing
Install dev dependencies:
uv pip install -r requirements-dev.txtRun tests:
uv run pytestThe pytest harness exercises the ralphython CLI end-to-end (argument parsing,
deprecated --tool handling, and PRD ingestion) without invoking Amp, Claude,
or Codex so you can validate behavior locally before handing off to agents.
Key Files
| File | Purpose |
|---|---|
ralph.sh |
The bash loop that spawns fresh AI instances (supports --agent amp, --agent claude, or --agent codex) |
prompt.md |
Prompt template for Amp |
CLAUDE.md |
Prompt template for Claude Code |
prd.json |
User stories with passes status (the task list) |
prd.json.example |
Example PRD format for reference |
progress.txt |
Append-only learnings for future iterations |
skills/prd/ |
Skill for generating PRDs (works with Amp and Claude Code) |
skills/ralph/ |
Skill for converting PRDs to JSON (works with Amp and Claude Code) |
.claude-plugin/ |
Plugin manifest for Claude Code marketplace discovery |
flowchart/ |
Interactive visualization of how Ralph works |
Flowchart
View Interactive Flowchart - Click through to see each step with animations.
The flowchart/ directory contains the source code. To run locally:
cd flowchart
npm install
npm run devCritical Concepts
Each Iteration = Fresh Context
Each iteration spawns a new AI instance (Amp or Claude Code) with clean context. The only memory between iterations is:
- Git history (commits from previous iterations)
progress.txt(learnings and context)prd.json(which stories are done)
Small Tasks
Each PRD item should be small enough to complete in one context window. If a task is too big, the LLM runs out of context before finishing and produces poor code.
Right-sized stories:
- Add a database column and migration
- Add a UI component to an existing page
- Update a server action with new logic
- Add a filter dropdown to a list
Too big (split these):
- "Build the entire dashboard"
- "Add authentication"
- "Refactor the API"
AGENTS.md Updates Are Critical
After each iteration, Ralph updates the relevant AGENTS.md files with learnings. This is key because AI coding tools automatically read these files, so future iterations (and future human developers) benefit from discovered patterns, gotchas, and conventions.
Examples of what to add to AGENTS.md:
- Patterns discovered ("this codebase uses X for Y")
- Gotchas ("do not forget to update Z when changing W")
- Useful context ("the settings panel is in component X")
Feedback Loops
Ralph only works if there are feedback loops:
- Typecheck catches type errors
- Tests verify behavior
- CI must stay green (broken code compounds across iterations)
Browser Verification for UI Stories
Frontend stories must include "Verify in browser using dev-browser skill" in acceptance criteria. Ralph will use the dev-browser skill to navigate to the page, interact with the UI, and confirm changes work.
Stop Condition
When all stories have passes: true, Ralph outputs <promise>COMPLETE</promise> and the loop exits.
Debugging
Check current state:
# See which stories are done
cat prd.json | jq '.userStories[] | {id, title, passes}'
# See learnings from previous iterations
cat progress.txt
# Check git history
git log --oneline -10Customizing the Prompt
After copying prompt.md (for Amp) or CLAUDE.md (for Claude Code) to your project, customize it for your project:
- Add project-specific quality check commands
- Include codebase conventions
- Add common gotchas for your stack
Archiving
Ralph automatically archives previous runs when you start a new feature (different branchName). Archives are saved to archive/YYYY-MM-DD-feature-name/.
References
</details>
---
💡 <a href="/PromptExecution/ralph-plus-_b00t_/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
| try: | ||
| last_branch = last_branch_file.read_text().strip() | ||
| except Exception: | ||
| last_branch = "" |
There was a problem hiding this comment.
The broad exception handler catches all exceptions without logging them, making it difficult to debug issues when reading the last branch name fails. Consider catching specific exceptions like FileNotFoundError or IOError, and logging unexpected errors.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| _codex_prompt_file = Path( | ||
| os.environ.get("CODEX_PROMPT_FILE", str(script_dir / "prompt.md")) | ||
| ) | ||
| codex_model = os.environ.get("CODEX_MODEL", "gpt-5.2-codex") |
There was a problem hiding this comment.
The environment variable default "gpt-5.2-codex" appears to reference a model that may not exist. As of my knowledge cutoff (January 2025), there is no publicly documented GPT-5.2 model from OpenAI. Verify that this model name is correct for your use case, or update it to a valid model identifier.
| codex_model = os.environ.get("CODEX_MODEL", "gpt-5.2-codex") | |
| codex_model = os.environ.get("CODEX_MODEL", "gpt-4.1") |
| def _read_branch_name(prd_file: Path) -> str: | ||
| try: | ||
| data = json.loads(prd_file.read_text()) | ||
| except Exception: | ||
| return "" |
There was a problem hiding this comment.
The broad exception handler catches all exceptions, which can hide bugs and make debugging difficult. Consider catching specific exceptions like JSONDecodeError, FileNotFoundError, or at minimum logging the exception before returning an empty string so errors are not silently ignored.
| try: | ||
| async with Client("http://localhost:8766/mcp") as client: | ||
| print("✅ Connected to Ralph MCP server\n") | ||
|
|
||
| # Test get_prd_status tool | ||
| print("📋 Testing get_prd_status tool...") | ||
| result = await client.call_tool("get_prd_status", {}) | ||
| print(f" PRD Status: {result['project']}") | ||
| print(f" Completed: {result['completed_stories']}/{result['total_stories']} ({result['completion_percentage']}%)\n") | ||
|
|
||
| # Test get_ralph_status tool | ||
| print("📊 Testing get_ralph_status tool...") | ||
| result = await client.call_tool("get_ralph_status", {}) | ||
| print(f" Status: {result['status']}") | ||
| print(f" Total lines: {result.get('total_lines', 0)}\n") | ||
|
|
||
| # Test resources | ||
| print("📂 Testing ralph://prd resource...") | ||
| resources = await client.list_resources() | ||
| prd_resources = [r for r in resources if "prd" in r.uri] | ||
| if prd_resources: | ||
| print(f" Found resource: {prd_resources[0].uri}\n") | ||
|
|
||
| print("✅ All tests passed!") | ||
|
|
||
| finally: | ||
| proc.terminate() | ||
| proc.wait(timeout=2) | ||
| print("\n🛑 Server stopped") |
There was a problem hiding this comment.
The test_ralph_mcp.py file lacks error handling for scenarios where the server fails to start, the connection fails, or the MCP tools return error responses. The test assumes happy-path execution and will fail with unclear errors if the server doesn't start properly or if any tool call fails. Add try-except blocks and assertions to handle and report these failure cases clearly.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| total = len(prd_data["userStories"]) | ||
| completed = sum(1 for story in prd_data["userStories"] if story.get("passes", False)) | ||
| incomplete = [ | ||
| {"id": s["id"], "title": s["title"]} | ||
| for s in prd_data["userStories"] | ||
| if not s.get("passes", False) | ||
| ] | ||
|
|
||
| return { | ||
| "status": "loaded", | ||
| "project": prd_data.get("project", "Unknown"), | ||
| "total_stories": total, | ||
| "completed_stories": completed, | ||
| "completion_percentage": round((completed / total) * 100, 1) if total > 0 else 0, | ||
| "incomplete_stories": incomplete[:5], # First 5 incomplete |
There was a problem hiding this comment.
The function assumes the PRD data structure has a "userStories" key. If this key is missing, the function will raise a KeyError. Add validation to check if required keys exist and return an appropriate error status if the PRD structure is invalid.
There was a problem hiding this comment.
@copilot open a new pull request to apply changes based on this feedback
| _codex_prompt_file = Path( | ||
| os.environ.get("CODEX_PROMPT_FILE", str(script_dir / "prompt.md")) | ||
| ) |
There was a problem hiding this comment.
Variable _codex_prompt_file is not used.
| _codex_prompt_file = Path( | |
| os.environ.get("CODEX_PROMPT_FILE", str(script_dir / "prompt.md")) | |
| ) |
There was a problem hiding this comment.
where/how should it be used?
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Brian Horakh <35611074+elasticdotventures@users.noreply.github.com>
|
@elasticdotventures I've opened a new pull request, #2, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Brian Horakh <35611074+elasticdotventures@users.noreply.github.com>
|
@elasticdotventures I've opened a new pull request, #3, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Brian Horakh <35611074+elasticdotventures@users.noreply.github.com>
|
@elasticdotventures I've opened a new pull request, #4, to work on those changes. Once the pull request is ready, I'll request review from you. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Brian Horakh <35611074+elasticdotventures@users.noreply.github.com>
Co-authored-by: elasticdotventures <35611074+elasticdotventures@users.noreply.github.com>
[WIP] Fix issues based on feedback from review on ralph/pytest harness PR
Co-authored-by: elasticdotventures <35611074+elasticdotventures@users.noreply.github.com>
Co-authored-by: elasticdotventures <35611074+elasticdotventures@users.noreply.github.com>
Co-authored-by: elasticdotventures <35611074+elasticdotventures@users.noreply.github.com>
Add error handling to MCP server test harness
Add PRD structure validation to prevent KeyError in get_prd_status
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: Brian Horakh <35611074+elasticdotventures@users.noreply.github.com>
