ralph/pytest harness pr by elasticdotventures · Pull Request #1 · PromptExecution/b00t-wiggums

elasticdotventures · 2026-01-31T10:59:57Z

feat: add FastMCP 3.0 server capabilities to Ralph
feat: add pytest harness

- Add FastMCP 3.0.0b1 integration with dual CLI/MCP modes - Expose MCP tools: run_ralph_iteration, get_ralph_status, get_prd_status - Expose MCP resources: ralph://prd, ralph://progress - Fix ralph.sh symlink resolution (cd to script dir before uv run) - Update default model to gpt-5.2-codex - Add README-MCP.md with usage examples Ralph can now be used as both: 1. CLI tool: ./ralph.sh --agent codex 3 2. MCP server: uv run --script ralphython.py --mcp --transport http Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This pull request adds FastMCP 3.0 server capabilities to Ralph and introduces a pytest testing harness. The implementation migrates the core logic from a bash script to a Python module (ralphython.py) that can run as both a CLI tool and an MCP server.

Changes:

Rewrote Ralph's core logic in Python with FastMCP 3.0 integration, exposing MCP tools for running iterations and checking status, plus resources for accessing PRD and progress files
Added pytest test suite covering argument parsing, deprecated flag handling, and PRD ingestion
Simplified ralph.sh to a thin wrapper that delegates to the Python implementation via uv

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 18 comments.

Show a summary per file

File	Description
ralphython.py	Main Python implementation with CLI and MCP server modes, including tools and resources
tests/test_ralphython.py	Pytest test suite for CLI functionality
tests/conftest.py	Test configuration for module imports
test_ralph_mcp.py	Integration test for MCP server capabilities
ralph.sh	Simplified bash wrapper delegating to Python
requirements-dev.txt	Development dependencies (pytest, mypy)
pytest.ini	Pytest configuration
README.md	Updated usage instructions and added testing section
README-MCP.md	New documentation for MCP features
AGENTS.md	Updated agent selection examples
tests/pycache/*	Compiled bytecode (should not be committed)
pycache/*	Compiled bytecode (should not be committed)
ralph_cli.py	Unclear purpose file containing only "ralphython.py"
=3.0.0b1	Malformed or incomplete file
ralph.sh~	Editor backup file (should not be committed)

Comments suppressed due to low confidence (1)

README.md:258

The main README.md does not mention the new MCP functionality at all, despite it being a significant feature addition mentioned in the PR description. Consider adding a reference to README-MCP.md in the main README so users are aware of the MCP capabilities.

# Ralph

![Ralph](ralph.webp)

Ralph is an autonomous AI agent loop that runs AI coding tools ([Amp](https://ampcode.com) or [Claude Code](https://docs.anthropic.com/en/docs/claude-code)) repeatedly until all PRD items are complete. Each iteration is a fresh instance with clean context. Memory persists via git history, `progress.txt`, and `prd.json`.

Based on [Geoffrey Huntley's Ralph pattern](https://ghuntley.com/ralph/).

[Read my in-depth article on how I use Ralph](https://x.com/ryancarson/status/2008548371712135632)

## Prerequisites

- One of the following AI coding tools installed and authenticated:
  - [Amp CLI](https://ampcode.com)
  - [Claude Code](https://docs.anthropic.com/en/docs/claude-code) (`npm install -g @anthropic-ai/claude-code`)
- `jq` installed (`brew install jq` on macOS)
- A git repository for your project

## Setup

### Option 1: Copy to your project

Copy the ralph files into your project:

```bash
# From your project root
mkdir -p scripts/ralph
cp /path/to/ralph/ralph.sh scripts/ralph/

# Copy the prompt template for your AI tool of choice:
cp /path/to/ralph/prompt.md scripts/ralph/prompt.md    # For Amp
# OR
cp /path/to/ralph/CLAUDE.md scripts/ralph/CLAUDE.md    # For Claude Code

chmod +x scripts/ralph/ralph.sh

Option 2: Install skills globally (Amp)

Copy the skills to your Amp or Claude config for use across all projects:

For AMP

cp -r skills/prd ~/.config/amp/skills/
cp -r skills/ralph ~/.config/amp/skills/

For Claude Code (manual)

cp -r skills/prd ~/.claude/skills/
cp -r skills/ralph ~/.claude/skills/

Option 3: Use as Claude Code Marketplace

Add the Ralph marketplace to Claude Code:

/plugin marketplace add snarktank/ralph

Then install the skills:

/plugin install ralph-skills@ralph-marketplace

Available skills after installation:

/prd - Generate Product Requirements Documents
/ralph - Convert PRDs to prd.json format

Skills are automatically invoked when you ask Claude to:

"create a prd", "write prd for", "plan this feature"
"convert this prd", "turn into ralph format", "create prd.json"

Configure Amp auto-handoff (recommended)

Add to ~/.config/amp/settings.json:

{
  "amp.experimental.autoHandoff": { "context": 90 }
}

This enables automatic handoff when context fills up, allowing Ralph to handle large stories that exceed a single context window.

Workflow

1. Create a PRD

Use the PRD skill to generate a detailed requirements document:

Load the prd skill and create a PRD for [your feature description]

Answer the clarifying questions. The skill saves output to tasks/prd-[feature-name].md.

2. Convert PRD to Ralph format

Use the Ralph skill to convert the markdown PRD to JSON:

Load the ralph skill and convert tasks/prd-[feature-name].md to prd.json

This creates prd.json with user stories structured for autonomous execution.

3. Run Ralph

# Using Amp
./scripts/ralph/ralph.sh [max_iterations]

# Using Claude Code
./scripts/ralph/ralph.sh --agent claude [max_iterations]

Default is 10 iterations. Use --agent amp, --agent claude, or --agent codex to select your AI coding tool.

Ralph will:

Create a feature branch (from PRD branchName)
Pick the highest priority story where passes: false
Implement that single story
Run quality checks (typecheck, tests)
Commit if checks pass
Update prd.json to mark story as passes: true
Append learnings to progress.txt
Repeat until all stories pass or max iterations reached

Testing

Install dev dependencies:

uv pip install -r requirements-dev.txt

Run tests:

uv run pytest

The pytest harness exercises the ralphython CLI end-to-end (argument parsing,
deprecated --tool handling, and PRD ingestion) without invoking Amp, Claude,
or Codex so you can validate behavior locally before handing off to agents.

Key Files

File	Purpose
`ralph.sh`	The bash loop that spawns fresh AI instances (supports `--agent amp`, `--agent claude`, or `--agent codex`)
`prompt.md`	Prompt template for Amp
`CLAUDE.md`	Prompt template for Claude Code
`prd.json`	User stories with `passes` status (the task list)
`prd.json.example`	Example PRD format for reference
`progress.txt`	Append-only learnings for future iterations
`skills/prd/`	Skill for generating PRDs (works with Amp and Claude Code)
`skills/ralph/`	Skill for converting PRDs to JSON (works with Amp and Claude Code)
`.claude-plugin/`	Plugin manifest for Claude Code marketplace discovery
`flowchart/`	Interactive visualization of how Ralph works

Flowchart

View Interactive Flowchart - Click through to see each step with animations.

The flowchart/ directory contains the source code. To run locally:

cd flowchart
npm install
npm run dev

Critical Concepts

Each Iteration = Fresh Context

Each iteration spawns a new AI instance (Amp or Claude Code) with clean context. The only memory between iterations is:

Git history (commits from previous iterations)
progress.txt (learnings and context)
prd.json (which stories are done)

Small Tasks

Each PRD item should be small enough to complete in one context window. If a task is too big, the LLM runs out of context before finishing and produces poor code.

Right-sized stories:

Add a database column and migration
Add a UI component to an existing page
Update a server action with new logic
Add a filter dropdown to a list

Too big (split these):

"Build the entire dashboard"
"Add authentication"
"Refactor the API"

AGENTS.md Updates Are Critical

After each iteration, Ralph updates the relevant AGENTS.md files with learnings. This is key because AI coding tools automatically read these files, so future iterations (and future human developers) benefit from discovered patterns, gotchas, and conventions.

Examples of what to add to AGENTS.md:

Patterns discovered ("this codebase uses X for Y")
Gotchas ("do not forget to update Z when changing W")
Useful context ("the settings panel is in component X")

Feedback Loops

Ralph only works if there are feedback loops:

Typecheck catches type errors
Tests verify behavior
CI must stay green (broken code compounds across iterations)

Browser Verification for UI Stories

Frontend stories must include "Verify in browser using dev-browser skill" in acceptance criteria. Ralph will use the dev-browser skill to navigate to the page, interact with the UI, and confirm changes work.

Stop Condition

When all stories have passes: true, Ralph outputs <promise>COMPLETE</promise> and the loop exits.

Debugging

Check current state:

# See which stories are done
cat prd.json | jq '.userStories[] | {id, title, passes}'

# See learnings from previous iterations
cat progress.txt

# Check git history
git log --oneline -10

Customizing the Prompt

After copying prompt.md (for Amp) or CLAUDE.md (for Claude Code) to your project, customize it for your project:

Add project-specific quality check commands
Include codebase conventions
Add common gotchas for your stack

Archiving

Ralph automatically archives previous runs when you start a new feature (different branchName). Archives are saved to archive/YYYY-MM-DD-feature-name/.

References

</details>



---

💡 <a href="/PromptExecution/ralph-plus-_b00t_/new/main/.github/instructions?filename=*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.

Copilot · 2026-01-31T11:10:44Z

+    try:
+        last_branch = last_branch_file.read_text().strip()
+    except Exception:
+        last_branch = ""


The broad exception handler catches all exceptions without logging them, making it difficult to debug issues when reading the last branch name fails. Consider catching specific exceptions like FileNotFoundError or IOError, and logging unexpected errors.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-01-31T11:10:45Z

+    _codex_prompt_file = Path(
+        os.environ.get("CODEX_PROMPT_FILE", str(script_dir / "prompt.md"))
+    )
+    codex_model = os.environ.get("CODEX_MODEL", "gpt-5.2-codex")


The environment variable default "gpt-5.2-codex" appears to reference a model that may not exist. As of my knowledge cutoff (January 2025), there is no publicly documented GPT-5.2 model from OpenAI. Verify that this model name is correct for your use case, or update it to a valid model identifier.

Suggested change

codex_model = os.environ.get("CODEX_MODEL", "gpt-5.2-codex")

codex_model = os.environ.get("CODEX_MODEL", "gpt-4.1")

Copilot · 2026-01-31T11:10:45Z

+def _read_branch_name(prd_file: Path) -> str:
+    try:
+        data = json.loads(prd_file.read_text())
+    except Exception:
+        return ""


The broad exception handler catches all exceptions, which can hide bugs and make debugging difficult. Consider catching specific exceptions like JSONDecodeError, FileNotFoundError, or at minimum logging the exception before returning an empty string so errors are not silently ignored.

Copilot · 2026-01-31T11:10:47Z

+    try:
+        async with Client("http://localhost:8766/mcp") as client:
+            print("✅ Connected to Ralph MCP server\n")
+
+            # Test get_prd_status tool
+            print("📋 Testing get_prd_status tool...")
+            result = await client.call_tool("get_prd_status", {})
+            print(f"   PRD Status: {result['project']}")
+            print(f"   Completed: {result['completed_stories']}/{result['total_stories']} ({result['completion_percentage']}%)\n")
+
+            # Test get_ralph_status tool
+            print("📊 Testing get_ralph_status tool...")
+            result = await client.call_tool("get_ralph_status", {})
+            print(f"   Status: {result['status']}")
+            print(f"   Total lines: {result.get('total_lines', 0)}\n")
+
+            # Test resources
+            print("📂 Testing ralph://prd resource...")
+            resources = await client.list_resources()
+            prd_resources = [r for r in resources if "prd" in r.uri]
+            if prd_resources:
+                print(f"   Found resource: {prd_resources[0].uri}\n")
+
+            print("✅ All tests passed!")
+
+    finally:
+        proc.terminate()
+        proc.wait(timeout=2)
+        print("\n🛑 Server stopped")


The test_ralph_mcp.py file lacks error handling for scenarios where the server fails to start, the connection fails, or the MCP tools return error responses. The test assumes happy-path execution and will fail with unclear errors if the server doesn't start properly or if any tool call fails. Add try-except blocks and assertions to handle and report these failure cases clearly.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-01-31T11:10:48Z

+    total = len(prd_data["userStories"])
+    completed = sum(1 for story in prd_data["userStories"] if story.get("passes", False))
+    incomplete = [
+        {"id": s["id"], "title": s["title"]}
+        for s in prd_data["userStories"]
+        if not s.get("passes", False)
+    ]
+
+    return {
+        "status": "loaded",
+        "project": prd_data.get("project", "Unknown"),
+        "total_stories": total,
+        "completed_stories": completed,
+        "completion_percentage": round((completed / total) * 100, 1) if total > 0 else 0,
+        "incomplete_stories": incomplete[:5],  # First 5 incomplete


The function assumes the PRD data structure has a "userStories" key. If this key is missing, the function will raise a KeyError. Add validation to check if required keys exist and return an appropriate error status if the PRD structure is invalid.

@copilot open a new pull request to apply changes based on this feedback

Copilot · 2026-01-31T11:10:48Z

+    _codex_prompt_file = Path(
+        os.environ.get("CODEX_PROMPT_FILE", str(script_dir / "prompt.md"))
+    )


Variable _codex_prompt_file is not used.

Suggested change

_codex_prompt_file = Path(

os.environ.get("CODEX_PROMPT_FILE", str(script_dir / "prompt.md"))

)

where/how should it be used?