iterate V2 Implementation Plan

Based on Research of Top 20 AI Coding Agents

Executive Summary

Research of top coding agents (Aider, Claude Code, Gemini CLI, Devin, OpenCode) reveals the key differences:

Agent	Key Innovation	Success Rate
Aider	Unified diffs + flexible patching	88% self-written
Claude Code	MCP + tool calling	Best instruction following
Gemini CLI	ReAct loop + 1M context	Free, fast
Devin	Sandboxed execution + desktop use	13.86% SWE-bench
OpenCode	Provider-agnostic	Multi-model support

Core Insight: Aider's unified diff format works 3X better than custom SEARCH/REPLACE because:

LLMs are already trained on git diffs
No escaping/JSON overhead
Flexible error recovery when diffs fail

Problem Analysis

Current iterate Issues:

❌ LLM ignores SEARCH/REPLACE format - Custom format unfamiliar to LLMs
❌ No error recovery - Single diff failure = complete failure
❌ Rate limiting - Single API key, no rotation at tool level
❌ No repo mapping - LLM lacks full codebase context
❌ Test verification is post-hoc - Tests should drive development

Implementation Plan

Phase 1: Fix Core Editing (Week 1)

1.1 Replace SEARCH/REPLACE with Unified Diffs

Why: Aider showed 3X improvement using unified diffs over custom formats

Implementation:

// Current (bad):
FILE: path/to/file.go
<<<<<<< SEARCH
old code
=======
new code
>>>>>>>

// New (good) - use standard unified diff:
--- a/path/to/file.go
+++ b/path/to/file.go
@@ -1,5 +1,7 @@
 func example() {
-    // old
+    // new
+    // added
     return
 }

Files to modify:

internal/evolution/prompts_aider.go - Change format
internal/evolution/search_replace.go - Rename to diff_parser.go

1.2 Add Flexible Patching (Critical)

Why: Aider's 9X improvement with flexible error recovery

Implementation:

func ApplyUnifiedDiff(content string, diff string) (string, error) {
    // Strategy 1: Standard patch
    // Strategy 2: Normalize whitespace
    // Strategy 3: Relative indentation matching
    // Strategy 4: Split into smaller hunks
    // Strategy 5: Expand context window
}

Key techniques:

If hunk fails → try without line numbers
If whitespace fails → normalize and retry
If location fails → try fuzzy matching on content

1.3 Multi-Model Support

Why: Different models follow instructions differently

Implementation:

// Detect model and use optimal format
func GetEditFormatForModel(model string) string {
    switch {
    case strings.Contains(model, "claude"):
        return "json"  // Claude follows JSON well
    case strings.Contains(model, "gpt"):
        return "unified-diff"  // Aider's approach
    case strings.Contains(model, "gemini"):
        return "unified-diff"
    default:
        return "unified-diff"
    }
}

Phase 2: Enhanced Context (Week 1-2)

2.1 Repo Mapping (Like Aider)

Why: Aider's repo map dramatically improves LLM context

Implementation:

type RepoMap struct {
    Files     []FileNode
    Functions []FunctionNode
    Classes   []ClassNode
}

type FileNode struct {
    Path     string
    Language string
    Size     int
    Functions []string  // function names
    Classes   []string  // class names
}

Usage in prompts:

## Repository Map
- internal/evolution/
  - engine.go (500 lines)
    - func: Run(), Execute(), Verify()
    - type: Engine, Config
  - phases.go (400 lines)
    - func: Plan(), Implement(), Review()

2.2 Incremental Context Window

Why: Full context exceeds token limits

Implementation:

// Send only relevant files based on task
func SelectRelevantFiles(task string, repoMap RepoMap) []string {
    // Use embeddings or keyword matching
    // Prioritize files mentioned in task
    // Add files that import/are imported by those files
}

Phase 3: Tool-Level Rate Limiting (Week 2)

3.1 Implement Retry with Backoff

Why: Current rotation is script-level, not tool-level

Implementation:

func (e *Engine) callProviderWithRetry(ctx context.Context, req Request) (*Response, error) {
    maxRetries := 3
    baseDelay := 2 * time.Second
    
    for attempt := 0; attempt < maxRetries; attempt++ {
        resp, err := e.provider.Call(ctx, req)
        
        if isRateLimit(err) {
            delay := baseDelay * math.Pow(2, float64(attempt))
            time.Sleep(delay)
            continue
        }
        
        return resp, err
    }
    return nil, ErrMaxRetriesExceeded
}

3.2 Multi-Provider Fallback

Why: Swap providers when one fails

Implementation:

type ProviderPool struct {
    providers []Provider
    current   int
}

func (p *ProviderPool) Call(ctx context.Context, req Request) (*Response, error) {
    for i := range p.providers {
        idx := (p.current + i) % len(p.providers)
        resp, err := p.providers[idx].Call(ctx, req)
        
        if !isProviderError(err) {
            p.current = idx
            return resp, err
        }
    }
    return nil, ErrAllProvidersFailed
}

Phase 4: Test-First Workflow (Week 2)

4.1 TDD Enforcement

Why: Tests should drive development, not be added after

Implementation:

// In prompts, require test FIRST:
## Task: Fix bug in findTodos

### Step 1: Write Test (REQUIRED)
Create a test that FAILS with current code.
The test should demonstrate the bug.

### Step 2: Run Test
Verify it fails: `go test -v -run TestFindTodos`

### Step 3: Fix Code
Now fix the bug to make test pass.

### Step 4: Verify
Run: `go test -v -run TestFindTodos`
Must pass.

4.2 Test Coverage Gates

Why: Prevent partial fixes

Implementation:

func verifyTestDriven(ctx context.Context, changes []string) error {
    // 1. Verify test file was created/modified BEFORE code file
    testFiles := filterTestFiles(changes)
    codeFiles := filterNonTestFiles(changes)
    
    if len(testFiles) == 0 {
        return ErrNoTestChanges
    }
    
    // 2. Run tests - should pass
    if !runTests().Success {
        return ErrTestsFailed
    }
    
    // 3. Verify coverage didn't decrease
    if getCoverage() < getPreviousCoverage() {
        return ErrCoverageDecreased
    }
    
    return nil
}

Phase 5: Safety & Verification (Week 2-3)

5.1 Sandboxed Execution (Like Devin)

Why: Prevent destructive commands

Implementation:

type SandboxConfig struct {
    AllowedCommands []string
    DisallowedPatterns []string
    Timeout time.Duration
}

var SafeConfig = SandboxConfig{
    AllowedCommands: []string{
        "go test", "go build", "go vet", "go fmt",
        "git status", "git diff", "git add",
        "ls", "cat", "grep", "find",
    },
    DisallowedPatterns: []string{
        "rm -rf", "curl | sh", "wget | sh",
        "git push --force", "dd of=/dev/",
    },
}

func (e *Engine) validateCommand(cmd string) error {
    for _, pattern := range SafeConfig.DisallowedPatterns {
        if strings.Contains(cmd, pattern) {
            return fmt.Errorf("command blocked: %s", pattern)
        }
    }
    return nil
}

5.2 Pre-Merge Verification

Why: Catch issues before merging

Implementation:

func preMergeVerification(ctx context.Context, prNumber int) error {
    // 1. Run full test suite
    if !runTests(ctx).Success {
        return ErrTestsFailed
    }
    
    // 2. Run linting
    if !runLinters(ctx).Success {
        return ErrLintingFailed
    }
    
    // 3. Verify no protected files changed
    if hasProtectedChanges(ctx) {
        return ErrProtectedFilesModified
    }
    
    // 4. Verify code coverage maintained
    if coverageDecreased(ctx) {
        return ErrCoverageDecreased
    }
    
    return nil
}

Phase 6: Learning & Adaptation (Week 3)

6.1 Failure Pattern Memory

Why: Don't repeat mistakes

Implementation:

type FailurePattern struct {
    ErrorType     string
    Model         string
    FixSuggestion string
}

func (e *Engine) learnFromFailure(err error, model string) {
    pattern := FailurePattern{
        ErrorType: categorizeError(err),
        Model:     model,
        FixSuggestion: generateFixSuggestion(err),
    }
    
    // Save to memory
    saveToJSONL("memory/failure_patterns.jsonl", pattern)
    
    // Update next prompt with learned patterns
    e.addLearnedPatternsToPrompt()
}

6.2 Model-Specific Tuning

Why: Different models need different prompts

Implementation:

func GetSystemPromptForModel(model string) string {
    basePrompt := loadBasePrompt()
    
    switch {
    case strings.Contains(model, "claude"):
        // Claude follows instructions well, use concise format
        return basePrompt + "\n\nUse JSON for file edits."
        
    case strings.Contains(model, "gpt-4"):
        // GPT-4 works best with unified diffs
        return basePrompt + "\n\nUse unified diff format for edits."
        
    case strings.Contains(model, "gemini"):
        // Gemini benefits from step-by-step
        return basePrompt + "\n\nThink step by step."
        
    default:
        return basePrompt
    }
}

Detailed File Changes

Files to Create:

internal/evolution/diff_parser.go - Unified diff parsing
internal/evolution/flexible_patch.go - Error recovery
internal/evolution/repo_map.go - Repository mapping
internal/evolution/provider_pool.go - Multi-provider support
internal/evolution/sandbox.go - Command sandboxing
scripts/evolution/test_first_prompt.go - TDD prompts

Files to Modify:

internal/evolution/prompts_aider.go - Use unified diffs
internal/evolution/phases.go - Add verification gates
scripts/evolution/evolve.sh - Already has rotation

Success Metrics

Metric	Current	Target
Code changes per evolution	~3	5+
Test inclusion rate	50%	100%
Build/test pass rate	70%	95%
SEARCH/REPLACE success	0%	80%+
Rate limit recovery	Manual	Auto

Priority Order

Must Fix (This Week):

✅ Unified diff format (replaces SEARCH/REPLACE)
✅ Flexible patching (error recovery)
✅ Repo mapping (better context)

Should Fix (Next Week):

Multi-provider fallback
Test-first prompts
Pre-merge verification

Nice to Have (Week 3):

Sandboxed execution
Failure pattern learning
Model-specific tuning

Key Takeaways from Research

Use familiar formats - Unified diffs are 3X better than custom formats
Be flexible - 9X improvement with error recovery
Provide context - Repo mapping improves LLM understanding
Multi-provider - Rate limits happen, have backups
Test-first - TDD prevents bugs, not post-hoc testing
Verify everything - Pre-merge gates catch failures

Based on research of Aider (42K stars), Claude Code, Gemini CLI (99K stars), Devin, and OpenCode. Key insight: Format familiarity matters more than format sophistication.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

iterate V2 Implementation Plan

Based on Research of Top 20 AI Coding Agents

Executive Summary

Problem Analysis

Current iterate Issues:

Implementation Plan

Phase 1: Fix Core Editing (Week 1)

1.1 Replace SEARCH/REPLACE with Unified Diffs

1.2 Add Flexible Patching (Critical)

1.3 Multi-Model Support

Phase 2: Enhanced Context (Week 1-2)

2.1 Repo Mapping (Like Aider)

2.2 Incremental Context Window

Phase 3: Tool-Level Rate Limiting (Week 2)

3.1 Implement Retry with Backoff

3.2 Multi-Provider Fallback

Phase 4: Test-First Workflow (Week 2)

4.1 TDD Enforcement

4.2 Test Coverage Gates

Phase 5: Safety & Verification (Week 2-3)

5.1 Sandboxed Execution (Like Devin)

5.2 Pre-Merge Verification

Phase 6: Learning & Adaptation (Week 3)

6.1 Failure Pattern Memory

6.2 Model-Specific Tuning

Detailed File Changes

Files to Create:

Files to Modify:

Success Metrics

Priority Order

Must Fix (This Week):

Should Fix (Next Week):

Nice to Have (Week 3):

Key Takeaways from Research

FilesExpand file tree

ROADMAP_V2.md

Latest commit

History

ROADMAP_V2.md

File metadata and controls

iterate V2 Implementation Plan

Based on Research of Top 20 AI Coding Agents

Executive Summary

Problem Analysis

Current iterate Issues:

Implementation Plan

Phase 1: Fix Core Editing (Week 1)

1.1 Replace SEARCH/REPLACE with Unified Diffs

1.2 Add Flexible Patching (Critical)

1.3 Multi-Model Support

Phase 2: Enhanced Context (Week 1-2)

2.1 Repo Mapping (Like Aider)

2.2 Incremental Context Window

Phase 3: Tool-Level Rate Limiting (Week 2)

3.1 Implement Retry with Backoff

3.2 Multi-Provider Fallback

Phase 4: Test-First Workflow (Week 2)

4.1 TDD Enforcement

4.2 Test Coverage Gates

Phase 5: Safety & Verification (Week 2-3)

5.1 Sandboxed Execution (Like Devin)

5.2 Pre-Merge Verification

Phase 6: Learning & Adaptation (Week 3)

6.1 Failure Pattern Memory

6.2 Model-Specific Tuning

Detailed File Changes

Files to Create:

Files to Modify:

Success Metrics

Priority Order

Must Fix (This Week):

Should Fix (Next Week):

Nice to Have (Week 3):

Key Takeaways from Research