Research of top coding agents (Aider, Claude Code, Gemini CLI, Devin, OpenCode) reveals the key differences:
| Agent | Key Innovation | Success Rate |
|---|---|---|
| Aider | Unified diffs + flexible patching | 88% self-written |
| Claude Code | MCP + tool calling | Best instruction following |
| Gemini CLI | ReAct loop + 1M context | Free, fast |
| Devin | Sandboxed execution + desktop use | 13.86% SWE-bench |
| OpenCode | Provider-agnostic | Multi-model support |
Core Insight: Aider's unified diff format works 3X better than custom SEARCH/REPLACE because:
- LLMs are already trained on git diffs
- No escaping/JSON overhead
- Flexible error recovery when diffs fail
- ❌ LLM ignores SEARCH/REPLACE format - Custom format unfamiliar to LLMs
- ❌ No error recovery - Single diff failure = complete failure
- ❌ Rate limiting - Single API key, no rotation at tool level
- ❌ No repo mapping - LLM lacks full codebase context
- ❌ Test verification is post-hoc - Tests should drive development
Why: Aider showed 3X improvement using unified diffs over custom formats
Implementation:
// Current (bad):
FILE: path/to/file.go
<<<<<<< SEARCH
old code
=======
new code
>>>>>>>
// New (good) - use standard unified diff:
--- a/path/to/file.go
+++ b/path/to/file.go
@@ -1,5 +1,7 @@
func example() {
- // old
+ // new
+ // added
return
}Files to modify:
internal/evolution/prompts_aider.go- Change formatinternal/evolution/search_replace.go- Rename todiff_parser.go
Why: Aider's 9X improvement with flexible error recovery
Implementation:
func ApplyUnifiedDiff(content string, diff string) (string, error) {
// Strategy 1: Standard patch
// Strategy 2: Normalize whitespace
// Strategy 3: Relative indentation matching
// Strategy 4: Split into smaller hunks
// Strategy 5: Expand context window
}Key techniques:
- If hunk fails → try without line numbers
- If whitespace fails → normalize and retry
- If location fails → try fuzzy matching on content
Why: Different models follow instructions differently
Implementation:
// Detect model and use optimal format
func GetEditFormatForModel(model string) string {
switch {
case strings.Contains(model, "claude"):
return "json" // Claude follows JSON well
case strings.Contains(model, "gpt"):
return "unified-diff" // Aider's approach
case strings.Contains(model, "gemini"):
return "unified-diff"
default:
return "unified-diff"
}
}Why: Aider's repo map dramatically improves LLM context
Implementation:
type RepoMap struct {
Files []FileNode
Functions []FunctionNode
Classes []ClassNode
}
type FileNode struct {
Path string
Language string
Size int
Functions []string // function names
Classes []string // class names
}Usage in prompts:
## Repository Map
- internal/evolution/
- engine.go (500 lines)
- func: Run(), Execute(), Verify()
- type: Engine, Config
- phases.go (400 lines)
- func: Plan(), Implement(), Review()
Why: Full context exceeds token limits
Implementation:
// Send only relevant files based on task
func SelectRelevantFiles(task string, repoMap RepoMap) []string {
// Use embeddings or keyword matching
// Prioritize files mentioned in task
// Add files that import/are imported by those files
}Why: Current rotation is script-level, not tool-level
Implementation:
func (e *Engine) callProviderWithRetry(ctx context.Context, req Request) (*Response, error) {
maxRetries := 3
baseDelay := 2 * time.Second
for attempt := 0; attempt < maxRetries; attempt++ {
resp, err := e.provider.Call(ctx, req)
if isRateLimit(err) {
delay := baseDelay * math.Pow(2, float64(attempt))
time.Sleep(delay)
continue
}
return resp, err
}
return nil, ErrMaxRetriesExceeded
}Why: Swap providers when one fails
Implementation:
type ProviderPool struct {
providers []Provider
current int
}
func (p *ProviderPool) Call(ctx context.Context, req Request) (*Response, error) {
for i := range p.providers {
idx := (p.current + i) % len(p.providers)
resp, err := p.providers[idx].Call(ctx, req)
if !isProviderError(err) {
p.current = idx
return resp, err
}
}
return nil, ErrAllProvidersFailed
}Why: Tests should drive development, not be added after
Implementation:
// In prompts, require test FIRST:
## Task: Fix bug in findTodos
### Step 1: Write Test (REQUIRED)
Create a test that FAILS with current code.
The test should demonstrate the bug.
### Step 2: Run Test
Verify it fails: `go test -v -run TestFindTodos`
### Step 3: Fix Code
Now fix the bug to make test pass.
### Step 4: Verify
Run: `go test -v -run TestFindTodos`
Must pass.Why: Prevent partial fixes
Implementation:
func verifyTestDriven(ctx context.Context, changes []string) error {
// 1. Verify test file was created/modified BEFORE code file
testFiles := filterTestFiles(changes)
codeFiles := filterNonTestFiles(changes)
if len(testFiles) == 0 {
return ErrNoTestChanges
}
// 2. Run tests - should pass
if !runTests().Success {
return ErrTestsFailed
}
// 3. Verify coverage didn't decrease
if getCoverage() < getPreviousCoverage() {
return ErrCoverageDecreased
}
return nil
}Why: Prevent destructive commands
Implementation:
type SandboxConfig struct {
AllowedCommands []string
DisallowedPatterns []string
Timeout time.Duration
}
var SafeConfig = SandboxConfig{
AllowedCommands: []string{
"go test", "go build", "go vet", "go fmt",
"git status", "git diff", "git add",
"ls", "cat", "grep", "find",
},
DisallowedPatterns: []string{
"rm -rf", "curl | sh", "wget | sh",
"git push --force", "dd of=/dev/",
},
}
func (e *Engine) validateCommand(cmd string) error {
for _, pattern := range SafeConfig.DisallowedPatterns {
if strings.Contains(cmd, pattern) {
return fmt.Errorf("command blocked: %s", pattern)
}
}
return nil
}Why: Catch issues before merging
Implementation:
func preMergeVerification(ctx context.Context, prNumber int) error {
// 1. Run full test suite
if !runTests(ctx).Success {
return ErrTestsFailed
}
// 2. Run linting
if !runLinters(ctx).Success {
return ErrLintingFailed
}
// 3. Verify no protected files changed
if hasProtectedChanges(ctx) {
return ErrProtectedFilesModified
}
// 4. Verify code coverage maintained
if coverageDecreased(ctx) {
return ErrCoverageDecreased
}
return nil
}Why: Don't repeat mistakes
Implementation:
type FailurePattern struct {
ErrorType string
Model string
FixSuggestion string
}
func (e *Engine) learnFromFailure(err error, model string) {
pattern := FailurePattern{
ErrorType: categorizeError(err),
Model: model,
FixSuggestion: generateFixSuggestion(err),
}
// Save to memory
saveToJSONL("memory/failure_patterns.jsonl", pattern)
// Update next prompt with learned patterns
e.addLearnedPatternsToPrompt()
}Why: Different models need different prompts
Implementation:
func GetSystemPromptForModel(model string) string {
basePrompt := loadBasePrompt()
switch {
case strings.Contains(model, "claude"):
// Claude follows instructions well, use concise format
return basePrompt + "\n\nUse JSON for file edits."
case strings.Contains(model, "gpt-4"):
// GPT-4 works best with unified diffs
return basePrompt + "\n\nUse unified diff format for edits."
case strings.Contains(model, "gemini"):
// Gemini benefits from step-by-step
return basePrompt + "\n\nThink step by step."
default:
return basePrompt
}
}internal/evolution/diff_parser.go- Unified diff parsinginternal/evolution/flexible_patch.go- Error recoveryinternal/evolution/repo_map.go- Repository mappinginternal/evolution/provider_pool.go- Multi-provider supportinternal/evolution/sandbox.go- Command sandboxingscripts/evolution/test_first_prompt.go- TDD prompts
internal/evolution/prompts_aider.go- Use unified diffsinternal/evolution/phases.go- Add verification gatesscripts/evolution/evolve.sh- Already has rotation
| Metric | Current | Target |
|---|---|---|
| Code changes per evolution | ~3 | 5+ |
| Test inclusion rate | 50% | 100% |
| Build/test pass rate | 70% | 95% |
| SEARCH/REPLACE success | 0% | 80%+ |
| Rate limit recovery | Manual | Auto |
- ✅ Unified diff format (replaces SEARCH/REPLACE)
- ✅ Flexible patching (error recovery)
- ✅ Repo mapping (better context)
- Multi-provider fallback
- Test-first prompts
- Pre-merge verification
- Sandboxed execution
- Failure pattern learning
- Model-specific tuning
- Use familiar formats - Unified diffs are 3X better than custom formats
- Be flexible - 9X improvement with error recovery
- Provide context - Repo mapping improves LLM understanding
- Multi-provider - Rate limits happen, have backups
- Test-first - TDD prevents bugs, not post-hoc testing
- Verify everything - Pre-merge gates catch failures
Based on research of Aider (42K stars), Claude Code, Gemini CLI (99K stars), Devin, and OpenCode. Key insight: Format familiarity matters more than format sophistication.