| # | Agent | Stars | Developer | Key Feature | Price |
|---|---|---|---|---|---|
| 1 | Claude Code | 98K ⭐ | Anthropic | Best autonomous agent, MCP support | $20/mo |
| 2 | Gemini CLI | 99K ⭐ | 1M token context, free | Free | |
| 3 | OpenCode | 103K ⭐ | OpenCode AI | Provider-agnostic, LSP support | Free (BYOK) |
| 4 | Aider | 42K ⭐ | Community | Git-native, unified diffs | Free (BYOK) |
| 5 | Goose | 33K ⭐ | Block | MCP extensions, recipes | Free |
| 6 | Codex CLI | 45K ⭐ | OpenAI | Rust-based, permission levels | $20/mo |
| 7 | Amp | - | Sourcegraph | Deep reasoning, Oracle agent | Free tier |
| 8 | Warp AI | - | Warp Inc | Terminal replacement, Oz agents | $20/mo |
| 9 | NovaKit CLI | - | NovaKit | Gemini-powered terminal | Free |
| 10 | AI Magicx CLI | - | AI Magicx | Multi-provider | Free |
| # | Agent | Rating | Developer | Key Feature | Price |
|---|---|---|---|---|---|
| 11 | Cursor | 96/100 | Anysphere | Best IDE integration | $20/mo |
| 12 | Windsurf | 91/100 | Codeium | Cascade agent | Free tier |
| 13 | GitHub Copilot | - | Microsoft | IDE integration | $10/mo |
| 14 | Cline | 59K ⭐ | Cline | Autonomous IDE coding | Free |
| 15 | Kiro | - | Kiro | AI-first IDE | Free |
| # | Agent | SWE-bench | Developer | Key Feature | Price |
|---|---|---|---|---|---|
| 16 | Devin | 13.86% | Cognition | Fully autonomous, sandboxed | $20/mo |
| 17 | Claude Devin | - | Anthropic | Claude-powered Devin | $20/mo |
| 18 | DeepWiki | - | Cognition | Codebase documentation | Free |
| 19 | Devin Search | - | Cognition | Q&A about codebases | Free |
| # | Agent | Focus | Developer | Key Feature | Price |
|---|---|---|---|---|---|
| 20 | CodeRabbit | Code review | CodeRabbit | PR review automation | Free tier |
- Unified diff format - 3X better than custom SEARCH/REPLACE
- Flexible patching - 9X improvement with error recovery
- Model-agnostic - Works with any LLM
- Auto git commits - Descriptive commit messages
- Lesson: Use familiar formats (git diffs), be flexible with errors
- MCP (Model Context Protocol) - Standardized tool integration
- Repo mapping - Maps entire codebase before changes
- Context compaction - Handles long sessions without crashing
- Multi-step execution - Plans, executes, verifies
- Subagents - Specialized agents for different tasks
- Lesson: Build robust scaffolding, use MCP for extensibility
- Client/server architecture - Bun runtime + Go TUI
- AI SDK - Provider-agnostic LLM access
- Plan/Build agents - Separate planning from execution
- LSP integration - Real-time diagnostics
- Snapshot/restore - Git-based state management
- Lesson: Separate concerns, build for flexibility
- 1M token context - Entire codebase in one shot
- Free tier - 60 req/min, 1000 req/day
- ReAct loop - Reason + Act pattern
- Lesson: Leverage context window, minimize cost
- Sandboxed execution - Isolated VM per session
- Interactive planning - Human reviews before execution
- Desktop use - Can interact with GUI apps
- DeepWiki - Auto-generates documentation
- Lesson: Safety through sandboxing, human-in-loop for safety
- MCP-first - Everything via MCP
- Recipes - Reusable agent patterns
- Headless mode - CI/CD integration
- Sandboxed execution - Secure by default
- Lesson: Build extension points, think headless
- Rust-based - Fast, small binary
- 3-tier permissions - Read-only, Auto, Full
- ChatGPT integration - Already paid for by users
- Lesson: Fast execution, trust levels
- Deep mode - Extended reasoning for complex problems
- Oracle agent - Codebase analysis
- Librarian agent - Documentation Q&A
- Sub-agents - Team of specialized agents
- Lesson: Multi-agent architecture
| Format | Success Rate | Notes |
|---|---|---|
| Custom JSON | 20% | LLMs unfamiliar |
| SEARCH/REPLACE | 20% | Our current issue |
| Unified Diff | 61% | 3X better - familiar to LLMs |
| Tool Calling (JSON) | Varies | Model-dependent |
Key insight: Use formats LLMs have seen millions of times in training (git diffs)
- Without flexible patching: 9X more failures
- Strategies that work:
- Normalize whitespace
- Try fuzzy matching
- Split into smaller hunks
- Expand context window
- Repo mapping (Claude Code)
- Incremental context
- Session summarization
- Snapshot/restore (OpenCode)
- Sandboxed execution (Devin, Goose)
- Permission levels (Codex)
- Pre-commit verification
- Post-change testing
- Plan vs Build separation (OpenCode)
- Sub-agents for specialized tasks (Amp)
- Task agents (Claude Code)
- Review agents (CodeRabbit)
- Replace SEARCH/REPLACE with unified diffs
- Implement 3 fallback strategies
- Update prompts
Implementation:
// Plan Agent - analyzes but doesn't edit
type PlanAgent struct {
Name string
Tools []string // read, grep, glob only
Prompt string
}
// Build Agent - can edit files
type BuildAgent struct {
Name string
Tools []string // all tools
Prompt string
}
// Review Agent - verifies changes
type ReviewAgent struct {
Name string
Tools []string
Prompt string
}Implementation:
func ApplyDiffWithRetry(content, diff string) (string, error) {
// Strategy 1: Direct patch
// Strategy 2: Normalize whitespace
// Strategy 3: Fuzzy matching
// Strategy 4: Split into smaller hunks
// Strategy 5: Expand context window
}Implementation:
type RepoMap struct {
Files map[string]FileInfo
Functions map[string][]Function
Imports map[string][]string
}
func BuildRepoMap(root string) *RepoMap {
// Use AST parsing
// Extract functions, classes, imports
// Build dependency graph
}Implementation:
type SandboxConfig struct {
AllowedCmds []string
BlockedPatterns []string
Timeout time.Duration
MemoryLimit int64
}
func (e *Engine) ExecuteInSandbox(cmd string) error {
// Whitelist allowed commands
// Block destructive patterns
// Set timeout
// Monitor resources
}Implementation:
func (e *Engine) CreateSnapshot() (string, error) {
// Git add + write-tree
return hash, nil
}
func (e *Engine) RestoreSnapshot(hash string) error {
// Git read-tree + checkout-index
}Implementation:
type ProviderPool struct {
providers map[string]Provider
current Provider
rateLimitCount int
}
func (p *ProviderPool) CallWithFallback(req Request) error {
for _, provider := range p.providers {
if resp, err := provider.Call(req); err == nil {
return resp
}
if isRateLimit(err) {
continue // Try next
}
return err // Real error
}
return ErrAllProvidersFailed
}Implementation:
type MCPClient struct {
serverPath string
transport string // stdio, http
}
func (m *MCPClient) ListTools() ([]Tool, error)
func (m *MCPClient) CallTool(name string, args map[string]interface{}) (string, error)Implementation:
const (
PermissionReadOnly Permission = "read" // glob, grep, read
PermissionAuto Permission = "auto" // can edit, needs approval
PermissionFull Permission = "full" // can do anything
)Implementation:
func PreChangeCheck(changes []string) error {
// 1. Check protected files
// 2. Check for destructive commands
// 3. Check test coverage impact
}Implementation:
func PostChangeCheck() error {
// 1. Run go build
// 2. Run go vet
// 3. Run go test
// 4. Check for regressions
}Implementation:
func EnforceTDD(task Task) error {
// 1. Require test file in diff
// 2. Run test - should FAIL initially
// 3. Make code change
// 4. Run test - should PASS
}Implementation:
type FailurePattern struct {
ErrorType string
Model string
Frequency int
FixSuggestion string
}
func (e *Engine) LearnFromFailure(err error) {
pattern := categorizeError(err)
saveToJSONL("memory/failures.jsonl", pattern)
}Implementation:
func GetModelSpecificPrompt(model string) string {
switch {
case strings.Contains(model, "claude"):
return claudePrompt // Use JSON tool calls
case strings.Contains(model, "gpt"):
return gptPrompt // Use unified diffs
case strings.Contains(model, "gemini"):
return geminiPrompt // Use step-by-step
}
}| File | Purpose |
|---|---|
internal/evolution/agents.go |
Multi-agent architecture |
internal/evolution/repo_map.go |
Repository mapping |
internal/evolution/sandbox.go |
Sandboxed execution |
internal/evolution/snapshot.go |
Git-based snapshots |
internal/evolution/provider_pool.go |
Multi-provider fallback |
internal/evolution/mcp.go |
MCP client support |
internal/evolution/permissions.go |
Tool permission levels |
internal/evolution/tdd.go |
TDD enforcement |
| File | Changes |
|---|---|
internal/evolution/prompts_aider.go |
Already updated with unified diffs |
internal/evolution/phases.go |
Already updated for unified diffs |
internal/evolution/engine.go |
Add agent routing |
scripts/evolution/evolve.sh |
Already has API rotation |
| Metric | Current | Target |
|---|---|---|
| Code changes/evolution | ~3 | 10+ |
| Test inclusion | 50% | 100% |
| Build pass rate | 70% | 95% |
| Unified diff success | 0% | 70%+ |
| Rate limit recovery | Manual | Auto |
| Context relevance | 40% | 80% |
- Task Success Rate - % of tasks completed successfully
- Test Coverage - Coverage maintained/increased
- Error Recovery - % of errors recovered with retries
- Code Quality - Lint/vet pass rate
- API Cost - Cost per successful task
- ✅ Unified diff format (done)
- ⏳ Flexible diff application (in progress)
- ⏳ Multi-agent architecture (plan/build separation)
- ⏳ Better error messages
- Repo mapping for better context
- Sandboxed command execution
- Multi-provider fallback
- Pre/post verification gates
- MCP integration
- Failure pattern learning
- Model-specific prompts
- Snapshot/restore
- Familiar formats - Unified diffs > custom formats (3X better)
- Flexible error handling - 9X improvement with retries
- Plan/Build separation - OpenCode, Claude Code
- Sandboxing - Devin, Goose for safety
- MCP - Standardized tool integration
- Git-native - Aider's commit workflow
- Context management - Claude Code compaction
- Custom edit formats - LLMs don't follow
- Single agent - Need specialized agents
- No error recovery - Fail fast = fail often
- Unbounded execution - Need sandbox + timeout
- Single provider - Rate limits happen
- Autonomous - Self-evolving (unique)
- Go-native - Built in Go, for Go projects
- Evolution - Can modify itself (unique)
- GitHub-native - Built-in CI/CD integration
Research compiled from: Aider docs, Claude Code docs, OpenCode source, Gemini CLI docs, Devin technical deep-dives, Goose architecture, Codex CLI, Amp, and 15+ comparison articles.