Skip to content

Commit 81fdc9d

Browse files
committed
f
1 parent 31877f0 commit 81fdc9d

File tree

3 files changed

+190
-20
lines changed

3 files changed

+190
-20
lines changed

examples/k_module_problem/README.md

Lines changed: 19 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -140,31 +140,31 @@ This establishes the "no learning" baseline. Any method that beats this is demon
140140

141141
**Key observation**: The iterative agent repeatedly finds configurations with 3/4 correct modules (`csv_reader`, `quicksort`, `json`) but cannot identify that `preprocess` is the wrong module. It keeps cycling through variations without escaping this local optimum.
142142

143-
### OpenEvolve (Evolutionary) Results
143+
### OpenEvolve (Evolutionary) Results (3 trials, 100 iterations max)
144144

145-
| Trial | Iterations | Result | Best Score | Notes |
146-
|-------|------------|--------|------------|-------|
147-
| 1 | 21 | SUCCESS | 100% (4/4) | Solution found through population diversity |
145+
| Trial | Iterations | Result | Best Score |
146+
|-------|------------|--------|------------|
147+
| 1 | 18 | SUCCESS | 100% (4/4) |
148+
| 2 | 50 | SUCCESS | 100% (4/4) |
149+
| 3 | 89 | SUCCESS | 100% (4/4) |
148150

149151
**Summary:**
150-
- **Success rate**: 100% (1/1 trial found solution)
151-
- **Solution found at**: Iteration 21
152-
- **Key observation**: OpenEvolve's population-based approach explores multiple configurations in parallel. By iteration 9, the population already had diverse configurations, and by iteration 21, the correct combination was discovered.
152+
- **Success rate**: 100% (3/3 trials found solution)
153+
- **Avg iterations to solution**: 52.3
154+
- **Min iterations**: 18
155+
- **Max iterations**: 89
153156

154-
**Progression:**
155-
- Iteration 3: 25% (1/4) - Initial exploration
156-
- Iteration 9: 50% (2/4) - Multiple 50% configs in population
157-
- Iteration 21: 100% (4/4) - csv_reader, normalize, quicksort, json - PERFECT!
158-
159-
**Key advantage**: OpenEvolve's prompt encourages systematic exploration ("try DIFFERENT options for EACH module") rather than following potentially misleading hints. Combined with higher temperature (0.9), larger population (25), and more frequent migration, this leads to faster discovery.
157+
**Key advantage**: OpenEvolve's population-based approach maintains diverse configurations that explore different module combinations in parallel. Even when some individuals get stuck at local optima (75% with wrong preprocessing), others explore alternatives and eventually discover the correct solution.
160158

161159
### Comparison Summary
162160

163-
| Method | Success Rate | Evaluations to Solution | Key Limitation |
164-
|--------|-------------|------------------------|----------------|
165-
| **Random Baseline** | 16% | 43.3 avg (when found) | No learning |
166-
| **Iterative Refinement** | 33% | 13 (when found) | Gets stuck at 75%, can't escape local optima |
167-
| **OpenEvolve** | 100% | 21 | Population diversity + systematic exploration |
161+
| Method | Success Rate | Avg Iterations | Key Finding |
162+
|--------|-------------|----------------|-------------|
163+
| **Random Baseline** | 16% | 43.3 (when found) | No learning baseline |
164+
| **Iterative Refinement** | 33% (1/3) | 13 (when found) | Gets stuck at 75% local optimum |
165+
| **OpenEvolve** | **100% (3/3)** | 52.3 | Always finds solution |
166+
167+
**Key insight**: While OpenEvolve takes more iterations on average (52.3 vs 13), it has a **100% success rate** compared to iterative refinement's 33%. The evolutionary approach's population diversity ensures it eventually escapes local optima that trap single-trajectory methods.
168168

169169
## Why This Matters
170170

@@ -190,6 +190,7 @@ Real-world examples:
190190
| `config.yaml` | OpenEvolve configuration |
191191
| `iterative_agent.py` | Iterative refinement agent using OpenRouter API |
192192
| `run_iterative_trials.py` | Run multiple trials of iterative agent |
193+
| `run_openevolve_trials.py` | Run multiple trials of OpenEvolve |
193194
| `run_random_baseline.py` | Random search baseline with pass@k analysis |
194195
| `compare_results.py` | Analysis and visualization |
195196

examples/k_module_problem/config.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,7 +81,7 @@ evaluator:
8181
use_llm_feedback: false
8282
enable_artifacts: true
8383

84-
# Early stopping - stop when we find the solution
85-
early_stopping_patience: 30 # Reduced - expect faster convergence
84+
# Early stopping - disabled to allow full exploration
85+
early_stopping_patience: 100 # Allow full run
8686
convergence_threshold: 0.001
8787
early_stopping_metric: "combined_score"
Lines changed: 169 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,169 @@
1+
#!/usr/bin/env python3
2+
"""Run multiple trials of OpenEvolve to get statistics."""
3+
4+
import json
5+
import os
6+
import shutil
7+
import subprocess
8+
import sys
9+
from pathlib import Path
10+
11+
# Run from the example directory
12+
os.chdir(Path(__file__).parent)
13+
14+
15+
def run_trial(trial_num: int, max_iterations: int = 100, seed: int = None):
16+
"""Run a single OpenEvolve trial."""
17+
output_dir = f"openevolve_output_trial_{trial_num}"
18+
19+
# Clean output directory
20+
if os.path.exists(output_dir):
21+
shutil.rmtree(output_dir)
22+
23+
# Update config with new seed if provided
24+
if seed is not None:
25+
# Read config
26+
with open("config.yaml", "r") as f:
27+
config_content = f.read()
28+
29+
# Replace seed
30+
import re
31+
config_content = re.sub(r'random_seed:\s*\d+', f'random_seed: {seed}', config_content)
32+
33+
# Write temp config
34+
temp_config = f"config_trial_{trial_num}.yaml"
35+
with open(temp_config, "w") as f:
36+
f.write(config_content)
37+
else:
38+
temp_config = "config.yaml"
39+
40+
# Run OpenEvolve
41+
cmd = [
42+
"openevolve-run",
43+
"initial_program.py",
44+
"evaluator.py",
45+
"--config", temp_config,
46+
"--iterations", str(max_iterations),
47+
"--output", output_dir,
48+
]
49+
50+
print(f"\n{'='*60}")
51+
print(f"TRIAL {trial_num + 1}: Running OpenEvolve with seed {seed}")
52+
print('='*60)
53+
54+
result = subprocess.run(cmd, capture_output=True, text=True)
55+
56+
# Clean up temp config
57+
if seed is not None and os.path.exists(temp_config):
58+
os.remove(temp_config)
59+
60+
# Parse results from log
61+
solution_found_at = None
62+
best_score = 0.0
63+
64+
log_dir = Path(output_dir) / "logs"
65+
if log_dir.exists():
66+
log_files = list(log_dir.glob("*.log"))
67+
if log_files:
68+
with open(log_files[0], "r") as f:
69+
log_content = f.read()
70+
71+
import re
72+
73+
# Find best score
74+
score_matches = re.findall(r'combined_score[=:]\s*([\d.]+)', log_content)
75+
if score_matches:
76+
best_score = max(float(s) for s in score_matches)
77+
78+
# Look for first 100% solution - find the "New best" line with 1.0000
79+
new_best_matches = re.findall(r'New best solution found at iteration (\d+):', log_content)
80+
perfect_matches = re.findall(r'Iteration (\d+):.*?combined_score=1\.0000', log_content)
81+
82+
if perfect_matches:
83+
solution_found_at = int(perfect_matches[0])
84+
elif best_score >= 1.0 and new_best_matches:
85+
# Fallback: find last new best if we have 100%
86+
solution_found_at = int(new_best_matches[-1])
87+
88+
return {
89+
"trial": trial_num,
90+
"seed": seed,
91+
"solution_found_at": solution_found_at,
92+
"best_score": best_score,
93+
"max_iterations": max_iterations,
94+
}
95+
96+
97+
def run_trials(num_trials: int = 3, max_iterations: int = 100, base_seed: int = 100):
98+
"""Run multiple trials and collect statistics."""
99+
results = []
100+
solutions_found = []
101+
102+
for trial in range(num_trials):
103+
seed = base_seed + trial * 111 # Different seeds for each trial
104+
result = run_trial(trial, max_iterations, seed)
105+
results.append(result)
106+
107+
if result["solution_found_at"] is not None:
108+
solutions_found.append(result["solution_found_at"])
109+
print(f"Trial {trial + 1}: SUCCESS at iteration {result['solution_found_at']}")
110+
else:
111+
print(f"Trial {trial + 1}: FAILED (best score: {result['best_score']:.2%})")
112+
113+
# Calculate statistics
114+
success_rate = len(solutions_found) / num_trials
115+
avg_iterations = sum(solutions_found) / len(solutions_found) if solutions_found else float('inf')
116+
min_iterations = min(solutions_found) if solutions_found else None
117+
max_iterations_found = max(solutions_found) if solutions_found else None
118+
119+
print(f"\n{'='*60}")
120+
print("OPENEVOLVE TRIAL RESULTS")
121+
print('='*60)
122+
print(f"Trials: {num_trials}")
123+
print(f"Max iterations per trial: {max_iterations}")
124+
print(f"Success rate: {success_rate:.0%} ({len(solutions_found)}/{num_trials})")
125+
if solutions_found:
126+
print(f"Avg iterations to solution: {avg_iterations:.1f}")
127+
print(f"Min iterations: {min_iterations}")
128+
print(f"Max iterations: {max_iterations_found}")
129+
print('='*60)
130+
131+
# Save summary
132+
summary = {
133+
"config": {
134+
"num_trials": num_trials,
135+
"max_iterations": max_iterations,
136+
},
137+
"summary": {
138+
"success_rate": success_rate,
139+
"avg_iterations_to_solution": avg_iterations if solutions_found else None,
140+
"min_iterations": min_iterations,
141+
"max_iterations": max_iterations_found,
142+
"solutions_found": len(solutions_found),
143+
},
144+
"trials": results,
145+
}
146+
147+
with open("openevolve_trials_results.json", "w") as f:
148+
json.dump(summary, f, indent=2)
149+
150+
print(f"\nResults saved to: openevolve_trials_results.json")
151+
152+
# Clean up trial output directories
153+
for trial in range(num_trials):
154+
output_dir = f"openevolve_output_trial_{trial}"
155+
if os.path.exists(output_dir):
156+
shutil.rmtree(output_dir)
157+
158+
return summary
159+
160+
161+
if __name__ == "__main__":
162+
import argparse
163+
parser = argparse.ArgumentParser()
164+
parser.add_argument("--trials", type=int, default=3, help="Number of trials")
165+
parser.add_argument("--iterations", type=int, default=100, help="Max iterations per trial")
166+
parser.add_argument("--seed", type=int, default=100, help="Base random seed")
167+
args = parser.parse_args()
168+
169+
run_trials(num_trials=args.trials, max_iterations=args.iterations, base_seed=args.seed)

0 commit comments

Comments
 (0)