Skip to content

Commit 8fbe582

Browse files
committed
add logging and test scripts
1 parent 5edde31 commit 8fbe582

164 files changed

Lines changed: 20890 additions & 258 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -59,3 +59,7 @@ problems
5959
# Local debug logs
6060
prompt_builder_logs.jsonl
6161
try.txt
62+
63+
# Claude-generated documentation
64+
CLAUDE.md
65+
CONFIGURATION_GUIDE.md

CLAUDE.md

Lines changed: 0 additions & 118 deletions
This file was deleted.

SEARCH_STRATEGIES.md

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -278,7 +278,7 @@ All strategies share the following OpenEvolve components:
278278
To compare all strategies:
279279
280280
```bash
281-
# Run all strategies with same parameters
281+
# Run all strategies with same parameters (each saves to separate directory)
282282
for strategy in "" "--best-of-n" "--beam-search" "--mcts"; do
283283
python openevolve-run.py \
284284
examples/math_mas/initial_program.py \
@@ -287,8 +287,11 @@ for strategy in "" "--best-of-n" "--beam-search" "--mcts"; do
287287
--iterations 50
288288
done
289289
290-
# Compare results
291-
# Each run saves to: openevolve_output/best/best_program_info.json
290+
# Compare results - each strategy has its own directory:
291+
# openevolve_output/best_of_n/best/best_program_info.json
292+
# openevolve_output/beam_search/best/best_program_info.json
293+
# openevolve_output/mcts/best/best_program_info.json
294+
# openevolve_output/best/best_program_info.json (default MAP-Elites)
292295
```
293296

294297
## Configuration
@@ -372,24 +375,37 @@ database:
372375
373376
## Output
374377
375-
All strategies produce the same output structure:
378+
Each strategy saves results to its own subdirectory to prevent overwriting:
376379
377380
```
378381
openevolve_output/
379-
├── best/
380-
│ ├── best_program.py # Best program code
381-
│ └── best_program_info.json # Metrics and metadata
382-
├── checkpoints/
383-
│ ├── checkpoint_5/
384-
│ │ ├── strategy.json # Strategy state
385-
│ │ ├── best_program.py
386-
│ │ └── best_program_info.json
387-
│ └── checkpoint_10/
388-
│ └── ...
389-
└── logs/
390-
└── openevolve_YYYYMMDD_HHMMSS.log
382+
├── best_of_n/ # Best-of-N results
383+
│ ├── best/
384+
│ │ ├── best_program.py # Best program code
385+
│ │ └── best_program_info.json # Metrics and metadata
386+
│ ├── checkpoints/
387+
│ │ ├── checkpoint_5/
388+
│ │ │ ├── strategy.json # Strategy state
389+
│ │ │ ├── best_program.py
390+
│ │ │ └── best_program_info.json
391+
│ │ └── checkpoint_10/
392+
│ │ └── ...
393+
│ └── logs/
394+
│ └── openevolve_YYYYMMDD_HHMMSS.log
395+
├── beam_search/ # Beam Search results
396+
│ ├── best/
397+
│ ├── checkpoints/
398+
│ └── logs/
399+
├── mcts/ # MCTS results
400+
│ ├── best/
401+
│ ├── checkpoints/
402+
│ └── logs/
403+
└── [default MAP-Elites at root if no strategy specified]
391404
```
392405

406+
**Note**: Each strategy uses `openevolve_output/<strategy_name>/` to keep results separate.
407+
You can override with `--output custom_dir/`.
408+
393409
## Extending with New Strategies
394410

395411
To add a new search strategy:

examples/math_mas/README.md

Lines changed: 45 additions & 112 deletions
Original file line numberDiff line numberDiff line change
@@ -1,150 +1,83 @@
1-
# Multi-Agent Math Solver Evolution
1+
# Multi-Agent Math Solving System Evolution
22

3-
This example uses OpenEvolve to evolve a multi-agent system for solving mathematical problems from the Math500 dataset.
3+
This directory contains scripts for evolving and testing multi-agent systems that solve mathematical problems from the OlympiadBench dataset.
44

5-
## Overview
5+
## Quick Start
66

7-
The system evolves a collaborative multi-agent architecture with up to 4 agents:
8-
- **Solver**: Initial problem-solving
9-
- **Verifier**: Solution verification
10-
- **Reviser**: Error correction based on feedback
11-
- **Refiner**: Final answer polishing
12-
13-
OpenEvolve optimizes:
14-
- Agent system prompts (roles and expertise)
15-
- Communication protocols (interaction patterns)
16-
- Workflow structure (agent coordination)
17-
18-
## Setup
19-
20-
### 1. Install Dependencies
7+
### Run Evolution (Single Strategy)
218

229
```bash
23-
# Be in the main code
24-
pip install -e ".[dev]"
25-
26-
# Install additional dependencies for this example
27-
pip install langchain langchain-openai datasets word2number sympy latex2sympy2
28-
```
29-
30-
### 2. Set Environment Variables
10+
# Run MAP-Elites for 100 iterations with 100 problems
11+
./run_map_elites.sh 100 100
3112

32-
```bash
33-
# OpenAI API key (used for both evolution and multi-agent system)
34-
export OPENAI_API_KEY="your-openai-api-key"
13+
# Run Best-of-N
14+
./run_best_of_n.sh 100 100
3515

36-
# Optional: Configure the model used by agents (inside the multi-agent system)
37-
export OPENEVOLVE_MODEL="gpt-4o-mini" # Default model for agents in the multi-agent system
16+
# Run Beam Search
17+
./run_beam_search.sh 100 100
3818

39-
# Optional: Number of test problems per evaluation
40-
export MATH_EVAL_PROBLEMS="10" # Default: 10 problems
19+
# Run MCTS
20+
./run_mcts.sh 100 100
4121
```
4222

43-
### 3. Test the Initial System
23+
### Run All Strategies in Parallel
4424

4525
```bash
46-
cd examples/math_mas
47-
48-
# Test the initial multi-agent system
49-
python initial_program.py
50-
51-
# Test the evaluator
52-
python evaluator.py
26+
# Run all 4 strategies simultaneously
27+
./run_all_strategies.sh 100 100
5328
```
5429

55-
## Running Evolution
56-
57-
### Basic Evolution Run
30+
### Test a Program
5831

5932
```bash
60-
# From the repository root
61-
python openevolve-run.py \
62-
examples/math_mas/initial_program.py \
63-
examples/math_mas/evaluator.py \
64-
--config examples/math_mas/config.yaml \
65-
--iterations 50
66-
```
33+
# Test initial program with 100 problems (seed=42)
34+
python test_program.py initial_program.py
6735

68-
### Resume from Checkpoint
36+
# Test evolved program with different seed for test set
37+
python test_program.py openevolve_output/best/best_program.py --seed 99
6938

70-
```bash
71-
python openevolve-run.py \
72-
examples/math_mas/initial_program.py \
73-
examples/math_mas/evaluator.py \
74-
--config examples/math_mas/config.yaml \
75-
--checkpoint examples/math_mas/openevolve_output/checkpoints/checkpoint_40 \
76-
--iterations 20
39+
# Test with more problems
40+
python test_program.py path/to/program.py --num-problems 200 --seed 1234
7741
```
7842

79-
## Configuration
43+
---
8044

81-
Key configuration options in `config.yaml`:
45+
## Testing Script: test_program.py
8246

83-
### Evolution Settings
84-
- `max_iterations: 50` - Number of evolution iterations
85-
- `diff_based_evolution: false` - Use full rewrites instead of diffs
86-
- `early_stopping_patience: 20` - Stop if no improvement for 20 iterations
47+
Standalone script to evaluate any program on math problems with configurable random seed.
8748

88-
### Database (MAP-Elites)
89-
- `population_size: 100` - Maximum programs in population
90-
- `num_islands: 4` - Isolated populations for diversity
91-
- `feature_dimensions: [accuracy, completion_rate]` - Quality-diversity space
92-
93-
### Evaluator
94-
- `cascade_evaluation: true` - Fast-fail for bad programs
95-
- `parallel_evaluations: 4` - Run 4 evaluations concurrently
96-
- `timeout: 600` - 10 minute timeout per evaluation
97-
98-
99-
100-
101-
### Adjust Problem Difficulty
49+
**Usage:**
10250
```bash
103-
# Use fewer problems for faster iterations
104-
export MATH_EVAL_PROBLEMS="5"
51+
python test_program.py <program_path> [options]
10552

106-
# Use more problems for better evaluation
107-
export MATH_EVAL_PROBLEMS="20"
53+
Options:
54+
-n, --num-problems N Number of problems (default: 100, use -1 for all 675)
55+
-s, --seed N Random seed for sampling (default: 42)
56+
-o, --output FILE Output JSON file
10857
```
10958

110-
### Customize Agent Model
59+
**Examples:**
11160
```bash
112-
# Use a more powerful model for agents
113-
export OPENEVOLVE_MODEL="gpt-4o"
61+
# Test with default settings (100 problems, seed=42)
62+
python test_program.py initial_program.py
11463

115-
# Or use GPT-5 for agents too (expensive but powerful)
116-
export OPENEVOLVE_MODEL="gpt-5"
117-
```
64+
# Test on DIFFERENT problems (seed=99 instead of 42)
65+
python test_program.py openevolve_output/best/best_program.py --seed 99
11866

119-
### Visualize Evolution
120-
```bash
121-
# After evolution completes
122-
python scripts/visualizer.py \
123-
--path examples/math_mas/openevolve_output/checkpoints/checkpoint_50/
67+
# Test on full dataset
68+
python test_program.py path/to/program.py --num-problems -1
12469
```
12570

126-
## Troubleshooting
71+
## Train/Test Split with Seeds
12772

128-
### "Import langchain_openai could not be resolved"
129-
```bash
130-
pip install langchain langchain-openai
131-
```
73+
Use different random seeds to create train/test splits:
13274

133-
### "No problems loaded"
13475
```bash
135-
# Install datasets library
136-
pip install datasets
76+
# Evolution uses seed=42 (from config.yaml)
77+
./run_map_elites.sh 100 100
13778

138-
# Or problems will fall back to synthetic test problems
79+
# Test on different problems (seed=99)
80+
python test_program.py openevolve_output/best/best_program.py --seed 99
13981
```
14082

141-
### API Rate Limits
142-
- Reduce `parallel_evaluations` in config.yaml
143-
- Increase `timeout` if models are slow
144-
- Use faster models (gpt-4o-mini instead of gpt-5)
145-
146-
### Out of Memory
147-
- Reduce `population_size` in config.yaml
148-
- Reduce `MATH_EVAL_PROBLEMS` environment variable
149-
- Enable cascade evaluation to fail fast on bad programs
150-
83+
See full README for more details.

0 commit comments

Comments
 (0)