Skip to content

Commit cf0046c

Browse files
DavyMorganclaude
andcommitted
docs: address all PR review feedback on SKILL.md
- Fix core concepts: config.yaml is optional, not required - Clarify EVOLVE-BLOCK markers are LLM guidance, not enforced by the tool - Fix fitness description: excludes MAP-Elites feature dimensions - Remove non-numeric error field from evaluator example metrics dict - Fix checkpoint structure: programs are .json, add best_program.py and best_program_info.json, add artifacts/ directory - Fix output paths to be consistent (my_experiment/output throughout) - Point to best_program.py instead of programs/<id>.py for result retrieval - Remove plotly from visualizer install (Flask only, no plotly dependency) - Clarify artifact config: evaluator.enable_artifacts + prompt.include_artifacts Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 52d6314 commit cf0046c

1 file changed

Lines changed: 31 additions & 26 deletions

File tree

SKILL.md

Lines changed: 31 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -36,29 +36,29 @@ Verify: `openevolve-run --help`
3636

3737
## Core Concepts
3838

39-
Every OpenEvolve experiment requires exactly **3 files**:
39+
Every OpenEvolve experiment requires **2 files** (with an optional third):
4040

41-
| File | Purpose |
42-
|------|---------|
43-
| `initial_program.py` | Starting code with `EVOLVE-BLOCK` markers around the section to evolve |
44-
| `evaluator.py` | Defines `evaluate(program_path) -> dict` that scores each variant |
45-
| `config.yaml` | LLM provider, iterations, population size, system message |
41+
| File | Required | Purpose |
42+
|------|----------|---------|
43+
| `initial_program.py` | Yes | Starting code with `EVOLVE-BLOCK` markers around the section to evolve |
44+
| `evaluator.py` | Yes | Defines `evaluate(program_path) -> dict` that scores each variant |
45+
| `config.yaml` | No | LLM provider, iterations, population size, system message (sensible defaults are used when omitted) |
4646

4747
---
4848

4949
## Workflow
5050

5151
### Step 1: Scaffold the experiment
5252

53-
Create a project directory with the three required files.
53+
Create a project directory with the required files.
5454

5555
```bash
5656
mkdir -p my_experiment
5757
```
5858

5959
#### initial_program.py
6060

61-
Wrap the code to evolve in `EVOLVE-BLOCK` markers. Code outside the markers stays fixed.
61+
Wrap the code to evolve in `EVOLVE-BLOCK` markers. These markers serve as guidance for the LLM, signaling which sections should be modified. Note: the markers are not enforced by the tool — the diff engine can technically change any part of the file — but they strongly steer the LLM's attention.
6262

6363
```python
6464
import math
@@ -79,10 +79,11 @@ if __name__ == "__main__":
7979
print(solve([3, 1, 4, 1, 5]))
8080
```
8181

82-
**Rules for EVOLVE-BLOCK markers:**
82+
**Guidelines for EVOLVE-BLOCK markers:**
8383
- Both `# EVOLVE-BLOCK-START` and `# EVOLVE-BLOCK-END` are required as a pair
8484
- Multiple blocks are supported for evolving different sections
85-
- If omitted, OpenEvolve wraps the entire file (less control)
85+
- If omitted, the LLM may modify any part of the file (less control)
86+
- These markers are conventions that guide the LLM prompt — they are not enforced at the tool level
8687
- Keep the code between markers self-contained — imports and helpers go outside
8788

8889
#### evaluator.py
@@ -202,7 +203,8 @@ evaluator:
202203
| `database.num_islands` | 5 | Parallel evolving populations |
203204
| `database.migration_interval` | 50 | Generations between island migration |
204205
| `evaluator.cascade_evaluation` | true | Multi-stage filtering of bad programs |
205-
| `evaluator.enable_artifacts` | true | Feed errors/warnings back to LLM |
206+
| `evaluator.enable_artifacts` | true | Capture and store evaluation artifacts |
207+
| `prompt.include_artifacts` | true | Include artifacts in LLM prompts (feedback loop) |
206208

207209
**LLM provider examples:**
208210

@@ -312,28 +314,31 @@ result = evolve_function(
312314
**Output directory structure:**
313315

314316
```
315-
output/
317+
my_experiment/output/
316318
├── checkpoints/
317319
│ ├── checkpoint_100/
318-
│ │ ├── metadata.json # iteration info, best program ID
319-
│ │ └── programs/
320-
│ │ ├── <program_id>.py # evolved program variants
321-
│ │ └── ...
320+
│ │ ├── metadata.json # database state and iteration info
321+
│ │ ├── best_program.py # best evolved code (extension matches language)
322+
│ │ ├── best_program_info.json # best program metadata (id, metrics, generation)
323+
│ │ ├── programs/ # all population programs as JSON
324+
│ │ │ ├── <program_id>.json
325+
│ │ │ └── ...
326+
│ │ └── artifacts/ # evaluation artifacts (if any)
322327
│ └── checkpoint_200/
323-
└── evolution_trace.jsonl # per-iteration log (if enabled)
328+
└── evolution_trace.jsonl # per-iteration log (if enabled)
324329
```
325330

326331
**Get the best program from the latest checkpoint:**
327332

328333
```bash
329334
# Find the latest checkpoint
330-
ls -d output/checkpoints/checkpoint_* | sort -t_ -k2 -n | tail -1
335+
ls -d my_experiment/output/checkpoints/checkpoint_* | sort -t_ -k2 -n | tail -1
331336

332-
# Read metadata to find best program ID
333-
cat output/checkpoints/checkpoint_200/metadata.json | python -m json.tool
337+
# View the best evolved code directly
338+
cat my_experiment/output/checkpoints/checkpoint_200/best_program.py
334339

335-
# View the best evolved code
336-
cat output/checkpoints/checkpoint_200/programs/<best_program_id>.py
340+
# Read best program metadata (id, metrics, generation, etc.)
341+
cat my_experiment/output/checkpoints/checkpoint_200/best_program_info.json | python -m json.tool
337342
```
338343

339344
**Check score progression (if evolution_trace is enabled):**
@@ -342,7 +347,7 @@ cat output/checkpoints/checkpoint_200/programs/<best_program_id>.py
342347
# Extract scores from JSONL trace
343348
python -c "
344349
import json
345-
with open('output/evolution_trace.jsonl') as f:
350+
with open('my_experiment/output/evolution_trace.jsonl') as f:
346351
for line in f:
347352
entry = json.loads(line)
348353
if 'metrics' in entry:
@@ -356,7 +361,7 @@ with open('output/evolution_trace.jsonl') as f:
356361
openevolve-run my_experiment/initial_program.py \
357362
my_experiment/evaluator.py \
358363
--config my_experiment/config.yaml \
359-
--checkpoint output/checkpoints/checkpoint_200 \
364+
--checkpoint my_experiment/output/checkpoints/checkpoint_200 \
360365
--iterations 100
361366
```
362367

@@ -366,7 +371,7 @@ This loads the MAP-Elites population from the checkpoint and runs 100 more itera
366371

367372
```bash
368373
pip install flask
369-
python scripts/visualizer.py --path output/checkpoints/checkpoint_200/
374+
python scripts/visualizer.py --path my_experiment/output/checkpoints/checkpoint_200/
370375
```
371376

372377
Opens a web UI with evolution tree, score progression, code diffs, and MAP-Elites grid.
@@ -446,7 +451,7 @@ as `feature_dimensions` in the database config. OpenEvolve maintains Pareto-opti
446451
| Evolution stuck at same score | Increase `temperature`, add more `num_diverse_programs`, improve system message |
447452
| Out of memory | Reduce `population_size`, enable `cascade_evaluation` |
448453
| LLM rate limits | Add `retry_delay: 10` in llm config, or use OptiLLM proxy |
449-
| Bad evolved code | Enable `enable_artifacts: true` so errors feed back to the LLM |
454+
| Bad evolved code | Enable `evaluator.enable_artifacts: true` and `prompt.include_artifacts: true` so errors feed back to the LLM |
450455

451456
---
452457

0 commit comments

Comments
 (0)