Skip to content

Commit 1775f22

Browse files
committed
AGENTS.md, CI upgrade, .env.example, table report, error handling
- AGENTS.md: project conventions for future coding sessions - .github/workflows/ci.yml: validate all modules import + benchmarks.json schema - .env.example: template for LLM_API_KEY / LLM_MODEL / LLM_BASE_URL - analysis/table_report.py: ASCII table summaries from run logs - generator.py: LLM timeout, empty response fallback, empty project guard - mutation_engine.py: empty/short prompt guards, crossover short-word guard
1 parent 0355884 commit 1775f22

10 files changed

Lines changed: 457 additions & 65 deletions

File tree

.env.example

Lines changed: 10 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,14 @@
1-
# LLM Provider Configuration
2-
# Pick one provider and set the corresponding variables.
1+
# LLM provider configuration for Grounded Evolution
2+
# The grounded loop (infinite_research_loop.py) requires LLM_API_KEY.
3+
# Lexical loop (eval.py / auto_evolve.py) does not use the LLM.
34

4-
# --- Mistral AI (default) ---
5-
LLM_API_KEY="your_mistral_api_key"
5+
# Required for grounded evolution
6+
LLM_API_KEY="your_api_key_here"
7+
8+
# Model selection (default: mistral-large-latest)
69
LLM_MODEL="mistral-large-latest"
7-
LLM_BASE_URL="https://api.mistral.ai/v1"
810

9-
# --- OpenAI ---
10-
# LLM_API_KEY="sk-..."
11-
# LLM_MODEL="gpt-4o"
11+
# API base URL: Mistral, OpenAI, or any OpenAI-compatible endpoint
12+
LLM_BASE_URL="https://api.mistral.ai/v1"
1213
# LLM_BASE_URL="https://api.openai.com/v1"
13-
14-
# --- Ollama (local) ---
15-
# LLM_API_KEY="ollama"
16-
# LLM_MODEL="qwen2.5:7b"
17-
# LLM_BASE_URL="http://localhost:11434/v1"
18-
19-
# --- Alternative: OPENAI_API_KEY (used by generator.py fallback) ---
20-
# OPENAI_API_KEY="sk-..."
14+
# LLM_BASE_URL="http://localhost:11434/v1" # Ollama

.github/workflows/ci.yml

Lines changed: 46 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,15 @@
1-
name: CI
1+
name: Validate
22

33
on:
44
push:
55
branches: [main]
66
pull_request:
77
branches: [main]
88

9+
concurrency:
10+
group: ${{ github.workflow }}-${{ github.ref }}
11+
cancel-in-progress: true
12+
913
jobs:
1014
lint:
1115
runs-on: ubuntu-latest
@@ -15,14 +19,51 @@ jobs:
1519
with:
1620
python-version: "3.12"
1721
- run: pip install ruff
18-
- run: ruff check . --ignore=E501,F821
22+
- run: ruff check . --ignore=E501,F821 --statistics
23+
24+
imports:
25+
runs-on: ubuntu-latest
26+
steps:
27+
- uses: actions/checkout@v4
28+
- uses: actions/setup-python@v5
29+
with:
30+
python-version: "3.12"
31+
- run: pip install openai
32+
- name: Check all modules load
33+
run: |
34+
for mod in generator population_manager mutation_engine; do
35+
echo "=== $mod ==="
36+
python -c "import $mod; print('OK')"
37+
done
38+
for mod in evaluator.runtime_evaluator; do
39+
echo "=== $mod ==="
40+
python -c "import $mod; print('OK')"
41+
done
42+
- name: Check scripts parse (no-exec)
43+
run: |
44+
for script in infinite_research_loop.py run_experiment.py beautify_readme.py \
45+
eval.py auto_evolve.py; do
46+
echo "=== $script ==="
47+
python -c "compile(open('$script').read(), '$script', 'exec'); print('syntax OK')"
48+
done
1949
20-
evaluate:
50+
experiment_design:
2151
runs-on: ubuntu-latest
2252
steps:
2353
- uses: actions/checkout@v4
2454
- uses: actions/setup-python@v5
2555
with:
2656
python-version: "3.12"
27-
- name: Quick sanity check
28-
run: python -c "from evaluate import evaluate; print('evaluate.py loads OK')"
57+
- name: Validate benchmarks/tasks.json
58+
run: |
59+
python3 -c "
60+
import json
61+
with open('benchmarks/tasks.json') as f:
62+
tasks = json.load(f)
63+
assert len(tasks) >= 3
64+
for t in tasks:
65+
assert 'name' in t
66+
assert 'hidden_test_files' in t
67+
assert len(t['hidden_test_files']) >= 1
68+
print(f'{len(tasks)} benchmarks OK')
69+
"

AGENTS.md

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# AGENTS.md — Grounded Evolution Conventions
2+
3+
## Project Identity
4+
This is a **research platform** for execution-grounded prompt evolution.
5+
Framing: evolutionary software optimization, NOT AGI/sentience claims.
6+
Repository is public at `NullLabTests/grounded_evolution`.
7+
8+
## Conventions
9+
10+
### Code Style
11+
- Type hints on **all** function signatures and public variables
12+
- No comments unless absolutely necessary (the code should explain itself)
13+
- Max line length: loosely 120 (no hard enforcement, follow existing style)
14+
- Imports: stdlib, then blank line, then third-party, then blank line, then local
15+
(stdlib comes first; no `isort`/`ruff` ordering enforced — be practical)
16+
- Use `Any` from `typing` for dynamic types, never bare generics omitted
17+
- Prefer `Path` from `pathlib` over `os.path`
18+
- File-level docstrings on every `.py` file
19+
20+
### Project Structure
21+
- `generator.py` — LLM code generation, returns `(text, usage_dict)` tuple
22+
- `evaluator/runtime_evaluator.py` — execution-grounded validation (AST, pytest, hidden tests)
23+
- `mutation_engine.py` — prompt mutation/crossover operators
24+
- `mutation.py` / `evaluate.py` / `evolve_forever.py` / `auto_evolve.py` — lexical-only loop (legacy)
25+
- `population_manager.py` — JSON-based population persistence
26+
- `infinite_research_loop.py` — main grounded loop (calls generator → evaluator → population_manager)
27+
- `run_experiment.py` — orchestrated ablation experiments
28+
- `benchmarks/tasks.json` — 3 benchmark definitions with inline hidden test files
29+
- `experiments/` — all experiment output (logs, archives, ablation runs)
30+
31+
### Two Loops
32+
1. **Lexical loop** (`evaluate.py`/`evolve_forever.py`): keyword-matching fitness. Currently at 218 prompts, best score 1000/1000. Less important now.
33+
2. **Grounded loop** (`infinite_research_loop.py`/`generator.py`/`runtime_evaluator.py`): real code execution fitness. This is the primary focus.
34+
35+
### Environment Variables (never hardcode secrets)
36+
- `LLM_API_KEY` — required for grounded loop
37+
- `LLM_MODEL` — model name (default: `mistral-large-latest`)
38+
- `LLM_BASE_URL` — API base URL (default: `https://api.mistral.ai/v1`)
39+
40+
### Testing
41+
- No test suite for the project itself yet (TODO for future)
42+
- Hidden benchmark tests live in `benchmarks/tasks.json` as `hidden_test_files` dict
43+
- Rust-based tests (`cargo test`) exist in the `generated_projects/` output (not our code)
44+
45+
### Git
46+
- Auto-commits on score improvement from the grounded loop
47+
- Manual commits for structural changes (new features, refactors, docs)
48+
- Commit messages: concise, descriptive, no emoji
49+
50+
### Adding New Features
51+
1. Check if the feature already exists (grep for related terms)
52+
2. Follow the existing pattern (if it's a mutation, add to `mutation_engine.py`)
53+
3. Type hints everywhere
54+
4. Add the new feature to `run_experiment.py` if it's an experimental variable
55+
5. Update EXPERIMENT_DESIGN.md if the experiment protocol changes

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@
2727

2828
<!-- EVOLUTION_STATUS_START -->
2929

30-
> **Last Evolution Cycle:** 2026-05-28T17:27:44.326471+00:00 UTC
31-
> **Generation:** 46
30+
> **Last Evolution Cycle:** 2026-05-28T17:39:51.337015+00:00 UTC
31+
> **Generation:** 50
3232
> **Best Score:** 96.0
33-
> **Population Size:** 46
33+
> **Population Size:** 50
3434
3535
<!-- EVOLUTION_STATUS_END -->
3636

analysis/plot_convergence.py

Lines changed: 4 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -188,14 +188,13 @@ def plot_ablation_convergence(conditions: dict[str, list[dict[str, Any]]]) -> No
188188

189189
def main() -> None:
190190
"""Main entry point."""
191+
global ROLLING_WINDOW
191192
use_ablation: bool = "--ablation" in sys.argv
192-
rolling_window: int = ROLLING_WINDOW
193+
window: int = ROLLING_WINDOW
193194
for arg in sys.argv:
194195
if arg.startswith("--rolling="):
195-
rolling_window = int(arg.split("=")[1])
196-
197-
global ROLLING_WINDOW
198-
ROLLING_WINDOW = rolling_window
196+
window = int(arg.split("=")[1])
197+
ROLLING_WINDOW = window
199198

200199
if use_ablation:
201200
conditions = load_ablation_runs()

0 commit comments

Comments
 (0)