How to set up, run, and debug two-population adversarial co-evolution experiments in GigaEvo.
Two MAP-Elites populations evolve in parallel. Each population's fitness depends on how well it performs against the other population's archive. This creates an arms race that drives both populations toward better solutions.
GAN analogy: Pop A = Generator (produces solutions), Pop B = Discriminator (finds flaws). The adversarial pressure forces Pop A toward genuine local optima, not just solutions that look good in isolation.
When to use: When single-population evolution stagnates at local optima, or when you want to co-evolve complementary capabilities (optimizer vs landscape, constructor vs improver, attack vs defense).
Extends the standard pipeline with two new stages and a sync hook:
Standard Pipeline Adversarial Pipeline
───────────────── ────────────────────
MutationStage MutationStage
ValidateCodeStage ValidateCodeStage
CallProgramFunction CallProgramFunction
FetchOpponentIdsStage ← NEW (NO_CACHE)
FetchOpponentResultsStage ← NEW (InputHashCache)
CallValidatorFunction(validate.py) CallValidatorFunction(evaluate.py) ← MODIFIED
FetchMetrics FetchMetrics
EnsureMetricsStage EnsureMetricsStage
CollectorStage CollectorStage
MainRunSyncHook ← NEW (pre-step hook)
FetchOpponentIdsStage samples opponent ids fresh on every mutation (stochastic
fitness-proportional draw) and is intentionally uncached. FetchOpponentResultsStage
executes the picked opponents and is keyed on the id list, so identical id sets
reuse cached subprocess outputs.
| Component | File | Purpose |
|---|---|---|
AdversarialPipelineBuilder |
gigaevo/adversarial/pipeline.py |
Extends DefaultPipelineBuilder: replaces validate.py with evaluate.py, adds FetchOpponentIdsStage + FetchOpponentResultsStage, wires opponent results as context to CallValidatorFunction |
FetchOpponentIdsStage |
gigaevo/adversarial/stages.py |
Samples N opponent ids from the other population's archive via OpponentArchiveProvider on every call. Marked NO_CACHE because sampling is stochastic |
FetchOpponentResultsStage |
gigaevo/adversarial/stages.py |
Reads opponent codes for the sampled ids, executes each entrypoint() in parallel subprocesses (via run_exec_runner), returns results as context. Cached by id-set hash |
RedisOpponentArchiveProvider |
gigaevo/adversarial/opponent_provider.py |
Reads opponent programs from the other population's MAP-Elites archive in Redis. Fitness-proportional sampling. Cached (30s TTL) |
MainRunSyncHook |
gigaevo/prompts/coevolution/sync.py |
Pre-step hook: blocks engine after each generation until the opponent population has also advanced by >= 1 generation. Polls {prefix}:run_state engine:total_generations |
config/pipeline/adversarial_coevo.yaml |
Config | Hydra config tying it all together |
Pop A process (redis.db=1) Pop B process (redis.db=2)
┌───────────────────────┐ ┌───────────────────────┐
│ CallProgramFunction │ │ CallProgramFunction │
│ → program_output │ │ → program_output │
│ │ │ │
│ FetchOpponentResults │◄──── reads ───│ archive (Pop B DB 2) │
│ → opponent_results │ │ │
│ │ │ FetchOpponentResults │
│ archive (Pop A DB 1) ─┼─── reads ────►│ → opponent_results │
│ │ │ │
│ CallValidatorFunction │ │ CallValidatorFunction │
│ evaluate.py( │ │ evaluate.py( │
│ opponent_results, │ │ opponent_results, │
│ program_output) │ │ program_output) │
│ → metrics dict │ │ → metrics dict │
└───────────────────────┘ └───────────────────────┘
▲ ▲
└──── MainRunSyncHook ────────────────────┘
(lockstep: wait for opponent gen)
problems/<your_problem>/
├── pop_a/
│ ├── evaluate.py # REQUIRED
│ ├── metrics.yaml # REQUIRED
│ ├── task_description.txt # REQUIRED
│ ├── helper.py # optional shared utilities
│ ├── initial_programs/ # REQUIRED: at least 1 seed .py
│ │ └── seed.py
│ └── fallback/ # RECOMMENDED: cold-start opponents
│ └── simple.py
│
└── pop_b/
├── evaluate.py
├── metrics.yaml
├── task_description.txt
├── helper.py
├── initial_programs/
│ └── seed.py
└── fallback/
└── simple.py
Signature (both populations):
def evaluate(opponent_results: list, program_output: object) -> dict[str, float]:
"""
Args:
opponent_results: list of outputs from opponent population's entrypoint()
program_output: output of this population's entrypoint()
Returns:
dict with at least 'fitness' and 'is_valid' keys.
All values must be float.
"""Design rules:
opponent_resultscontains the raw return values of the opponent'sentrypoint()function- Handle empty
opponent_resultsgracefully (cold start) - Return a sentinel dict with
is_valid: 0for invalid programs - Track
actual_fitness(raw objective) separately fromfitness(selection metric) - Ensure
fitnessis in [0, 1] for consistent MAP-Elites behavior
Fitness design pattern (Prover/Improver):
# Pop A: quality + resistance to improvement
fitness = ALPHA * quality + (1 - ALPHA) * resistance
# Pop B: mean improvement achieved
fitness = mean(normalized_improvements)Zero-sum property: For any (Pop A, Pop B) pair, resistance + mean_improvement = 1.0. This creates the adversarial pressure — what's good for one is bad for the other.
Each population needs its own metrics.yaml. Required metrics:
specs:
fitness:
description: "Primary selection metric"
is_primary: true
higher_is_better: true
lower_bound: 0.0
upper_bound: 1.0
# ... other fields
is_valid:
description: "Validity flag"
is_primary: false
# ...Add any additional tracking metrics (quality, resistance, n_opponents, etc.). Only metrics declared in metrics.yaml pass through EnsureMetricsStage — extras are silently dropped.
Seed programs (initial_programs/seed.py): The first program each population starts with. Must define entrypoint().
Fallback programs (fallback/): Used during cold start when the opponent archive is empty. Should provide a basic opponent so the population can begin evolving meaningfully.
- Pop A fallback = simple Pop B implementations (so Pop A has something to resist)
- Pop B fallback = simple Pop A implementations (so Pop B has something to improve)
Describe the adversarial game, not implementation details:
- What role does this population play?
- What does the opponent do?
- What makes a good solution?
- What constraints must be satisfied?
Do NOT list strategies or hardcode timeouts — let the LLM discover approaches.
Each run needs these overrides:
python run.py \
problem.name=<your_problem>/pop_a \
pipeline=adversarial_coevo \
redis.db=<DB_A> \
opponent_redis_db=<DB_B> \
opponent_redis_prefix=<your_problem>/pop_b \
pipeline_builder.per_opponent_timeout=<seconds>Critical: per_opponent_timeout is nested under pipeline_builder, NOT top-level. Using per_opponent_timeout=300 silently falls back to the default (10s).
For N=2 replicate pairs, you need 4 runs:
runs:
- label: P1_A
db: 1
prefix: <problem>/pop_a
pipeline: adversarial_coevo
problem_name: <problem>/pop_a
condition: "Pair 1: Pop A"
extra_overrides:
- opponent_redis_db=2
- opponent_redis_prefix=<problem>/pop_b
- pipeline_builder.per_opponent_timeout=300
- label: P1_B
db: 2
prefix: <problem>/pop_b
pipeline: adversarial_coevo
problem_name: <problem>/pop_b
condition: "Pair 1: Pop B"
extra_overrides:
- opponent_redis_db=1
- opponent_redis_prefix=<problem>/pop_a
- pipeline_builder.per_opponent_timeout=300
# Pair 2: same config, different DBs (3, 4)Label naming: Use underscores not hyphens (e.g. P1_A not P1-A). Hyphens break bash variable names in launch.sh.
| Key | Default | Notes |
|---|---|---|
opponent_provider.cache_ttl |
30.0 | Seconds to cache opponent list |
pipeline_builder.n_opponents |
5 | Opponents per evaluation |
pipeline_builder.per_opponent_timeout |
10.0 | Seconds per opponent subprocess |
pre_step_hook.timeout |
7200.0 | Max seconds to wait for opponent sync |
pre_step_hook.poll_interval |
5.0 | Seconds between sync polls |
gigaevo status -e <task>/<name>Key things to watch:
- Generation parity: Both populations should be within ~2 generations of each other. Large gaps indicate sync hook issues.
- n_opponents: Should be > 0 after gen 1. If stuck at 0, the opponent archive is empty.
- Invalid%: High invalidity (>50%) is normal for gen 1-2, but if it persists, check evaluate.py error handling.
grep "MainRunSyncHook" experiments/<task>/<name>/<logfile>.logIf you see timeout warnings, the opponent population is stuck. Check its log for errors.
fitness(selection metric): May be volatile due to arms race dynamics. Pop A fitness can drop when Pop B improves.actual_fitness(raw objective): Should trend upward over generations. This is the real measure of progress.resistance(Pop A): Should increase as Pop A finds harder-to-improve solutions.
per_opponent_timeout is nested under pipeline_builder:
pipeline_builder.per_opponent_timeout=300 # Correct
per_opponent_timeout=300 # WRONG — silently uses default 10s
When programs return callables (e.g. Pop B returns an improve() function), the callable is serialized with cloudpickle in the subprocess and deserialized in the parent. If the callable has closures over module-level imports (e.g. from helper import foo), sys.path must include the problem directory at deserialization time.
The wrapper.py worker pool handles this via _prepend_sys_path(python_path) before cloudpickle.loads(). If you see ModuleNotFoundError during opponent execution, check that python_path is being passed correctly.
At gen 0, both populations have empty archives. Each tries to fetch opponents from the other's archive and gets nothing. This is handled by:
- Pop A:
if not opponent_results: fitness = quality(quality-only, resistance=1.0) - Pop B:
if not opponent_results: return INVALID(can't evaluate without opponents)
Pop B seeds should still be able to run their entrypoint() to produce a callable, even if they can't be evaluated yet. The fallback directory provides simple opponents for initial evaluation.
If you run experiments from git worktrees (.claude/worktrees/), stale run.py processes may write to Redis DBs that you're trying to flush. Use tools/flush.py which detects both main-repo and worktree processes.
Only metrics declared in metrics.yaml pass through EnsureMetricsStage. If evaluate.py returns a metric not in metrics.yaml, it is silently dropped. Always verify your metrics.yaml matches your evaluate.py output keys.
| Problem | Pop A Role | Pop B Role |
|---|---|---|
adversarial/optimizer_v2 |
Optimizer (minimize f) | Landscape designer (make deceptive f) |
| What | Where |
|---|---|
| Pipeline builder | gigaevo/adversarial/pipeline.py |
| FetchOpponentIdsStage / FetchOpponentResultsStage | gigaevo/adversarial/stages.py |
| OpponentArchiveProvider | gigaevo/adversarial/opponent_provider.py |
| Sync hook | gigaevo/prompts/coevolution/sync.py |
| Pipeline config | config/pipeline/adversarial_coevo.yaml |
| Tests | tests/adversarial_pipeline/ |