Adversarial Co-Evolution Guide

How to set up, run, and debug two-population adversarial co-evolution experiments in GigaEvo.

What Is It

Two MAP-Elites populations evolve in parallel. Each population's fitness depends on how well it performs against the other population's archive. This creates an arms race that drives both populations toward better solutions.

GAN analogy: Pop A = Generator (produces solutions), Pop B = Discriminator (finds flaws). The adversarial pressure forces Pop A toward genuine local optima, not just solutions that look good in isolation.

When to use: When single-population evolution stagnates at local optima, or when you want to co-evolve complementary capabilities (optimizer vs landscape, constructor vs improver, attack vs defense).

Architecture

Pipeline: `pipeline=adversarial_coevo`

Extends the standard pipeline with two new stages and a sync hook:

Standard Pipeline                    Adversarial Pipeline
─────────────────                    ────────────────────
MutationStage                        MutationStage
ValidateCodeStage                    ValidateCodeStage
CallProgramFunction                  CallProgramFunction
                                     FetchOpponentIdsStage      ← NEW (NO_CACHE)
                                     FetchOpponentResultsStage  ← NEW (InputHashCache)
CallValidatorFunction(validate.py)   CallValidatorFunction(evaluate.py)  ← MODIFIED
FetchMetrics                         FetchMetrics
EnsureMetricsStage                   EnsureMetricsStage
CollectorStage                       CollectorStage
                                     MainRunSyncHook  ← NEW (pre-step hook)

FetchOpponentIdsStage samples opponent ids fresh on every mutation (stochastic fitness-proportional draw) and is intentionally uncached. FetchOpponentResultsStage executes the picked opponents and is keyed on the id list, so identical id sets reuse cached subprocess outputs.

Key Components

Component	File	Purpose
`AdversarialPipelineBuilder`	`gigaevo/adversarial/pipeline.py`	Extends `DefaultPipelineBuilder`: replaces `validate.py` with `evaluate.py`, adds `FetchOpponentIdsStage` + `FetchOpponentResultsStage`, wires opponent results as `context` to `CallValidatorFunction`
`FetchOpponentIdsStage`	`gigaevo/adversarial/stages.py`	Samples N opponent ids from the other population's archive via `OpponentArchiveProvider` on every call. Marked `NO_CACHE` because sampling is stochastic
`FetchOpponentResultsStage`	`gigaevo/adversarial/stages.py`	Reads opponent codes for the sampled ids, executes each `entrypoint()` in parallel subprocesses (via `run_exec_runner`), returns results as context. Cached by id-set hash
`RedisOpponentArchiveProvider`	`gigaevo/adversarial/opponent_provider.py`	Reads opponent programs from the other population's MAP-Elites archive in Redis. Fitness-proportional sampling. Cached (30s TTL)
`MainRunSyncHook`	`gigaevo/prompts/coevolution/sync.py`	Pre-step hook: blocks engine after each generation until the opponent population has also advanced by >= 1 generation. Polls `{prefix}:run_state engine:total_generations`
`config/pipeline/adversarial_coevo.yaml`	Config	Hydra config tying it all together

Data Flow

Pop A process (redis.db=1)              Pop B process (redis.db=2)
┌───────────────────────┐               ┌───────────────────────┐
│ CallProgramFunction   │               │ CallProgramFunction   │
│   → program_output    │               │   → program_output    │
│                       │               │                       │
│ FetchOpponentResults  │◄──── reads ───│ archive (Pop B DB 2)  │
│   → opponent_results  │               │                       │
│                       │               │ FetchOpponentResults  │
│ archive (Pop A DB 1) ─┼─── reads ────►│   → opponent_results  │
│                       │               │                       │
│ CallValidatorFunction │               │ CallValidatorFunction │
│   evaluate.py(        │               │   evaluate.py(        │
│     opponent_results, │               │     opponent_results, │
│     program_output)   │               │     program_output)   │
│   → metrics dict      │               │   → metrics dict      │
└───────────────────────┘               └───────────────────────┘
         ▲                                        ▲
         └──── MainRunSyncHook ────────────────────┘
               (lockstep: wait for opponent gen)

How to Create a New Adversarial Problem

Step 1: Create Two Population Directories

problems/<your_problem>/
├── pop_a/
│   ├── evaluate.py          # REQUIRED
│   ├── metrics.yaml         # REQUIRED
│   ├── task_description.txt # REQUIRED
│   ├── helper.py            # optional shared utilities
│   ├── initial_programs/    # REQUIRED: at least 1 seed .py
│   │   └── seed.py
│   └── fallback/            # RECOMMENDED: cold-start opponents
│       └── simple.py
│
└── pop_b/
    ├── evaluate.py
    ├── metrics.yaml
    ├── task_description.txt
    ├── helper.py
    ├── initial_programs/
    │   └── seed.py
    └── fallback/
        └── simple.py

Step 2: Write evaluate.py

Signature (both populations):

def evaluate(opponent_results: list, program_output: object) -> dict[str, float]:
    """
    Args:
        opponent_results: list of outputs from opponent population's entrypoint()
        program_output: output of this population's entrypoint()

    Returns:
        dict with at least 'fitness' and 'is_valid' keys.
        All values must be float.
    """

Design rules:

opponent_results contains the raw return values of the opponent's entrypoint() function
Handle empty opponent_results gracefully (cold start)
Return a sentinel dict with is_valid: 0 for invalid programs
Track actual_fitness (raw objective) separately from fitness (selection metric)
Ensure fitness is in [0, 1] for consistent MAP-Elites behavior

Fitness design pattern (Prover/Improver):

# Pop A: quality + resistance to improvement
fitness = ALPHA * quality + (1 - ALPHA) * resistance

# Pop B: mean improvement achieved
fitness = mean(normalized_improvements)

Zero-sum property: For any (Pop A, Pop B) pair, resistance + mean_improvement = 1.0. This creates the adversarial pressure — what's good for one is bad for the other.

Step 3: Write metrics.yaml

Each population needs its own metrics.yaml. Required metrics:

specs:
  fitness:
    description: "Primary selection metric"
    is_primary: true
    higher_is_better: true
    lower_bound: 0.0
    upper_bound: 1.0
    # ... other fields
  is_valid:
    description: "Validity flag"
    is_primary: false
    # ...

Add any additional tracking metrics (quality, resistance, n_opponents, etc.). Only metrics declared in metrics.yaml pass through EnsureMetricsStage — extras are silently dropped.

Step 4: Write Seeds and Fallbacks

Seed programs (initial_programs/seed.py): The first program each population starts with. Must define entrypoint().

Fallback programs (fallback/): Used during cold start when the opponent archive is empty. Should provide a basic opponent so the population can begin evolving meaningfully.

Pop A fallback = simple Pop B implementations (so Pop A has something to resist)
Pop B fallback = simple Pop A implementations (so Pop B has something to improve)

Step 5: Write task_description.txt

Describe the adversarial game, not implementation details:

What role does this population play?
What does the opponent do?
What makes a good solution?
What constraints must be satisfied?

Do NOT list strategies or hardcode timeouts — let the LLM discover approaches.

How to Configure and Launch

Required Hydra Overrides

Each run needs these overrides:

python run.py \
  problem.name=<your_problem>/pop_a \
  pipeline=adversarial_coevo \
  redis.db=<DB_A> \
  opponent_redis_db=<DB_B> \
  opponent_redis_prefix=<your_problem>/pop_b \
  pipeline_builder.per_opponent_timeout=<seconds>

Critical: per_opponent_timeout is nested under pipeline_builder, NOT top-level. Using per_opponent_timeout=300 silently falls back to the default (10s).

experiment.yaml Structure

For N=2 replicate pairs, you need 4 runs:

runs:
  - label: P1_A
    db: 1
    prefix: <problem>/pop_a
    pipeline: adversarial_coevo
    problem_name: <problem>/pop_a
    condition: "Pair 1: Pop A"
    extra_overrides:
      - opponent_redis_db=2
      - opponent_redis_prefix=<problem>/pop_b
      - pipeline_builder.per_opponent_timeout=300

  - label: P1_B
    db: 2
    prefix: <problem>/pop_b
    pipeline: adversarial_coevo
    problem_name: <problem>/pop_b
    condition: "Pair 1: Pop B"
    extra_overrides:
      - opponent_redis_db=1
      - opponent_redis_prefix=<problem>/pop_a
      - pipeline_builder.per_opponent_timeout=300

  # Pair 2: same config, different DBs (3, 4)

Label naming: Use underscores not hyphens (e.g. P1_A not P1-A). Hyphens break bash variable names in launch.sh.

Config Defaults (adversarial_coevo.yaml)

Key	Default	Notes
`opponent_provider.cache_ttl`	30.0	Seconds to cache opponent list
`pipeline_builder.n_opponents`	5	Opponents per evaluation
`pipeline_builder.per_opponent_timeout`	10.0	Seconds per opponent subprocess
`pre_step_hook.timeout`	7200.0	Max seconds to wait for opponent sync
`pre_step_hook.poll_interval`	5.0	Seconds between sync polls

Monitoring

Status Check

gigaevo status -e <task>/<name>

Key things to watch:

Generation parity: Both populations should be within ~2 generations of each other. Large gaps indicate sync hook issues.
n_opponents: Should be > 0 after gen 1. If stuck at 0, the opponent archive is empty.
Invalid%: High invalidity (>50%) is normal for gen 1-2, but if it persists, check evaluate.py error handling.

Sync Diagnostics

grep "MainRunSyncHook" experiments/<task>/<name>/<logfile>.log

If you see timeout warnings, the opponent population is stuck. Check its log for errors.

Fitness Interpretation

fitness (selection metric): May be volatile due to arms race dynamics. Pop A fitness can drop when Pop B improves.
actual_fitness (raw objective): Should trend upward over generations. This is the real measure of progress.
resistance (Pop A): Should increase as Pop A finds harder-to-improve solutions.

Gotchas and Troubleshooting

1. Hydra Config Nesting

per_opponent_timeout is nested under pipeline_builder:

pipeline_builder.per_opponent_timeout=300  # Correct
per_opponent_timeout=300                    # WRONG — silently uses default 10s

2. Cloudpickle Deserialization

When programs return callables (e.g. Pop B returns an improve() function), the callable is serialized with cloudpickle in the subprocess and deserialized in the parent. If the callable has closures over module-level imports (e.g. from helper import foo), sys.path must include the problem directory at deserialization time.

The wrapper.py worker pool handles this via _prepend_sys_path(python_path) before cloudpickle.loads(). If you see ModuleNotFoundError during opponent execution, check that python_path is being passed correctly.

3. Cold Start Race Condition

At gen 0, both populations have empty archives. Each tries to fetch opponents from the other's archive and gets nothing. This is handled by:

Pop A: if not opponent_results: fitness = quality (quality-only, resistance=1.0)
Pop B: if not opponent_results: return INVALID (can't evaluate without opponents)

Pop B seeds should still be able to run their entrypoint() to produce a callable, even if they can't be evaluated yet. The fallback directory provides simple opponents for initial evaluation.

4. Stale Processes from Worktrees

If you run experiments from git worktrees (.claude/worktrees/), stale run.py processes may write to Redis DBs that you're trying to flush. Use tools/flush.py which detects both main-repo and worktree processes.

5. EnsureMetricsStage Drops Extras

Only metrics declared in metrics.yaml pass through EnsureMetricsStage. If evaluate.py returns a metric not in metrics.yaml, it is silently dropped. Always verify your metrics.yaml matches your evaluate.py output keys.

Example Adversarial Setups

Problem	Pop A Role	Pop B Role
`adversarial/optimizer_v2`	Optimizer (minimize f)	Landscape designer (make deceptive f)

Code Reference

What	Where
Pipeline builder	`gigaevo/adversarial/pipeline.py`
FetchOpponentIdsStage / FetchOpponentResultsStage	`gigaevo/adversarial/stages.py`
OpponentArchiveProvider	`gigaevo/adversarial/opponent_provider.py`
Sync hook	`gigaevo/prompts/coevolution/sync.py`
Pipeline config	`config/pipeline/adversarial_coevo.yaml`
Tests	`tests/adversarial_pipeline/`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Adversarial Co-Evolution Guide

What Is It

Architecture

Pipeline: `pipeline=adversarial_coevo`

Key Components

Data Flow

How to Create a New Adversarial Problem

Step 1: Create Two Population Directories

Step 2: Write evaluate.py

Step 3: Write metrics.yaml

Step 4: Write Seeds and Fallbacks

Step 5: Write task_description.txt

How to Configure and Launch

Required Hydra Overrides

experiment.yaml Structure

Config Defaults (adversarial_coevo.yaml)

Monitoring

Status Check

Sync Diagnostics

Fitness Interpretation

Gotchas and Troubleshooting

1. Hydra Config Nesting

2. Cloudpickle Deserialization

3. Cold Start Race Condition

4. Stale Processes from Worktrees

5. EnsureMetricsStage Drops Extras

Example Adversarial Setups

Code Reference

Uh oh!

FilesExpand file tree

adversarial_coevolution.md

Latest commit

History

adversarial_coevolution.md

File metadata and controls

Adversarial Co-Evolution Guide

What Is It

Architecture

Pipeline: pipeline=adversarial_coevo

Key Components

Data Flow

How to Create a New Adversarial Problem

Step 1: Create Two Population Directories

Step 2: Write evaluate.py

Step 3: Write metrics.yaml

Step 4: Write Seeds and Fallbacks

Step 5: Write task_description.txt

How to Configure and Launch

Required Hydra Overrides

experiment.yaml Structure

Config Defaults (adversarial_coevo.yaml)

Monitoring

Status Check

Sync Diagnostics

Fitness Interpretation

Gotchas and Troubleshooting

1. Hydra Config Nesting

2. Cloudpickle Deserialization

3. Cold Start Race Condition

4. Stale Processes from Worktrees

5. EnsureMetricsStage Drops Extras

Example Adversarial Setups

Code Reference

Pipeline: `pipeline=adversarial_coevo`