Skip to content

AI-native-Systems-Research/ai-native-storage-certus-workbench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Certus Workbench

Comparing AI-driven evolution frameworks on systems code optimization.

Run evolutionary search frameworks like GEPA, K-Search, SkyDiscover, Nous, and coding agents against the same compiled system target with the same optimization target.

Frameworks

Each framework is optional — install only the ones you want to compare.

Framework Type Repo Paper
GEPA Reflective Pareto evolution github.com/gepa-ai/gepa arxiv:2507.19457
SkyDiscover Adaptive population (AdaEvolve, EvoX) github.com/skydiscover-ai/skydiscover AdaEvolve, EvoX
K-Search World-model guided tree search github.com/caoshiyi/K-Search arxiv:2602.19128
ShinkaEvolve Async population, UCB model selection github.com/SakanaAI/ShinkaEvolve arxiv:2509.19349
CORAL Multi-agent shared memory github.com/Human-Agent-Society/CORAL arxiv:2604.01658
AutoScientists Multi-agent self-organizing teams github.com/mims-harvard/AutoScientists arxiv:2605.28655
Nous Hypothesis-driven controlled experiments github.com/AI-native-Systems-Research/agentic-strategy-evolution
Coding Agent Single Claude session, iterative (built-in)

Setup

# Clone this repo and the certus target:
git clone https://github.com/AI-native-Systems-Research/ai-native-storage-certus-workbench.git
git clone https://github.com/AI-native-Systems-Research/ai-native-storage-certus.git -b unstable

# Clone frameworks you want to compare:
git clone https://github.com/gepa-ai/gepa.git
git clone https://github.com/skydiscover-ai/skydiscover.git
git clone https://github.com/caoshiyi/K-Search.git
git clone https://github.com/SakanaAI/ShinkaEvolve.git

# Install workbench:
cd ai-native-storage-certus-workbench
pip install -e .

# Install frameworks (editable from cloned source):
pip install -e ../gepa
pip install -e ../skydiscover
pip install -e ../K-Search
pip install -e ../ShinkaEvolve

Usage

# Run a single framework on a target:
workbench --framework gepa --target certus_p2p --config 1d_1c_1g

# Compare multiple frameworks (same budget, same seed):
workbench-compare --target certus_p2p --config 1d_1c_1g \
    --frameworks gepa,ksearch,skydiscover,nous,coding_agent \
    --budget 15

Structure

targets/                    System targets to evolve
  certus_p2p/               SSD→GPU data path optimization
    target.yaml             Files, build command, scoring, hardware ceilings
    evaluate.py             Multi-config evaluator
    configs/                Hardware configurations (drives × clients × GPUs)
    initial_programs/       Seed code per framework format

frameworks/                 Per-framework runner scripts
  gepa.py                   GEPA optimize_anything wrapper
  ksearch.py                K-Search world-model wrapper
  skydiscover.py            SkyDiscover (AdaEvolve/EvoX) wrapper
  nous.py                   Nous campaign orchestrator wrapper
  coding_agent.py           Raw Claude coding agent
  random_baseline.py        Random mutation control

orchestrator/               Run & compare infrastructure
  run.py                    Run one framework on one target+config
  compare.py                Run multiple, generate comparison report
  preflight.py              Smoke-test frameworks before full run

knowledge/                  Accumulated learnings across experiments
  experiment_experience.md  Operational lessons from prior runs
  evolution_strategy.md     Framework comparison analysis

results/                    Experiment outputs

Adding a New Target

Create a folder under targets/ with:

  1. target.yaml — defines files in scope, build command, scoring formula, hardware ceilings
  2. evaluate.py — evaluator script (build → test → bench → score)
  3. configs/ — hardware configuration variants
  4. initial_programs/ — seed code for frameworks that need it

Adding a New Framework

Add a script to frameworks/ that:

  1. Accepts a target config (YAML path)
  2. Accepts an iteration budget
  3. Runs the framework's optimization loop
  4. Writes results to results/<target>/<config>/<framework>/

No ABC required — just a script that takes args and produces scored candidates.

Prior Results

See knowledge/evolution_strategy.md for full framework comparison from the P2P experiment (7 frameworks × 2 conditions, single-drive). Key finding: evolutionary frameworks (GEPA, SkyDiscover, ShinkaEvolve) cannot handle multi-file Rust FFI code. Only hypothesis-driven (Nous) and agentic (coding agent) approaches produced meaningful results.

About

Framework for AI-based solution generation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages