Measure Brain against the spec's latency/throughput targets, then break it on purpose to validate the recovery story. The output is a set of reproducible baselines + a chaos harness that drives the Phase 14 acceptance gates.
- Phase 12 complete (
phase-12-completetag). Benchmarks read the metric counters that Phase 12 wires; chaos asserts use the same counters to verify recovery.
spec/19_benchmarks/02_performance_targets.mdspec/19_benchmarks/02_performance_targets.mdspec/19_benchmarks/04_benchmark_methodology.mdspec/18_failure_recovery/07_chaos_testing.md
- Per-operation criterion benches in each runtime crate.
benches/load_generator.rs— sustained-rate end-to-end load harness.tests/chaos/— kill-at-point, I/O fault, network failure, corruption injection scenarios.tests/soak/— 48 h continuous-load rig (run on dedicated infra; not CI).- Performance report committed to
docs/performance/baselines-<date>.md. - Tag:
phase-13-complete.
Reads: spec §02/02, §02/03, §02/07.
Writes: benches/*.rs in brain-storage, brain-index, brain-ops, brain-planner, brain-server (one bench harness per crate; one benchmark per spec'd operation).
Done when: every cognitive operation has a criterion baseline; results table commits to docs/performance/baselines-<date>.md; spec §14 latency targets met on reference hardware.
Writes: benches/load_generator.rs (binary) — sustains a configurable rate of mixed encode / recall / link traffic over the SDK; reports p50/p95/p99 and per-op error rates.
Done when: generator hits spec §02/03 throughput targets without saturating CPU; emits a CSV summary suitable for diffing across runs.
Reads: spec §02/07.
Writes: tests/chaos/{kill_during_wal_write,io_fault,network_partition,bit_flip,resource_exhaustion}.rs.
Done when: each scenario reproduces the spec'd failure mode and asserts the spec'd recovery behaviour (no data loss, no silent corruption, fail-stop where mandated). Loom coverage for the select concurrency-critical paths flagged in §02/07.
Writes: tests/soak/{driver,asserts}.rs — drives sustained mixed traffic for 48 h; samples memory, fd count, latency every 60 s; fails the run on memory leak / latency drift / error rate exceeding spec §02/04 thresholds.
Done when: soak completes one 48 h run on dedicated infra with no failures; results land in docs/performance/soak-<date>.md.
Status: scaffolding shipped as crates/brain-sdk-rust/examples/soak.rs. The driver + sampler + drift-checker are CLI-driven; the 48 h reference run is operator-side (spec §02/15 puts soak at "weekly" cadence — never in CI). Final exit code reflects pass/fail; CSV + SOAK_RESULT pass=true|false ... summary line are suitable for committing to docs/performance/soak-<date>.md.
- Sub-tasks 13.1–13.4 scaffolded.
- Performance baselines doc + per-crate criterion benches shipped.
- Storage-layer chaos (random_kill, bit_flip, io_fault) green in CI.
- Operator-run 48 h soak result recorded in
docs/performance/soak-<date>.md. - Network-partition / resource-exhaustion / time-anomaly chaos (operator-infra dependent; tracked as Phase 14 acceptance scenarios).
- Tag
phase-13-completeonce the soak result lands.
Benchmarks need a quiet machine — no other tenants, fixed CPU governor, no thermal throttling. The methodology doc in spec §02/07 covers this; follow it precisely or the numbers are worthless.
Chaos tests intentionally bring the process down. Run them in a sandbox; never against a real corpus.