Repeatable load-test orchestration driven entirely from the control machine over SSH, using
the Ansible inventory scripts/gen-inventory.sh generates from the CloudFormation stack.
There is no config file — the rig's fixed facts (ports, models, PG creds, audit tables,
cooldown, etc.) are hardcoded in lib.sh; you only ever set the handful of run knobs below,
inline. Re-running with the same knobs reproduces a run.
# all 6 profiles for one gateway (the usual)
GATEWAY=bifrost scripts/bench/run-tiers.sh
# one profile, one cycle
GATEWAY=bifrost PROFILE=nonstream-550 scripts/bench/bench.sh(Inventory must exist first: scripts/gen-inventory.sh <stack> <key.pem> [region], or
scripts/control-bootstrap.sh on the control box.)
| knob | what it does | values · default |
|---|---|---|
GATEWAY |
which gateway to test | nexus bifrost litellm kong portkey tensorzero · bifrost |
PROFILE |
request shape for a single bench.sh cycle (prompt size + stream/non-stream) |
nonstream-128 stream-128 nonstream-550 stream-550 nonstream(12k) stream(12k) · nonstream-550. run-tiers.sh ignores this — it runs all 6. |
STAGES |
load ladder, conc:dur,… closed-loop or @rate:dur,… open-loop |
· built-in ladder (run-tiers uses tier-aware ladders) |
RUN_ID |
names the results dir (results/<RUN_ID>/) |
· timestamp |
NEXUS_HOOKS |
nexus only — content scanning | off on · in run-tiers both are swept |
NEXUS_AUDIT_BODIES |
nexus only — capture full request+response bodies in the audit | off on · in run-tiers both are swept |
Everything else (which is a lot — deep-clean, the PII/redaction scan gate, audit settle,
ports, tables, …) is always-on rig policy or a fixed constant, hardcoded in lib.sh.
There are no skip/disable flags by design.
run-tiers.sh— one gateway, all 6 profiles, one report each. ForGATEWAY=nexusit also sweepsNEXUS_HOOKS×NEXUS_AUDIT_BODIES(the headline comparisons). Unattended by default; launch detached for a long run:nohup env GATEWAY=bifrost scripts/bench/run-tiers.sh > ~/bifrost-tiers.log 2>&1 &bench.sh— one single profile, one full cycle: clean → setup → restart → health → cooldown → run (+monitor) → verify-audit → report.
Step scripts (each independently re-runnable, all called by bench.sh):
| script | what it does |
|---|---|
clean.sh |
always deep-cleans: stop services → purge durable backlog → TRUNCATE tables → FLUSH Redis → restart, so every run starts verified-empty |
setup.sh |
nexus only — apply NEXUS_HOOKS / NEXUS_AUDIT_BODIES; with hooks on, a PII probe asserts redaction actually fires |
restart.sh |
restart each gateway service (cold process — no carried-over pools/heap/cache) |
cooldown.sh |
wait until each box is back to baseline (CPU idle + sockets drained) before measuring |
run.sh |
drive the load generator at each gateway's private IP with the profile + stages |
monitor.sh |
sample server-side CPU%/loadavg per box during the run; pull nexus CPU pprof |
verify-audit.sh |
wait 2 min for async audit to settle, then count each gateway's persisted rows vs requests sent (the lossless-vs-lossy ratio) |
report.sh |
pull results → results/<run>/report.md (per-stage RPS + ok% + p50/p95/p99 / TTFT) |
GATEWAY=nexus run-tiers.sh sweeps both nexus dimensions automatically. To run one combo
explicitly (e.g. a single bench.sh cycle):
GATEWAY=nexus NEXUS_HOOKS=on NEXUS_AUDIT_BODIES=on PROFILE=nonstream-550 scripts/bench/bench.shhooks-off is the clean hot path (no content scanning); hooks-on scans every request —
the compliance cost. bodies-on stores full request+response bodies in the audit (the
heaviest lossless-audit case). Results land under results/<run-id>/ (gitignored).