Benchmark orchestration scripts

Repeatable load-test orchestration driven entirely from the control machine over SSH, using the Ansible inventory scripts/gen-inventory.sh generates from the CloudFormation stack. There is no config file — the rig's fixed facts (ports, models, PG creds, audit tables, cooldown, etc.) are hardcoded in lib.sh; you only ever set the handful of run knobs below, inline. Re-running with the same knobs reproduces a run.

Run it

# all 6 profiles for one gateway (the usual)
GATEWAY=bifrost scripts/bench/run-tiers.sh

# one profile, one cycle
GATEWAY=bifrost PROFILE=nonstream-550 scripts/bench/bench.sh

(Inventory must exist first: scripts/gen-inventory.sh <stack> <key.pem> [region], or scripts/control-bootstrap.sh on the control box.)

The run knobs — the ONLY things you set

knob	what it does	values · default
`GATEWAY`	which gateway to test	`nexus bifrost litellm kong portkey tensorzero` · `bifrost`
`PROFILE`	request shape for a single `bench.sh` cycle (prompt size + stream/non-stream)	`nonstream-128` `stream-128` `nonstream-550` `stream-550` `nonstream`(12k) `stream`(12k) · `nonstream-550`. `run-tiers.sh` ignores this — it runs all 6.
`STAGES`	load ladder, `conc:dur,…` closed-loop or `@rate:dur,…` open-loop	· built-in ladder (`run-tiers` uses tier-aware ladders)
`RUN_ID`	names the results dir (`results/<RUN_ID>/`)	· timestamp
`NEXUS_HOOKS`	nexus only — content scanning	`off` `on` · in `run-tiers` both are swept
`NEXUS_AUDIT_BODIES`	nexus only — capture full request+response bodies in the audit	`off` `on` · in `run-tiers` both are swept

Everything else (which is a lot — deep-clean, the PII/redaction scan gate, audit settle, ports, tables, …) is always-on rig policy or a fixed constant, hardcoded in lib.sh. There are no skip/disable flags by design.

The two entry scripts

run-tiers.sh — one gateway, all 6 profiles, one report each. For GATEWAY=nexus it also sweeps NEXUS_HOOKS × NEXUS_AUDIT_BODIES (the headline comparisons). Unattended by default; launch detached for a long run: nohup env GATEWAY=bifrost scripts/bench/run-tiers.sh > ~/bifrost-tiers.log 2>&1 &
bench.sh — one single profile, one full cycle: clean → setup → restart → health → cooldown → run (+monitor) → verify-audit → report.

Step scripts (each independently re-runnable, all called by bench.sh):

script	what it does
`clean.sh`	always deep-cleans: stop services → purge durable backlog → TRUNCATE tables → FLUSH Redis → restart, so every run starts verified-empty
`setup.sh`	nexus only — apply `NEXUS_HOOKS` / `NEXUS_AUDIT_BODIES`; with hooks on, a PII probe asserts redaction actually fires
`restart.sh`	restart each gateway service (cold process — no carried-over pools/heap/cache)
`cooldown.sh`	wait until each box is back to baseline (CPU idle + sockets drained) before measuring
`run.sh`	drive the load generator at each gateway's private IP with the profile + stages
`monitor.sh`	sample server-side CPU%/loadavg per box during the run; pull nexus CPU pprof
`verify-audit.sh`	wait 2 min for async audit to settle, then count each gateway's persisted rows vs requests sent (the lossless-vs-lossy ratio)
`report.sh`	pull results → `results/<run>/report.md` (per-stage RPS + ok% + p50/p95/p99 / TTFT)

Nexus headline comparisons

GATEWAY=nexus run-tiers.sh sweeps both nexus dimensions automatically. To run one combo explicitly (e.g. a single bench.sh cycle):

GATEWAY=nexus NEXUS_HOOKS=on NEXUS_AUDIT_BODIES=on PROFILE=nonstream-550 scripts/bench/bench.sh

hooks-off is the clean hot path (no content scanning); hooks-on scans every request — the compliance cost. bodies-on stores full request+response bodies in the audit (the heaviest lossless-audit case). Results land under results/<run-id>/ (gitignored).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Benchmark orchestration scripts

Run it

The run knobs — the ONLY things you set

The two entry scripts

Nexus headline comparisons

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Benchmark orchestration scripts

Run it

The run knobs — the ONLY things you set

The two entry scripts

Nexus headline comparisons