Skip to content

Latest commit

 

History

History
70 lines (54 loc) · 4.02 KB

File metadata and controls

70 lines (54 loc) · 4.02 KB

Benchmark orchestration scripts

Repeatable load-test orchestration driven entirely from the control machine over SSH, using the Ansible inventory scripts/gen-inventory.sh generates from the CloudFormation stack. There is no config file — the rig's fixed facts (ports, models, PG creds, audit tables, cooldown, etc.) are hardcoded in lib.sh; you only ever set the handful of run knobs below, inline. Re-running with the same knobs reproduces a run.

Run it

# all 6 profiles for one gateway (the usual)
GATEWAY=bifrost scripts/bench/run-tiers.sh

# one profile, one cycle
GATEWAY=bifrost PROFILE=nonstream-550 scripts/bench/bench.sh

(Inventory must exist first: scripts/gen-inventory.sh <stack> <key.pem> [region], or scripts/control-bootstrap.sh on the control box.)

The run knobs — the ONLY things you set

knob what it does values · default
GATEWAY which gateway to test nexus bifrost litellm kong portkey tensorzero · bifrost
PROFILE request shape for a single bench.sh cycle (prompt size + stream/non-stream) nonstream-128 stream-128 nonstream-550 stream-550 nonstream(12k) stream(12k) · nonstream-550. run-tiers.sh ignores this — it runs all 6.
STAGES load ladder, conc:dur,… closed-loop or @rate:dur,… open-loop · built-in ladder (run-tiers uses tier-aware ladders)
RUN_ID names the results dir (results/<RUN_ID>/) · timestamp
NEXUS_HOOKS nexus only — content scanning off on · in run-tiers both are swept
NEXUS_AUDIT_BODIES nexus only — capture full request+response bodies in the audit off on · in run-tiers both are swept

Everything else (which is a lot — deep-clean, the PII/redaction scan gate, audit settle, ports, tables, …) is always-on rig policy or a fixed constant, hardcoded in lib.sh. There are no skip/disable flags by design.

The two entry scripts

  • run-tiers.sh — one gateway, all 6 profiles, one report each. For GATEWAY=nexus it also sweeps NEXUS_HOOKS × NEXUS_AUDIT_BODIES (the headline comparisons). Unattended by default; launch detached for a long run: nohup env GATEWAY=bifrost scripts/bench/run-tiers.sh > ~/bifrost-tiers.log 2>&1 &
  • bench.sh — one single profile, one full cycle: clean → setup → restart → health → cooldown → run (+monitor) → verify-audit → report.

Step scripts (each independently re-runnable, all called by bench.sh):

script what it does
clean.sh always deep-cleans: stop services → purge durable backlog → TRUNCATE tables → FLUSH Redis → restart, so every run starts verified-empty
setup.sh nexus only — apply NEXUS_HOOKS / NEXUS_AUDIT_BODIES; with hooks on, a PII probe asserts redaction actually fires
restart.sh restart each gateway service (cold process — no carried-over pools/heap/cache)
cooldown.sh wait until each box is back to baseline (CPU idle + sockets drained) before measuring
run.sh drive the load generator at each gateway's private IP with the profile + stages
monitor.sh sample server-side CPU%/loadavg per box during the run; pull nexus CPU pprof
verify-audit.sh wait 2 min for async audit to settle, then count each gateway's persisted rows vs requests sent (the lossless-vs-lossy ratio)
report.sh pull results → results/<run>/report.md (per-stage RPS + ok% + p50/p95/p99 / TTFT)

Nexus headline comparisons

GATEWAY=nexus run-tiers.sh sweeps both nexus dimensions automatically. To run one combo explicitly (e.g. a single bench.sh cycle):

GATEWAY=nexus NEXUS_HOOKS=on NEXUS_AUDIT_BODIES=on PROFILE=nonstream-550 scripts/bench/bench.sh

hooks-off is the clean hot path (no content scanning); hooks-on scans every request — the compliance cost. bodies-on stores full request+response bodies in the audit (the heaviest lossless-audit case). Results land under results/<run-id>/ (gitignored).