CloudFormation (infra) + Ansible (host-native install) for a fair, reproducible,
multi-box benchmark of 6 LLM gateways against a shared mock upstream.
On-demand: deploy to bring up, delete-stack to tear down. See
ARCHITECTURE.md for the full picture.
| Box | Software | Lang | Port | Datastores |
|---|---|---|---|---|
| mock | nexus-mock-provider (prebuilt) | Go | 3062 | — |
| nexus | Nexus (ai-gw+hub+cp+cp-ui/nginx) | Go | 3050 | PG+Redis+NATS |
| bifrost | Bifrost | Go | 8080 | PG+Redis |
| litellm | LiteLLM | Python | 4000 | PG+Redis |
| kong | Kong AI Gateway (ai-proxy) | Lua | 8000 | PG |
| portkey | Portkey | Node | 8787 | PG+Redis(idle) |
| tensorzero | TensorZero (obs off) | Rust | 3000 | — (ClickHouse-native, disabled) |
| loadtest | Go loadtest (the only load tool) | — | — | — |
- AWS CLI configured; an EC2 key pair.
ansible-core+ collections:ansible-galaxy collection install ansible.posix community.postgresql- Gateway/mock/loadtest binaries + cp-ui ship prebuilt in
artifacts/— the roles copy them, nothing compiles on-box. Nexus's DB tooling (Prisma) ships as the self-containedartifacts/db-migrateasset and the OAuth login helper is vendored asscripts/nexus-auth.sh, so no Nexus source checkout is needed — the rig deploys from assets only.
aws cloudformation deploy \
--stack-name nexus-perf-matrix \
--template-file cloudformation/perf-matrix-stack.yaml \
--capabilities CAPABILITY_IAM \
--parameter-overrides KeyName=<your-key> AdminCidr=<your.ip>/32(Defaults: gateways c6i.4xlarge, mock+loadtest c6i.4xlarge, AL2023 x86. For
Graviton: GatewayInstanceType=c7g.4xlarge + the arm64 LatestAmiId.)
scripts/gen-inventory.sh nexus-perf-matrix ~/.ssh/<your-key>.pem <region>
cd ansible
ansible-playbook -i inventory.ini site.yml # everything
# or one gateway: ansible-playbook -i inventory.ini site.yml --tags kong --limit kongEach role finishes with a smoke check asserting the mock signature
(id == chatcmpl-mock, prompt echoed, usage 9/1/10) — proof the gateway
actually reaches the mock and not a real provider.
Control machine = a laptop, or the in-VPC control box. The stack deploys a small
c6i.xlargecontrol box by default (DeployControl=true) to run all of the above from inside the VPC. For a strictly linear copy-paste sequence (SSH in → setup → run → read results), followdocs/CONTROL-BOX-RUNBOOK.md; control-box internals (IAM, key distribution) are indocs/CONTROL-BOX.md.
GATEWAY=bifrost scripts/bench/run-tiers.sh # one gateway, all 6 tiers → a report each
# or one cycle: GATEWAY=bifrost PROFILE=nonstream-550 scripts/bench/bench.shNo config file — you only ever set a few inline knobs (GATEWAY, PROFILE, STAGES,
NEXUS_HOOKS, NEXUS_AUDIT_BODIES, RUN_ID); everything else is fixed in lib.sh.
Results land in results/<run-id>/report.md. Check generator health first (the
report's validity gate): if the load generator was the bottleneck (FD/port
exhaustion), the numbers don't count — re-run. Compare gateways by TTFT delta (the
shared mock's latency cancels). Knobs: scripts/bench/README.md ·
full guide: docs/LOADTEST-RUNBOOK.md.
aws cloudformation delete-stack --stack-name nexus-perf-matrixNothing persists except the IaC in git. Delete when idle to control cost.
- All host-native (no container overhead), each gateway isolated on its own box.
- Standardized storage: PostgreSQL + Redis everywhere (TensorZero is the documented exception — ClickHouse-native, run with observability off).
- Same mock + same stages for all; each gateway addresses the mock per its own routing rules — see per-gateway gotchas below.
- Nexus runs 100% durable audit (code default
AI_GATEWAY_AUDIT_LOSS_MODE=spillblock, zero-loss — no env needed); when comparing RPS, note who persists what (Bifrost default drops ~99% of logs). - Not directly comparable to competitors' single-box published numbers — this is a fair head-to-head between these gateways (each isolated, shared remote mock).
- Bifrost: no
OPENAI_API_KEYin env (oropenai/*routes to real OpenAI); call the model asmock-provider/mock-gpt-4o. - LiteLLM:
api_baseincludes/v1; may require a master key on the proxy. - Kong: the ai-proxy plugin is enabled (a bare reverse proxy is NOT an AI gateway and its RPS is meaningless).
- Portkey: routes via request headers (
x-portkey-provider+ custom host). - TensorZero: OpenAI-compatible endpoint at
/openai/v1/chat/completions.
cloudformation/ perf-matrix-stack.yaml (8 benchmark boxes + optional control box, SG, IAM) · network-stack · validate-stack
ansible/ site.yml + group_vars + roles/ (common · datastore · mock · 6 gateways · loadtest)
deploy.sh · down.sh (repo root: bring up / tear down)
scripts/ gen-inventory.sh · box.sh · nexus-configure.sh · spin-control-box.sh
control-ssh-setup.sh · control-bootstrap.sh (in-VPC control box)
scripts/bench/ load-test orchestration (clean/setup/restart/run/monitor/report) + profiles
artifacts/ prebuilt linux/amd64 binaries + cp-ui zip + prisma db-migrate
docs/ LOADTEST-RUNBOOK.md · CONTROL-BOX-RUNBOOK.md · CONTROL-BOX.md