Skip to content

Commit be86728

Browse files
committed
[scheduler] Add end-to-end stress tests to scheduler
Booking + accounting stress suite for the Rust scheduler. cluster feed → job query → host matching → dispatch) against a realistic farm in two phases inside one process: 1. **drain** — farm capacity comfortably exceeds demand. Benchmarks booking throughput (frames/s over the active booking window) and requires ≥90% (`STRESS_DRAIN_TARGET`) of frames to dispatch. 2. **saturation** — demand vastly exceeds tight subscription bursts and per-job core caps, so the Redis Lua cap check becomes the binding constraint. Verifies enforcement (no booking above burst / job max-cores) and that rejections actually flowed through the accounting hot path. After each phase, an audit cross-checks every Redis `acct:*` hash against `SUM(proc)` in Postgres plus host/frame/stat invariants — with the recompute and limit-reseed loops pushed beyond the test horizon, agreement proves the dispatch hot path (Lua book + force-rollback) kept accounting exact on its own. Requirements - Postgres with migrations applied, from the repo root: `docker compose up -d flyway` - A Docker daemon (the suite starts a throwaway Redis container via testcontainers; all Redis state dies with the container)
1 parent ca5f70f commit be86728

8 files changed

Lines changed: 2111 additions & 1184 deletions

File tree

Lines changed: 150 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,150 @@
1+
name: OpenCue Scheduler Stress Pipeline
2+
3+
# Runs the scheduler booking + accounting stress suite
4+
# (rust/crates/scheduler/tests/stress_tests.rs): a full pipeline::run against a
5+
# seeded farm, with an end-of-run audit that cross-checks the Redis acct:*
6+
# hashes against SUM(proc) in Postgres and asserts cap enforcement.
7+
#
8+
# When it runs — and when it deliberately doesn't:
9+
# - Pull requests: only when the scheduler crate, its proto dependency, the
10+
# DB migrations, or this workflow change. The suite needs a migrated
11+
# Postgres plus a Redis container and takes several minutes — running it
12+
# for Python/CueGUI/docs changes would burn runner time for zero signal.
13+
# - Nightly on master: catches drift from changes that slipped past the
14+
# paths filter (e.g. shared workspace dependencies) and gives a daily
15+
# throughput data point under fixed scale.
16+
# - Manually (workflow_dispatch): for benchmarking a branch at custom scale.
17+
#
18+
# What is a gate vs. what is informational:
19+
# - The job FAILS on correctness regressions: accounting drift between Redis
20+
# and Postgres, cap breaches (subscription burst / job max-cores), booking
21+
# liveness (<90% drain, no saturation rejections), or leftover test data.
22+
# - The throughput numbers (frames/s) are reported in the step summary but
23+
# are NOT asserted on: shared runners are too noisy for perf gating. For
24+
# real benchmarking run the suite locally in release mode (see
25+
# docs/_docs/developer-guide/scheduler-stress-testing.md).
26+
27+
on:
28+
pull_request:
29+
branches: ["master"]
30+
paths:
31+
- "rust/crates/scheduler/**"
32+
- "rust/crates/opencue-proto/**"
33+
- "rust/Cargo.toml"
34+
- "cuebot/src/main/resources/conf/ddl/postgres/migrations/**"
35+
- ".github/workflows/scheduler-stress-pipeline.yml"
36+
schedule:
37+
# Nightly on master. Odd minute to avoid the top-of-hour scheduling rush.
38+
- cron: "23 9 * * *"
39+
workflow_dispatch:
40+
inputs:
41+
stress_jobs:
42+
description: "Drain-phase job count (default 300)"
43+
required: false
44+
stress_hosts:
45+
description: "Drain-phase host count (default 1200)"
46+
required: false
47+
stress_frames_per_layer:
48+
description: "Drain-phase frames per layer (default 5)"
49+
required: false
50+
stress_timeout_secs:
51+
description: "Per-phase hard timeout in seconds (default 600)"
52+
required: false
53+
54+
concurrency:
55+
group: ${{ github.workflow }}-${{ github.ref }}
56+
cancel-in-progress: true
57+
58+
env:
59+
CARGO_TERM_COLOR: always
60+
CARGO_INCREMENTAL: 0
61+
CARGO_NET_RETRY: 10
62+
RUSTUP_MAX_RETRIES: 10
63+
64+
jobs:
65+
stress:
66+
runs-on: ubuntu-22.04
67+
timeout-minutes: 45
68+
69+
# The suite's Postgres half. Redis is NOT listed here on purpose: the test
70+
# starts its own throwaway Redis via testcontainers, so all accounting
71+
# state is guaranteed to die with the test process.
72+
services:
73+
postgres:
74+
image: postgres:15.1
75+
env:
76+
POSTGRES_USER: cuebot
77+
POSTGRES_PASSWORD: cuebot_password
78+
POSTGRES_DB: cuebot
79+
ports:
80+
- 5432:5432
81+
options: >-
82+
--health-cmd "pg_isready -U cuebot -d cuebot"
83+
--health-interval 5s
84+
--health-timeout 5s
85+
--health-retries 10
86+
87+
steps:
88+
- uses: actions/checkout@v4
89+
90+
- name: Install build dependencies
91+
run: |
92+
sudo apt-get update && sudo apt-get install -y libx11-dev protobuf-compiler libcurl4-openssl-dev postgresql-client
93+
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs > rust-install.sh
94+
bash ./rust-install.sh -y
95+
96+
- name: Cache cargo deps
97+
uses: Swatinem/rust-cache@v2
98+
with:
99+
workspaces: rust
100+
cache-on-failure: true
101+
102+
# Plain psql instead of the sandbox Flyway image: the migrations are
103+
# versioned plain-SQL files, so applying them in numeric order is exactly
104+
# what Flyway would do, without building a JDK image first.
105+
- name: Apply database migrations
106+
working-directory: cuebot/src/main/resources/conf/ddl/postgres/migrations
107+
env:
108+
PGPASSWORD: cuebot_password
109+
run: |
110+
for f in $(ls V*.sql | sort -t V -k2 -n); do
111+
echo "== $f"
112+
psql -q -v ON_ERROR_STOP=1 -h localhost -U cuebot -d cuebot -f "$f"
113+
done
114+
115+
- name: Run stress suite
116+
working-directory: rust
117+
shell: bash
118+
env:
119+
STRESS_JOBS: ${{ inputs.stress_jobs }}
120+
STRESS_HOSTS: ${{ inputs.stress_hosts }}
121+
STRESS_FRAMES_PER_LAYER: ${{ inputs.stress_frames_per_layer }}
122+
STRESS_TIMEOUT_SECS: ${{ inputs.stress_timeout_secs }}
123+
run: |
124+
cargo test -p scheduler --features stress-tests --test stress_tests -- --nocapture 2>&1 | tee stress-output.log
125+
126+
- name: Publish phase report
127+
if: always()
128+
working-directory: rust
129+
shell: bash
130+
run: |
131+
if [ -f stress-output.log ] && grep -q "^================ phase" stress-output.log; then
132+
{
133+
echo "## Scheduler stress suite"
134+
echo '```'
135+
sed -n '/^================ phase/,$p' stress-output.log | sed -n '1,80p'
136+
echo '```'
137+
echo "_Throughput numbers are informational; only the accounting/enforcement assertions gate this job._"
138+
} >> "$GITHUB_STEP_SUMMARY"
139+
else
140+
echo "## Scheduler stress suite: no phase report produced (failed before the run?)" >> "$GITHUB_STEP_SUMMARY"
141+
fi
142+
143+
- name: Upload full output
144+
if: always()
145+
uses: actions/upload-artifact@v4
146+
with:
147+
name: scheduler-stress-output
148+
path: rust/stress-output.log
149+
if-no-files-found: ignore
150+
retention-days: 30
Lines changed: 204 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,204 @@
1+
---
2+
title: "Scheduler Stress Testing"
3+
nav_order: 102
4+
parent: Reference
5+
layout: default
6+
linkTitle: "Scheduler Stress Testing"
7+
date: 2026-06-12
8+
description: >
9+
How to run, tune, and interpret the Rust scheduler's booking and accounting
10+
stress suite, locally and in CI
11+
---
12+
13+
# Scheduler Stress Testing
14+
15+
### Running and interpreting the booking + accounting stress suite
16+
17+
---
18+
19+
## Overview
20+
21+
The stress suite (`rust/crates/scheduler/tests/stress_tests.rs`) exercises the
22+
[Rust scheduler](/docs/developer-guide/scheduler/)'s full production dispatch
23+
path at scale — `pipeline::run` end to end: Redis accounting bootstrap →
24+
cluster feed → pending-job query → host matching → dispatch (proc insert, host
25+
ledger decrement, frame start) — against a deterministic, bulk-seeded farm.
26+
27+
It is both a **correctness gate** and a **benchmark harness**:
28+
29+
- **Correctness**: after each phase an audit cross-checks every Redis `acct:*`
30+
hash the run touched against `SUM(proc)` in Postgres (the canonical record —
31+
see the [Redis-Backed Accounting Reference](/docs/developer-guide/redis-accounting/)),
32+
and verifies cap enforcement and ledger invariants.
33+
- **Benchmark**: it reports booking throughput (frames/s over the active
34+
booking window), host-matching efficiency (wasted attempt %), host-cache hit
35+
ratio, and Redis Lua op counts.
36+
37+
The suite runs two phases in one process:
38+
39+
| Phase | Shape | What it proves |
40+
|---|---|---|
41+
| **drain** | Farm capacity comfortably exceeds demand (default: 1,200 hosts, 6,000 frames) | ≥90% of frames book; throughput measured; accounting stays exact under concurrency, including the force-rollback compensation path |
42+
| **saturation** | Demand vastly exceeds tight subscription bursts and per-job core caps (default: 400 hosts, 3,000 frames, 150-core bursts) | The Redis Lua cap check is the binding constraint: bookings stop exactly at burst, caps are never breached, rejections flow through the hot path |
43+
44+
### Invariants the audit asserts
45+
46+
1. Every `acct:{sub,folder,job,layer,point}` hash holds exactly
47+
`SUM(proc.int_cores_reserved)/100` cores and `SUM(proc.int_gpus_reserved)`
48+
GPUs for its grouping — the same 5-dimension grouping and centicore→core
49+
conversion the recompute loop uses. The suite pushes the recompute and
50+
limit-reseed loops out to a 1-hour interval, so agreement here proves the
51+
*dispatch hot path alone* (Lua book + force-rollback) kept Redis exact —
52+
reconciliation never got a chance to paper over drift.
53+
2. Jobs with no bookings have no leaked Redis counters.
54+
3. Per-(show, alloc) booked cores never exceed the subscription burst.
55+
4. Per-job booked cores never exceed `job_resource.int_max_cores`.
56+
5. Host ledger: `int_cores - int_cores_idle == SUM(proc)` per host, never negative.
57+
6. One `RUNNING` frame per proc row.
58+
7. Trigger-maintained `job_stat.int_waiting_count` matches the frame table.
59+
8. After teardown, zero `stress_%` rows remain in any table the suite touches.
60+
61+
## Running locally
62+
63+
### Prerequisites
64+
65+
- A migrated Postgres on `localhost:5432` (`cuebot` / `cuebot_password`).
66+
From the repo root: `docker compose up -d flyway` (brings up `db` and applies
67+
migrations). If the Flyway image won't build in your environment (e.g.
68+
SSL-inspecting proxies break its package mirrors), apply the migrations
69+
directly — they are plain SQL:
70+
71+
```bash
72+
cd cuebot/src/main/resources/conf/ddl/postgres/migrations
73+
for f in $(ls V*.sql | sort -t V -k2 -n); do
74+
docker exec -i opencue-db-1 psql -q -v ON_ERROR_STOP=1 -U cuebot -d cuebot < "$f"
75+
done
76+
```
77+
78+
- A running Docker daemon. The suite starts its own throwaway Redis container
79+
via testcontainers; all accounting state dies with it.
80+
81+
### Run
82+
83+
```bash
84+
cd rust
85+
cargo test -p scheduler --features stress-tests --test stress_tests -- --nocapture
86+
```
87+
88+
For meaningful benchmark numbers, use a release build:
89+
90+
```bash
91+
cargo test -p scheduler --release --features stress-tests --test stress_tests -- --nocapture
92+
```
93+
94+
### Tuning
95+
96+
| Env var | Default | Meaning |
97+
|---|---|---|
98+
| `STRESS_JOBS` | 300 | drain-phase job count |
99+
| `STRESS_LAYERS` | 4 | drain-phase layers per job |
100+
| `STRESS_FRAMES_PER_LAYER` | 5 | drain-phase frames per layer |
101+
| `STRESS_HOSTS` | 1200 | drain-phase host count |
102+
| `STRESS_TAGS` | 8 | drain-phase manual tag count |
103+
| `STRESS_SAT_JOBS` | 150 | saturation-phase job count |
104+
| `STRESS_SAT_HOSTS` | 400 | saturation-phase host count |
105+
| `STRESS_DRAIN_TARGET` | 0.9 | fraction of drain frames that must book |
106+
| `STRESS_STALL_SECS` | 30 | watchdog: pause jobs after this long without a new booking |
107+
| `STRESS_TIMEOUT_SECS` | 600 | watchdog: per-phase hard timeout |
108+
109+
Seeding is deterministic for a given scale (fixed RNG seed), so consecutive
110+
runs at the same scale book the same workload — diffs in throughput between
111+
runs reflect the code, not the data.
112+
113+
### Reading the report
114+
115+
```
116+
================ phase: drain ================
117+
frames : 6000 seeded, 5988 dispatched (99.8%), waiting 6000 -> 12
118+
throughput : 975.1 frames/s over a 6.1s booking window (wall 43.3s)
119+
matching : 3175 host attempts (41.9% wasted), 39 cluster rounds, host-cache hit 98%
120+
accounting : 7452 redis lua ops, 5988 dispatches (metrics), 24040 booked cores, rejections [...]
121+
audit : OK
122+
```
123+
124+
- **throughput** is measured from the first to the last `proc.ts_booked`, so it
125+
excludes the post-drain shutdown tail of the feed (the `wall` figure includes it).
126+
- **redis lua ops** above the dispatch count means the compensation path ran:
127+
each failed dispatch costs one book plus one force-rollback. The audit
128+
passing alongside a surplus is a *positive* signal — rollbacks netted out.
129+
- In the saturation phase, expect large `subscription=` rejection counts and
130+
every subscription pinned at exactly `burst/burst` cores.
131+
132+
### Cleanup guarantees
133+
134+
All database rows the suite creates are prefixed `stress_`. The suite sweeps
135+
that prefix **before** seeding (so leftovers from a crashed earlier run never
136+
skew results) and **after** the run, then asserts zero residue. Redis state
137+
needs no cleanup — the container is destroyed with the test. If a run is
138+
killed hard (e.g. SIGKILL mid-phase), the next run's pre-sweep removes the
139+
leftovers.
140+
141+
## CI integration
142+
143+
The suite runs in the
144+
[`scheduler-stress-pipeline.yml`](https://github.com/AcademySoftwareFoundation/OpenCue/blob/master/.github/workflows/scheduler-stress-pipeline.yml)
145+
workflow.
146+
147+
### When it runs
148+
149+
| Trigger | Scale | Purpose |
150+
|---|---|---|
151+
| Pull request touching `rust/crates/scheduler/**`, `rust/crates/opencue-proto/**`, `rust/Cargo.toml`, the Postgres migrations, or the workflow itself | defaults | Gate scheduler changes on booking/accounting correctness |
152+
| Nightly (cron, master) | defaults | Catch drift from changes outside the paths filter; daily throughput data point |
153+
| Manual (`workflow_dispatch`) | custom via inputs | Benchmark a branch at chosen scale |
154+
155+
### When it deliberately does not run
156+
157+
- **PRs that don't touch the scheduler or schema** (Python, CueGUI, CueWeb,
158+
docs, …). The suite needs a migrated Postgres, a Docker daemon, and several
159+
minutes of runner time; for those changes it produces zero signal.
160+
- **As a performance gate.** Shared CI runners have noisy CPU/IO, so the
161+
workflow never asserts on frames/s — throughput is published in the job's
162+
step summary (and the full log as an artifact) for humans to eyeball trends.
163+
Benchmark conclusions should come from local release-mode runs on quiet
164+
hardware.
165+
166+
### What fails the job
167+
168+
Only correctness regressions: accounting drift between Redis and Postgres, a
169+
cap breach, booking liveness failures (drain below target, or a saturated farm
170+
producing no Redis rejections), a phase that never converges (hard-timeout),
171+
or test data left behind after cleanup.
172+
173+
### Launching a manual benchmark run
174+
175+
GitHub → Actions → *OpenCue Scheduler Stress Pipeline**Run workflow*, then
176+
optionally override the job/host/frame counts and timeout. Results appear in
177+
the run's step summary; the complete log is attached as the
178+
`scheduler-stress-output` artifact (kept 30 days).
179+
180+
## Scope and limitations
181+
182+
- **RQD is not exercised.** The suite runs in `dry_run_mode`: the full booking
183+
path executes (Redis Lua, proc insert, host ledger, frame start) but no gRPC
184+
launch is sent. Frame *completion* and the Cuebot release path are out of
185+
scope — see the [Redis-Backed Accounting Reference](/docs/developer-guide/redis-accounting/)
186+
for how releases are reconciled.
187+
- **Only scheduler-managed shows** (`show.b_scheduler_managed = true`) are
188+
covered; Cuebot-managed accounting is Cuebot's test territory.
189+
- The recompute / limit-reseed loops are intentionally dormant during the run
190+
(see invariant 1); their CAS semantics are covered separately by
191+
`tests/redis_integration.rs` (`--features redis-tests`).
192+
193+
## Schema gotchas the suite encodes
194+
195+
These bit during development and are asserted/documented in the test code —
196+
keep them in mind when extending the seeding:
197+
198+
- `alloc.str_tag` is `VARCHAR(24)` and `host.str_name` is `VARCHAR(30)`:
199+
generated names must stay short.
200+
- The pending-job query `INNER JOIN`s `folder_resource`: a folder without that
201+
row makes every job in it silently unbookable.
202+
- The `vs_waiting` view requires `job_resource.int_max_cores - int_cores >= 100`
203+
(centicores): `int_max_cores = 0` does **not** mean "unlimited" on the query
204+
path — use a large value instead.

docs/_docs/developer-guide/scheduler.md

Lines changed: 11 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -998,18 +998,20 @@ cargo build --release -p scheduler
998998
cargo test -p scheduler
999999
```
10001000

1001-
**Integration tests** (requires database):
1001+
**Smoke tests** (requires a migrated local Postgres, see `docker compose up -d flyway`):
10021002
```bash
1003-
# Set up test database
1004-
export DATABASE_URL=postgresql://user:pass@localhost/test_db
1005-
1006-
# Run integration tests
1007-
cargo test -p scheduler --test integration_tests
1003+
cargo test -p scheduler --features smoke-tests --test smoke_tests
10081004
```
10091005

1010-
**Stress tests**:
1006+
**Stress tests** (requires a migrated local Postgres plus a Docker daemon for
1007+
the throwaway Redis container; see the
1008+
[Scheduler Stress Testing](/docs/developer-guide/scheduler-stress-testing/)
1009+
guide for tuning, CI behavior, and how to read the report):
10111010
```bash
1012-
cargo test -p scheduler --test stress_tests --release -- --nocapture
1011+
cargo test -p scheduler --features stress-tests --test stress_tests -- --nocapture
1012+
1013+
# Release mode for representative benchmark numbers
1014+
cargo test -p scheduler --release --features stress-tests --test stress_tests -- --nocapture
10131015
```
10141016

10151017
### Code Quality
@@ -1128,7 +1130,7 @@ INFO scheduler: Host cache size: 50000 hosts
11281130
**CPU profiling**:
11291131
```bash
11301132
cargo install samply
1131-
samply record cargo test --test stress_tests --release -- --nocapture
1133+
samply record cargo test -p scheduler --release --features stress-tests --test stress_tests -- --nocapture
11321134
```
11331135

11341136
## API and Extensibility

0 commit comments

Comments
 (0)