AcademySoftwareFoundation
diff --git a/‎.github/workflows/scheduler-stress-pipeline.yml‎
Lines changed: 150 additions & 0 deletions b/‎.github/workflows/scheduler-stress-pipeline.yml‎
Lines changed: 150 additions & 0 deletions
diff --git a/‎docs/_docs/developer-guide/scheduler-stress-testing.md‎
Lines changed: 204 additions & 0 deletions b/‎docs/_docs/developer-guide/scheduler-stress-testing.md‎
Lines changed: 204 additions & 0 deletions
diff --git a/‎docs/_docs/developer-guide/scheduler.md‎
Lines changed: 11 additions & 9 deletions b/‎docs/_docs/developer-guide/scheduler.md‎
Lines changed: 11 additions & 9 deletions
@@ -0,0 +1,150 @@
+name: OpenCue Scheduler Stress Pipeline
+
+# Runs the scheduler booking + accounting stress suite
+# (rust/crates/scheduler/tests/stress_tests.rs): a full pipeline::run against a
+# seeded farm, with an end-of-run audit that cross-checks the Redis acct:*
+# hashes against SUM(proc) in Postgres and asserts cap enforcement.
+#
+# When it runs — and when it deliberately doesn't:
+#   - Pull requests: only when the scheduler crate, its proto dependency, the
+#     DB migrations, or this workflow change. The suite needs a migrated
+#     Postgres plus a Redis container and takes several minutes — running it
+#     for Python/CueGUI/docs changes would burn runner time for zero signal.
+#   - Nightly on master: catches drift from changes that slipped past the
+#     paths filter (e.g. shared workspace dependencies) and gives a daily
+#     throughput data point under fixed scale.
+#   - Manually (workflow_dispatch): for benchmarking a branch at custom scale.
+#
+# What is a gate vs. what is informational:
+#   - The job FAILS on correctness regressions: accounting drift between Redis
+#     and Postgres, cap breaches (subscription burst / job max-cores), booking
+#     liveness (<90% drain, no saturation rejections), or leftover test data.
+#   - The throughput numbers (frames/s) are reported in the step summary but
+#     are NOT asserted on: shared runners are too noisy for perf gating. For
+#     real benchmarking run the suite locally in release mode (see
+#     docs/_docs/developer-guide/scheduler-stress-testing.md).
+
+on:
+  pull_request:
+    branches: ["master"]
+    paths:
+      - "rust/crates/scheduler/**"
+      - "rust/crates/opencue-proto/**"
+      - "rust/Cargo.toml"
+      - "cuebot/src/main/resources/conf/ddl/postgres/migrations/**"
+      - ".github/workflows/scheduler-stress-pipeline.yml"
+  schedule:
+    # Nightly on master. Odd minute to avoid the top-of-hour scheduling rush.
+    - cron: "23 9 * * *"
+  workflow_dispatch:
+    inputs:
+      stress_jobs:
+        description: "Drain-phase job count (default 300)"
+        required: false
+      stress_hosts:
+        description: "Drain-phase host count (default 1200)"
+        required: false
+      stress_frames_per_layer:
+        description: "Drain-phase frames per layer (default 5)"
+        required: false
+      stress_timeout_secs:
+        description: "Per-phase hard timeout in seconds (default 600)"
+        required: false
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+env:
+  CARGO_TERM_COLOR: always
+  CARGO_INCREMENTAL: 0
+  CARGO_NET_RETRY: 10
+  RUSTUP_MAX_RETRIES: 10
+
+jobs:
+  stress:
+    runs-on: ubuntu-22.04
+    timeout-minutes: 45
+
+    # The suite's Postgres half. Redis is NOT listed here on purpose: the test
+    # starts its own throwaway Redis via testcontainers, so all accounting
+    # state is guaranteed to die with the test process.
+    services:
+      postgres:
+        image: postgres:15.1
+        env:
+          POSTGRES_USER: cuebot
+          POSTGRES_PASSWORD: cuebot_password
+          POSTGRES_DB: cuebot
+        ports:
+          - 5432:5432
+        options: >-
+          --health-cmd "pg_isready -U cuebot -d cuebot"
+          --health-interval 5s
+          --health-timeout 5s
+          --health-retries 10
+
+    steps:
+      - uses: actions/checkout@v4
+
+      - name: Install build dependencies
+        run: |
+          sudo apt-get update && sudo apt-get install -y libx11-dev protobuf-compiler libcurl4-openssl-dev postgresql-client
+          curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs > rust-install.sh
+          bash ./rust-install.sh -y
+
+      - name: Cache cargo deps
+        uses: Swatinem/rust-cache@v2
+        with:
+          workspaces: rust
+          cache-on-failure: true
+
+      # Plain psql instead of the sandbox Flyway image: the migrations are
+      # versioned plain-SQL files, so applying them in numeric order is exactly
+      # what Flyway would do, without building a JDK image first.
+      - name: Apply database migrations
+        working-directory: cuebot/src/main/resources/conf/ddl/postgres/migrations
+        env:
+          PGPASSWORD: cuebot_password
+        run: |
+          for f in $(ls V*.sql | sort -t V -k2 -n); do
+            echo "== $f"
+            psql -q -v ON_ERROR_STOP=1 -h localhost -U cuebot -d cuebot -f "$f"
+          done
+
+      - name: Run stress suite
+        working-directory: rust
+        shell: bash
+        env:
+          STRESS_JOBS: ${{ inputs.stress_jobs }}
+          STRESS_HOSTS: ${{ inputs.stress_hosts }}
+          STRESS_FRAMES_PER_LAYER: ${{ inputs.stress_frames_per_layer }}
+          STRESS_TIMEOUT_SECS: ${{ inputs.stress_timeout_secs }}
+        run: |
+          cargo test -p scheduler --features stress-tests --test stress_tests -- --nocapture 2>&1 | tee stress-output.log
+
+      - name: Publish phase report
+        if: always()
+        working-directory: rust
+        shell: bash
+        run: |
+          if [ -f stress-output.log ] && grep -q "^================ phase" stress-output.log; then
+            {
+              echo "## Scheduler stress suite"
+              echo '```'
+              sed -n '/^================ phase/,$p' stress-output.log | sed -n '1,80p'
+              echo '```'
+              echo "_Throughput numbers are informational; only the accounting/enforcement assertions gate this job._"
+            } >> "$GITHUB_STEP_SUMMARY"
+          else
+            echo "## Scheduler stress suite: no phase report produced (failed before the run?)" >> "$GITHUB_STEP_SUMMARY"
+          fi
+
+      - name: Upload full output
+        if: always()
+        uses: actions/upload-artifact@v4
+        with:
+          name: scheduler-stress-output
+          path: rust/stress-output.log
+          if-no-files-found: ignore
+          retention-days: 30
@@ -0,0 +1,204 @@
+---
+title: "Scheduler Stress Testing"
+nav_order: 102
+parent: Reference
+layout: default
+linkTitle: "Scheduler Stress Testing"
+date: 2026-06-12
+description: >
+  How to run, tune, and interpret the Rust scheduler's booking and accounting
+  stress suite, locally and in CI
+---
+
+# Scheduler Stress Testing
+
+### Running and interpreting the booking + accounting stress suite
+
+---
+
+## Overview
+
+The stress suite (`rust/crates/scheduler/tests/stress_tests.rs`) exercises the
+[Rust scheduler](/docs/developer-guide/scheduler/)'s full production dispatch
+path at scale — `pipeline::run` end to end: Redis accounting bootstrap →
+cluster feed → pending-job query → host matching → dispatch (proc insert, host
+ledger decrement, frame start) — against a deterministic, bulk-seeded farm.
+
+It is both a **correctness gate** and a **benchmark harness**:
+
+- **Correctness**: after each phase an audit cross-checks every Redis `acct:*`
+  hash the run touched against `SUM(proc)` in Postgres (the canonical record —
+  see the [Redis-Backed Accounting Reference](/docs/developer-guide/redis-accounting/)),
+  and verifies cap enforcement and ledger invariants.
+- **Benchmark**: it reports booking throughput (frames/s over the active
+  booking window), host-matching efficiency (wasted attempt %), host-cache hit
+  ratio, and Redis Lua op counts.
+
+The suite runs two phases in one process:
+
+| Phase | Shape | What it proves |
+|---|---|---|
+| **drain** | Farm capacity comfortably exceeds demand (default: 1,200 hosts, 6,000 frames) | ≥90% of frames book; throughput measured; accounting stays exact under concurrency, including the force-rollback compensation path |
+| **saturation** | Demand vastly exceeds tight subscription bursts and per-job core caps (default: 400 hosts, 3,000 frames, 150-core bursts) | The Redis Lua cap check is the binding constraint: bookings stop exactly at burst, caps are never breached, rejections flow through the hot path |
+
+### Invariants the audit asserts
+
+1. Every `acct:{sub,folder,job,layer,point}` hash holds exactly
+   `SUM(proc.int_cores_reserved)/100` cores and `SUM(proc.int_gpus_reserved)`
+   GPUs for its grouping — the same 5-dimension grouping and centicore→core
+   conversion the recompute loop uses. The suite pushes the recompute and
+   limit-reseed loops out to a 1-hour interval, so agreement here proves the
+   *dispatch hot path alone* (Lua book + force-rollback) kept Redis exact —
+   reconciliation never got a chance to paper over drift.
+2. Jobs with no bookings have no leaked Redis counters.
+3. Per-(show, alloc) booked cores never exceed the subscription burst.
+4. Per-job booked cores never exceed `job_resource.int_max_cores`.
+5. Host ledger: `int_cores - int_cores_idle == SUM(proc)` per host, never negative.
+6. One `RUNNING` frame per proc row.
+7. Trigger-maintained `job_stat.int_waiting_count` matches the frame table.
+8. After teardown, zero `stress_%` rows remain in any table the suite touches.
+
+## Running locally
+
+### Prerequisites
+
+- A migrated Postgres on `localhost:5432` (`cuebot` / `cuebot_password`).
+  From the repo root: `docker compose up -d flyway` (brings up `db` and applies
+  migrations). If the Flyway image won't build in your environment (e.g.
+  SSL-inspecting proxies break its package mirrors), apply the migrations
+  directly — they are plain SQL:
+
+  ```bash
+  cd cuebot/src/main/resources/conf/ddl/postgres/migrations
+  for f in $(ls V*.sql | sort -t V -k2 -n); do
+    docker exec -i opencue-db-1 psql -q -v ON_ERROR_STOP=1 -U cuebot -d cuebot < "$f"
+  done
+  ```
+
+- A running Docker daemon. The suite starts its own throwaway Redis container
+  via testcontainers; all accounting state dies with it.
+
+### Run
+
+```bash
+cd rust
+cargo test -p scheduler --features stress-tests --test stress_tests -- --nocapture
+```
+
+For meaningful benchmark numbers, use a release build:
+
+```bash
+cargo test -p scheduler --release --features stress-tests --test stress_tests -- --nocapture
+```
+
+### Tuning
+
+| Env var | Default | Meaning |
+|---|---|---|
+| `STRESS_JOBS` | 300 | drain-phase job count |
+| `STRESS_LAYERS` | 4 | drain-phase layers per job |
+| `STRESS_FRAMES_PER_LAYER` | 5 | drain-phase frames per layer |
+| `STRESS_HOSTS` | 1200 | drain-phase host count |
+| `STRESS_TAGS` | 8 | drain-phase manual tag count |
+| `STRESS_SAT_JOBS` | 150 | saturation-phase job count |
+| `STRESS_SAT_HOSTS` | 400 | saturation-phase host count |
+| `STRESS_DRAIN_TARGET` | 0.9 | fraction of drain frames that must book |
+| `STRESS_STALL_SECS` | 30 | watchdog: pause jobs after this long without a new booking |
+| `STRESS_TIMEOUT_SECS` | 600 | watchdog: per-phase hard timeout |
+
+Seeding is deterministic for a given scale (fixed RNG seed), so consecutive
+runs at the same scale book the same workload — diffs in throughput between
+runs reflect the code, not the data.
+
+### Reading the report
+
+```
+================ phase: drain ================
+frames     : 6000 seeded, 5988 dispatched (99.8%), waiting 6000 -> 12
+throughput : 975.1 frames/s over a 6.1s booking window (wall 43.3s)
+matching   : 3175 host attempts (41.9% wasted), 39 cluster rounds, host-cache hit 98%
+accounting : 7452 redis lua ops, 5988 dispatches (metrics), 24040 booked cores, rejections [...]
+audit      : OK
+```
+
+- **throughput** is measured from the first to the last `proc.ts_booked`, so it
+  excludes the post-drain shutdown tail of the feed (the `wall` figure includes it).
+- **redis lua ops** above the dispatch count means the compensation path ran:
+  each failed dispatch costs one book plus one force-rollback. The audit
+  passing alongside a surplus is a *positive* signal — rollbacks netted out.
+- In the saturation phase, expect large `subscription=` rejection counts and
+  every subscription pinned at exactly `burst/burst` cores.
+
+### Cleanup guarantees
+
+All database rows the suite creates are prefixed `stress_`. The suite sweeps
+that prefix **before** seeding (so leftovers from a crashed earlier run never
+skew results) and **after** the run, then asserts zero residue. Redis state
+needs no cleanup — the container is destroyed with the test. If a run is
+killed hard (e.g. SIGKILL mid-phase), the next run's pre-sweep removes the
+leftovers.
+
+## CI integration
+
+The suite runs in the
+[`scheduler-stress-pipeline.yml`](https://github.com/AcademySoftwareFoundation/OpenCue/blob/master/.github/workflows/scheduler-stress-pipeline.yml)
+workflow.
+
+### When it runs
+
+| Trigger | Scale | Purpose |
+|---|---|---|
+| Pull request touching `rust/crates/scheduler/**`, `rust/crates/opencue-proto/**`, `rust/Cargo.toml`, the Postgres migrations, or the workflow itself | defaults | Gate scheduler changes on booking/accounting correctness |
+| Nightly (cron, master) | defaults | Catch drift from changes outside the paths filter; daily throughput data point |
+| Manual (`workflow_dispatch`) | custom via inputs | Benchmark a branch at chosen scale |
+
+### When it deliberately does not run
+
+- **PRs that don't touch the scheduler or schema** (Python, CueGUI, CueWeb,
+  docs, …). The suite needs a migrated Postgres, a Docker daemon, and several
+  minutes of runner time; for those changes it produces zero signal.
+- **As a performance gate.** Shared CI runners have noisy CPU/IO, so the
+  workflow never asserts on frames/s — throughput is published in the job's
+  step summary (and the full log as an artifact) for humans to eyeball trends.
+  Benchmark conclusions should come from local release-mode runs on quiet
+  hardware.
+
+### What fails the job
+
+Only correctness regressions: accounting drift between Redis and Postgres, a
+cap breach, booking liveness failures (drain below target, or a saturated farm
+producing no Redis rejections), a phase that never converges (hard-timeout),
+or test data left behind after cleanup.
+
+### Launching a manual benchmark run
+
+GitHub → Actions → *OpenCue Scheduler Stress Pipeline* → *Run workflow*, then
+optionally override the job/host/frame counts and timeout. Results appear in
+the run's step summary; the complete log is attached as the
+`scheduler-stress-output` artifact (kept 30 days).
+
+## Scope and limitations
+
+- **RQD is not exercised.** The suite runs in `dry_run_mode`: the full booking
+  path executes (Redis Lua, proc insert, host ledger, frame start) but no gRPC
+  launch is sent. Frame *completion* and the Cuebot release path are out of
+  scope — see the [Redis-Backed Accounting Reference](/docs/developer-guide/redis-accounting/)
+  for how releases are reconciled.
+- **Only scheduler-managed shows** (`show.b_scheduler_managed = true`) are
+  covered; Cuebot-managed accounting is Cuebot's test territory.
+- The recompute / limit-reseed loops are intentionally dormant during the run
+  (see invariant 1); their CAS semantics are covered separately by
+  `tests/redis_integration.rs` (`--features redis-tests`).
+
+## Schema gotchas the suite encodes
+
+These bit during development and are asserted/documented in the test code —
+keep them in mind when extending the seeding:
+
+- `alloc.str_tag` is `VARCHAR(24)` and `host.str_name` is `VARCHAR(30)`:
+  generated names must stay short.
+- The pending-job query `INNER JOIN`s `folder_resource`: a folder without that
+  row makes every job in it silently unbookable.
+- The `vs_waiting` view requires `job_resource.int_max_cores - int_cores >= 100`
+  (centicores): `int_max_cores = 0` does **not** mean "unlimited" on the query
+  path — use a large value instead.
@@ -998,18 +998,20 @@ cargo build --release -p scheduler
 cargo test -p scheduler
 ```
 
-**Integration tests** (requires database):
+**Smoke tests** (requires a migrated local Postgres, see `docker compose up -d flyway`):
 ```bash
-# Set up test database
-export DATABASE_URL=postgresql://user:pass@localhost/test_db
-
-# Run integration tests
-cargo test -p scheduler --test integration_tests
+cargo test -p scheduler --features smoke-tests --test smoke_tests
 ```
 
-**Stress tests**:
+**Stress tests** (requires a migrated local Postgres plus a Docker daemon for
+the throwaway Redis container; see the
+[Scheduler Stress Testing](/docs/developer-guide/scheduler-stress-testing/)
+guide for tuning, CI behavior, and how to read the report):
 ```bash
-cargo test -p scheduler --test stress_tests --release -- --nocapture
+cargo test -p scheduler --features stress-tests --test stress_tests -- --nocapture
+
+# Release mode for representative benchmark numbers
+cargo test -p scheduler --release --features stress-tests --test stress_tests -- --nocapture
 ```
 
 ### Code Quality
@@ -1128,7 +1130,7 @@ INFO scheduler: Host cache size: 50000 hosts
 **CPU profiling**:
 ```bash
 cargo install samply
-samply record cargo test --test stress_tests --release -- --nocapture
+samply record cargo test -p scheduler --release --features stress-tests --test stress_tests -- --nocapture
 ```
 
 ## API and Extensibility