|
| 1 | +# Plan: GCP GPU Runner for Integration Tests |
| 2 | + |
| 3 | +**Created:** 2026-03-16 |
| 4 | +**Branch:** feat/gha-gpu-runner |
| 5 | +**Status:** Implemented (pending `pulumi up` and PR merge) |
| 6 | + |
| 7 | +## Overview |
| 8 | + |
| 9 | +The python-audio-separator integration tests currently run on a CPU-only self-hosted |
| 10 | +GHA runner (`e2-standard-4`, 4 vCPU, 16GB RAM). With the new ensemble tests and |
| 11 | +multi-stem verification tests, CI takes 30+ minutes because each model separation runs |
| 12 | +on CPU. A GPU runner would reduce this to ~5 minutes. |
| 13 | + |
| 14 | +## Current State |
| 15 | + |
| 16 | +### Existing runner infrastructure |
| 17 | +- **Location:** `karaoke-gen/infrastructure/compute/github_runners.py` (Pulumi) |
| 18 | +- **Runners:** 3× `e2-standard-4` (general) + 1× `e2-standard-8` (Docker builds) |
| 19 | +- **Labels:** `self-hosted`, `Linux`, `X64`, `gcp`, `large-disk` |
| 20 | +- **Region:** `us-central1-a` |
| 21 | +- **Models:** Pre-cached at `/opt/audio-separator-models` on runner startup |
| 22 | +- **Org-level:** Runners are registered to `nomadkaraoke` org, available to all repos |
| 23 | +- **NAT:** All runners use Cloud NAT (no external IPs) |
| 24 | + |
| 25 | +### Current integration test workflow |
| 26 | +- File: `.github/workflows/run-integration-tests.yaml` |
| 27 | +- Runs on: `self-hosted` (picks up any org runner) |
| 28 | +- Tests: `poetry run pytest -sv --cov=audio_separator tests/integration` |
| 29 | +- Installs: `poetry install -E cpu` |
| 30 | +- Problem: All model inference on CPU → very slow for Roformer/Demucs models |
| 31 | + |
| 32 | +## Requirements |
| 33 | + |
| 34 | +- [x] GCE VM with NVIDIA GPU (T4 is cheapest, sufficient for inference) |
| 35 | +- [x] CUDA drivers + PyTorch GPU support pre-installed |
| 36 | +- [x] Models pre-cached on persistent disk (same as existing runners) |
| 37 | +- [x] Labeled `gpu` so workflow can target it specifically |
| 38 | +- [x] Cost-effective — only runs when needed (on-demand, not always-on) |
| 39 | +- [x] Integration test workflow updated to use `gpu` label |
| 40 | +- [x] Install `poetry install -E gpu` (onnxruntime-gpu) instead of `-E cpu` |
| 41 | + |
| 42 | +## Technical Approach |
| 43 | + |
| 44 | +### Option A: Dedicated GPU VM (simplest) |
| 45 | + |
| 46 | +Add a new GPU runner VM to the existing Pulumi infrastructure. Use an `n1-standard-4` |
| 47 | +with 1× NVIDIA T4 GPU. Cost: ~$0.35/hr on-demand, ~$0.11/hr spot. |
| 48 | + |
| 49 | +**Pros:** Simple, fits existing patterns, fast startup (VM already running) |
| 50 | +**Cons:** Always-on cost if not managed; or slow cold-start if managed |
| 51 | + |
| 52 | +### Option B: Spot GPU VM with startup/shutdown management |
| 53 | + |
| 54 | +Same as A but use spot pricing and the existing runner_manager Cloud Function to |
| 55 | +start/stop based on CI demand. |
| 56 | + |
| 57 | +**Pros:** 70% cheaper ($0.11/hr), fits existing management pattern |
| 58 | +**Cons:** Spot can be preempted mid-test (rare for short jobs); cold start ~2-3 min |
| 59 | + |
| 60 | +### Option C: Use a cloud GPU service (Modal, Lambda Labs, etc.) |
| 61 | + |
| 62 | +Run the integration tests on a cloud GPU service rather than self-hosted. |
| 63 | + |
| 64 | +**Pros:** No infrastructure to manage, pay-per-second |
| 65 | +**Cons:** More complex CI integration, different from existing patterns |
| 66 | + |
| 67 | +### Recommendation: Option B (Spot GPU VM) |
| 68 | + |
| 69 | +The integration test takes <10 minutes on GPU, so spot preemption risk is low. |
| 70 | +Cold start is acceptable since it's triggered by PR events. Cost: ~$0.02 per CI run. |
| 71 | + |
| 72 | +## Implementation Steps |
| 73 | + |
| 74 | +### 1. Pulumi infrastructure (in karaoke-gen repo) |
| 75 | + |
| 76 | +1. [x] Add `GITHUB_GPU_RUNNER` machine type to `config.py`: `n1-standard-4` + 1× T4 |
| 77 | +2. [x] Add `GPU_RUNNER_LABELS` to `config.py`: `"self-hosted,linux,x64,gcp,gpu"` |
| 78 | +3. [x] Create GPU runner VM in `github_runners.py`: |
| 79 | + - `n1-standard-4` (4 vCPU, 15GB RAM) |
| 80 | + - 1× NVIDIA T4 GPU (`nvidia-tesla-t4`) |
| 81 | + - `guest_accelerators` config |
| 82 | + - `on_host_maintenance: "TERMINATE"` (required for GPU VMs) |
| 83 | + - Same NAT/networking as existing runners |
| 84 | +4. [x] Create GPU startup script (`github_runner_gpu.sh`): |
| 85 | + - Install NVIDIA drivers via CUDA repo (cuda-drivers + cuda-toolkit-12-4) |
| 86 | + - Install CUDA toolkit |
| 87 | + - Verify GPU: `nvidia-smi` |
| 88 | + - Pre-download models to `/opt/audio-separator-models` |
| 89 | + - Register as GHA runner with `gpu` label |
| 90 | +5. [x] Add spot scheduling for cost optimization |
| 91 | +6. [ ] Run `pulumi up` to create the VM |
| 92 | + |
| 93 | +### 2. Workflow update (in python-audio-separator repo) |
| 94 | + |
| 95 | +7. [x] Update `run-integration-tests.yaml`: |
| 96 | + - Change `runs-on: self-hosted` to `runs-on: [self-hosted, gpu]` |
| 97 | + - Change `poetry install -E cpu` to `poetry install -E gpu` |
| 98 | + - Add `nvidia-smi` verification step |
| 99 | + - Add 30-minute timeout |
| 100 | +8. [ ] Add fallback: if no GPU runner available, fall back to CPU with longer timeout |
| 101 | + - Deferred: not needed initially, the runner_manager auto-starts the GPU VM on demand |
| 102 | + |
| 103 | +### 3. Startup script details |
| 104 | + |
| 105 | +The GPU startup script needs to: |
| 106 | +```bash |
| 107 | +# Install NVIDIA drivers (for Debian 12) |
| 108 | +sudo apt-get update |
| 109 | +sudo apt-get install -y linux-headers-$(uname -r) nvidia-driver-535 |
| 110 | + |
| 111 | +# Verify GPU |
| 112 | +nvidia-smi |
| 113 | + |
| 114 | +# Install CUDA (for PyTorch) |
| 115 | +# PyTorch bundles its own CUDA, so we mainly need the driver |
| 116 | + |
| 117 | +# Pre-download models |
| 118 | +pip install audio-separator[gpu] |
| 119 | +python -c " |
| 120 | +from audio_separator.separator import Separator |
| 121 | +sep = Separator(model_file_dir='/opt/audio-separator-models') |
| 122 | +# Download all models used in integration tests |
| 123 | +models = [ |
| 124 | + 'model_bs_roformer_ep_317_sdr_12.9755.ckpt', |
| 125 | + 'mel_band_roformer_karaoke_aufr33_viperx_sdr_10.1956.ckpt', |
| 126 | + 'MGM_MAIN_v4.pth', |
| 127 | + 'UVR-MDX-NET-Inst_HQ_4.onnx', |
| 128 | + 'kuielab_b_vocals.onnx', |
| 129 | + '2_HP-UVR.pth', |
| 130 | + 'htdemucs_6s.yaml', |
| 131 | + 'htdemucs_ft.yaml', |
| 132 | + # Ensemble preset models |
| 133 | + 'bs_roformer_vocals_resurrection_unwa.ckpt', |
| 134 | + 'melband_roformer_big_beta6x.ckpt', |
| 135 | + 'bs_roformer_vocals_revive_v2_unwa.ckpt', |
| 136 | + 'mel_band_roformer_kim_ft2_bleedless_unwa.ckpt', |
| 137 | + 'bs_roformer_vocals_revive_v3e_unwa.ckpt', |
| 138 | + 'mel_band_roformer_vocals_becruily.ckpt', |
| 139 | + 'mel_band_roformer_vocals_fv4_gabox.ckpt', |
| 140 | + 'mel_band_roformer_instrumental_fv7z_gabox.ckpt', |
| 141 | + 'bs_roformer_instrumental_resurrection_unwa.ckpt', |
| 142 | + 'melband_roformer_inst_v1e_plus.ckpt', |
| 143 | + 'mel_band_roformer_instrumental_becruily.ckpt', |
| 144 | + 'mel_band_roformer_instrumental_instv8_gabox.ckpt', |
| 145 | + 'UVR-MDX-NET-Inst_HQ_5.onnx', |
| 146 | + 'mel_band_roformer_karaoke_gabox_v2.ckpt', |
| 147 | + 'mel_band_roformer_karaoke_becruily.ckpt', |
| 148 | + # Multi-stem test models |
| 149 | + '17_HP-Wind_Inst-UVR.pth', |
| 150 | + 'MDX23C-DrumSep-aufr33-jarredou.ckpt', |
| 151 | + 'dereverb_mel_band_roformer_anvuew_sdr_19.1729.ckpt', |
| 152 | +] |
| 153 | +for m in models: |
| 154 | + sep.download_model_and_data(m) |
| 155 | +" |
| 156 | +``` |
| 157 | + |
| 158 | +## Cost Estimate |
| 159 | + |
| 160 | +| Config | Hourly | Per CI run (~10 min) | Monthly (est. 100 runs) | |
| 161 | +|--------|--------|---------------------|-------------------------| |
| 162 | +| n1-standard-4 + T4 (on-demand) | $0.61 | $0.10 | $10 | |
| 163 | +| n1-standard-4 + T4 (spot) | $0.19 | $0.03 | $3 | |
| 164 | +| Current CPU (e2-standard-4) | $0.13 | $0.07 | $7 | |
| 165 | + |
| 166 | +Spot GPU is actually cheaper per-run than current CPU because GPU tests finish 5× faster. |
| 167 | + |
| 168 | +## Files to Create/Modify |
| 169 | + |
| 170 | +| File | Repo | Action | |
| 171 | +|------|------|--------| |
| 172 | +| `infrastructure/config.py` | karaoke-gen | Add GPU machine type + labels | |
| 173 | +| `infrastructure/compute/github_runners.py` | karaoke-gen | Add GPU runner VM | |
| 174 | +| `infrastructure/compute/startup_scripts/github_runner_gpu.sh` | karaoke-gen | GPU-specific startup | |
| 175 | +| `.github/workflows/run-integration-tests.yaml` | python-audio-separator | Target GPU runner | |
| 176 | + |
| 177 | +## Open Questions |
| 178 | + |
| 179 | +- [x] Should the GPU runner be spot or on-demand? → **Spot** ($0.19/hr, ~$3/mo) |
| 180 | +- [x] Should we keep the CPU fallback for when GPU runner is unavailable? → **Deferred** (runner_manager auto-starts VM) |
| 181 | +- [x] Should the runner startup script install NVIDIA drivers from scratch each boot, |
| 182 | + or use a pre-built GCP Deep Learning VM image? → **From scratch** (idempotent, matches existing pattern) |
| 183 | +- [x] Zone availability: T4 GPUs may not be available in us-central1-a → **Available** in all us-central1 zones (a, b, c, f) |
| 184 | + |
| 185 | +## Rollback Plan |
| 186 | + |
| 187 | +The GPU runner is additive infrastructure. If it fails: |
| 188 | +1. Change workflow back to `runs-on: self-hosted` (CPU) |
| 189 | +2. Destroy the GPU VM via `pulumi destroy` targeting just that resource |
0 commit comments