Skip to content

Commit 153b2e4

Browse files
beveradbclaude
andauthored
fix: ensemble separation ignores custom_output_names, misclassifies stems (#275)
* fix: ensemble separation ignores custom_output_names, misclassifies stems When using ensemble presets with custom_output_names, intermediate per-model separations received custom names that replaced the _(StemType)_ filename markers. This broke stem type classification (regex extraction), causing all stems to be labeled "Unknown"/"Other" and custom_output_names to not match. Fix: pass None to _separate_file for intermediate ensemble files (matching how _process_with_chunking already works), apply custom_output_names only to the final ensembled output. Bumps version to 0.43.1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs: add CI GPU runner infrastructure documentation Documents how the auto-scaling GPU runner system works, including architecture, troubleshooting steps, and the critical requirement to update branch protection rules when renaming integration test jobs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 288e322 commit 153b2e4

5 files changed

Lines changed: 350 additions & 3 deletions

File tree

audio_separator/separator/separator.py

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1279,8 +1279,11 @@ def _separate_ensemble(self, audio_file_path, custom_output_names=None):
12791279
self.model_instance.output_dir = temp_dir
12801280

12811281
try:
1282-
# Perform separation
1283-
model_stems = self._separate_file(path, custom_output_names)
1282+
# Perform separation WITHOUT custom_output_names for intermediate files.
1283+
# Intermediate stems must use the default "base_(StemType)_model.ext" naming
1284+
# so the regex below can extract stem types for classification.
1285+
# custom_output_names is applied later to the final ensembled output.
1286+
model_stems = self._separate_file(path, None)
12841287

12851288
# Extract and normalize stem names from this model's outputs
12861289
model_stem_names = []

docs/CI-GPU-RUNNERS.md

Lines changed: 165 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,165 @@
1+
# CI GPU Runner Infrastructure
2+
3+
This document explains how the GPU-based integration test infrastructure works for this repo.
4+
5+
## Overview
6+
7+
Integration tests require GPU hardware to run ML model inference. GPU VMs are expensive (~$1.62/hr for 3x T4), so they auto-scale to zero when idle. The system automatically starts runners when CI jobs need them and stops them after 15 minutes of inactivity.
8+
9+
## Architecture
10+
11+
```
12+
GitHub webhook (workflow_job.queued)
13+
14+
15+
Cloud Function (github-runner-manager)
16+
17+
├── Job has "gpu" label? → Start GPU runners (3x n1-standard-4 + T4)
18+
├── Job has "self-hosted" label? → Start CPU runners
19+
└── Neither? → Ignore
20+
21+
Cloud Scheduler (every 15 min)
22+
23+
24+
Cloud Function (?action=check_idle)
25+
26+
└── No pending jobs + runner idle > 15 min? → Stop runner
27+
```
28+
29+
### Components
30+
31+
| Component | Location | Purpose |
32+
|-----------|----------|---------|
33+
| Cloud Function | `karaoke-gen/infrastructure/functions/runner_manager/main.py` | Starts/stops runner VMs based on demand |
34+
| Pulumi module | `karaoke-gen/infrastructure/modules/runner_manager.py` | Deploys the function, scheduler, and IAM |
35+
| GPU VM definitions | `karaoke-gen/infrastructure/compute/github_runners.py` | 3x n1-standard-4 with T4 GPU |
36+
| GPU startup script | `karaoke-gen/infrastructure/compute/startup_scripts/github_runner_gpu.sh` | Installs NVIDIA drivers, Python, registers runner |
37+
| Config | `karaoke-gen/infrastructure/config.py` | Runner count, labels, idle timeout |
38+
| GitHub webhook | Org-level (`nomadkaraoke`) | Sends `workflow_job` events to Cloud Function |
39+
40+
### GPU Runner VMs
41+
42+
- **Count**: 3 (configurable via `NUM_GPU_RUNNERS` in config.py)
43+
- **Machine type**: n1-standard-4 (4 vCPU, 15GB RAM) + 1x NVIDIA T4
44+
- **Zone**: us-central1-a
45+
- **Labels**: `self-hosted, linux, x64, gcp, gpu`
46+
- **Startup time**: ~15-20 min (NVIDIA driver install, Python build, model download)
47+
- **Model cache**: ~14GB of ML models pre-downloaded to `/opt/audio-separator-models/`
48+
49+
### Required GitHub Branch Protection Checks
50+
51+
The `Protect main` ruleset (ID: 529535) requires these checks to pass before merge:
52+
53+
- `unit-tests` — from `run-unit-tests.yaml` (runs on GitHub-hosted runners)
54+
- `ensemble-presets` — from `run-integration-tests.yaml` (runs on GPU runners)
55+
- `core-models` — from `run-integration-tests.yaml` (runs on GPU runners)
56+
- `stems-and-quality` — from `run-integration-tests.yaml` (runs on GPU runners)
57+
58+
**IMPORTANT**: If integration test job names change (e.g., splitting or renaming jobs), you MUST update the ruleset to match. The ruleset is configured at:
59+
https://github.com/nomadkaraoke/python-audio-separator/settings/rules/529535
60+
61+
To update via API:
62+
```bash
63+
gh api repos/nomadkaraoke/python-audio-separator/rulesets/529535 \
64+
--method PUT --input - <<'EOF'
65+
{
66+
"name": "Protect main",
67+
"enforcement": "active",
68+
"target": "branch",
69+
"conditions": {"ref_name": {"include": ["~DEFAULT_BRANCH"], "exclude": []}},
70+
"rules": [
71+
{"type": "deletion"},
72+
{"type": "pull_request", "parameters": {
73+
"required_approving_review_count": 0,
74+
"allowed_merge_methods": ["squash"]
75+
}},
76+
{"type": "required_status_checks", "parameters": {
77+
"required_status_checks": [
78+
{"context": "unit-tests", "integration_id": 15368},
79+
{"context": "JOB_NAME_HERE", "integration_id": 15368}
80+
]
81+
}}
82+
]
83+
}
84+
EOF
85+
```
86+
87+
## Troubleshooting
88+
89+
### Integration tests stuck in "queued"
90+
91+
**Symptoms**: PR checks show `pending` for `ensemble-presets`, `core-models`, `stems-and-quality`.
92+
93+
**Diagnosis steps**:
94+
95+
1. Check if GPU runners are online:
96+
```bash
97+
gh api orgs/nomadkaraoke/actions/runners \
98+
--jq '.runners[] | select(.labels[].name == "gpu") | {name, status, busy}'
99+
```
100+
101+
2. Check if GPU VMs exist:
102+
```bash
103+
gcloud compute instances list --project=nomadkaraoke --filter="name~gpu"
104+
```
105+
106+
3. Check Cloud Function logs for webhook delivery:
107+
```bash
108+
gcloud logging read 'resource.labels.service_name="github-runner-manager"' \
109+
--project=nomadkaraoke --limit=20 \
110+
--format="value(timestamp,textPayload,jsonPayload.message)"
111+
```
112+
113+
4. Check GPU runner startup logs (if VMs are RUNNING but GitHub shows offline):
114+
```bash
115+
gcloud compute ssh github-gpu-runner-1 --zone=us-central1-a --project=nomadkaraoke \
116+
--command="tail -50 /var/log/github-runner-startup.log"
117+
```
118+
119+
### GPU VMs don't exist
120+
121+
If `gcloud compute instances list` shows no GPU runners but Pulumi state thinks they exist:
122+
123+
```bash
124+
# 1. Remove stale state (from karaoke-gen/infrastructure/ dir)
125+
pulumi state delete "urn:pulumi:prod::karaoke-gen-infrastructure::gcp:compute/instance:Instance::github-gpu-runner-1" --target-dependents --yes
126+
pulumi state delete "urn:pulumi:prod::karaoke-gen-infrastructure::gcp:compute/instance:Instance::github-gpu-runner-2" --target-dependents --yes
127+
pulumi state delete "urn:pulumi:prod::karaoke-gen-infrastructure::gcp:compute/instance:Instance::github-gpu-runner-3" --target-dependents --yes
128+
129+
# 2. Recreate
130+
pulumi up --yes
131+
132+
# 3. Re-import dependent resources that got removed (runner-manager function, IAM, scheduler)
133+
# Check `pulumi preview` for what needs importing
134+
```
135+
136+
### GPU runner startup fails (NVIDIA driver issues)
137+
138+
The startup script handles kernel header mismatches by upgrading the kernel and rebooting once. If the runner still fails:
139+
140+
```bash
141+
# SSH in and check
142+
gcloud compute ssh github-gpu-runner-1 --zone=us-central1-a --project=nomadkaraoke \
143+
--command="nvidia-smi; dkms status; uname -r"
144+
```
145+
146+
See `karaoke-gen` memory file `project_gpu_runner_drivers.md` for known issues.
147+
148+
### Webhook not firing
149+
150+
Check the org-level webhook configuration:
151+
```bash
152+
gh api orgs/nomadkaraoke/hooks \
153+
--jq '.[] | select(.events[] == "workflow_job") | {id, active, config: {url: .config.url}}'
154+
```
155+
156+
The webhook URL should point to: `https://us-central1-nomadkaraoke.cloudfunctions.net/github-runner-manager`
157+
158+
## Cost
159+
160+
| Scenario | Cost |
161+
|----------|------|
162+
| Per GPU runner hour | ~$0.54/hr (n1-standard-4 + T4) |
163+
| 3 runners × 15 min CI run | ~$0.41 |
164+
| Idle (scale to zero) | $0 |
165+
| Typical daily cost (5 PRs) | ~$2 |

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "poetry.core.masonry.api"
44

55
[tool.poetry]
66
name = "audio-separator"
7-
version = "0.43.0"
7+
version = "0.43.1"
88
description = "Easy to use audio stem separation, using various models from UVR trained primarily by @Anjok07"
99
authors = ["Andrew Beveridge <andrew@beveridge.uk>"]
1010
license = "MIT"

tests/reproduce_ensemble_bug.py

Lines changed: 113 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,113 @@
1+
"""
2+
Reproduce the ensemble + custom_output_names bug against the live API.
3+
4+
This script simulates exactly what karaoke-gen's audio_processor does:
5+
1. Call the API with preset=instrumental_clean and custom_output_names
6+
2. Download the results
7+
3. Check if the expected filenames exist
8+
9+
Expected behavior (fixed): files named job123_mixed_vocals.flac and job123_mixed_instrumental.flac
10+
Bug behavior (current prod): files named with original filename + _(Unknown)_ or _(Other)_
11+
12+
Usage:
13+
python tests/reproduce_ensemble_bug.py [--api-url URL]
14+
"""
15+
import json
16+
import os
17+
import sys
18+
import tempfile
19+
20+
# Add the repo to path so we can import the API client
21+
sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
22+
23+
from audio_separator.remote.api_client import AudioSeparatorAPIClient
24+
25+
26+
def main():
27+
api_url = os.environ.get("AUDIO_SEPARATOR_API_URL")
28+
if not api_url:
29+
print("ERROR: Set AUDIO_SEPARATOR_API_URL environment variable")
30+
sys.exit(1)
31+
32+
test_audio = os.path.join(os.path.dirname(os.path.abspath(__file__)), "inputs", "under_pressure_harmonies.flac")
33+
if not os.path.exists(test_audio):
34+
print(f"ERROR: Test audio file not found: {test_audio}")
35+
sys.exit(1)
36+
37+
with tempfile.TemporaryDirectory(prefix="ensemble_bug_test_") as output_dir:
38+
print(f"API URL: {api_url}")
39+
print(f"Output dir: {output_dir}")
40+
print()
41+
42+
import logging
43+
logging.basicConfig(level=logging.INFO, format="%(asctime)s %(levelname)s %(name)s: %(message)s")
44+
logger = logging.getLogger("test")
45+
46+
client = AudioSeparatorAPIClient(api_url, logger)
47+
48+
# This is exactly what karaoke-gen does in _process_audio_separation_remote
49+
file_prefix = "job123" # Simulates job_id-based prefix
50+
custom_output_names = {
51+
"Vocals": f"{file_prefix}_mixed_vocals",
52+
"Instrumental": f"{file_prefix}_mixed_instrumental",
53+
}
54+
55+
print("=" * 60)
56+
print("TEST: Preset + custom_output_names (reproduces karaoke-gen bug)")
57+
print(f" preset: instrumental_clean")
58+
print(f" custom_output_names: {custom_output_names}")
59+
print("=" * 60)
60+
print()
61+
62+
result = client.separate_audio_and_wait(
63+
test_audio,
64+
preset="instrumental_clean",
65+
timeout=600,
66+
poll_interval=10,
67+
download=True,
68+
output_dir=output_dir,
69+
output_format="flac",
70+
custom_output_names=custom_output_names,
71+
)
72+
73+
print()
74+
print("=" * 60)
75+
print("RESULTS")
76+
print("=" * 60)
77+
print(f"Status: {result.get('status')}")
78+
print(f"Downloaded files: {result.get('downloaded_files', [])}")
79+
print()
80+
81+
# List what's actually in the output dir
82+
actual_files = os.listdir(output_dir)
83+
print(f"Files in output dir: {actual_files}")
84+
print()
85+
86+
# Check for expected files
87+
fmt = "flac"
88+
expected_vocals = f"{file_prefix}_mixed_vocals.{fmt}"
89+
expected_instrumental = f"{file_prefix}_mixed_instrumental.{fmt}"
90+
91+
vocals_exists = os.path.exists(os.path.join(output_dir, expected_vocals))
92+
instrumental_exists = os.path.exists(os.path.join(output_dir, expected_instrumental))
93+
94+
print("EXPECTED FILE CHECK:")
95+
print(f" {expected_vocals}: {'FOUND' if vocals_exists else 'MISSING'}")
96+
print(f" {expected_instrumental}: {'FOUND' if instrumental_exists else 'MISSING'}")
97+
print()
98+
99+
if vocals_exists and instrumental_exists:
100+
print("RESULT: PASS - custom_output_names working correctly")
101+
return 0
102+
else:
103+
print("RESULT: FAIL - custom_output_names NOT applied (bug reproduced)")
104+
print()
105+
print("Actual files downloaded:")
106+
for f in actual_files:
107+
size = os.path.getsize(os.path.join(output_dir, f))
108+
print(f" {f} ({size / 1024:.1f} KB)")
109+
return 1
110+
111+
112+
if __name__ == "__main__":
113+
sys.exit(main())

tests/unit/test_stem_naming.py

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -161,3 +161,69 @@ def test_custom_ensemble_slug_generation(self):
161161
assert "Inst_HQ_5" in filename
162162
assert "karaoke_aufr" in filename
163163
assert filename.startswith("mardy20s_(Vocals)_custom_ensemble_")
164+
165+
166+
class TestEnsembleCustomOutputNames:
167+
"""Test that custom_output_names works correctly with ensemble separation."""
168+
169+
def test_custom_output_names_not_passed_to_intermediate_separation(self):
170+
"""Intermediate per-model separations must NOT receive custom_output_names.
171+
172+
custom_output_names replaces the default '_(StemType)_model' naming, which
173+
removes the _(StemType)_ markers needed by _separate_ensemble to classify
174+
stems. custom_output_names should only be applied to the final ensembled output.
175+
"""
176+
import re
177+
from unittest.mock import patch, MagicMock, call
178+
from audio_separator.separator.separator import Separator
179+
180+
sep = Separator(
181+
log_level=logging.WARNING,
182+
model_file_dir="/tmp/models",
183+
output_dir="/tmp/output",
184+
output_format="flac",
185+
)
186+
sep.model_filenames = ["model_a.ckpt", "model_b.ckpt"]
187+
sep.model_filename = ["model_a.ckpt", "model_b.ckpt"]
188+
sep.ensemble_algorithm = "uvr_max_spec"
189+
sep.ensemble_weights = None
190+
sep.ensemble_preset = "test_preset"
191+
sep.sample_rate = 44100
192+
193+
custom_names = {"Vocals": "job123_mixed_vocals", "Instrumental": "job123_mixed_instrumental"}
194+
195+
with patch.object(sep, '_separate_file') as mock_separate, \
196+
patch.object(sep, 'load_model'), \
197+
patch('audio_separator.separator.separator.Ensembler') as MockEnsembler, \
198+
patch('audio_separator.separator.separator.librosa') as mock_librosa, \
199+
patch('audio_separator.separator.separator.np') as mock_np:
200+
201+
# Mock _separate_file to return files with proper _(StemType)_ naming
202+
mock_separate.side_effect = [
203+
["/tmp/ensemble/song_(Vocals)_model_a.flac", "/tmp/ensemble/song_(Instrumental)_model_a.flac"],
204+
["/tmp/ensemble/song_(Vocals)_model_b.flac", "/tmp/ensemble/song_(Instrumental)_model_b.flac"],
205+
]
206+
207+
# Mock librosa and numpy for ensembling
208+
mock_wav = MagicMock()
209+
mock_wav.ndim = 2
210+
mock_wav.shape = (2, 44100)
211+
mock_librosa.load.return_value = (mock_wav, 44100)
212+
mock_np.asfortranarray.return_value = mock_wav
213+
214+
mock_ensembler = MagicMock()
215+
mock_ensembler.ensemble.return_value = mock_wav
216+
MockEnsembler.return_value = mock_ensembler
217+
218+
# Mock model_instance for write_audio
219+
sep.model_instance = MagicMock()
220+
sep.model_instance.output_dir = "/tmp/output"
221+
222+
sep._separate_ensemble("/tmp/song.flac", custom_output_names=custom_names)
223+
224+
# Key assertion: _separate_file must be called with None, not custom_names
225+
for call_args in mock_separate.call_args_list:
226+
assert call_args[0][1] is None, (
227+
f"_separate_file was called with custom_output_names={call_args[0][1]!r} "
228+
f"but should be None for intermediate ensemble files"
229+
)

0 commit comments

Comments
 (0)