Commit adc5539
feat: multi-model ensemble separation with 9 community-curated presets (#265)
* add ensembler
* refactor(ensembler): fix state mutation, handle mono input, and add fallback writer
* try fix test
* review comments
* review comments
* fix test
* fix: resolve ensemble PR review issues — CLI compat, state bugs, test coverage
- Revert -m to single value, add --extra_models for ensemble (fixes CLI breaking change)
- Initialize model_filename/model_filenames in __init__ (prevents AttributeError)
- Fix list reference copy in load_model (use list() instead of shared reference)
- Move original_output_dir capture outside per-model loop (state mutation fix)
- Extract stem name map to module-level STEM_NAME_MAP constant
- Preserve mono channel count through ensemble (avoid fake stereo)
- Add trailing newlines to all files
- Add 8 new unit tests: median/min/max_fft, uvr_max/min_spec, invalid algo, weight mismatch
- Add 3 CLI tests: --extra_models, single model string compat, old syntax backward compat
- Update README ensemble examples for new --extra_models flag
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat: add ensemble preset system with 9 community-curated presets
Add a JSON-based ensemble preset system that lets users select known-good
model combinations by name instead of specifying every detail manually.
Presets are sourced from deton24's community-maintained audio separation guide
and cover instrumental (4), vocal (4), and karaoke (1) use cases.
New features:
- ensemble_presets.json with 9 presets (instrumental_clean/full/balanced/low_resource,
vocal_balanced/clean/full/rvc, karaoke)
- --ensemble_preset CLI flag and Separator(ensemble_preset=...) Python API
- --list_presets CLI flag to show available presets
- Preset algorithm/weights can be overridden by explicit user args
- ensemble_algorithm parameter now accepts None (defaults to avg_wave)
- 10 new unit tests for preset loading, validation, override, JSON validity
- 2 new CLI tests for --ensemble_preset and --list_presets
- README updated with preset documentation and usage examples
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: correct stem labeling for ensemble — swap mismatched target_instrument, map "other" to "Instrumental"
Three fixes for stem name handling in ensemble mode:
1. common_separator.py: When a model's target_instrument doesn't match
instruments[0], swap primary/secondary stem names so the model's
prediction gets the correct label. Fixes bs_roformer_instrumental_
resurrection_unwa whose "vocals" output was actually instrumental.
2. separator.py: In _separate_ensemble, when a model produces exactly
2 stems and one is vocal-like, map "other" to "Instrumental" instead
of keeping it as a separate group. This ensures all 2-stem models
contribute to the same Vocals/Instrumental ensemble regardless of
whether they label their non-vocal stem "Instrumental" or "other".
3. separator.py: Use preset name in ensemble output filenames
(preset_<name>) and descriptive slugs for manual ensembles
(custom_ensemble_<slug1>_<slug2>).
Also adds tests/utils_audio_verification.py — a content verification
utility that correlates output stems against known references to detect
label mismatches programmatically.
Verified: all 9 presets now produce exactly 2 correctly-labeled stems
(18/18 OK, 0 mismatches).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add e2e integration tests for all 9 ensemble presets with reference spectrograms
- 36 reference spectrogram/waveform PNGs for 9 presets × 2 stems each
- test_ensemble_integration.py: parametrized test that for each preset:
1. Runs the preset separation on mardy20s.flac
2. Verifies stems contain correct content (correlation-based)
3. Compares spectrograms against committed references (SSIM)
- generate_reference_images_ensemble.py: script to regenerate references
- utils_audio_verification.py: content verification utility (already committed)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add P0/P1 tests for stem swap, preset validation, and filename logic
- 5 tests for CommonSeparator stem name swap (target_instrument mismatch,
no swap when matching, edge cases)
- 2 tests for STEM_NAME_MAP completeness and lowercase invariant
- 2 tests for ensemble output filename format (preset and custom slugs)
- 5 tests for preset validation edge cases (bad weights length, bad
algorithm, single model, weights applied, weights override)
Total: 233 unit tests passing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs: add ensemble_preset to Python API parameter reference in README
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* revert: restore original arg order in integration tests
The nargs="+" change on -m was reverted in favor of --extra_models,
so the old CLI arg order (audio-separator -m model audio.wav) works
again. No need to change these tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add on-demand regression test to verify stem labels for all 163 models
Runs every supported model on mardy20s.flac and verifies each output stem's
label matches its actual content using correlation against known vocal and
instrumental references.
Usage:
pytest tests/regression/test_all_models_stem_verification.py -v -s
pytest ... -k "VR" (single architecture)
pytest ... -k "resurrection" (single model)
STEM_VERIFY_REPORT_ONLY=1 pytest ... (report without failing)
Handles:
- Vocal/Instrumental stems: verified via Pearson correlation (>0.7 threshold)
- Sub-stems (drums, bass, guitar, piano): verified not-full-mix; near-silence OK
- Full mix detection: any stem with >0.95 correlation to original mix fails
- Demucs 6-stem models: sub-stems like Piano can be legitimately silent
Not run in CI — requires downloading all models.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: handle utility models and sub-stems in stem verification test
- Utility models (de-echo, de-noise, de-reverb, BVE) get relaxed
verification — their stems don't follow standard vocal/instrumental
patterns on clean source audio
- Sub-stems (drums, bass, guitar, "No X" variants) skip the full-mix
check since "No X" is legitimately ≈ the mix when X isn't present
- Partial vocal stems (backing/lead vocals) skip full-vocal correlation
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add missing stem types to verification test (drumsep, gender, aspiration, etc.)
Full 163-model run revealed stem types not yet in SUB_STEMS or UTILITY_STEMS:
- Drumsep: kick, snare, toms, hh, ride, crash
- Gender split: male, female
- Specialized: aspiration, bleed, no bleed
- Utility: noreverb
160 passed, 0 real failures, 3 skipped (download failures).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add 4 multi-instrument test clips for stem verification
New test input audio clips with diverse instrumentation for testing
instrument-specific separation models:
- levee_drums.flac (20s, 24-bit) — Led Zeppelin, drums+guitar+vocals
- clocks_piano.flac (20s, 16-bit) — Coldplay, piano+instruments+vocals
- sing_sing_sing_brass.flac (25s, 16-bit) — Benny Goodman, drums+brass+wind
- only_time_reverb.flac (25s, 16-bit) — Enya, reverb-heavy vocal+synths
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add multi-stem integration test framework with 30 reference stems
New integration test suite verifying instrument-specific separation models
across 4 test clips with diverse instrumentation:
Test matrix:
- Vocal/Instrumental: resurrection model on all 4 clips
- 4-stem (drums/bass/other/vocals): htdemucs_ft on levee + clocks
- DrumSep pipeline: mix → htdemucs_ft drums → drumsep kit parts
- Karaoke: aufr33/viperx model on levee + clocks
- Wind/Brass: 17_HP-Wind on sing_sing_sing
- De-reverb pipeline: mix → resurrection vocals → dereverb
30 reference stems generated by best-in-class models, committed as
tests/inputs/reference/ref_*.flac. Tests verify new model outputs
correlate > 0.70 with references.
Includes generate_multi_stem_references.py for regenerating references.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refine: karaoke test verifies output differs from standard vocal split
Karaoke models remove lead vocals while preserving backing vocals.
The test now additionally checks that karaoke vocal output differs
from standard vocal output (correlating < 0.95), confirming the model
is doing karaoke-specific extraction, not just a generic split.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add Under Pressure clip for karaoke backing vocal verification
Queen & David Bowie — Under Pressure 1:35-1:55 (20s, 16-bit).
Section has clear lead vocal over dense backing harmonies, making
karaoke vs standard vocal separation measurably different (0.740
correlation vs 0.961 for Clocks which lacks strong backing vocals).
Karaoke test now runs on 3 clips: levee, clocks, under_pressure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: bump version to 0.42.0 for ensemble feature release
New minor version for:
- Multi-model ensemble separation
- 9 community-curated ensemble presets
- Stem label fixes (target_instrument swap, contextual "other" mapping)
- New CLI flags: --extra_models, --ensemble_preset, --list_presets
- Multi-stem integration test framework
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test: add meaningful ensemble integration tests
Three tests verifying ensemble presets produce semantically correct output:
1. test_vocal_ensemble_matches_best_single_model: vocal_balanced ensemble
output should correlate >0.90 with the best single model (Resurrection),
confirming ensemble doesn't degrade quality.
2. test_karaoke_ensemble_extracts_lead_only: On Under Pressure (prominent
backing harmonies), karaoke ensemble vocals should differ from standard
vocal extraction (<0.90 correlation), confirming it extracts only lead.
3. test_karaoke_on_vocals_produces_lead_backing_split: Pipeline test —
mix → vocal model → karaoke model should produce distinct lead and
backing vocal stems (both non-silent, correlation <0.50).
Includes 9 new reference stems for these tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: fix find_stem pipeline matching and nan correlation in integration tests
find_stem() matched the first _(StemName) group in filenames, which broke
pipeline tests where the input filename already contained a parenthesized
stem from a prior step. Now uses the last match. Also handle near-silent
stems (e.g. vocals from instrumental-only audio) returning nan correlation.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: makhlwf <altrhwnyashrf1@gmail.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 4efec66 commit adc5539
103 files changed
Lines changed: 30440 additions & 14 deletions
File tree
- audio_separator
- separator
- utils
- docs
- tests
- inputs
- reference
- integration
- regression
- unit
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
| |||
318 | 318 | | |
319 | 319 | | |
320 | 320 | | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
| 378 | + | |
| 379 | + | |
| 380 | + | |
| 381 | + | |
| 382 | + | |
| 383 | + | |
| 384 | + | |
| 385 | + | |
| 386 | + | |
| 387 | + | |
| 388 | + | |
| 389 | + | |
| 390 | + | |
| 391 | + | |
| 392 | + | |
| 393 | + | |
| 394 | + | |
| 395 | + | |
| 396 | + | |
| 397 | + | |
| 398 | + | |
| 399 | + | |
| 400 | + | |
| 401 | + | |
| 402 | + | |
| 403 | + | |
| 404 | + | |
| 405 | + | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
321 | 416 | | |
322 | 417 | | |
323 | 418 | | |
| |||
525 | 620 | | |
526 | 621 | | |
527 | 622 | | |
| 623 | + | |
| 624 | + | |
| 625 | + | |
528 | 626 | | |
529 | 627 | | |
530 | 628 | | |
| |||
653 | 751 | | |
654 | 752 | | |
655 | 753 | | |
656 | | - | |
| 754 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
103 | 103 | | |
104 | 104 | | |
105 | 105 | | |
106 | | - | |
107 | | - | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
108 | 118 | | |
109 | 119 | | |
110 | 120 | | |
| |||
0 commit comments