Skip to content

Commit 194f67a

Browse files
beveradbclaude
andauthored
Modal → GCP audio separation migration (#273)
* Add Modal → GCP audio separation migration plan Plan for deploying audio-separator as a Cloud Run GPU service, replacing the current Modal deployment. Includes ensemble preset support and Dockerfile for Cloud Run. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add Cloud Run GPU deployment and preset support - deploy_cloudrun.py: FastAPI server adapted from deploy_modal.py for Cloud Run with L4 GPU. Same API contract, in-memory job tracking, GCS model download on startup, ensemble preset support. - Dockerfile.cloudrun: CUDA 12.6 runtime, Python 3.11, FFmpeg, audio-separator[gpu] - api_client.py: Add `preset` parameter to separate_audio() and separate_audio_and_wait() for ensemble preset-based separation - deploy-to-cloudrun.yml: CI workflow to build and push to Artifact Registry Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Bake models into Docker image, switch to us-east4 - Dockerfile: Download ensemble preset models (instrumental_clean + karaoke) during build. Eliminates cold-start model download. - Remove google-cloud-storage dependency (no GCS model download needed) - CI workflow: Switch to us-east4 (L4 GPU quota approved there) - Reduce startup probe period (models pre-loaded, startup is fast) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix Dockerfile multi-line RUN syntax for model download Use COPY heredoc + separate RUN instead of inline python -c which Docker can't parse with internal quotes/newlines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix Dockerfile for amd64/GPU, add Cloud Build config - Use Python 3.12 from deadsnakes PPA (onnxruntime-gpu needs >= 3.11) - Use apt ffmpeg instead of downloading static build - Add cloudbuild.yaml for building with baked models on GCP (E2_HIGHCPU_32 machine has enough RAM for model loading) - Update GHA workflow to use Cloud Build instead of local Docker Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix Dockerfile heredoc syntax for Cloud Build compatibility Cloud Build's Docker doesn't support BuildKit heredocs (COPY <<). Use a separate script file instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add --preset flag to remote CLI Adds -p/--preset as a mutually exclusive option alongside -m/--model and --models. Passes preset parameter through to API client. Usage: audio-separator-remote separate song.mp3 -p instrumental_clean Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix: strip stem markers from input filenames in Cloud Run server When chaining separations (Stage 1 output → Stage 2 input), the input filename contains stem markers like _(Vocals)_ that confuse the Separator's stem grouping regex, causing it to only output 1 stem instead of 2. Strip these markers before processing. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Trigger CI: GPU runner drivers fixed * Trigger CI: GPU runners fixed (kernel headers) * Trigger fresh CI run (re-runs inherit stale timeout) * Trigger CI: GPU runners properly fixed (kernel headers + poetry perms) * Validate GPU runner fixes (pip bootstrapped) * Validate GPU runners: Python toolcache rebuilt with pip * CI retry: idle check paused, runners stable --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent 3f58f95 commit 194f67a

8 files changed

Lines changed: 1186 additions & 5 deletions

File tree

Lines changed: 60 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,60 @@
1+
name: Deploy to Cloud Run
2+
3+
on:
4+
# Deploy when a new PyPI release is published
5+
workflow_run:
6+
workflows: ["Publish to PyPI"]
7+
types: [completed]
8+
9+
# Deploy on changes to Dockerfile or Cloud Run server
10+
push:
11+
branches: [main]
12+
paths:
13+
- "Dockerfile.cloudrun"
14+
- "audio_separator/remote/deploy_cloudrun.py"
15+
- "audio_separator/ensemble_presets.json"
16+
- "cloudbuild.yaml"
17+
18+
# Manual deployment
19+
workflow_dispatch:
20+
21+
jobs:
22+
deploy:
23+
runs-on: ubuntu-latest
24+
# Only run on successful PyPI publish (or push/manual triggers)
25+
if: ${{ github.event_name != 'workflow_run' || github.event.workflow_run.conclusion == 'success' }}
26+
27+
permissions:
28+
contents: read
29+
id-token: write # Required for Workload Identity Federation
30+
31+
steps:
32+
- name: Checkout code
33+
uses: actions/checkout@v4
34+
35+
- name: Authenticate to Google Cloud
36+
uses: google-github-actions/auth@v2
37+
with:
38+
workload_identity_provider: ${{ secrets.GCP_WORKLOAD_IDENTITY_PROVIDER }}
39+
service_account: ${{ secrets.GCP_SERVICE_ACCOUNT }}
40+
41+
- name: Set up Cloud SDK
42+
uses: google-github-actions/setup-gcloud@v2
43+
44+
# Use Cloud Build for the Docker build — it has native x86 with enough
45+
# RAM to load ML models during the build (baking models into the image).
46+
- name: Build and push via Cloud Build
47+
run: |
48+
gcloud builds submit \
49+
--config cloudbuild.yaml \
50+
--region=us-east4 \
51+
--project=nomadkaraoke \
52+
--substitutions=SHORT_SHA=${GITHUB_SHA::8}
53+
54+
- name: Deploy to Cloud Run
55+
run: |
56+
gcloud run services update audio-separator \
57+
--image="us-east4-docker.pkg.dev/nomadkaraoke/audio-separator/api:${GITHUB_SHA::8}" \
58+
--region=us-east4 \
59+
--project=nomadkaraoke \
60+
--quiet

Dockerfile.cloudrun

Lines changed: 94 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,94 @@
1+
# Audio Separator API - Cloud Run GPU Deployment
2+
# Optimized for NVIDIA L4 GPU on Google Cloud Run
3+
#
4+
# Models are baked into the image for zero cold-start latency.
5+
# To update models, rebuild the image.
6+
#
7+
# Build: docker build -f Dockerfile.cloudrun -t audio-separator-cloudrun .
8+
# Run: docker run --gpus all -p 8080:8080 audio-separator-cloudrun
9+
10+
FROM nvidia/cuda:12.6.3-runtime-ubuntu22.04
11+
12+
# Prevent interactive prompts during package installation
13+
ENV DEBIAN_FRONTEND=noninteractive
14+
15+
# Install Python 3.12 from deadsnakes PPA (onnxruntime-gpu requires >= 3.11)
16+
# and system dependencies
17+
RUN apt-get update && apt-get install -y --no-install-recommends \
18+
software-properties-common \
19+
&& add-apt-repository -y ppa:deadsnakes/ppa \
20+
&& apt-get update && apt-get install -y --no-install-recommends \
21+
# Python 3.12
22+
python3.12 \
23+
python3.12-dev \
24+
python3.12-venv \
25+
# FFmpeg
26+
ffmpeg \
27+
# Audio libraries
28+
libsndfile1 \
29+
libsndfile1-dev \
30+
libsox-dev \
31+
sox \
32+
libportaudio2 \
33+
portaudio19-dev \
34+
libasound2-dev \
35+
libpulse-dev \
36+
libjack-dev \
37+
libsamplerate0 \
38+
libsamplerate0-dev \
39+
# Build tools (for compiling Python packages with C extensions)
40+
build-essential \
41+
gcc \
42+
g++ \
43+
pkg-config \
44+
# Utilities
45+
curl \
46+
&& rm -rf /var/lib/apt/lists/* \
47+
&& python3.12 --version && ffmpeg -version
48+
49+
# Set Python 3.12 as default and install pip
50+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.12 1 \
51+
&& update-alternatives --install /usr/bin/python python /usr/bin/python3.12 1 \
52+
&& curl -sS https://bootstrap.pypa.io/get-pip.py | python3.12 \
53+
&& python3 -m pip install --no-cache-dir --upgrade pip setuptools wheel
54+
55+
# Install audio-separator with GPU support and API dependencies
56+
COPY . /tmp/audio-separator-src
57+
RUN cd /tmp/audio-separator-src \
58+
&& pip install --no-cache-dir ".[gpu]" \
59+
&& pip install --no-cache-dir \
60+
"fastapi>=0.104.0" \
61+
"uvicorn[standard]>=0.24.0" \
62+
"python-multipart>=0.0.6" \
63+
"filetype>=1.2.0" \
64+
&& rm -rf /tmp/audio-separator-src
65+
66+
# Set up CUDA library paths
67+
RUN echo '/usr/local/cuda/lib64' >> /etc/ld.so.conf.d/cuda.conf && ldconfig
68+
69+
# Environment configuration
70+
ENV MODEL_DIR=/models \
71+
STORAGE_DIR=/tmp/storage \
72+
PORT=8080 \
73+
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH \
74+
PATH=/usr/local/cuda/bin:$PATH \
75+
PYTHONUNBUFFERED=1
76+
77+
# Create directories
78+
RUN mkdir -p /models /tmp/storage/outputs
79+
80+
# Bake ensemble preset models into the image.
81+
# These are the models used by the default presets (instrumental_clean + karaoke).
82+
# Total: ~1-1.5 GB. This eliminates cold-start model download time.
83+
COPY scripts/download_preset_models.py /tmp/download_preset_models.py
84+
RUN python3 /tmp/download_preset_models.py && rm /tmp/download_preset_models.py && ls -lh /models/
85+
86+
# Expose Cloud Run default port
87+
EXPOSE 8080
88+
89+
# Health check for container orchestration
90+
HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
91+
CMD curl -f http://localhost:8080/health || exit 1
92+
93+
# Run the API server
94+
CMD ["python3", "-m", "audio_separator.remote.deploy_cloudrun"]

audio_separator/remote/api_client.py

Lines changed: 12 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ def separate_audio(
3232
file_path: str,
3333
model: Optional[str] = None,
3434
models: Optional[List[str]] = None,
35+
preset: Optional[str] = None,
3536
# Output parameters
3637
output_format: str = "flac",
3738
output_bitrate: Optional[str] = None,
@@ -76,8 +77,10 @@ def separate_audio(
7677
files = {"file": (os.path.basename(file_path), open(file_path, "rb"))}
7778
data = {}
7879

79-
# Handle model parameters (backwards compatibility)
80-
if models:
80+
# Handle model/preset parameters
81+
if preset:
82+
data["preset"] = preset
83+
elif models:
8184
data["models"] = json.dumps(models)
8285
elif model:
8386
data["model"] = model
@@ -144,6 +147,7 @@ def separate_audio_and_wait(
144147
file_path: str,
145148
model: Optional[str] = None,
146149
models: Optional[List[str]] = None,
150+
preset: Optional[str] = None,
147151
timeout: int = 600,
148152
poll_interval: int = 10,
149153
download: bool = True,
@@ -208,13 +212,17 @@ def separate_audio_and_wait(
208212
import time
209213

210214
# Submit the separation job with all parameters
211-
models_desc = models or ([model] if model else ["default"])
212-
self.logger.info(f"Submitting separation job for '{file_path}' with models: {models_desc} (audio-separator v{AUDIO_SEPARATOR_VERSION})")
215+
if preset:
216+
models_desc = f"preset:{preset}"
217+
else:
218+
models_desc = models or ([model] if model else ["default"])
219+
self.logger.info(f"Submitting separation job for '{file_path}' with {models_desc} (audio-separator v{AUDIO_SEPARATOR_VERSION})")
213220

214221
result = self.separate_audio(
215222
file_path,
216223
model,
217224
models,
225+
preset,
218226
output_format,
219227
output_bitrate,
220228
normalization_threshold,

audio_separator/remote/cli.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,8 +30,9 @@ def main():
3030
separate_parser = subparsers.add_parser("separate", help="Separate audio files")
3131
separate_parser.add_argument("audio_files", nargs="+", help="Audio file paths to separate")
3232

33-
# Model selection
33+
# Model selection (mutually exclusive: preset, single model, or multiple models)
3434
model_group = separate_parser.add_mutually_exclusive_group()
35+
model_group.add_argument("-p", "--preset", help="Ensemble preset name (e.g. instrumental_clean, karaoke, vocal_balanced)")
3536
model_group.add_argument("-m", "--model", help="Single model to use for separation")
3637
model_group.add_argument("--models", nargs="+", help="Multiple models to use for separation")
3738

@@ -168,6 +169,7 @@ def handle_separate_command(args, api_client: AudioSeparatorAPIClient, logger: l
168169
kwargs = {
169170
"model": args.model,
170171
"models": args.models,
172+
"preset": args.preset,
171173
"timeout": args.timeout,
172174
"poll_interval": args.poll_interval,
173175
"download": True, # Always download in CLI

0 commit comments

Comments
 (0)