ci: verify ephemeral GPU runner end-to-end by beveradb · Pull Request #287 · nomadkaraoke/python-audio-separator

beveradb · 2026-05-18T04:34:36Z

Summary

End-to-end verification for karaoke-gen PR #781 — the GPU image NVIDIA driver-load fix.

This PR is a single-line touch on audio_separator/separator/__init__.py whose only purpose is to trigger run-integration-tests.yaml's three self-hosted GPU jobs (ensemble-presets, core-models, stems-and-quality).

Expected behaviour

For each of the three jobs the ephemeral GHA runner dispatcher should:

Create a fresh GCE VM from the gha-runner-gpu image family (latest is gha-runner-gpu-20260518-035713).
Create it with Secure Boot OFF (per the enable_secure_boot=not family.has_gpu change in #781).
On first boot, gha-gpu-modprobe.service rebuilds nvidia.ko via DKMS against the running kernel (kernel skew between image-build and runtime is now handled), modprobes nvidia/nvidia_uvm/nvidia_drm, and verifies with nvidia-smi.
The "Verify GPU availability" step (nvidia-smi --query-gpu=driver_version,name,memory.total --format=csv,noheader) succeeds.
The actual integration tests run.

Smoke result before this PR

A smoke VM created by hand from gha-runner-gpu-20260518-035713 (SB off) booted successfully and nvidia-smi reported Tesla T4 / driver 595.71.05 / CUDA 13.2 as both root and as the runner user. Total boot-to-GPU-ready time: ~2 minutes.

Test plan

All three self-hosted GPU jobs pass (ensemble-presets, core-models, stems-and-quality)
Each job's "Verify GPU availability" step output shows the T4
No "Failed to query NVIDIA devices" or "Key was rejected by service" entries in the runner logs

@coderabbitai ignore

🤖 Generated with Claude Code

No-op touch on audio_separator/separator/__init__.py to fire the run-integration-tests workflow's three self-hosted GPU jobs (ensemble-presets, core-models, stems-and-quality). This is the end-to-end verification step for karaoke-gen PR #781 (GPU image NVIDIA driver-load fix + secure-boot disabled for GPU VMs). All three jobs should land on fresh ephemeral GPU VMs created by the updated dispatcher, the gha-gpu-modprobe.service should rebuild the NVIDIA module against the running kernel on first boot, and nvidia-smi should report Tesla T4 at the "Verify GPU availability" step. Smoke-test of the new image (`gha-runner-gpu-20260518-035713`) on 2026-05-18 already showed end-to-end success — this PR exercises it through the real dispatcher path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

beveradb · 2026-05-18T04:50:34Z

All 3 GPU integration jobs green — verifies karaoke-gen PR #781 end-to-end. Closing per the same convention as #286 (verification trigger, not a real change).

beveradb closed this May 18, 2026

beveradb deleted the feat/sess-20260518-0431-verify-ephemeral-gpu branch May 18, 2026 04:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ci: verify ephemeral GPU runner end-to-end#287

ci: verify ephemeral GPU runner end-to-end#287
beveradb wants to merge 1 commit into
mainfrom
feat/sess-20260518-0431-verify-ephemeral-gpu

beveradb commented May 18, 2026

Uh oh!

beveradb commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

beveradb commented May 18, 2026

Summary

Expected behaviour

Smoke result before this PR

Test plan

Uh oh!

beveradb commented May 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant