feat: add Blackwell GPU arch mismatch diagnostic by neuron-tech-ai · Pull Request #653 · jamiepine/voicebox

neuron-tech-ai · 2026-05-14T20:28:42Z

RTX 5000-series (Blackwell, sm_100) GPUs produce a CUDA arch mismatch error during model load that surfaces as a generic crash — confusing for users who just got a new GPU and expect it to work.

This adds detection for the specific error pattern and shows a clear user-facing message: the installed CUDA libraries were compiled for an older architecture and need to be updated. Points to the resolution path instead of leaving the user with a raw stack trace.

Affected: Windows and Linux users with RTX 5080/5090 (Blackwell, compute capability 10.0).

Summary by CodeRabbit

New Features
- Enhanced GPU architecture compatibility detection with customized warning messages based on GPU type
- Blackwell-class GPU users receive specific instructions to download the correct CUDA binary from Settings
Bug Fixes
- Improved CUDA compatibility checking with clearer guidance when GPU architecture doesn't match installed CUDA binary
- More informative error messages for unsupported GPU configurations

Add get_cuda_arch() to platform_detect.py. Improve check_cuda_compatibility() warning to give Blackwell (sm_120+) users a clear re-download path via Settings → Server → GPU Acceleration. Surface cuda_arch_warning on every ModelStatus entry in /models/status so the frontend can highlight it.

coderabbitai · 2026-05-14T20:28:53Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 188f1030-5f76-4341-a2d1-8ad5a225bda9

📥 Commits

Reviewing files that changed from the base of the PR and between b35b909 and aa95089.

📒 Files selected for processing (4)

backend/backends/base.py
backend/models.py
backend/routes/models.py
backend/utils/platform_detect.py

📝 Walkthrough

Walkthrough

This PR adds GPU CUDA architecture compatibility detection and reporting. It introduces runtime GPU architecture probing via torch, enhances CUDA compatibility checking with GPU-specific warning messages (especially for Blackwell architectures), and integrates the warnings into the model-status API response.

Changes

GPU CUDA Compatibility Warning System

Layer / File(s)	Summary
GPU Architecture Detection `backend/utils/platform_detect.py`	New `get_cuda_arch()` function detects primary GPU compute capability at runtime and returns an SM architecture string (e.g., `sm_90`) or `None` when CUDA is unavailable or an error occurs.
CUDA Compatibility Warning Logic `backend/backends/base.py`	`check_cuda_compatibility()` now constructs targeted warning messages: Blackwell GPUs (`major >= 12`) receive a "re-download from Settings → Server → GPU Acceleration" message; other unsupported GPUs receive a "download compatible CUDA binary" message.
Model Status API Response Integration `backend/models.py`, `backend/routes/models.py`	`ModelStatus` adds optional `cuda_arch_warning` field. `get_model_status()` computes the warning by checking CUDA availability and calling `check_cuda_compatibility()`, then includes it in both the primary success path and exception fallback path.

Sequence Diagram

sequenceDiagram
  participant Client
  participant get_model_status
  participant check_cuda_compatibility
  participant get_cuda_arch
  participant CUDA_Device
  
  Client->>get_model_status: Request model status
  get_model_status->>get_cuda_arch: Detect GPU architecture
  get_cuda_arch->>CUDA_Device: torch.cuda.get_device_capability()
  CUDA_Device-->>get_cuda_arch: (major, minor) compute capability
  get_cuda_arch-->>get_model_status: sm_90 (or None if unavailable)
  get_model_status->>check_cuda_compatibility: Check PyTorch build support
  check_cuda_compatibility-->>get_model_status: (compatible: bool, warning: Optional[str])
  get_model_status-->>Client: ModelStatus with cuda_arch_warning field

Estimated Code Review Effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Poem

🐰 A GPU speaks its secret name,
SM architectures, Blackwell's claim!
PyTorch listens, checks the tune,
Warnings flutter to the moon.
Users download, all is well—
CUDA harmony does swell! 🚀

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically summarizes the main change: adding a diagnostic for Blackwell GPU architecture mismatch.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add Blackwell GPU arch mismatch diagnostic#653

feat: add Blackwell GPU arch mismatch diagnostic#653
neuron-tech-ai wants to merge 1 commit into
jamiepine:mainfrom
neuron-tech-ai:feat/blackwell-diagnostic

neuron-tech-ai commented May 14, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented May 14, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neuron-tech-ai commented May 14, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated Code Review Effort

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

neuron-tech-ai commented May 14, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 14, 2026 •

edited

Loading