fix(pt): Treat cuBLAS allocation failures as PyTorch OOM during auto bs#5440
fix(pt): Treat cuBLAS allocation failures as PyTorch OOM during auto bs#5440OutisLi wants to merge 1 commit into
Conversation
…batch sizing PyTorch inference can raise after an oversized batch attempt, especially during . Previously this was treated as a generic RuntimeError, so stopped after the first batch size reduction instead of continuing to shrink the inference batch. Add this cuBLAS allocation failure to the PyTorch auto-batch OOM markers and cover it with a unit test, allowing to continue retrying with smaller batch sizes.
There was a problem hiding this comment.
Pull request overview
Treats cuBLAS allocation failures (CUBLAS_STATUS_ALLOC_FAILED) as GPU OOM signals in the PyTorch auto-batch-sizing path, so inference can continue shrinking the batch size instead of bailing out on a generic RuntimeError.
Changes:
- Add
CUBLAS_STATUS_ALLOC_FAILEDto the PyTorch OOM marker substrings used byAutoBatchSize.is_oom_error. - Add a unit test asserting that this error string is classified as OOM and triggers
torch.cuda.empty_cache().
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
deepmd/pt/utils/auto_batch_size.py |
Extends the RuntimeError message markers considered OOM to include cuBLAS allocation failures. |
source/tests/pt/test_auto_batch_size.py |
Adds coverage to ensure the new cuBLAS allocation failure marker is treated as OOM and clears CUDA cache. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughThis PR expands CUDA out-of-memory error detection to recognize cuBLAS allocation failures. The ChangesCUBLAS OOM Detection
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~3 minutes Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #5440 +/- ##
==========================================
+ Coverage 82.50% 82.84% +0.34%
==========================================
Files 826 830 +4
Lines 87935 91245 +3310
Branches 4206 4376 +170
==========================================
+ Hits 72547 75591 +3044
- Misses 14104 14349 +245
- Partials 1284 1305 +21 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Treat cuBLAS allocation failures as PyTorch OOM during auto batch sizing
PyTorch inference can raise after an oversized
batch attempt, especially during . Previously this was treated as a generic RuntimeError, so stopped after the first batch size reduction instead of continuing to shrink the inference batch.
Add this cuBLAS allocation failure to the PyTorch auto-batch OOM markers and cover it with a unit test, allowing to continue retrying with smaller batch sizes.
Summary by CodeRabbit