Skip to content

Improve error message for ManagedMemoryResource() on unsupported platforms#1835

Merged
Andy-Jost merged 1 commit into
NVIDIA:mainfrom
Andy-Jost:managed-mr-error-message
Apr 1, 2026
Merged

Improve error message for ManagedMemoryResource() on unsupported platforms#1835
Andy-Jost merged 1 commit into
NVIDIA:mainfrom
Andy-Jost:managed-mr-error-message

Conversation

@Andy-Jost

@Andy-Jost Andy-Jost commented Mar 30, 2026

Copy link
Copy Markdown
Contributor

Closes #1617

Summary

ManagedMemoryResource() (no options) calls cuMemGetMemPool to retrieve the default managed memory pool, but on platforms without concurrent managed access (e.g. WSL2), this fails with a cryptic CUDA_ERROR_NOT_SUPPORTED. Meanwhile, explicitly creating a pool via ManagedMemoryResource(options=ManagedMemoryResourceOptions(...)) works fine on the same platform.

This PR catches the error and re-raises it as a RuntimeError with an actionable message pointing users to the explicit options path. The improved message is only emitted when concurrent managed access is confirmed to be unavailable; otherwise the original CUDAError propagates unchanged.

The error is identified via string match on the CUDAError message rather than inspecting a structured error code because (1) we prefer not to change the CUDAError class or the MP_init_current_pool API for this, and (2) this is not a hot path.

Changes

  • _managed_memory_resource.pyx — catch CUDAError from MP_init_current_pool in the opts is None path; check concurrent_managed_access via device properties and raise a clear RuntimeError when applicable
  • _memory_pool.pyx — improve the CUDA < 13 fallback error message in MP_init_current_pool to describe the unsupported operation
  • test_managed_memory_warning.py — add test_default_pool_error_without_concurrent_access using the existing device_without_concurrent_managed_access fixture

Test Plan

  • Reproduced on WSL2 (RTX 3500 Ada, concurrent_managed_access=False)
  • Verified fix produces the improved error message
  • CI

Made with Cursor

@Andy-Jost Andy-Jost added this to the cuda.core v0.7.0 milestone Mar 30, 2026
@Andy-Jost Andy-Jost added bug Something isn't working P1 Medium priority - Should do cuda.core Everything related to the cuda.core module labels Mar 30, 2026
@Andy-Jost Andy-Jost self-assigned this Mar 30, 2026
@Andy-Jost Andy-Jost requested review from cpcloud, leofang, mdboom, rparolin and rwgk and removed request for leofang March 30, 2026 20:10
Comment on lines +260 to +263
raise RuntimeError(
"Getting the current memory pool for a memory location and "
"allocation type requires CUDA 13.0 or later"
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unrelated fix, but a needed improvement.

When ManagedMemoryResource() is called without options on a platform
where the default memory pool does not support managed allocations
(e.g. WSL2), the error from cuMemGetMemPool is now caught and
re-raised as a RuntimeError with actionable guidance.

Made-with: Cursor
@Andy-Jost Andy-Jost force-pushed the managed-mr-error-message branch from a2dc93d to a986306 Compare March 30, 2026 20:18
@github-actions

Copy link
Copy Markdown

@rwgk rwgk left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

One meta request for future PRs:

Could you please add me as a reviewer only after CI has passed? That would really help me respond quickly.

I have review requests flagged with red labels in Gmail, and I try to respond within an hour when I see them. However, the three PRs I reviewed today were assigned to reviewers from the moment they were opened. Yesterday, I checked several times, but there were CI failures and new commits while I was looking, so I ended up setting them aside for the day.

It would also help me gauge when my review is especially important if the reviewer list stayed small — for most PRs, I would expect just one reviewer. When the whole team is assigned immediately, it is harder for me to tell whether my review is specifically needed.

@Andy-Jost Andy-Jost merged commit a81fd07 into NVIDIA:main Apr 1, 2026
186 of 214 checks passed
@Andy-Jost Andy-Jost deleted the managed-mr-error-message branch April 1, 2026 19:07
github-actions Bot pushed a commit that referenced this pull request Apr 2, 2026
Removed preview folders for the following PRs:
- PR #1835
- PR #1842
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working cuda.core Everything related to the cuda.core module P1 Medium priority - Should do

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ManagedMemoryResource() without options fails when the default pool does not support managed allocations

2 participants