Improve error message for ManagedMemoryResource() on unsupported platforms#1835
Conversation
| raise RuntimeError( | ||
| "Getting the current memory pool for a memory location and " | ||
| "allocation type requires CUDA 13.0 or later" | ||
| ) |
There was a problem hiding this comment.
Unrelated fix, but a needed improvement.
When ManagedMemoryResource() is called without options on a platform where the default memory pool does not support managed allocations (e.g. WSL2), the error from cuMemGetMemPool is now caught and re-raised as a RuntimeError with actionable guidance. Made-with: Cursor
a2dc93d to
a986306
Compare
|
rwgk
left a comment
There was a problem hiding this comment.
Looks good to me.
One meta request for future PRs:
Could you please add me as a reviewer only after CI has passed? That would really help me respond quickly.
I have review requests flagged with red labels in Gmail, and I try to respond within an hour when I see them. However, the three PRs I reviewed today were assigned to reviewers from the moment they were opened. Yesterday, I checked several times, but there were CI failures and new commits while I was looking, so I ended up setting them aside for the day.
It would also help me gauge when my review is especially important if the reviewer list stayed small — for most PRs, I would expect just one reviewer. When the whole team is assigned immediately, it is harder for me to tell whether my review is specifically needed.
Closes #1617
Summary
ManagedMemoryResource()(no options) callscuMemGetMemPoolto retrieve the default managed memory pool, but on platforms without concurrent managed access (e.g. WSL2), this fails with a crypticCUDA_ERROR_NOT_SUPPORTED. Meanwhile, explicitly creating a pool viaManagedMemoryResource(options=ManagedMemoryResourceOptions(...))works fine on the same platform.This PR catches the error and re-raises it as a
RuntimeErrorwith an actionable message pointing users to the explicit options path. The improved message is only emitted when concurrent managed access is confirmed to be unavailable; otherwise the originalCUDAErrorpropagates unchanged.The error is identified via string match on the
CUDAErrormessage rather than inspecting a structured error code because (1) we prefer not to change theCUDAErrorclass or theMP_init_current_poolAPI for this, and (2) this is not a hot path.Changes
_managed_memory_resource.pyx— catchCUDAErrorfromMP_init_current_poolin theopts is Nonepath; checkconcurrent_managed_accessvia device properties and raise a clearRuntimeErrorwhen applicable_memory_pool.pyx— improve the CUDA < 13 fallback error message inMP_init_current_poolto describe the unsupported operationtest_managed_memory_warning.py— addtest_default_pool_error_without_concurrent_accessusing the existingdevice_without_concurrent_managed_accessfixtureTest Plan
concurrent_managed_access=False)Made with Cursor