Conversation
Remove dependency on rmm::mr::device_memory_resource base class. Resources now satisfy the cuda::mr::resource concept directly. - Replace shared_ptr<device_memory_resource> with value types and cuda::mr::any_resource<cuda::mr::device_accessible> for type-erased storage - Replace set_current_device_resource(ptr) with set_current_device_resource_ref - Replace set_per_device_resource(id, ptr) with set_per_device_resource_ref - Remove make_owning_wrapper usage - Remove dynamic_cast on memory resources (no common base class) - Remove owning_wrapper.hpp and device_memory_resource.hpp includes - Add missing thrust/iterator/transform_output_iterator.h include (no longer transitively included via CCCL)
|
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
# Conflicts: # cpp/src/utilities/cuda_helpers.cuh
a624829 to
8fe66e5
Compare
3c4bbbe to
717e877
Compare
717e877 to
b42797f
Compare
|
/ok to test |
| mem_handler.device_alloc = cudss_device_alloc<void>; | ||
| mem_handler.device_free = cudss_device_dealloc<void>; |
There was a problem hiding this comment.
It doesn't seem like this template parameter is being used. The rmm::mr::device_memory_resource class has been removed, but this change in an unused template parameter should have no effect.
| } else { | ||
| return total_mem; | ||
| } | ||
| return total_mem; |
There was a problem hiding this comment.
We cannot currently perform this kind of dynamic cast with the CCCL memory resource design. There is a CCCL feature that we can adopt soon to re-enable this functionality, which I expect to be available for 26.06. Depending on how the migration goes, I may be able to restore this as soon as next week. I have marked this as a TODO on my side.
| return total_mem; | |
| // TODO (bdice): Restore limiting adaptor check | |
| return total_mem; |
📝 WalkthroughWalkthroughThis pull request updates CI scripts to source new helper scripts that manage PR-specific conda packages and wheels, and refactors C++ GPU memory resource handling from shared pointer-based to direct object instances. Two new helper scripts are introduced for artifact management. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 6
🧹 Nitpick comments (2)
cpp/cuopt_cli.cpp (1)
69-72: Dead code:make_async()is now unused.The
make_async()helper function at line 71 is no longer called anywhere in this file after the migration to value-type memory resources. Consider removing it.🧹 Proposed removal
-/** - * `@brief` Make an async memory resource for RMM - * `@return` std::shared_ptr<rmm::mr::cuda_async_memory_resource> - */ -inline auto make_async() { return std::make_shared<rmm::mr::cuda_async_memory_resource>(); } -🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/cuopt_cli.cpp` around lines 69 - 72, The helper function make_async() is now unused and should be removed: delete the inline auto make_async() { return std::make_shared<rmm::mr::cuda_async_memory_resource>(); } declaration/definition to eliminate dead code and any associated includes or comments that only served it; ensure no remaining references to make_async exist and run a build to confirm no usages remain.cpp/src/pdlp/termination_strategy/infeasibility_information.cu (1)
18-19: Remove the duplicate Thrust include.Line 18 adds
<thrust/iterator/transform_output_iterator.h>, which is already included at Line 32. Keeping only one include improves clarity.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@cpp/src/pdlp/termination_strategy/infeasibility_information.cu` around lines 18 - 19, Remove the duplicate Thrust include by deleting the redundant `#include` for <thrust/iterator/transform_output_iterator.h> so only the single existing include remains; locate the duplicate include line (the <thrust/iterator/transform_output_iterator.h> directive) in infeasibility_information.cu and remove it, leaving the original include at the later occurrence intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@ci/build_wheel_libcuopt.sh`:
- Around line 8-9: The script currently sources ci/use_wheels_from_prs.sh which
appends to PIP_CONSTRAINT, but later reassigns PIP_CONSTRAINT to
/tmp/constraints.txt, dropping those entries; update the script so the
PR-sourced constraints are merged into the final /tmp/constraints.txt instead of
being overwritten—e.g., after sourcing ci/use_wheels_from_prs.sh, if
PIP_CONSTRAINT contains extra constraints, append or merge them into the file
referenced by PIP_CONSTRAINT (/tmp/constraints.txt) or change the later
assignment to preserve existing values; refer to the PIP_CONSTRAINT variable and
the ci/use_wheels_from_prs.sh source to locate where to merge rather than
overwrite.
In `@ci/test_wheel_cuopt.sh`:
- Line 11: The script sources wheel pins from use_wheels_from_prs.sh into
${PIP_CONSTRAINT} but then clobbers them when recreating ${PIP_CONSTRAINT};
update the creation of ${PIP_CONSTRAINT} so it preserves existing pins (from
use_wheels_from_prs.sh) instead of overwriting—e.g., append the new constraints
to ${PIP_CONSTRAINT} (or merge existing content) rather than using a destructive
write; key symbols: use_wheels_from_prs.sh and the ${PIP_CONSTRAINT} write
(currently using cat >) should be changed to an append/merge approach (cat >> or
equivalent).
In `@ci/use_conda_packages_from_prs.sh`:
- Around line 6-11: The script hard-codes PR artifact IDs in the calls that
populate LIBRMM_CHANNEL, RMM_CHANNEL, LIBUCXX_CHANNEL, UCXX_CHANNEL,
LIBRAFT_CHANNEL and RAFT_CHANNEL; change these invocations of
rapids-get-pr-artifact to take PR IDs from environment variables or script
arguments (e.g. PR_RMM, PR_UCXX, PR_RAFT) instead of literal numbers, and
implement a clear fallback behavior: if a PR ID var is unset or empty, skip
requesting the PR artifact and fall back to the stable channel (or exit with a
clear error if that is preferred for your CI policy). Ensure the logic that
constructs those six variables checks the env vars before calling
rapids-get-pr-artifact and documents the new environment variable names.
In `@ci/use_wheels_from_prs.sh`:
- Around line 7-24: Replace the hardcoded external PR IDs used in the
rapids-get-pr-artifact invocations (the numeric IDs in the calls that populate
LIBRMM_WHEELHOUSE, RMM_WHEELHOUSE, LIBUCXX_WHEELHOUSE, UCXX_WHEELHOUSE,
LIBRAFT_WHEELHOUSE, RAFT_WHEELHOUSE) with configurable inputs: read PR IDs from
environment variables (e.g., RMM_PR_ID, UCXX_PR_ID, RAFT_PR_ID) with sensible
fallbacks (empty/none meaning "use stable/latest" or resolve dynamically), and
update the rapids-get-pr-artifact calls to use those variables instead of
literal numbers; also ensure the script logs or skips gracefully if an artifact
cannot be found so CI jobs aren’t tightly coupled to fixed external PR
artifacts.
In `@cpp/src/utilities/cuda_helpers.cuh`:
- Around line 241-246: The cudaMemGetInfo call in get_device_memory_size lacks
CUDA error checking; wrap the call with the RAFT_CUDA_TRY macro (or equivalent
from <raft/util/cudart_utils.hpp>) so any cudaMemGetInfo failure is
propagated/logged, e.g., call RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem,
&total_mem)); then return total_mem as before; update get_device_memory_size to
use RAFT_CUDA_TRY to satisfy the coding guideline and existing header utilities.
In `@cpp/tests/utilities/base_fixture.hpp`:
- Around line 29-47: make_pool() and make_binning() currently return
memory_resource adaptors that hold non-owning references to temporaries (from
make_async() and the local pool), causing dangling references; change the
factory functions to return owning wrappers that keep upstream resources alive
(e.g., std::shared_ptr or rmm::mr::any_resource) and construct the pool/bins
from those owned resources instead of temporaries—update
make_async()/make_cuda()/make_managed() if needed to return owned resources or
wrappers, then have make_pool() create and retain the async resource in an
owning container and return the pool as an owning object, and have
make_binning() accept/hold the pool owner (or build and return a single owning
wrapper representing the whole chain) so downstream users never reference
destroyed temporaries.
---
Nitpick comments:
In `@cpp/cuopt_cli.cpp`:
- Around line 69-72: The helper function make_async() is now unused and should
be removed: delete the inline auto make_async() { return
std::make_shared<rmm::mr::cuda_async_memory_resource>(); }
declaration/definition to eliminate dead code and any associated includes or
comments that only served it; ensure no remaining references to make_async exist
and run a build to confirm no usages remain.
In `@cpp/src/pdlp/termination_strategy/infeasibility_information.cu`:
- Around line 18-19: Remove the duplicate Thrust include by deleting the
redundant `#include` for <thrust/iterator/transform_output_iterator.h> so only the
single existing include remains; locate the duplicate include line (the
<thrust/iterator/transform_output_iterator.h> directive) in
infeasibility_information.cu and remove it, leaving the original include at the
later occurrence intact.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 466802a0-542e-4646-b49c-c1f6235231c7
📒 Files selected for processing (26)
ci/build_cpp.shci/build_docs.shci/build_python.shci/build_wheel_cuopt.shci/build_wheel_cuopt_mps_parser.shci/build_wheel_cuopt_server.shci/build_wheel_cuopt_sh_client.shci/build_wheel_libcuopt.shci/test_cpp.shci/test_cpp_memcheck.shci/test_notebooks.shci/test_python.shci/test_self_hosted_service.shci/test_skills_assets.shci/test_wheel_cuopt.shci/test_wheel_cuopt_server.shci/use_conda_packages_from_prs.shci/use_wheels_from_prs.shcpp/cuopt_cli.cppcpp/src/barrier/sparse_cholesky.cuhcpp/src/pdlp/termination_strategy/infeasibility_information.cucpp/src/routing/ges_solver.cucpp/src/utilities/cuda_helpers.cuhcpp/tests/mip/load_balancing_test.cucpp/tests/mip/multi_probe_test.cucpp/tests/utilities/base_fixture.hpp
💤 Files with no reviewable changes (1)
- cpp/src/routing/ges_solver.cu
| source ./ci/use_wheels_from_prs.sh | ||
|
|
There was a problem hiding this comment.
PR wheel constraints are likely dropped later due to PIP_CONSTRAINT reassignment.
Line 8 sources ci/use_wheels_from_prs.sh, which appends constraints to the current PIP_CONSTRAINT. But Line 48 later points PIP_CONSTRAINT to /tmp/constraints.txt, so those earlier entries are no longer used during wheel build.
Suggested fix
CUOPT_MPS_PARSER_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="cuopt_mps_parser" rapids-download-wheels-from-github python)
-echo "cuopt-mps-parser @ file://$(echo ${CUOPT_MPS_PARSER_WHEELHOUSE}/cuopt_mps_parser*.whl)" >> /tmp/constraints.txt
-export PIP_CONSTRAINT="/tmp/constraints.txt"
+echo "cuopt-mps-parser @ file://$(echo "${CUOPT_MPS_PARSER_WHEELHOUSE}"/cuopt_mps_parser*.whl)" >> "${PIP_CONSTRAINT}"🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 8-8: Not following: ./ci/use_wheels_from_prs.sh was not specified as input (see shellcheck -x).
(SC1091)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ci/build_wheel_libcuopt.sh` around lines 8 - 9, The script currently sources
ci/use_wheels_from_prs.sh which appends to PIP_CONSTRAINT, but later reassigns
PIP_CONSTRAINT to /tmp/constraints.txt, dropping those entries; update the
script so the PR-sourced constraints are merged into the final
/tmp/constraints.txt instead of being overwritten—e.g., after sourcing
ci/use_wheels_from_prs.sh, if PIP_CONSTRAINT contains extra constraints, append
or merge them into the file referenced by PIP_CONSTRAINT (/tmp/constraints.txt)
or change the later assignment to preserve existing values; refer to the
PIP_CONSTRAINT variable and the ci/use_wheels_from_prs.sh source to locate where
to merge rather than overwrite.
| # sets up a constraints file for 'pip' and puts its location in an exported variable PIP_EXPORT, | ||
| # so those constraints will affect all future 'pip install' calls | ||
| source rapids-init-pip | ||
| source ./ci/use_wheels_from_prs.sh |
There was a problem hiding this comment.
The sourced PR wheel constraints are overwritten later in this script.
Line 11 appends wheel pins via use_wheels_from_prs.sh, but Line 22 recreates ${PIP_CONSTRAINT} with cat >, so those pins are lost before install.
Minimal fix
-cat > "${PIP_CONSTRAINT}" <<EOF
+cat >> "${PIP_CONSTRAINT}" <<EOF🧰 Tools
🪛 Shellcheck (0.11.0)
[info] 11-11: Not following: ./ci/use_wheels_from_prs.sh was not specified as input (see shellcheck -x).
(SC1091)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ci/test_wheel_cuopt.sh` at line 11, The script sources wheel pins from
use_wheels_from_prs.sh into ${PIP_CONSTRAINT} but then clobbers them when
recreating ${PIP_CONSTRAINT}; update the creation of ${PIP_CONSTRAINT} so it
preserves existing pins (from use_wheels_from_prs.sh) instead of
overwriting—e.g., append the new constraints to ${PIP_CONSTRAINT} (or merge
existing content) rather than using a destructive write; key symbols:
use_wheels_from_prs.sh and the ${PIP_CONSTRAINT} write (currently using cat >)
should be changed to an append/merge approach (cat >> or equivalent).
| LIBRMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 cpp conda) | ||
| RMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 python conda --stable) | ||
| LIBUCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 cpp conda) | ||
| UCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 python conda --stable) | ||
| LIBRAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 cpp conda) | ||
| RAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 python conda --stable) |
There was a problem hiding this comment.
Avoid hard-coding PR artifact IDs in shared CI setup.
These fixed IDs are resolved for every consumer of this script, so non-target jobs can silently test against stale/transient artifacts or fail when those artifacts are unavailable.
Suggested hardening
-# download CI artifacts
-LIBRMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 cpp conda)
-RMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 python conda --stable)
-LIBUCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 cpp conda)
-UCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 python conda --stable)
-LIBRAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 cpp conda)
-RAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 python conda --stable)
+if [[ "${RAPIDS_BUILD_TYPE:-}" == "pull-request" ]]; then
+ RMM_PR="${RMM_PR:-2361}"
+ UCXX_PR="${UCXX_PR:-636}"
+ RAFT_PR="${RAFT_PR:-2996}"
+
+ LIBRMM_CHANNEL=$(rapids-get-pr-artifact rmm "${RMM_PR}" cpp conda)
+ RMM_CHANNEL=$(rapids-get-pr-artifact rmm "${RMM_PR}" python conda --stable)
+ LIBUCXX_CHANNEL=$(rapids-get-pr-artifact ucxx "${UCXX_PR}" cpp conda)
+ UCXX_CHANNEL=$(rapids-get-pr-artifact ucxx "${UCXX_PR}" python conda --stable)
+ LIBRAFT_CHANNEL=$(rapids-get-pr-artifact raft "${RAFT_PR}" cpp conda)
+ RAFT_CHANNEL=$(rapids-get-pr-artifact raft "${RAFT_PR}" python conda --stable)
+else
+ RAPIDS_PREPENDED_CONDA_CHANNELS=()
+ RAPIDS_EXTRA_CONDA_CHANNEL_ARGS=()
+fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ci/use_conda_packages_from_prs.sh` around lines 6 - 11, The script hard-codes
PR artifact IDs in the calls that populate LIBRMM_CHANNEL, RMM_CHANNEL,
LIBUCXX_CHANNEL, UCXX_CHANNEL, LIBRAFT_CHANNEL and RAFT_CHANNEL; change these
invocations of rapids-get-pr-artifact to take PR IDs from environment variables
or script arguments (e.g. PR_RMM, PR_UCXX, PR_RAFT) instead of literal numbers,
and implement a clear fallback behavior: if a PR ID var is unset or empty, skip
requesting the PR artifact and fall back to the stable channel (or exit with a
clear error if that is preferred for your CI policy). Ensure the logic that
constructs those six variables checks the env vars before calling
rapids-get-pr-artifact and documents the new environment variable names.
| LIBRMM_WHEELHOUSE=$( | ||
| RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact rmm 2361 cpp wheel | ||
| ) | ||
| RMM_WHEELHOUSE=$( | ||
| rapids-get-pr-artifact rmm 2361 python wheel --stable | ||
| ) | ||
| LIBUCXX_WHEELHOUSE=$( | ||
| RAPIDS_PY_WHEEL_NAME="libucxx_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact ucxx 636 cpp wheel | ||
| ) | ||
| UCXX_WHEELHOUSE=$( | ||
| rapids-get-pr-artifact ucxx 636 python wheel --stable | ||
| ) | ||
| LIBRAFT_WHEELHOUSE=$( | ||
| RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact raft 2996 cpp wheel | ||
| ) | ||
| RAFT_WHEELHOUSE=$( | ||
| rapids-get-pr-artifact raft 2996 python wheel --stable --pkg_name pylibraft | ||
| ) |
There was a problem hiding this comment.
Do not pin shared wheel bootstrap to fixed external PR IDs.
This unconditionally couples all consuming jobs to specific external PR artifacts, which can expire or drift and destabilize unrelated CI runs.
Suggested hardening
-LIBRMM_WHEELHOUSE=$(
- RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact rmm 2361 cpp wheel
-)
+if [[ "${RAPIDS_BUILD_TYPE:-}" == "pull-request" ]]; then
+ RMM_PR="${RMM_PR:-2361}"
+ UCXX_PR="${UCXX_PR:-636}"
+ RAFT_PR="${RAFT_PR:-2996}"
+
+ LIBRMM_WHEELHOUSE=$(
+ RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact rmm "${RMM_PR}" cpp wheel
+ )
+ RMM_WHEELHOUSE=$(rapids-get-pr-artifact rmm "${RMM_PR}" python wheel --stable)
+ LIBUCXX_WHEELHOUSE=$(
+ RAPIDS_PY_WHEEL_NAME="libucxx_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact ucxx "${UCXX_PR}" cpp wheel
+ )
+ UCXX_WHEELHOUSE=$(rapids-get-pr-artifact ucxx "${UCXX_PR}" python wheel --stable)
+ LIBRAFT_WHEELHOUSE=$(
+ RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact raft "${RAFT_PR}" cpp wheel
+ )
+ RAFT_WHEELHOUSE=$(rapids-get-pr-artifact raft "${RAFT_PR}" python wheel --stable --pkg_name pylibraft)
+fi🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@ci/use_wheels_from_prs.sh` around lines 7 - 24, Replace the hardcoded
external PR IDs used in the rapids-get-pr-artifact invocations (the numeric IDs
in the calls that populate LIBRMM_WHEELHOUSE, RMM_WHEELHOUSE,
LIBUCXX_WHEELHOUSE, UCXX_WHEELHOUSE, LIBRAFT_WHEELHOUSE, RAFT_WHEELHOUSE) with
configurable inputs: read PR IDs from environment variables (e.g., RMM_PR_ID,
UCXX_PR_ID, RAFT_PR_ID) with sensible fallbacks (empty/none meaning "use
stable/latest" or resolve dynamically), and update the rapids-get-pr-artifact
calls to use those variables instead of literal numbers; also ensure the script
logs or skips gracefully if an artifact cannot be found so CI jobs aren’t
tightly coupled to fixed external PR artifacts.
| inline size_t get_device_memory_size() | ||
| { | ||
| // Otherwise, we need to get the free memory from the device | ||
| size_t free_mem, total_mem; | ||
| cudaMemGetInfo(&free_mem, &total_mem); | ||
|
|
||
| auto res = rmm::mr::get_current_device_resource(); | ||
| auto limiting_adaptor = | ||
| dynamic_cast<rmm::mr::limiting_resource_adaptor<rmm::mr::cuda_async_memory_resource>*>(res); | ||
| // Did we specifiy an explicit memory limit? | ||
| if (limiting_adaptor) { | ||
| printf("limiting_adaptor->get_allocation_limit(): %fMiB\n", | ||
| limiting_adaptor->get_allocation_limit() / (double)1e6); | ||
| printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6); | ||
| printf("free_mem: %fMiB\n", | ||
| (limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) / | ||
| (double)1e6); | ||
| return std::min(total_mem, limiting_adaptor->get_allocation_limit()); | ||
| } else { | ||
| return total_mem; | ||
| } | ||
| return total_mem; | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if limiting_resource_adaptor is still referenced in the codebase
rg -n "limiting_resource_adaptor" --type cpp --type cuRepository: NVIDIA/cuopt
Length of output: 82
🏁 Script executed:
#!/bin/bash
# Search for limiting_resource_adaptor with corrected syntax (no file type filter)
echo "=== Searching for limiting_resource_adaptor ==="
rg -n "limiting_resource_adaptor"
# Also search for CUDA error checking patterns in the utilities directory
echo -e "\n=== Checking error handling patterns in cuda_helpers.cuh ==="
head -50 cpp/src/utilities/cuda_helpers.cuh | grep -E "(CUDA_CHECK|RAFT_CUDA_TRY|try|catch)"
# Get the full cuda_helpers.cuh file around the function
echo -e "\n=== Full context of get_device_memory_size() ==="
cat -n cpp/src/utilities/cuda_helpers.cuh | sed -n '235,260p'
# Check problem.cu to see how this function is used
echo -e "\n=== Searching for get_device_memory_size usage ==="
rg -n "get_device_memory_size" -A 5 -B 2Repository: NVIDIA/cuopt
Length of output: 2138
🏁 Script executed:
#!/bin/bash
# Check for RAFT_CUDA_TRY and error handling imports in cuda_helpers.cuh
echo "=== Checking cuda_helpers.cuh header and error handling ==="
head -100 cpp/src/utilities/cuda_helpers.cuh
echo -e "\n=== Searching for RAFT_CUDA_TRY usage in the repository ==="
rg "RAFT_CUDA_TRY" | head -20
echo -e "\n=== Checking problem.cu context around size_factor computation ==="
sed -n '950,970p' cpp/src/mip_heuristics/problem/problem.cuRepository: NVIDIA/cuopt
Length of output: 5634
Add CUDA error checking to cudaMemGetInfo call.
The call to cudaMemGetInfo on line 244 lacks error checking. Per coding guidelines, all CUDA operations must include error checking (e.g., RAFT_CUDA_TRY). The required header <raft/util/cudart_utils.hpp> is already included.
Note: The removal of limiting_resource_adaptor logic was intentional (documented in RELEASE-NOTES.md). The simplified implementation is deliberate to maintain determinism—as documented in cpp/src/mip_heuristics/problem/problem.cu (line 956-957), the code avoids querying free memory or allocation try/catch mechanisms since GPU memory may be shared with other processes.
🔧 Proposed fix
inline size_t get_device_memory_size()
{
size_t free_mem, total_mem;
- cudaMemGetInfo(&free_mem, &total_mem);
+ RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem, &total_mem));
return total_mem;
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| inline size_t get_device_memory_size() | |
| { | |
| // Otherwise, we need to get the free memory from the device | |
| size_t free_mem, total_mem; | |
| cudaMemGetInfo(&free_mem, &total_mem); | |
| auto res = rmm::mr::get_current_device_resource(); | |
| auto limiting_adaptor = | |
| dynamic_cast<rmm::mr::limiting_resource_adaptor<rmm::mr::cuda_async_memory_resource>*>(res); | |
| // Did we specifiy an explicit memory limit? | |
| if (limiting_adaptor) { | |
| printf("limiting_adaptor->get_allocation_limit(): %fMiB\n", | |
| limiting_adaptor->get_allocation_limit() / (double)1e6); | |
| printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6); | |
| printf("free_mem: %fMiB\n", | |
| (limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) / | |
| (double)1e6); | |
| return std::min(total_mem, limiting_adaptor->get_allocation_limit()); | |
| } else { | |
| return total_mem; | |
| } | |
| return total_mem; | |
| } | |
| inline size_t get_device_memory_size() | |
| { | |
| size_t free_mem, total_mem; | |
| RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem, &total_mem)); | |
| return total_mem; | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/src/utilities/cuda_helpers.cuh` around lines 241 - 246, The
cudaMemGetInfo call in get_device_memory_size lacks CUDA error checking; wrap
the call with the RAFT_CUDA_TRY macro (or equivalent from
<raft/util/cudart_utils.hpp>) so any cudaMemGetInfo failure is
propagated/logged, e.g., call RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem,
&total_mem)); then return total_mem as before; update get_device_memory_size to
use RAFT_CUDA_TRY to satisfy the coding guideline and existing header utilities.
| inline auto make_cuda() { return rmm::mr::cuda_memory_resource(); } | ||
|
|
||
| inline auto make_async() { return std::make_shared<rmm::mr::cuda_async_memory_resource>(); } | ||
| inline auto make_async() { return rmm::mr::cuda_async_memory_resource(); } | ||
|
|
||
| inline auto make_managed() { return std::make_shared<rmm::mr::managed_memory_resource>(); } | ||
| inline auto make_managed() { return rmm::mr::managed_memory_resource(); } | ||
|
|
||
| inline auto make_pool() | ||
| { | ||
| // 1GB of initial pool size | ||
| const size_t initial_pool_size = 1024 * 1024 * 1024; | ||
| return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(make_async(), | ||
| initial_pool_size); | ||
| return rmm::mr::pool_memory_resource(make_async(), initial_pool_size); | ||
| } | ||
|
|
||
| inline auto make_binning() | ||
| { | ||
| auto pool = make_pool(); | ||
| // Add a fixed_size_memory_resource for bins of size 256, 512, 1024, 2048 and | ||
| // 4096KiB Larger allocations will use the pool resource | ||
| auto mr = rmm::mr::make_owning_wrapper<rmm::mr::binning_memory_resource>(pool, 18, 22); | ||
| return mr; | ||
| return rmm::mr::binning_memory_resource(pool, 18, 22); |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
RMM pool_memory_resource constructor upstream resource ownership 2025
💡 Result:
The rmm::mr::pool_memory_resource constructor takes a non-owning reference to the upstream memory resource and does not take ownership of it. The upstream is stored as a device_async_resource_ref, which is explicitly a non-owning reference (rmm::device_async_resource_ref is an alias for detail::cccl_async_resource_ref<cuda::mr::resource_refcuda::mr::device_accessible>). Constructors accept either device_async_resource_ref directly or a raw pointer (Upstream*) to the upstream resource, converting it via to_device_async_resource_ref_checked, which creates a non-owning ref. The destructor deallocates only the pool blocks allocated from the upstream using upstream_mr_.allocate/deallocate, but does not destroy the upstream itself. Official docs confirm: "Typically, a resource does not own its upstream" (rmm::mr::owning_wrapper docs). RAPIDS RMM discussions emphasize that adaptors like pool_memory_resource should not own upstreams to allow multiple adaptors sharing the same upstream.
Citations:
- 1: https://docs.rapids.ai/api/librmm/nightly/classrmm_1_1mr_1_1pool__memory__resource
- 2: https://docs.rapids.ai/api/librmm/stable/classrmm_1_1mr_1_1pool__memory__resource
- 3: Cython MR ownership and resource_refs rapidsai/rmm#1662
- 4: https://docs.rapids.ai/api/librmm/26.06/classrmm_1_1mr_1_1owning__wrapper
- 5: https://docs.rapids.ai/api/librmm/stable/pool__memory__resource_8hpp_source
- 6: https://docs.rapids.ai/api/librmm/26.04/pool__memory__resource_8hpp_source
- 7: https://docs.rapids.ai/api/librmm/stable/resource__ref_8hpp_source
- 8: https://docs.rapids.ai/api/librmm/stable/resource__ref_8hpp
Critical lifetime bug: dangling references in factory function chain
The original approval is incorrect. The web search confirms that pool_memory_resource stores a non-owning reference to its upstream memory resource and does not take ownership. This means:
- In
make_pool(): the temporary frommake_async()is destroyed immediately after returning, leaving the pool with a dangling reference to destroyed memory. - In
make_binning(): the localpoolvariable is destroyed when the function returns, leaving the binning resource with a dangling reference to the destroyed pool.
Both factory functions create chains of non-owning references to temporaries, which violates the lifetime requirements. The code will access dangling pointers when the returned resources are used.
The upstream memory resources must be owned and kept alive for the duration that downstream adaptors reference them. Consider either returning owning containers (like std::shared_ptr or any_resource wrapping the entire chain), or restructure to avoid the temporary-to-reference pattern.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@cpp/tests/utilities/base_fixture.hpp` around lines 29 - 47, make_pool() and
make_binning() currently return memory_resource adaptors that hold non-owning
references to temporaries (from make_async() and the local pool), causing
dangling references; change the factory functions to return owning wrappers that
keep upstream resources alive (e.g., std::shared_ptr or rmm::mr::any_resource)
and construct the pool/bins from those owned resources instead of
temporaries—update make_async()/make_cuda()/make_managed() if needed to return
owned resources or wrappers, then have make_pool() create and retain the async
resource in an owning container and return the pool as an owning object, and
have make_binning() accept/hold the pool owner (or build and return a single
owning wrapper representing the whole chain) so downstream users never reference
destroyed temporaries.
Summary
rmm::mr::device_memory_resourcebase class; resources now satisfy thecuda::mr::resourceconcept directlyshared_ptr<device_memory_resource>with value types andcuda::mr::any_resource<cuda::mr::device_accessible>for type-erased storageset_current_device_resource(ptr)/set_per_device_resource(id, ptr)withset_current_device_resource_ref/set_per_device_resource_refmake_owning_wrapperusage anddynamic_caston memory resources (no common base class)thrust/iterator/transform_output_iterator.hinclude (no longer transitively included via CCCL)Details
8 files changed, 24 insertions, 41 deletions. C++ build passes (all 62 targets). Cython build failures are pre-existing (missing
pylibraftdev headers, unrelated to this change).