Skip to content

Migrate RMM usage to CCCL MR design#1035

Open
bdice wants to merge 6 commits intoNVIDIA:mainfrom
bdice:rmm-cccl-migration
Open

Migrate RMM usage to CCCL MR design#1035
bdice wants to merge 6 commits intoNVIDIA:mainfrom
bdice:rmm-cccl-migration

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Apr 3, 2026

Summary

  • Remove dependency on rmm::mr::device_memory_resource base class; resources now satisfy the cuda::mr::resource concept directly
  • Replace shared_ptr<device_memory_resource> with value types and cuda::mr::any_resource<cuda::mr::device_accessible> for type-erased storage
  • Replace set_current_device_resource(ptr) / set_per_device_resource(id, ptr) with set_current_device_resource_ref / set_per_device_resource_ref
  • Remove make_owning_wrapper usage and dynamic_cast on memory resources (no common base class)
  • Add missing thrust/iterator/transform_output_iterator.h include (no longer transitively included via CCCL)

Details

8 files changed, 24 insertions, 41 deletions. C++ build passes (all 62 targets). Cython build failures are pre-existing (missing pylibraft dev headers, unrelated to this change).

Remove dependency on rmm::mr::device_memory_resource base class. Resources
now satisfy the cuda::mr::resource concept directly.

- Replace shared_ptr<device_memory_resource> with value types and
  cuda::mr::any_resource<cuda::mr::device_accessible> for type-erased storage
- Replace set_current_device_resource(ptr) with set_current_device_resource_ref
- Replace set_per_device_resource(id, ptr) with set_per_device_resource_ref
- Remove make_owning_wrapper usage
- Remove dynamic_cast on memory resources (no common base class)
- Remove owning_wrapper.hpp and device_memory_resource.hpp includes
- Add missing thrust/iterator/transform_output_iterator.h include
  (no longer transitively included via CCCL)
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot bot commented Apr 3, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@bdice bdice force-pushed the rmm-cccl-migration branch 4 times, most recently from a624829 to 8fe66e5 Compare April 17, 2026 07:09
@bdice bdice added breaking Introduces a breaking change Feature improvement Improves an existing functionality labels Apr 17, 2026
@bdice bdice force-pushed the rmm-cccl-migration branch 5 times, most recently from 3c4bbbe to 717e877 Compare April 17, 2026 21:07
@bdice bdice force-pushed the rmm-cccl-migration branch from 717e877 to b42797f Compare April 18, 2026 15:52
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Apr 19, 2026

/ok to test

@bdice bdice marked this pull request as ready for review April 19, 2026 07:12
@bdice bdice requested review from a team as code owners April 19, 2026 07:12
Comment on lines +250 to +251
mem_handler.device_alloc = cudss_device_alloc<void>;
mem_handler.device_free = cudss_device_dealloc<void>;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem like this template parameter is being used. The rmm::mr::device_memory_resource class has been removed, but this change in an unused template parameter should have no effect.

} else {
return total_mem;
}
return total_mem;
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot currently perform this kind of dynamic cast with the CCCL memory resource design. There is a CCCL feature that we can adopt soon to re-enable this functionality, which I expect to be available for 26.06. Depending on how the migration goes, I may be able to restore this as soon as next week. I have marked this as a TODO on my side.

Suggested change
return total_mem;
// TODO (bdice): Restore limiting adaptor check
return total_mem;

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 19, 2026

📝 Walkthrough

Walkthrough

This pull request updates CI scripts to source new helper scripts that manage PR-specific conda packages and wheels, and refactors C++ GPU memory resource handling from shared pointer-based to direct object instances. Two new helper scripts are introduced for artifact management.

Changes

Cohort / File(s) Summary
CI: Conda Package Management
ci/build_cpp.sh, ci/build_docs.sh, ci/build_python.sh, ci/test_cpp.sh, ci/test_cpp_memcheck.sh, ci/test_notebooks.sh, ci/test_python.sh, ci/test_skills_assets.sh
Scripts now source ./ci/use_conda_packages_from_prs.sh and pass RAPIDS_EXTRA_CONDA_CHANNEL_ARGS to rapids-dependency-file-generator to include PR-derived conda channels. Copyright year ranges updated to 2026 in relevant files.
CI: Wheel Package Management
ci/build_wheel_cuopt.sh, ci/build_wheel_cuopt_mps_parser.sh, ci/build_wheel_cuopt_server.sh, ci/build_wheel_cuopt_sh_client.sh, ci/build_wheel_libcuopt.sh, ci/test_self_hosted_service.sh, ci/test_wheel_cuopt.sh, ci/test_wheel_cuopt_server.sh
Scripts now source ./ci/use_wheels_from_prs.sh after rapids pip initialization to incorporate PR-specific wheel artifacts.
CI: New Helper Scripts
ci/use_conda_packages_from_prs.sh, ci/use_wheels_from_prs.sh
New scripts that retrieve PR artifacts for RAPIDS components (rmm, ucxx, raft) using rapids-get-pr-artifact and configure conda channels and wheel constraints respectively.
C++: Memory Resource Refactoring
cpp/cuopt_cli.cpp, cpp/tests/mip/load_balancing_test.cu, cpp/tests/mip/multi_probe_test.cu
Changed GPU/async memory resource handling from std::shared_ptr<rmm::mr::cuda_async_memory_resource> to direct rmm::mr::cuda_async_memory_resource instances passed directly to resource setup functions.
C++: Memory Handler & Utilities
cpp/src/barrier/sparse_cholesky.cuh, cpp/src/utilities/cuda_helpers.cuh
Updated cuDSS memory handler callbacks and removed RMM resource inspection for allocation limits. Simplified get_device_memory_size() to return only device total memory.
C++: Test Fixture & Includes
cpp/tests/utilities/base_fixture.hpp, cpp/src/pdlp/termination_strategy/infeasibility_information.cu, cpp/src/routing/ges_solver.cu
Refactored memory resource factory functions to return cuda::mr::any_resource<cuda::mr::device_accessible> instead of shared pointers. Added thrust iterator include and removed unused RMM header.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~35 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 23.08% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Migrate RMM usage to CCCL MR design' clearly and concisely describes the main change: migrating from RMM's memory resource design to CCCL's memory resource design.
Description check ✅ Passed The description is directly related to the changeset, detailing the specific migration changes including removal of rmm::mr::device_memory_resource dependency, replacement of shared_ptr with value types, and API changes.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🧹 Nitpick comments (2)
cpp/cuopt_cli.cpp (1)

69-72: Dead code: make_async() is now unused.

The make_async() helper function at line 71 is no longer called anywhere in this file after the migration to value-type memory resources. Consider removing it.

🧹 Proposed removal
-/**
- * `@brief` Make an async memory resource for RMM
- * `@return` std::shared_ptr<rmm::mr::cuda_async_memory_resource>
- */
-inline auto make_async() { return std::make_shared<rmm::mr::cuda_async_memory_resource>(); }
-
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/cuopt_cli.cpp` around lines 69 - 72, The helper function make_async() is
now unused and should be removed: delete the inline auto make_async() { return
std::make_shared<rmm::mr::cuda_async_memory_resource>(); }
declaration/definition to eliminate dead code and any associated includes or
comments that only served it; ensure no remaining references to make_async exist
and run a build to confirm no usages remain.
cpp/src/pdlp/termination_strategy/infeasibility_information.cu (1)

18-19: Remove the duplicate Thrust include.

Line 18 adds <thrust/iterator/transform_output_iterator.h>, which is already included at Line 32. Keeping only one include improves clarity.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/pdlp/termination_strategy/infeasibility_information.cu` around lines
18 - 19, Remove the duplicate Thrust include by deleting the redundant `#include`
for <thrust/iterator/transform_output_iterator.h> so only the single existing
include remains; locate the duplicate include line (the
<thrust/iterator/transform_output_iterator.h> directive) in
infeasibility_information.cu and remove it, leaving the original include at the
later occurrence intact.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@ci/build_wheel_libcuopt.sh`:
- Around line 8-9: The script currently sources ci/use_wheels_from_prs.sh which
appends to PIP_CONSTRAINT, but later reassigns PIP_CONSTRAINT to
/tmp/constraints.txt, dropping those entries; update the script so the
PR-sourced constraints are merged into the final /tmp/constraints.txt instead of
being overwritten—e.g., after sourcing ci/use_wheels_from_prs.sh, if
PIP_CONSTRAINT contains extra constraints, append or merge them into the file
referenced by PIP_CONSTRAINT (/tmp/constraints.txt) or change the later
assignment to preserve existing values; refer to the PIP_CONSTRAINT variable and
the ci/use_wheels_from_prs.sh source to locate where to merge rather than
overwrite.

In `@ci/test_wheel_cuopt.sh`:
- Line 11: The script sources wheel pins from use_wheels_from_prs.sh into
${PIP_CONSTRAINT} but then clobbers them when recreating ${PIP_CONSTRAINT};
update the creation of ${PIP_CONSTRAINT} so it preserves existing pins (from
use_wheels_from_prs.sh) instead of overwriting—e.g., append the new constraints
to ${PIP_CONSTRAINT} (or merge existing content) rather than using a destructive
write; key symbols: use_wheels_from_prs.sh and the ${PIP_CONSTRAINT} write
(currently using cat >) should be changed to an append/merge approach (cat >> or
equivalent).

In `@ci/use_conda_packages_from_prs.sh`:
- Around line 6-11: The script hard-codes PR artifact IDs in the calls that
populate LIBRMM_CHANNEL, RMM_CHANNEL, LIBUCXX_CHANNEL, UCXX_CHANNEL,
LIBRAFT_CHANNEL and RAFT_CHANNEL; change these invocations of
rapids-get-pr-artifact to take PR IDs from environment variables or script
arguments (e.g. PR_RMM, PR_UCXX, PR_RAFT) instead of literal numbers, and
implement a clear fallback behavior: if a PR ID var is unset or empty, skip
requesting the PR artifact and fall back to the stable channel (or exit with a
clear error if that is preferred for your CI policy). Ensure the logic that
constructs those six variables checks the env vars before calling
rapids-get-pr-artifact and documents the new environment variable names.

In `@ci/use_wheels_from_prs.sh`:
- Around line 7-24: Replace the hardcoded external PR IDs used in the
rapids-get-pr-artifact invocations (the numeric IDs in the calls that populate
LIBRMM_WHEELHOUSE, RMM_WHEELHOUSE, LIBUCXX_WHEELHOUSE, UCXX_WHEELHOUSE,
LIBRAFT_WHEELHOUSE, RAFT_WHEELHOUSE) with configurable inputs: read PR IDs from
environment variables (e.g., RMM_PR_ID, UCXX_PR_ID, RAFT_PR_ID) with sensible
fallbacks (empty/none meaning "use stable/latest" or resolve dynamically), and
update the rapids-get-pr-artifact calls to use those variables instead of
literal numbers; also ensure the script logs or skips gracefully if an artifact
cannot be found so CI jobs aren’t tightly coupled to fixed external PR
artifacts.

In `@cpp/src/utilities/cuda_helpers.cuh`:
- Around line 241-246: The cudaMemGetInfo call in get_device_memory_size lacks
CUDA error checking; wrap the call with the RAFT_CUDA_TRY macro (or equivalent
from <raft/util/cudart_utils.hpp>) so any cudaMemGetInfo failure is
propagated/logged, e.g., call RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem,
&total_mem)); then return total_mem as before; update get_device_memory_size to
use RAFT_CUDA_TRY to satisfy the coding guideline and existing header utilities.

In `@cpp/tests/utilities/base_fixture.hpp`:
- Around line 29-47: make_pool() and make_binning() currently return
memory_resource adaptors that hold non-owning references to temporaries (from
make_async() and the local pool), causing dangling references; change the
factory functions to return owning wrappers that keep upstream resources alive
(e.g., std::shared_ptr or rmm::mr::any_resource) and construct the pool/bins
from those owned resources instead of temporaries—update
make_async()/make_cuda()/make_managed() if needed to return owned resources or
wrappers, then have make_pool() create and retain the async resource in an
owning container and return the pool as an owning object, and have
make_binning() accept/hold the pool owner (or build and return a single owning
wrapper representing the whole chain) so downstream users never reference
destroyed temporaries.

---

Nitpick comments:
In `@cpp/cuopt_cli.cpp`:
- Around line 69-72: The helper function make_async() is now unused and should
be removed: delete the inline auto make_async() { return
std::make_shared<rmm::mr::cuda_async_memory_resource>(); }
declaration/definition to eliminate dead code and any associated includes or
comments that only served it; ensure no remaining references to make_async exist
and run a build to confirm no usages remain.

In `@cpp/src/pdlp/termination_strategy/infeasibility_information.cu`:
- Around line 18-19: Remove the duplicate Thrust include by deleting the
redundant `#include` for <thrust/iterator/transform_output_iterator.h> so only the
single existing include remains; locate the duplicate include line (the
<thrust/iterator/transform_output_iterator.h> directive) in
infeasibility_information.cu and remove it, leaving the original include at the
later occurrence intact.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 466802a0-542e-4646-b49c-c1f6235231c7

📥 Commits

Reviewing files that changed from the base of the PR and between 6b0d8a4 and b42797f.

📒 Files selected for processing (26)
  • ci/build_cpp.sh
  • ci/build_docs.sh
  • ci/build_python.sh
  • ci/build_wheel_cuopt.sh
  • ci/build_wheel_cuopt_mps_parser.sh
  • ci/build_wheel_cuopt_server.sh
  • ci/build_wheel_cuopt_sh_client.sh
  • ci/build_wheel_libcuopt.sh
  • ci/test_cpp.sh
  • ci/test_cpp_memcheck.sh
  • ci/test_notebooks.sh
  • ci/test_python.sh
  • ci/test_self_hosted_service.sh
  • ci/test_skills_assets.sh
  • ci/test_wheel_cuopt.sh
  • ci/test_wheel_cuopt_server.sh
  • ci/use_conda_packages_from_prs.sh
  • ci/use_wheels_from_prs.sh
  • cpp/cuopt_cli.cpp
  • cpp/src/barrier/sparse_cholesky.cuh
  • cpp/src/pdlp/termination_strategy/infeasibility_information.cu
  • cpp/src/routing/ges_solver.cu
  • cpp/src/utilities/cuda_helpers.cuh
  • cpp/tests/mip/load_balancing_test.cu
  • cpp/tests/mip/multi_probe_test.cu
  • cpp/tests/utilities/base_fixture.hpp
💤 Files with no reviewable changes (1)
  • cpp/src/routing/ges_solver.cu

Comment on lines +8 to +9
source ./ci/use_wheels_from_prs.sh

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

PR wheel constraints are likely dropped later due to PIP_CONSTRAINT reassignment.

Line 8 sources ci/use_wheels_from_prs.sh, which appends constraints to the current PIP_CONSTRAINT. But Line 48 later points PIP_CONSTRAINT to /tmp/constraints.txt, so those earlier entries are no longer used during wheel build.

Suggested fix
 CUOPT_MPS_PARSER_WHEELHOUSE=$(RAPIDS_PY_WHEEL_NAME="cuopt_mps_parser" rapids-download-wheels-from-github python)
-echo "cuopt-mps-parser @ file://$(echo ${CUOPT_MPS_PARSER_WHEELHOUSE}/cuopt_mps_parser*.whl)" >> /tmp/constraints.txt
-export PIP_CONSTRAINT="/tmp/constraints.txt"
+echo "cuopt-mps-parser @ file://$(echo "${CUOPT_MPS_PARSER_WHEELHOUSE}"/cuopt_mps_parser*.whl)" >> "${PIP_CONSTRAINT}"
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 8-8: Not following: ./ci/use_wheels_from_prs.sh was not specified as input (see shellcheck -x).

(SC1091)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/build_wheel_libcuopt.sh` around lines 8 - 9, The script currently sources
ci/use_wheels_from_prs.sh which appends to PIP_CONSTRAINT, but later reassigns
PIP_CONSTRAINT to /tmp/constraints.txt, dropping those entries; update the
script so the PR-sourced constraints are merged into the final
/tmp/constraints.txt instead of being overwritten—e.g., after sourcing
ci/use_wheels_from_prs.sh, if PIP_CONSTRAINT contains extra constraints, append
or merge them into the file referenced by PIP_CONSTRAINT (/tmp/constraints.txt)
or change the later assignment to preserve existing values; refer to the
PIP_CONSTRAINT variable and the ci/use_wheels_from_prs.sh source to locate where
to merge rather than overwrite.

Comment thread ci/test_wheel_cuopt.sh
# sets up a constraints file for 'pip' and puts its location in an exported variable PIP_EXPORT,
# so those constraints will affect all future 'pip install' calls
source rapids-init-pip
source ./ci/use_wheels_from_prs.sh
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

The sourced PR wheel constraints are overwritten later in this script.

Line 11 appends wheel pins via use_wheels_from_prs.sh, but Line 22 recreates ${PIP_CONSTRAINT} with cat >, so those pins are lost before install.

Minimal fix
-cat > "${PIP_CONSTRAINT}" <<EOF
+cat >> "${PIP_CONSTRAINT}" <<EOF
🧰 Tools
🪛 Shellcheck (0.11.0)

[info] 11-11: Not following: ./ci/use_wheels_from_prs.sh was not specified as input (see shellcheck -x).

(SC1091)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/test_wheel_cuopt.sh` at line 11, The script sources wheel pins from
use_wheels_from_prs.sh into ${PIP_CONSTRAINT} but then clobbers them when
recreating ${PIP_CONSTRAINT}; update the creation of ${PIP_CONSTRAINT} so it
preserves existing pins (from use_wheels_from_prs.sh) instead of
overwriting—e.g., append the new constraints to ${PIP_CONSTRAINT} (or merge
existing content) rather than using a destructive write; key symbols:
use_wheels_from_prs.sh and the ${PIP_CONSTRAINT} write (currently using cat >)
should be changed to an append/merge approach (cat >> or equivalent).

Comment on lines +6 to +11
LIBRMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 cpp conda)
RMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 python conda --stable)
LIBUCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 cpp conda)
UCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 python conda --stable)
LIBRAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 cpp conda)
RAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 python conda --stable)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid hard-coding PR artifact IDs in shared CI setup.

These fixed IDs are resolved for every consumer of this script, so non-target jobs can silently test against stale/transient artifacts or fail when those artifacts are unavailable.

Suggested hardening
-# download CI artifacts
-LIBRMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 cpp conda)
-RMM_CHANNEL=$(rapids-get-pr-artifact rmm 2361 python conda --stable)
-LIBUCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 cpp conda)
-UCXX_CHANNEL=$(rapids-get-pr-artifact ucxx 636 python conda --stable)
-LIBRAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 cpp conda)
-RAFT_CHANNEL=$(rapids-get-pr-artifact raft 2996 python conda --stable)
+if [[ "${RAPIDS_BUILD_TYPE:-}" == "pull-request" ]]; then
+  RMM_PR="${RMM_PR:-2361}"
+  UCXX_PR="${UCXX_PR:-636}"
+  RAFT_PR="${RAFT_PR:-2996}"
+
+  LIBRMM_CHANNEL=$(rapids-get-pr-artifact rmm "${RMM_PR}" cpp conda)
+  RMM_CHANNEL=$(rapids-get-pr-artifact rmm "${RMM_PR}" python conda --stable)
+  LIBUCXX_CHANNEL=$(rapids-get-pr-artifact ucxx "${UCXX_PR}" cpp conda)
+  UCXX_CHANNEL=$(rapids-get-pr-artifact ucxx "${UCXX_PR}" python conda --stable)
+  LIBRAFT_CHANNEL=$(rapids-get-pr-artifact raft "${RAFT_PR}" cpp conda)
+  RAFT_CHANNEL=$(rapids-get-pr-artifact raft "${RAFT_PR}" python conda --stable)
+else
+  RAPIDS_PREPENDED_CONDA_CHANNELS=()
+  RAPIDS_EXTRA_CONDA_CHANNEL_ARGS=()
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/use_conda_packages_from_prs.sh` around lines 6 - 11, The script hard-codes
PR artifact IDs in the calls that populate LIBRMM_CHANNEL, RMM_CHANNEL,
LIBUCXX_CHANNEL, UCXX_CHANNEL, LIBRAFT_CHANNEL and RAFT_CHANNEL; change these
invocations of rapids-get-pr-artifact to take PR IDs from environment variables
or script arguments (e.g. PR_RMM, PR_UCXX, PR_RAFT) instead of literal numbers,
and implement a clear fallback behavior: if a PR ID var is unset or empty, skip
requesting the PR artifact and fall back to the stable channel (or exit with a
clear error if that is preferred for your CI policy). Ensure the logic that
constructs those six variables checks the env vars before calling
rapids-get-pr-artifact and documents the new environment variable names.

Comment thread ci/use_wheels_from_prs.sh
Comment on lines +7 to +24
LIBRMM_WHEELHOUSE=$(
RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact rmm 2361 cpp wheel
)
RMM_WHEELHOUSE=$(
rapids-get-pr-artifact rmm 2361 python wheel --stable
)
LIBUCXX_WHEELHOUSE=$(
RAPIDS_PY_WHEEL_NAME="libucxx_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact ucxx 636 cpp wheel
)
UCXX_WHEELHOUSE=$(
rapids-get-pr-artifact ucxx 636 python wheel --stable
)
LIBRAFT_WHEELHOUSE=$(
RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact raft 2996 cpp wheel
)
RAFT_WHEELHOUSE=$(
rapids-get-pr-artifact raft 2996 python wheel --stable --pkg_name pylibraft
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Do not pin shared wheel bootstrap to fixed external PR IDs.

This unconditionally couples all consuming jobs to specific external PR artifacts, which can expire or drift and destabilize unrelated CI runs.

Suggested hardening
-LIBRMM_WHEELHOUSE=$(
-  RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact rmm 2361 cpp wheel
-)
+if [[ "${RAPIDS_BUILD_TYPE:-}" == "pull-request" ]]; then
+  RMM_PR="${RMM_PR:-2361}"
+  UCXX_PR="${UCXX_PR:-636}"
+  RAFT_PR="${RAFT_PR:-2996}"
+
+  LIBRMM_WHEELHOUSE=$(
+    RAPIDS_PY_WHEEL_NAME="librmm_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact rmm "${RMM_PR}" cpp wheel
+  )
+  RMM_WHEELHOUSE=$(rapids-get-pr-artifact rmm "${RMM_PR}" python wheel --stable)
+  LIBUCXX_WHEELHOUSE=$(
+    RAPIDS_PY_WHEEL_NAME="libucxx_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact ucxx "${UCXX_PR}" cpp wheel
+  )
+  UCXX_WHEELHOUSE=$(rapids-get-pr-artifact ucxx "${UCXX_PR}" python wheel --stable)
+  LIBRAFT_WHEELHOUSE=$(
+    RAPIDS_PY_WHEEL_NAME="libraft_${RAPIDS_PY_CUDA_SUFFIX}" rapids-get-pr-artifact raft "${RAFT_PR}" cpp wheel
+  )
+  RAFT_WHEELHOUSE=$(rapids-get-pr-artifact raft "${RAFT_PR}" python wheel --stable --pkg_name pylibraft)
+fi
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@ci/use_wheels_from_prs.sh` around lines 7 - 24, Replace the hardcoded
external PR IDs used in the rapids-get-pr-artifact invocations (the numeric IDs
in the calls that populate LIBRMM_WHEELHOUSE, RMM_WHEELHOUSE,
LIBUCXX_WHEELHOUSE, UCXX_WHEELHOUSE, LIBRAFT_WHEELHOUSE, RAFT_WHEELHOUSE) with
configurable inputs: read PR IDs from environment variables (e.g., RMM_PR_ID,
UCXX_PR_ID, RAFT_PR_ID) with sensible fallbacks (empty/none meaning "use
stable/latest" or resolve dynamically), and update the rapids-get-pr-artifact
calls to use those variables instead of literal numbers; also ensure the script
logs or skips gracefully if an artifact cannot be found so CI jobs aren’t
tightly coupled to fixed external PR artifacts.

Comment on lines 241 to 246
inline size_t get_device_memory_size()
{
// Otherwise, we need to get the free memory from the device
size_t free_mem, total_mem;
cudaMemGetInfo(&free_mem, &total_mem);

auto res = rmm::mr::get_current_device_resource();
auto limiting_adaptor =
dynamic_cast<rmm::mr::limiting_resource_adaptor<rmm::mr::cuda_async_memory_resource>*>(res);
// Did we specifiy an explicit memory limit?
if (limiting_adaptor) {
printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
limiting_adaptor->get_allocation_limit() / (double)1e6);
printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6);
printf("free_mem: %fMiB\n",
(limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
(double)1e6);
return std::min(total_mem, limiting_adaptor->get_allocation_limit());
} else {
return total_mem;
}
return total_mem;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if limiting_resource_adaptor is still referenced in the codebase
rg -n "limiting_resource_adaptor" --type cpp --type cu

Repository: NVIDIA/cuopt

Length of output: 82


🏁 Script executed:

#!/bin/bash
# Search for limiting_resource_adaptor with corrected syntax (no file type filter)
echo "=== Searching for limiting_resource_adaptor ==="
rg -n "limiting_resource_adaptor" 

# Also search for CUDA error checking patterns in the utilities directory
echo -e "\n=== Checking error handling patterns in cuda_helpers.cuh ==="
head -50 cpp/src/utilities/cuda_helpers.cuh | grep -E "(CUDA_CHECK|RAFT_CUDA_TRY|try|catch)"

# Get the full cuda_helpers.cuh file around the function
echo -e "\n=== Full context of get_device_memory_size() ==="
cat -n cpp/src/utilities/cuda_helpers.cuh | sed -n '235,260p'

# Check problem.cu to see how this function is used
echo -e "\n=== Searching for get_device_memory_size usage ==="
rg -n "get_device_memory_size" -A 5 -B 2

Repository: NVIDIA/cuopt

Length of output: 2138


🏁 Script executed:

#!/bin/bash
# Check for RAFT_CUDA_TRY and error handling imports in cuda_helpers.cuh
echo "=== Checking cuda_helpers.cuh header and error handling ==="
head -100 cpp/src/utilities/cuda_helpers.cuh

echo -e "\n=== Searching for RAFT_CUDA_TRY usage in the repository ==="
rg "RAFT_CUDA_TRY" | head -20

echo -e "\n=== Checking problem.cu context around size_factor computation ==="
sed -n '950,970p' cpp/src/mip_heuristics/problem/problem.cu

Repository: NVIDIA/cuopt

Length of output: 5634


Add CUDA error checking to cudaMemGetInfo call.

The call to cudaMemGetInfo on line 244 lacks error checking. Per coding guidelines, all CUDA operations must include error checking (e.g., RAFT_CUDA_TRY). The required header <raft/util/cudart_utils.hpp> is already included.

Note: The removal of limiting_resource_adaptor logic was intentional (documented in RELEASE-NOTES.md). The simplified implementation is deliberate to maintain determinism—as documented in cpp/src/mip_heuristics/problem/problem.cu (line 956-957), the code avoids querying free memory or allocation try/catch mechanisms since GPU memory may be shared with other processes.

🔧 Proposed fix
 inline size_t get_device_memory_size()
 {
   size_t free_mem, total_mem;
-  cudaMemGetInfo(&free_mem, &total_mem);
+  RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem, &total_mem));
   return total_mem;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
inline size_t get_device_memory_size()
{
// Otherwise, we need to get the free memory from the device
size_t free_mem, total_mem;
cudaMemGetInfo(&free_mem, &total_mem);
auto res = rmm::mr::get_current_device_resource();
auto limiting_adaptor =
dynamic_cast<rmm::mr::limiting_resource_adaptor<rmm::mr::cuda_async_memory_resource>*>(res);
// Did we specifiy an explicit memory limit?
if (limiting_adaptor) {
printf("limiting_adaptor->get_allocation_limit(): %fMiB\n",
limiting_adaptor->get_allocation_limit() / (double)1e6);
printf("used_mem: %fMiB\n", limiting_adaptor->get_allocated_bytes() / (double)1e6);
printf("free_mem: %fMiB\n",
(limiting_adaptor->get_allocation_limit() - limiting_adaptor->get_allocated_bytes()) /
(double)1e6);
return std::min(total_mem, limiting_adaptor->get_allocation_limit());
} else {
return total_mem;
}
return total_mem;
}
inline size_t get_device_memory_size()
{
size_t free_mem, total_mem;
RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem, &total_mem));
return total_mem;
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/utilities/cuda_helpers.cuh` around lines 241 - 246, The
cudaMemGetInfo call in get_device_memory_size lacks CUDA error checking; wrap
the call with the RAFT_CUDA_TRY macro (or equivalent from
<raft/util/cudart_utils.hpp>) so any cudaMemGetInfo failure is
propagated/logged, e.g., call RAFT_CUDA_TRY(cudaMemGetInfo(&free_mem,
&total_mem)); then return total_mem as before; update get_device_memory_size to
use RAFT_CUDA_TRY to satisfy the coding guideline and existing header utilities.

Comment on lines +29 to +47
inline auto make_cuda() { return rmm::mr::cuda_memory_resource(); }

inline auto make_async() { return std::make_shared<rmm::mr::cuda_async_memory_resource>(); }
inline auto make_async() { return rmm::mr::cuda_async_memory_resource(); }

inline auto make_managed() { return std::make_shared<rmm::mr::managed_memory_resource>(); }
inline auto make_managed() { return rmm::mr::managed_memory_resource(); }

inline auto make_pool()
{
// 1GB of initial pool size
const size_t initial_pool_size = 1024 * 1024 * 1024;
return rmm::mr::make_owning_wrapper<rmm::mr::pool_memory_resource>(make_async(),
initial_pool_size);
return rmm::mr::pool_memory_resource(make_async(), initial_pool_size);
}

inline auto make_binning()
{
auto pool = make_pool();
// Add a fixed_size_memory_resource for bins of size 256, 512, 1024, 2048 and
// 4096KiB Larger allocations will use the pool resource
auto mr = rmm::mr::make_owning_wrapper<rmm::mr::binning_memory_resource>(pool, 18, 22);
return mr;
return rmm::mr::binning_memory_resource(pool, 18, 22);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

RMM pool_memory_resource constructor upstream resource ownership 2025

💡 Result:

The rmm::mr::pool_memory_resource constructor takes a non-owning reference to the upstream memory resource and does not take ownership of it. The upstream is stored as a device_async_resource_ref, which is explicitly a non-owning reference (rmm::device_async_resource_ref is an alias for detail::cccl_async_resource_ref<cuda::mr::resource_refcuda::mr::device_accessible>). Constructors accept either device_async_resource_ref directly or a raw pointer (Upstream*) to the upstream resource, converting it via to_device_async_resource_ref_checked, which creates a non-owning ref. The destructor deallocates only the pool blocks allocated from the upstream using upstream_mr_.allocate/deallocate, but does not destroy the upstream itself. Official docs confirm: "Typically, a resource does not own its upstream" (rmm::mr::owning_wrapper docs). RAPIDS RMM discussions emphasize that adaptors like pool_memory_resource should not own upstreams to allow multiple adaptors sharing the same upstream.

Citations:


Critical lifetime bug: dangling references in factory function chain

The original approval is incorrect. The web search confirms that pool_memory_resource stores a non-owning reference to its upstream memory resource and does not take ownership. This means:

  • In make_pool(): the temporary from make_async() is destroyed immediately after returning, leaving the pool with a dangling reference to destroyed memory.
  • In make_binning(): the local pool variable is destroyed when the function returns, leaving the binning resource with a dangling reference to the destroyed pool.

Both factory functions create chains of non-owning references to temporaries, which violates the lifetime requirements. The code will access dangling pointers when the returned resources are used.

The upstream memory resources must be owned and kept alive for the duration that downstream adaptors reference them. Consider either returning owning containers (like std::shared_ptr or any_resource wrapping the entire chain), or restructure to avoid the temporary-to-reference pattern.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/tests/utilities/base_fixture.hpp` around lines 29 - 47, make_pool() and
make_binning() currently return memory_resource adaptors that hold non-owning
references to temporaries (from make_async() and the local pool), causing
dangling references; change the factory functions to return owning wrappers that
keep upstream resources alive (e.g., std::shared_ptr or rmm::mr::any_resource)
and construct the pool/bins from those owned resources instead of
temporaries—update make_async()/make_cuda()/make_managed() if needed to return
owned resources or wrappers, then have make_pool() create and retain the async
resource in an owning container and return the pool as an owning object, and
have make_binning() accept/hold the pool owner (or build and return a single
owning wrapper representing the whole chain) so downstream users never reference
destroyed temporaries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking Introduces a breaking change Feature improvement Improves an existing functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants