Skip to content

Switch to public RAFT numpy_serialize APIs#2038

Open
julianmi wants to merge 2 commits intorapidsai:mainfrom
julianmi:raft-public-npy-helpers
Open

Switch to public RAFT numpy_serialize APIs#2038
julianmi wants to merge 2 commits intorapidsai:mainfrom
julianmi:raft-public-npy-helpers

Conversation

@julianmi
Copy link
Copy Markdown
Contributor

cuVS uses parse_descr, read_header, get_numpy_dtype, write_header from raft::detail::numpy_serializer as discussed here. This PR proposes to expose them publicly in RAFT instead and use them throughout cuVS.

There are similar issues with the raft/core/detail/macros.hpp to be address in a separate PR.

Depends on rapidsai/raft#3003

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

📝 Walkthrough

Summary by CodeRabbit

  • Refactor

    • Improved serialization and deserialization logic to use stable public APIs instead of internal implementation details for enhanced long-term compatibility.
  • Chores

    • Updated RAFT dependency configuration and copyright year ranges.

Walkthrough

Multiple C++ source and header files are updated to use the public raft::numpy_serializer API instead of the internal raft::detail::numpy_serializer. The RAFT dependency fork and pinned tag are modified in CMake configuration. Copyright year headers are updated in some files.

Changes

Cohort / File(s) Summary
C Deserialization Implementations
c/src/neighbors/brute_force.cpp, c/src/neighbors/cagra.cpp, c/src/neighbors/ivf_flat.cpp, c/src/neighbors/mg_cagra.cpp, c/src/neighbors/mg_ivf_flat.cpp, c/src/neighbors/mg_ivf_pq.cpp
Updated deserialization functions to call raft::numpy_serializer::parse_descr instead of internal raft::detail::numpy_serializer::parse_descr; added raft/core/numpy_serializer.hpp includes; SPDX copyright year ranges updated in multi-GPU variants.
C++ Serialization Implementations
cpp/src/neighbors/brute_force_serialize.cu, cpp/src/neighbors/detail/cagra/cagra_serialize.cuh, cpp/src/neighbors/ivf_flat/ivf_flat_serialize.cuh, cpp/src/neighbors/mg/snmg.cuh
Updated serialization functions to use raft::numpy_serializer::get_numpy_dtype<T>() instead of internal raft::detail::numpy_serializer::get_numpy_dtype<T>(); header includes and SPDX ranges updated.
C++ Load Implementation
cpp/src/neighbors/detail/cagra/cagra_build.cuh
Updated ACE disk-load logic to read NumPy headers via public raft::numpy_serializer::read_header instead of internal API; added corresponding header include.
C++ Public Headers
cpp/include/cuvs/neighbors/cagra.hpp, cpp/include/cuvs/util/file_io.hpp
Updated disk-backed load paths and NumPy file creation to use public raft::numpy_serializer namespace instead of internal detail variant; added required header includes.
HNSW Implementation
cpp/src/neighbors/detail/hnsw.hpp
Updated HNSW serialization to call public raft::numpy_serializer::read_header for graph, dataset, and mapping files instead of internal mdspan serializer; adjusted includes accordingly.
Build Configuration
cpp/cmake/thirdparty/get_raft.cmake
Changed RAFT_FORK from rapidsai to julianmi and pinned RAFT_PINNED_TAG to expose-public-npy-helpers; SPDX copyright year updated to 2026.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: switching from internal to public RAFT numpy_serializer APIs across multiple cuVS files.
Description check ✅ Passed The description clearly explains the rationale and scope of the changes, including context and dependencies.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@c/src/neighbors/brute_force.cpp`:
- Around line 242-244: The code reads 4 bytes into dtype_string using is.read
and immediately calls raft::numpy_serializer::parse_descr on those bytes;
however it doesn't check whether the read succeeded. Update the block around
dtype_string/is.read to verify the stream read (e.g., check is.gcount() == 4 or
is.fail()/is.good()) before calling raft::numpy_serializer::parse_descr, and on
failure handle the truncated header by returning an error/throwing an exception
or logging and aborting the parse so parse_descr never receives uninitialized
data.

In `@c/src/neighbors/cagra.cpp`:
- Around line 877-880: The code reads 4 bytes into dtype_string then calls
raft::numpy_serializer::parse_descr without validating the read; first check the
read succeeded (e.g., verify is.read(...) and that is.gcount() == 4 or that the
stream is in a good state) before constructing std::string(dtype_string, 4) and
calling raft::numpy_serializer::parse_descr; if the read fails/returns fewer
than 4 bytes, handle the error (throw, return an error code, or log and abort)
instead of passing potentially truncated data to parse_descr.

In `@c/src/neighbors/ivf_flat.cpp`:
- Around line 303-306: The code reads four bytes into dtype_string then calls
raft::numpy_serializer::parse_descr without verifying the read succeeded;
validate the read length immediately after is.read(dtype_string, 4) (e.g., check
is.gcount() == 4 or test stream state like if (!is || is.gcount() != 4)) and on
short reads/failure throw or return a clear error (or set an error status)
before calling raft::numpy_serializer::parse_descr, so malformed or truncated
files are handled safely.

In `@c/src/neighbors/mg_cagra.cpp`:
- Around line 403-406: The 4-byte dtype header read into dtype_string before
calling raft::numpy_serializer::parse_descr is not validated; check the istream
read result (e.g., is.read(...) and then is.gcount() == 4 or !is.fail()) and
handle short/truncated reads by logging/throwing/returning an error instead of
calling parse_descr with partial data; apply the same validation to the other
occurrence that reads 4 bytes so neither is.read -> parse_descr path can receive
garbage.

In `@c/src/neighbors/mg_ivf_flat.cpp`:
- Around line 400-403: The code reads 4 bytes into dtype_string and calls
raft::numpy_serializer::parse_descr without validating the read; update both the
deserialize and distribute code paths to check is.read(...) and the stream state
(e.g., verify gcount() == 4 or is.good()/is) before constructing
std::string(dtype_string,4) and calling parse_descr, and handle truncated reads
by returning/throwing an error or reporting via existing error path; reference
the dtype_string buffer, the is.read call, and the parse_descr invocation so you
change both occurrences (around the calls near deserialize and distribute).

In `@c/src/neighbors/mg_ivf_pq.cpp`:
- Around line 392-395: The code calls is.read into dtype_string and immediately
hands those bytes to raft::numpy_serializer::parse_descr without checking the
read result; update the logic around dtype_string/is.read to validate that 4
bytes were actually read (e.g., check is.gcount() == 4 and/or
is.good()/is.fail()) before calling parse_descr, and handle the short-read error
path (return an error, throw, or log and exit) so parse_descr is never called
with incomplete data; reference the dtype_string buffer, the is.read call, and
raft::numpy_serializer::parse_descr when making the change and ensure is.close()
still runs in all code paths.

In `@cpp/cmake/thirdparty/get_raft.cmake`:
- Around line 9-10: The CMake variables RAFT_FORK and RAFT_PINNED_TAG are
hard-pinned to a personal fork/branch; change the defaults to use the official
upstream (e.g., "rapidsai/raft") and a safe default tag (empty or a release
tag), and expose RAFT_FORK and RAFT_PINNED_TAG as CMake CACHE variables so they
can only be overridden explicitly via -D on the command line; additionally gate
any non-upstream usage behind an opt-in flag (e.g., USE_CUSTOM_RAFT or
RAFT_USE_FORK) so CI/releases use the official repo by default and personal-fork
overrides are explicitly documented and temporary.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 393379db-9f11-4936-953c-40a2df4cec4c

📥 Commits

Reviewing files that changed from the base of the PR and between f2bffb6 and ae5534e.

📒 Files selected for processing (15)
  • c/src/neighbors/brute_force.cpp
  • c/src/neighbors/cagra.cpp
  • c/src/neighbors/ivf_flat.cpp
  • c/src/neighbors/mg_cagra.cpp
  • c/src/neighbors/mg_ivf_flat.cpp
  • c/src/neighbors/mg_ivf_pq.cpp
  • cpp/cmake/thirdparty/get_raft.cmake
  • cpp/include/cuvs/neighbors/cagra.hpp
  • cpp/include/cuvs/util/file_io.hpp
  • cpp/src/neighbors/brute_force_serialize.cu
  • cpp/src/neighbors/detail/cagra/cagra_build.cuh
  • cpp/src/neighbors/detail/cagra/cagra_serialize.cuh
  • cpp/src/neighbors/detail/hnsw.hpp
  • cpp/src/neighbors/ivf_flat/ivf_flat_serialize.cuh
  • cpp/src/neighbors/mg/snmg.cuh

Comment on lines 242 to +244
char dtype_string[4];
is.read(dtype_string, 4);
auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard against truncated header reads before parsing dtype.

Line 243 reads 4 bytes but never checks success. On short/corrupt files, Line 244 may parse uninitialized bytes.

💡 Proposed fix
-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4]{};
+    if (!is.read(dtype_string, sizeof(dtype_string))) {
+      RAFT_FAIL("Invalid or truncated index header in file %s", filename);
+    }
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@c/src/neighbors/brute_force.cpp` around lines 242 - 244, The code reads 4
bytes into dtype_string using is.read and immediately calls
raft::numpy_serializer::parse_descr on those bytes; however it doesn't check
whether the read succeeded. Update the block around dtype_string/is.read to
verify the stream read (e.g., check is.gcount() == 4 or is.fail()/is.good())
before calling raft::numpy_serializer::parse_descr, and on failure handle the
truncated header by returning an error/throwing an exception or logging and
aborting the parse so parse_descr never receives uninitialized data.

Comment thread c/src/neighbors/cagra.cpp
Comment on lines 877 to 880
char dtype_string[4];
is.read(dtype_string, 4);
auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard the dtype prefix read before parse_descr.

Line 878 reads 4 bytes but does not validate read length before parsing on Line 879. Corrupt/truncated files can produce invalid dtype decoding.

Proposed fix
-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4] = {};
+    is.read(dtype_string, sizeof(dtype_string));
+    RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),
+                 "Failed to read dtype header from %s",
+                 filename);
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
char dtype_string[4];
is.read(dtype_string, 4);
auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
char dtype_string[4] = {};
is.read(dtype_string, sizeof(dtype_string));
RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),
"Failed to read dtype header from %s",
filename);
auto dtype =
raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@c/src/neighbors/cagra.cpp` around lines 877 - 880, The code reads 4 bytes
into dtype_string then calls raft::numpy_serializer::parse_descr without
validating the read; first check the read succeeded (e.g., verify is.read(...)
and that is.gcount() == 4 or that the stream is in a good state) before
constructing std::string(dtype_string, 4) and calling
raft::numpy_serializer::parse_descr; if the read fails/returns fewer than 4
bytes, handle the error (throw, return an error code, or log and abort) instead
of passing potentially truncated data to parse_descr.

Comment on lines 303 to 306
char dtype_string[4];
is.read(dtype_string, 4);
auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Fail early on short dtype-header reads.

Line 304 reads the dtype prefix without checking byte count before parsing on Line 305. This should be validated to handle malformed files safely.

Proposed fix
-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4] = {};
+    is.read(dtype_string, sizeof(dtype_string));
+    RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),
+                 "Failed to read dtype header from %s",
+                 filename);
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@c/src/neighbors/ivf_flat.cpp` around lines 303 - 306, The code reads four
bytes into dtype_string then calls raft::numpy_serializer::parse_descr without
verifying the read succeeded; validate the read length immediately after
is.read(dtype_string, 4) (e.g., check is.gcount() == 4 or test stream state like
if (!is || is.gcount() != 4)) and on short reads/failure throw or return a clear
error (or set an error status) before calling
raft::numpy_serializer::parse_descr, so malformed or truncated files are handled
safely.

Comment on lines 403 to 406
char dtype_string[4];
is.read(dtype_string, 4);
auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
is.close();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate dtype header reads before parsing.

On Line 404 and Line 435, the 4-byte read is not validated. Truncated files can pass partial/garbage bytes into parse_descr, causing invalid dtype dispatch.

Proposed fix
-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4] = {};
+    is.read(dtype_string, sizeof(dtype_string));
+    RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),
+                 "Failed to read dtype header from %s",
+                 filename);
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));
@@
-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4] = {};
+    is.read(dtype_string, sizeof(dtype_string));
+    RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),
+                 "Failed to read dtype header from %s",
+                 filename);
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

Also applies to: 434-437

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@c/src/neighbors/mg_cagra.cpp` around lines 403 - 406, The 4-byte dtype header
read into dtype_string before calling raft::numpy_serializer::parse_descr is not
validated; check the istream read result (e.g., is.read(...) and then
is.gcount() == 4 or !is.fail()) and handle short/truncated reads by
logging/throwing/returning an error instead of calling parse_descr with partial
data; apply the same validation to the other occurrence that reads 4 bytes so
neither is.read -> parse_descr path can receive garbage.

Comment on lines 400 to 403
char dtype_string[4];
is.read(dtype_string, 4);
auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
is.close();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate header read length before calling parse_descr in both paths.

Both blocks parse the dtype descriptor immediately after read(...) without checking stream state. Truncated files can lead to undefined behavior due to partially initialized buffers.

💡 Proposed fix (apply in both deserialize and distribute)
-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4]{};
+    if (!is.read(dtype_string, sizeof(dtype_string))) {
+      RAFT_FAIL("Invalid or truncated index header in file %s", filename);
+    }
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

Also applies to: 431-434

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@c/src/neighbors/mg_ivf_flat.cpp` around lines 400 - 403, The code reads 4
bytes into dtype_string and calls raft::numpy_serializer::parse_descr without
validating the read; update both the deserialize and distribute code paths to
check is.read(...) and the stream state (e.g., verify gcount() == 4 or
is.good()/is) before constructing std::string(dtype_string,4) and calling
parse_descr, and handle truncated reads by returning/throwing an error or
reporting via existing error path; reference the dtype_string buffer, the
is.read call, and the parse_descr invocation so you change both occurrences
(around the calls near deserialize and distribute).

Comment on lines 392 to 395
char dtype_string[4];
is.read(dtype_string, 4);
auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
is.close();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Validate read() result before dtype parsing.

Line 393 does not verify that 4 bytes were actually read. A short read can propagate invalid bytes into parse_descr on Line 394.

Proposed fix
-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4] = {};
+    is.read(dtype_string, sizeof(dtype_string));
+    RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),
+                 "Failed to read dtype header from %s",
+                 filename);
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@c/src/neighbors/mg_ivf_pq.cpp` around lines 392 - 395, The code calls is.read
into dtype_string and immediately hands those bytes to
raft::numpy_serializer::parse_descr without checking the read result; update the
logic around dtype_string/is.read to validate that 4 bytes were actually read
(e.g., check is.gcount() == 4 and/or is.good()/is.fail()) before calling
parse_descr, and handle the short-read error path (return an error, throw, or
log and exit) so parse_descr is never called with incomplete data; reference the
dtype_string buffer, the is.read call, and raft::numpy_serializer::parse_descr
when making the change and ensure is.close() still runs in all code paths.

Comment on lines +9 to +10
set(RAFT_FORK "julianmi")
set(RAFT_PINNED_TAG "expose-public-npy-helpers")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "Checking upstream dependency status (rapidsai/raft PR `#3003`)..."
curl -s https://api.github.com/repos/rapidsai/raft/pulls/3003 \
  | jq '{state, merged, merge_commit_sha, head_repo: .head.repo.full_name, head_ref: .head.ref, base_ref: .base.ref}'

echo
echo "Checking whether expose-public-npy-helpers exists in official and fork repos..."
git ls-remote --heads https://github.com/rapidsai/raft.git expose-public-npy-helpers || true
git ls-remote --tags  https://github.com/rapidsai/raft.git expose-public-npy-helpers || true
git ls-remote --heads https://github.com/julianmi/raft.git expose-public-npy-helpers || true
git ls-remote --tags  https://github.com/julianmi/raft.git expose-public-npy-helpers || true

Repository: rapidsai/cuvs

Length of output: 407


🏁 Script executed:

cat -n cpp/cmake/thirdparty/get_raft.cmake

Repository: rapidsai/cuvs

Length of output: 3417


🏁 Script executed:

# Search for any comments or related discussions about this fork usage
rg "expose-public-npy-helpers|RAFT_FORK|julianmi" --type cmake --type markdown -C 3

Repository: rapidsai/cuvs

Length of output: 9046


🏁 Script executed:

# Check git history for when this fork was introduced and any commit messages
git log --oneline -20 -- cpp/cmake/thirdparty/get_raft.cmake

Repository: rapidsai/cuvs

Length of output: 109


🏁 Script executed:

# Search for any open or recent PRs that might explain the fork usage
git log --all --oneline --grep="raft\|RAFT" | head -30

Repository: rapidsai/cuvs

Length of output: 1759


Avoid hard-pinning production builds to a personal RAFT fork/ref.

This introduces a supply-chain risk for CI and releases if that fork/ref is rewritten or removed. The expose-public-npy-helpers branch currently exists in julianmi/raft, but pinning to a personal fork without a documented integration path or timeline creates long-term dependency fragility. Prefer official rapidsai/raft by default, and gate fork/ref overrides behind explicit opt-in CMake cache vars for temporary cross-PR testing.

Suggested hardening
-set(RAFT_FORK "julianmi")
-set(RAFT_PINNED_TAG "expose-public-npy-helpers")
+set(RAFT_FORK "rapidsai")
+set(RAFT_PINNED_TAG "${rapids-cmake-checkout-tag}")
+
+set(CUVS_RAFT_FORK_OVERRIDE "" CACHE STRING "Optional RAFT fork override for temporary testing")
+set(CUVS_RAFT_TAG_OVERRIDE  "" CACHE STRING "Optional RAFT tag/branch override for temporary testing")
+
+if(CUVS_RAFT_FORK_OVERRIDE)
+  set(RAFT_FORK "${CUVS_RAFT_FORK_OVERRIDE}")
+endif()
+if(CUVS_RAFT_TAG_OVERRIDE)
+  set(RAFT_PINNED_TAG "${CUVS_RAFT_TAG_OVERRIDE}")
+endif()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/cmake/thirdparty/get_raft.cmake` around lines 9 - 10, The CMake variables
RAFT_FORK and RAFT_PINNED_TAG are hard-pinned to a personal fork/branch; change
the defaults to use the official upstream (e.g., "rapidsai/raft") and a safe
default tag (empty or a release tag), and expose RAFT_FORK and RAFT_PINNED_TAG
as CMake CACHE variables so they can only be overridden explicitly via -D on the
command line; additionally gate any non-upstream usage behind an opt-in flag
(e.g., USE_CUSTOM_RAFT or RAFT_USE_FORK) so CI/releases use the official repo by
default and personal-fork overrides are explicitly documented and temporary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant