Switch to public RAFT numpy_serialize APIs by julianmi · Pull Request #2038 · rapidsai/cuvs

julianmi · 2026-04-24T12:53:08Z

cuVS uses parse_descr, read_header, get_numpy_dtype, write_header from raft::detail::numpy_serializer as discussed here. This PR proposes to expose them publicly in RAFT instead and use them throughout cuVS.

There are similar issues with the raft/core/detail/macros.hpp to be address in a separate PR.

Depends on rapidsai/raft#3003

coderabbitai · 2026-04-24T13:00:16Z

📝 Walkthrough

Summary by CodeRabbit

Refactor
- Improved serialization and deserialization logic to use stable public APIs instead of internal implementation details for enhanced long-term compatibility.
Chores
- Updated RAFT dependency configuration and copyright year ranges.

Walkthrough

Multiple C++ source and header files are updated to use the public raft::numpy_serializer API instead of the internal raft::detail::numpy_serializer. The RAFT dependency fork and pinned tag are modified in CMake configuration. Copyright year headers are updated in some files.

Changes

Cohort / File(s)	Summary
C Deserialization Implementations `c/src/neighbors/brute_force.cpp`, `c/src/neighbors/cagra.cpp`, `c/src/neighbors/ivf_flat.cpp`, `c/src/neighbors/mg_cagra.cpp`, `c/src/neighbors/mg_ivf_flat.cpp`, `c/src/neighbors/mg_ivf_pq.cpp`	Updated deserialization functions to call `raft::numpy_serializer::parse_descr` instead of internal `raft::detail::numpy_serializer::parse_descr`; added `raft/core/numpy_serializer.hpp` includes; SPDX copyright year ranges updated in multi-GPU variants.
C++ Serialization Implementations `cpp/src/neighbors/brute_force_serialize.cu`, `cpp/src/neighbors/detail/cagra/cagra_serialize.cuh`, `cpp/src/neighbors/ivf_flat/ivf_flat_serialize.cuh`, `cpp/src/neighbors/mg/snmg.cuh`	Updated serialization functions to use `raft::numpy_serializer::get_numpy_dtype<T>()` instead of internal `raft::detail::numpy_serializer::get_numpy_dtype<T>()`; header includes and SPDX ranges updated.
C++ Load Implementation `cpp/src/neighbors/detail/cagra/cagra_build.cuh`	Updated ACE disk-load logic to read NumPy headers via public `raft::numpy_serializer::read_header` instead of internal API; added corresponding header include.
C++ Public Headers `cpp/include/cuvs/neighbors/cagra.hpp`, `cpp/include/cuvs/util/file_io.hpp`	Updated disk-backed load paths and NumPy file creation to use public `raft::numpy_serializer` namespace instead of internal detail variant; added required header includes.
HNSW Implementation `cpp/src/neighbors/detail/hnsw.hpp`	Updated HNSW serialization to call public `raft::numpy_serializer::read_header` for graph, dataset, and mapping files instead of internal mdspan serializer; adjusted includes accordingly.
Build Configuration `cpp/cmake/thirdparty/get_raft.cmake`	Changed `RAFT_FORK` from `rapidsai` to `julianmi` and pinned `RAFT_PINNED_TAG` to `expose-public-npy-helpers`; SPDX copyright year updated to 2026.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.86% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: switching from internal to public RAFT numpy_serializer APIs across multiple cuVS files.
Description check	✅ Passed	The description clearly explains the rationale and scope of the changes, including context and dependencies.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@c/src/neighbors/brute_force.cpp`:
- Around line 242-244: The code reads 4 bytes into dtype_string using is.read
and immediately calls raft::numpy_serializer::parse_descr on those bytes;
however it doesn't check whether the read succeeded. Update the block around
dtype_string/is.read to verify the stream read (e.g., check is.gcount() == 4 or
is.fail()/is.good()) before calling raft::numpy_serializer::parse_descr, and on
failure handle the truncated header by returning an error/throwing an exception
or logging and aborting the parse so parse_descr never receives uninitialized
data.

In `@c/src/neighbors/cagra.cpp`:
- Around line 877-880: The code reads 4 bytes into dtype_string then calls
raft::numpy_serializer::parse_descr without validating the read; first check the
read succeeded (e.g., verify is.read(...) and that is.gcount() == 4 or that the
stream is in a good state) before constructing std::string(dtype_string, 4) and
calling raft::numpy_serializer::parse_descr; if the read fails/returns fewer
than 4 bytes, handle the error (throw, return an error code, or log and abort)
instead of passing potentially truncated data to parse_descr.

In `@c/src/neighbors/ivf_flat.cpp`:
- Around line 303-306: The code reads four bytes into dtype_string then calls
raft::numpy_serializer::parse_descr without verifying the read succeeded;
validate the read length immediately after is.read(dtype_string, 4) (e.g., check
is.gcount() == 4 or test stream state like if (!is || is.gcount() != 4)) and on
short reads/failure throw or return a clear error (or set an error status)
before calling raft::numpy_serializer::parse_descr, so malformed or truncated
files are handled safely.

In `@c/src/neighbors/mg_cagra.cpp`:
- Around line 403-406: The 4-byte dtype header read into dtype_string before
calling raft::numpy_serializer::parse_descr is not validated; check the istream
read result (e.g., is.read(...) and then is.gcount() == 4 or !is.fail()) and
handle short/truncated reads by logging/throwing/returning an error instead of
calling parse_descr with partial data; apply the same validation to the other
occurrence that reads 4 bytes so neither is.read -> parse_descr path can receive
garbage.

In `@c/src/neighbors/mg_ivf_flat.cpp`:
- Around line 400-403: The code reads 4 bytes into dtype_string and calls
raft::numpy_serializer::parse_descr without validating the read; update both the
deserialize and distribute code paths to check is.read(...) and the stream state
(e.g., verify gcount() == 4 or is.good()/is) before constructing
std::string(dtype_string,4) and calling parse_descr, and handle truncated reads
by returning/throwing an error or reporting via existing error path; reference
the dtype_string buffer, the is.read call, and the parse_descr invocation so you
change both occurrences (around the calls near deserialize and distribute).

In `@c/src/neighbors/mg_ivf_pq.cpp`:
- Around line 392-395: The code calls is.read into dtype_string and immediately
hands those bytes to raft::numpy_serializer::parse_descr without checking the
read result; update the logic around dtype_string/is.read to validate that 4
bytes were actually read (e.g., check is.gcount() == 4 and/or
is.good()/is.fail()) before calling parse_descr, and handle the short-read error
path (return an error, throw, or log and exit) so parse_descr is never called
with incomplete data; reference the dtype_string buffer, the is.read call, and
raft::numpy_serializer::parse_descr when making the change and ensure is.close()
still runs in all code paths.

In `@cpp/cmake/thirdparty/get_raft.cmake`:
- Around line 9-10: The CMake variables RAFT_FORK and RAFT_PINNED_TAG are
hard-pinned to a personal fork/branch; change the defaults to use the official
upstream (e.g., "rapidsai/raft") and a safe default tag (empty or a release
tag), and expose RAFT_FORK and RAFT_PINNED_TAG as CMake CACHE variables so they
can only be overridden explicitly via -D on the command line; additionally gate
any non-upstream usage behind an opt-in flag (e.g., USE_CUSTOM_RAFT or
RAFT_USE_FORK) so CI/releases use the official repo by default and personal-fork
overrides are explicitly documented and temporary.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 393379db-9f11-4936-953c-40a2df4cec4c

📥 Commits

Reviewing files that changed from the base of the PR and between f2bffb6 and ae5534e.

📒 Files selected for processing (15)

c/src/neighbors/brute_force.cpp
c/src/neighbors/cagra.cpp
c/src/neighbors/ivf_flat.cpp
c/src/neighbors/mg_cagra.cpp
c/src/neighbors/mg_ivf_flat.cpp
c/src/neighbors/mg_ivf_pq.cpp
cpp/cmake/thirdparty/get_raft.cmake
cpp/include/cuvs/neighbors/cagra.hpp
cpp/include/cuvs/util/file_io.hpp
cpp/src/neighbors/brute_force_serialize.cu
cpp/src/neighbors/detail/cagra/cagra_build.cuh
cpp/src/neighbors/detail/cagra/cagra_serialize.cuh
cpp/src/neighbors/detail/hnsw.hpp
cpp/src/neighbors/ivf_flat/ivf_flat_serialize.cuh
cpp/src/neighbors/mg/snmg.cuh

coderabbitai · 2026-04-24T13:00:22Z

    char dtype_string[4];
    is.read(dtype_string, 4);
-    auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));


⚠️ Potential issue | 🟠 Major

Guard against truncated header reads before parsing dtype.

Line 243 reads 4 bytes but never checks success. On short/corrupt files, Line 244 may parse uninitialized bytes.

💡 Proposed fix

- char dtype_string[4]; - is.read(dtype_string, 4); - auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4)); + char dtype_string[4]{}; + if (!is.read(dtype_string, sizeof(dtype_string))) { + RAFT_FAIL("Invalid or truncated index header in file %s", filename); + } + auto dtype = + raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/brute_force.cpp` around lines 242 - 244, The code reads 4 bytes into dtype_string using is.read and immediately calls raft::numpy_serializer::parse_descr on those bytes; however it doesn't check whether the read succeeded. Update the block around dtype_string/is.read to verify the stream read (e.g., check is.gcount() == 4 or is.fail()/is.good()) before calling raft::numpy_serializer::parse_descr, and on failure handle the truncated header by returning an error/throwing an exception or logging and aborting the parse so parse_descr never receives uninitialized data.

coderabbitai · 2026-04-24T13:00:23Z

    char dtype_string[4];
    is.read(dtype_string, 4);
-    auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));



⚠️ Potential issue | 🟠 Major

Guard the dtype prefix read before parse_descr.

Line 878 reads 4 bytes but does not validate read length before parsing on Line 879. Corrupt/truncated files can produce invalid dtype decoding.

Proposed fix

- char dtype_string[4]; - is.read(dtype_string, 4); - auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4)); + char dtype_string[4] = {}; + is.read(dtype_string, sizeof(dtype_string)); + RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)), + "Failed to read dtype header from %s", + filename); + auto dtype = + raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

char dtype_string[4];

is.read(dtype_string, 4);

auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));

auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));

char dtype_string[4] = {};

is.read(dtype_string, sizeof(dtype_string));

RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),

"Failed to read dtype header from %s",

filename);

auto dtype =

raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/cagra.cpp` around lines 877 - 880, The code reads 4 bytes into dtype_string then calls raft::numpy_serializer::parse_descr without validating the read; first check the read succeeded (e.g., verify is.read(...) and that is.gcount() == 4 or that the stream is in a good state) before constructing std::string(dtype_string, 4) and calling raft::numpy_serializer::parse_descr; if the read fails/returns fewer than 4 bytes, handle the error (throw, return an error code, or log and abort) instead of passing potentially truncated data to parse_descr.

coderabbitai · 2026-04-24T13:00:23Z

    char dtype_string[4];
    is.read(dtype_string, 4);
-    auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));



⚠️ Potential issue | 🟠 Major

Fail early on short dtype-header reads.

Line 304 reads the dtype prefix without checking byte count before parsing on Line 305. This should be validated to handle malformed files safely.

Proposed fix

- char dtype_string[4]; - is.read(dtype_string, 4); - auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4)); + char dtype_string[4] = {}; + is.read(dtype_string, sizeof(dtype_string)); + RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)), + "Failed to read dtype header from %s", + filename); + auto dtype = + raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/ivf_flat.cpp` around lines 303 - 306, The code reads four bytes into dtype_string then calls raft::numpy_serializer::parse_descr without verifying the read succeeded; validate the read length immediately after is.read(dtype_string, 4) (e.g., check is.gcount() == 4 or test stream state like if (!is || is.gcount() != 4)) and on short reads/failure throw or return a clear error (or set an error status) before calling raft::numpy_serializer::parse_descr, so malformed or truncated files are handled safely.

coderabbitai · 2026-04-24T13:00:23Z

    char dtype_string[4];
    is.read(dtype_string, 4);
-    auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
    is.close();


⚠️ Potential issue | 🟠 Major

Validate dtype header reads before parsing.

On Line 404 and Line 435, the 4-byte read is not validated. Truncated files can pass partial/garbage bytes into parse_descr, causing invalid dtype dispatch.

Proposed fix

- char dtype_string[4]; - is.read(dtype_string, 4); - auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4)); + char dtype_string[4] = {}; + is.read(dtype_string, sizeof(dtype_string)); + RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)), + "Failed to read dtype header from %s", + filename); + auto dtype = + raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string))); @@ - char dtype_string[4]; - is.read(dtype_string, 4); - auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4)); + char dtype_string[4] = {}; + is.read(dtype_string, sizeof(dtype_string)); + RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)), + "Failed to read dtype header from %s", + filename); + auto dtype = + raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

Also applies to: 434-437

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/mg_cagra.cpp` around lines 403 - 406, The 4-byte dtype header read into dtype_string before calling raft::numpy_serializer::parse_descr is not validated; check the istream read result (e.g., is.read(...) and then is.gcount() == 4 or !is.fail()) and handle short/truncated reads by logging/throwing/returning an error instead of calling parse_descr with partial data; apply the same validation to the other occurrence that reads 4 bytes so neither is.read -> parse_descr path can receive garbage.

coderabbitai · 2026-04-24T13:00:23Z

    char dtype_string[4];
    is.read(dtype_string, 4);
-    auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
    is.close();


⚠️ Potential issue | 🟠 Major

Validate header read length before calling parse_descr in both paths.

Both blocks parse the dtype descriptor immediately after read(...) without checking stream state. Truncated files can lead to undefined behavior due to partially initialized buffers.

💡 Proposed fix (apply in both deserialize and distribute)

- char dtype_string[4]; - is.read(dtype_string, 4); - auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4)); + char dtype_string[4]{}; + if (!is.read(dtype_string, sizeof(dtype_string))) { + RAFT_FAIL("Invalid or truncated index header in file %s", filename); + } + auto dtype = + raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

Also applies to: 431-434

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/mg_ivf_flat.cpp` around lines 400 - 403, The code reads 4 bytes into dtype_string and calls raft::numpy_serializer::parse_descr without validating the read; update both the deserialize and distribute code paths to check is.read(...) and the stream state (e.g., verify gcount() == 4 or is.good()/is) before constructing std::string(dtype_string,4) and calling parse_descr, and handle truncated reads by returning/throwing an error or reporting via existing error path; reference the dtype_string buffer, the is.read call, and the parse_descr invocation so you change both occurrences (around the calls near deserialize and distribute).

coderabbitai · 2026-04-24T13:00:23Z

    char dtype_string[4];
    is.read(dtype_string, 4);
-    auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
    is.close();


⚠️ Potential issue | 🟠 Major

Validate read() result before dtype parsing.

Line 393 does not verify that 4 bytes were actually read. A short read can propagate invalid bytes into parse_descr on Line 394.

Proposed fix

- char dtype_string[4]; - is.read(dtype_string, 4); - auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4)); + char dtype_string[4] = {}; + is.read(dtype_string, sizeof(dtype_string)); + RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)), + "Failed to read dtype header from %s", + filename); + auto dtype = + raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/mg_ivf_pq.cpp` around lines 392 - 395, The code calls is.read into dtype_string and immediately hands those bytes to raft::numpy_serializer::parse_descr without checking the read result; update the logic around dtype_string/is.read to validate that 4 bytes were actually read (e.g., check is.gcount() == 4 and/or is.good()/is.fail()) before calling parse_descr, and handle the short-read error path (return an error, throw, or log and exit) so parse_descr is never called with incomplete data; reference the dtype_string buffer, the is.read call, and raft::numpy_serializer::parse_descr when making the change and ensure is.close() still runs in all code paths.

coderabbitai · 2026-04-24T13:00:23Z

+set(RAFT_FORK "julianmi")
+set(RAFT_PINNED_TAG "expose-public-npy-helpers")


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Checking upstream dependency status (rapidsai/raft PR `#3003`)..." curl -s https://api.github.com/repos/rapidsai/raft/pulls/3003 \ | jq '{state, merged, merge_commit_sha, head_repo: .head.repo.full_name, head_ref: .head.ref, base_ref: .base.ref}' echo echo "Checking whether expose-public-npy-helpers exists in official and fork repos..." git ls-remote --heads https://github.com/rapidsai/raft.git expose-public-npy-helpers || true git ls-remote --tags https://github.com/rapidsai/raft.git expose-public-npy-helpers || true git ls-remote --heads https://github.com/julianmi/raft.git expose-public-npy-helpers || true git ls-remote --tags https://github.com/julianmi/raft.git expose-public-npy-helpers || true

Repository: rapidsai/cuvs

Length of output: 407

🏁 Script executed:

cat -n cpp/cmake/thirdparty/get_raft.cmake

Repository: rapidsai/cuvs

Length of output: 3417

🏁 Script executed:

# Search for any comments or related discussions about this fork usage rg "expose-public-npy-helpers|RAFT_FORK|julianmi" --type cmake --type markdown -C 3

Repository: rapidsai/cuvs

Length of output: 9046

🏁 Script executed:

# Check git history for when this fork was introduced and any commit messages git log --oneline -20 -- cpp/cmake/thirdparty/get_raft.cmake

Repository: rapidsai/cuvs

Length of output: 109

🏁 Script executed:

# Search for any open or recent PRs that might explain the fork usage git log --all --oneline --grep="raft\|RAFT" | head -30

Repository: rapidsai/cuvs

Length of output: 1759

Avoid hard-pinning production builds to a personal RAFT fork/ref.

This introduces a supply-chain risk for CI and releases if that fork/ref is rewritten or removed. The expose-public-npy-helpers branch currently exists in julianmi/raft, but pinning to a personal fork without a documented integration path or timeline creates long-term dependency fragility. Prefer official rapidsai/raft by default, and gate fork/ref overrides behind explicit opt-in CMake cache vars for temporary cross-PR testing.

Suggested hardening

-set(RAFT_FORK "julianmi") -set(RAFT_PINNED_TAG "expose-public-npy-helpers") +set(RAFT_FORK "rapidsai") +set(RAFT_PINNED_TAG "${rapids-cmake-checkout-tag}") + +set(CUVS_RAFT_FORK_OVERRIDE "" CACHE STRING "Optional RAFT fork override for temporary testing") +set(CUVS_RAFT_TAG_OVERRIDE "" CACHE STRING "Optional RAFT tag/branch override for temporary testing") + +if(CUVS_RAFT_FORK_OVERRIDE) + set(RAFT_FORK "${CUVS_RAFT_FORK_OVERRIDE}") +endif() +if(CUVS_RAFT_TAG_OVERRIDE) + set(RAFT_PINNED_TAG "${CUVS_RAFT_TAG_OVERRIDE}") +endif()

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/cmake/thirdparty/get_raft.cmake` around lines 9 - 10, The CMake variables RAFT_FORK and RAFT_PINNED_TAG are hard-pinned to a personal fork/branch; change the defaults to use the official upstream (e.g., "rapidsai/raft") and a safe default tag (empty or a release tag), and expose RAFT_FORK and RAFT_PINNED_TAG as CMake CACHE variables so they can only be overridden explicitly via -D on the command line; additionally gate any non-upstream usage behind an opt-in flag (e.g., USE_CUSTOM_RAFT or RAFT_USE_FORK) so CI/releases use the official repo by default and personal-fork overrides are explicitly documented and temporary.

Use to be exposed RAFT numpy_serialize APIs

ae5534e

julianmi requested review from a team as code owners April 24, 2026 12:53

github-project-automation Bot added this to Unstructured Data Processing Apr 24, 2026

coderabbitai Bot reviewed Apr 24, 2026

View reviewed changes

Merge branch 'main' into raft-public-npy-helpers

0d5d280

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Switch to public RAFT numpy_serialize APIs#2038

Switch to public RAFT numpy_serialize APIs#2038
julianmi wants to merge 2 commits intorapidsai:mainfrom
julianmi:raft-public-npy-helpers

julianmi commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

coderabbitai Bot Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-    char dtype_string[4];
-    is.read(dtype_string, 4);
-    auto dtype = raft::detail::numpy_serializer::parse_descr(std::string(dtype_string, 4));
-    auto dtype = raft::numpy_serializer::parse_descr(std::string(dtype_string, 4));
+    char dtype_string[4] = {};
+    is.read(dtype_string, sizeof(dtype_string));
+    RAFT_EXPECTS(is.gcount() == static_cast<std::streamsize>(sizeof(dtype_string)),
+                 "Failed to read dtype header from %s",
+                 filename);
+    auto dtype =
+      raft::numpy_serializer::parse_descr(std::string(dtype_string, sizeof(dtype_string)));

		set(RAFT_FORK "julianmi")
		set(RAFT_PINNED_TAG "expose-public-npy-helpers")

Conversation

julianmi commented Apr 24, 2026

Uh oh!

coderabbitai Bot commented Apr 24, 2026

Summary by CodeRabbit

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant