Skip to content

feat(rust): add serialize/deserialize to brute force index#2231

Open
jamie8johnson wants to merge 3 commits into
rapidsai:mainfrom
jamie8johnson:rust-brute-force-serialize
Open

feat(rust): add serialize/deserialize to brute force index#2231
jamie8johnson wants to merge 3 commits into
rapidsai:mainfrom
jamie8johnson:rust-brute-force-serialize

Conversation

@jamie8johnson

Copy link
Copy Markdown
Contributor

Add serialize/deserialize to the Rust brute force index

What

Adds serialize and deserialize to the Rust brute_force::Index, wrapping the
existing C entry points cuvsBruteForceSerialize / cuvsBruteForceDeserialize.
This brings the brute force binding to parity with the CAGRA binding, which already
exposes serialize/deserialize.

  • Index::serialize(&self, res, filename) writes the index to disk.
  • Index::deserialize(res, filename) -> Result<Index> loads an index from disk.

Both methods mirror the CAGRA implementation:

  • A private path_to_cstring helper converts the filesystem path to a CString,
    returning Error::InvalidArgument (instead of panicking) for paths that are not
    valid UTF-8 or that contain an interior NUL byte. The path is validated before any
    FFI call is made.
  • Every FFI call is wrapped in check_cuvs.
  • deserialize constructs the Index handle first (via Index::new()), so that if
    the underlying cuvsBruteForceDeserialize call fails, the handle's Drop still
    runs and releases the C-side index allocation (RAII-safe error path).

The doc comments note that the serialization format may change between cuVS versions,
matching the wording in the C header.

Notes for reviewers

  • No new bindings were generated. cuvsBruteForceSerialize and
    cuvsBruteForceDeserialize are already present in rust/cuvs-sys/src/bindings.rs
    (brute_force is pulled in through core/all.h), so this change is purely additive
    on the safe Rust wrapper side and touches no generated code.
  • Test helper lifetime detail. The brute force Index keeps a non-owning device
    view of its dataset (_dataset). The serialize round-trip test deliberately keeps
    the host ndarray array in the same scope as the index for the duration of the
    test, because the device tensor's shape pointer borrows that array's dimension
    storage. Moving the host array while the index is alive would dangle that pointer
    (this is a property of the existing ManagedTensor view, not of these new methods).
  • Conflicts with sibling in-flight Rust PRs (e.g. the IVF-SQ bindings PR feat(rust): add IVF-SQ bindings #2229
    and other Rust binding PRs): if conflicts arise, resolve by merging main into this
    branch rather than rebasing, per the project's no-rebase contribution guideline.

Testing

Two new unit tests were added alongside the existing test_l2, mirroring the CAGRA
serialize tests:

  • test_brute_force_serialize_deserialize — builds an index, serializes it, asserts
    the output file exists and is non-empty, deserializes it back, and re-verifies that
    a self-neighbor search on the loaded index returns each query as its own nearest
    neighbor.
  • test_brute_force_serialize_rejects_interior_nul — confirms that a path containing
    an interior NUL byte surfaces as Error::InvalidArgument rather than panicking.

All brute force tests pass (run single-threaded on a single GPU):

cargo test -p cuvs brute_force -- --test-threads=1
test brute_force::tests::test_brute_force_serialize_deserialize ... ok
test brute_force::tests::test_brute_force_serialize_rejects_interior_nul ... ok
test brute_force::tests::test_l2 ... ok
test result: ok. 3 passed; 0 failed; 0 ignored

cargo fmt and cargo clippy are clean for the changed file.

On path_to_cstring: intentionally mirrored byte-for-byte from cagra/index.rs (a third copy arrives with the in-flight IVF-SQ PR #2229). Happy to hoist it into a shared module here or as a follow-up once the serialize PRs settle — whichever you prefer.

Wrap the existing C entry points cuvsBruteForceSerialize and
cuvsBruteForceDeserialize in safe Rust methods on brute_force::Index,
bringing the brute force binding to parity with the CAGRA binding.

- Index::serialize(&self, res, filename) writes the index to disk.
- Index::deserialize(res, filename) loads an index from disk, creating
  the Index handle first so failing FFI calls still Drop the C-side
  allocation (RAII-safe error path).
- A path_to_cstring helper rejects non-UTF-8 paths and paths containing
  an interior NUL byte with Error::InvalidArgument before any FFI call,
  mirroring the CAGRA implementation.

No bindings regeneration was needed: both symbols are already present in
cuvs-sys (brute_force is pulled in via core/all.h).

Tests (mirroring the CAGRA serialize tests):
- test_brute_force_serialize_deserialize: round-trips the index to disk,
  asserts the file exists and is non-empty, reloads it, and re-verifies a
  self-neighbor search on the loaded index.
- test_brute_force_serialize_rejects_interior_nul: confirms an interior
  NUL byte in the path surfaces as Error::InvalidArgument, not a panic.
@copy-pr-bot

copy-pr-bot Bot commented Jun 10, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: eaf3b9f9-8778-43f5-b5b0-29eb762720b7

📥 Commits

Reviewing files that changed from the base of the PR and between 71cc61a and 7eaa1ef.

📒 Files selected for processing (1)
  • rust/cuvs/src/brute_force.rs

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Brute-force KNN indices can now be serialized to and deserialized from files.
    • Enhanced error handling for invalid file paths.
  • Tests

    • Added comprehensive test coverage for index serialization, deserialization, and error handling scenarios.

Walkthrough

This PR adds file serialization support to the Rust brute-force KNN Index, introducing a safe path-to-CString helper, public serialize/deserialize methods that wrap cuVS C-API calls, and comprehensive tests for round-trip serialization and invalid path handling.

Changes

Index Serialization Support

Layer / File(s) Summary
Serialization API and path helper
rust/cuvs/src/brute_force.rs
Imports CString and Path; adds private path_to_cstring helper that rejects non-UTF-8 and interior-NUL paths with Error::InvalidArgument; adds public Index::serialize and Index::deserialize methods that route paths through the helper and call cuVS FFI, with deserialize allocating the Index handle before FFI to ensure RAII cleanup on error.
Serialization tests
rust/cuvs/src/brute_force.rs
Introduces test helpers for dataset generation and search verification; adds round-trip serialize/deserialize test confirming top-1 self-neighbor behavior after file reload; adds test verifying interior-NUL filenames return Error::InvalidArgument.

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely describes the main change: adding serialize/deserialize functionality to the brute force index in Rust.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining what was added, how it works, testing details, and implementation notes.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@rust/cuvs/src/brute_force.rs`:
- Around line 269-270: The test currently uses a fixed temp filename
("test_brute_force_index.bin") which can collide across parallel runs; replace
that with a unique temporary file using the tempfile crate (e.g., create a
tempfile::NamedTempFile and use its path for serialization), or generate a
unique filename (PID + UUID) in the temp dir; update the code that builds
filepath (currently the std::env::temp_dir().join(...) expression) to use the
NamedTempFile path (or unique name) and ensure the tempfile crate is added to
Cargo.toml if not already present so index.serialize(&res, &filepath) writes to
a non-colliding location.

In `@UPSTREAM_PR_BODY.md`:
- Around line 57-63: The fenced code block in UPSTREAM_PR_BODY.md is missing a
language identifier which triggers MD040; update the triple-backtick fence that
surrounds the cargo test output to include a language (e.g., change ``` to
```text) so the block is explicitly marked as plain text and the markdownlint
warning is resolved.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: af398801-b0ca-4312-8e90-144f5c458bfb

📥 Commits

Reviewing files that changed from the base of the PR and between 78135be and 71cc61a.

📒 Files selected for processing (2)
  • UPSTREAM_PR_BODY.md
  • rust/cuvs/src/brute_force.rs

Comment thread rust/cuvs/src/brute_force.rs Outdated
Comment thread UPSTREAM_PR_BODY.md Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

1 participant