Add ACE CAGRA Graph Reordering Option by julianmi · Pull Request #2032 · rapidsai/cuvs

julianmi · 2026-04-17T09:50:11Z

The CAGRA graph built by the disk-backed ACE algorithm partitions the dataset. Thus, the CAGRA graph uses the reordered index space. Building a HNSW index using from_cagra uses the reordered dataset and CAGRA graph. Downstream consumers building an HNSW index would therefore require the reordered dataset, which is typically large when requiring the disk-backed ACE algorithm. Thus, remapping the graph to the original index space can minimize the network transfers for downstream consumers if they have the original dataset locally available.

This PR proposes an option to remap the CAGRA graph to original index space using remap_disk_graph_to_original_ids. Passing the original dataset to from_cagra for a disk-backed index would first reorder the graph to original index space and then assemble the HNSW index from the remapped graph and the original dataset using serialize_to_hnswlib_with_original_dataset. See examples/cpp/src/cagra_hnsw_ace_example.cu using REMAP_GRAPH_TO_ORIGINAL_IDS for an example.

hnsw::build() is unchanged. Use cagra::build with ACE disk parameters followed by from_cagra passing the original dataset to use this option.

cuvs_cagra_hnswlib bench wrapper and JSON config gain a use_original_id_graph flag to benchmark these changes. The reordering is mostly I/O bound and scales roughly with the disk specifications and graph size. The overhead of remapping it is typically smaller than sending the reordered dataset over a slow network interface.

- Passing the original dataset to `from_cagra` reorders the CAGRA graph on disk to original ID space. - Added a new benchmarking parameter `use_original_id_graph` to control the remapping behavior.

copy-pr-bot · 2026-04-17T09:50:14Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cjnolet

We are planning to start using our new "Dataset" abstraction to represent datasets, and will be deprecating the mdspan-based datasets. Let's please wait until that lands so we can adjust accordingly.

For reordering, we will want to make a new dataset and update that accordingly.

cjnolet · 2026-04-17T10:37:21Z

 #include <cuvs/neighbors/hnsw.hpp>
+#include <cuvs/util/file_io.hpp>
+
+#include <raft/core/detail/mdspan_numpy_serializer.hpp>


Definitely should not be using detail APIs from RAFT.

If this is really needed in cuVS, it should be properly exposed through a public API.

Thanks, this makes sense to me. We use parse_descr, read_header, get_numpy_dtype, write_header from raft::detail::numpy_serializer throughout the codebase. These would be good candidates to expose.

Yeah, so my point is that we can't be using internal APIs. They will need to be exposed before they PR can be merged. But it's also important that public apis are also generally reusable and we aren't just exposing them to avoid exposing internal APIs.

julianmi added 2 commits April 17, 2026 11:22

Add ACE graph reordering option

ac14c93

- Passing the original dataset to `from_cagra` reorders the CAGRA graph on disk to original ID space. - Added a new benchmarking parameter `use_original_id_graph` to control the remapping behavior.

hnsw::build keeps using the reordered graph

5fff527

github-project-automation Bot added this to Unstructured Data Processing Apr 17, 2026

cjnolet requested changes Apr 17, 2026

View reviewed changes

aamijar assigned julianmi Apr 21, 2026

aamijar added non-breaking Introduces a non-breaking change improvement Improves an existing functionality labels Apr 21, 2026

aamijar moved this to In Progress in Unstructured Data Processing Apr 21, 2026

julianmi added 6 commits April 21, 2026 08:14

Add C++ testing of the graph reordering

132807e

Add Python interface and test

aa5aa8a

Formatting

47cb5ac

Add mem limit benchmarking support

35f8787

Merge remote-tracking branch 'upstream/main' into ace-reorder-graph

e4c10a0

Add ACE reordering example

e1d28ca

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ACE CAGRA Graph Reordering Option#2032

Add ACE CAGRA Graph Reordering Option#2032
julianmi wants to merge 8 commits intorapidsai:mainfrom
julianmi:ace-reorder-graph

julianmi commented Apr 17, 2026

Uh oh!

copy-pr-bot Bot commented Apr 17, 2026

Uh oh!

cjnolet left a comment

Uh oh!

cjnolet Apr 17, 2026

Uh oh!

julianmi Apr 21, 2026

Uh oh!

cjnolet Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

julianmi commented Apr 17, 2026

Uh oh!

copy-pr-bot Bot commented Apr 17, 2026

Uh oh!

cjnolet left a comment

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 17, 2026

Choose a reason for hiding this comment

Uh oh!

julianmi Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

cjnolet Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants