Multi segment cagra search by jamxia155 · Pull Request #2035 · rapidsai/cuvs

jamxia155 · 2026-04-22T23:02:16Z

This PR adds a CAGRA search API and associated implementation for batching workloads involving a single query and multiple indexes (e.g. index segments built from one dataset by cuvs-lucene). This promotes GPU utilization for non-batched queries by parallelizing work over multiple CTAs, while minimizing the number of concurrent kernels for potential host-side parallelism.

Addresses cuvs-lucene issue 124.

Associated cuvs-lucene PR: rapidsai/cuvs-lucene#133.

Introduces a globally-optimized path for searching across multiple CAGRA index segments without per-segment device-to-host copies. - c/include/cuvs/selection/select_k.h, c/src/selection/select_k.cpp: new C API function cuvsSelectK that wraps cuvs::selection::select_k for selecting the k smallest float values from a flat device array - BufferedCagraSearch (internal interface): searchIntoBuffer() writes per-segment CAGRA results into a slice of a caller-owned device buffer without syncing the stream or copying to host - CagraIndexImpl: implements BufferedCagraSearch; searchIntoBuffer() computes byte offsets into the global buffer using segmentIdx * topK - SelectKHelper: Panama FFI binding for cuvsSelectK - MultiSegmentCagraSearch: orchestrates the full pipeline — queue all per-segment searches, sync once, run cuvsSelectK on GPU, sync again, single D2H copy, decode results - MultiSegmentSearchResults: simple result carrier with count, segmentIndices, ordinals, and distances arrays

Implements concurrent per-segment GPU search that eliminates per-segment device-to-host copies and CPU blocking on workspace deallocation. Key changes: - CudaStreamPool: fixed-size pool of non-blocking CUDA streams (one cuvsResources_t per slot). Segments are assigned to slots via round-robin so searches on different slots run concurrently. Pool size defaults to 8 and is overridden via the system property com.nvidia.cuvs.streamPoolSize. - MultiSegmentCagraSearch: single-query search across N index segments using the stream pool. All per-segment CAGRA kernels write into a shared device buffer (no per-segment D2H copy or stream sync). A single cuvsSelectK call finds the global top-k entirely on GPU, then one D2H copy transfers the results. - BufferedCagraSearch / CagraIndexImpl: new searchIntoBuffer() method that queues a CAGRA search kernel on a caller-supplied stream and writes results at a given row offset into a pre-allocated device buffer. - cuvsRMMAsyncMemoryResourceEnable() (C API + Java bindings): switches the current device memory resource to cuda_async_memory_resource so that workspace deallocations issued by CAGRA's search plan destructor are stream-ordered and non-blocking. Without this, cudaFree serializes kernel launches across streams regardless of stream assignment, nullifying the stream pool benefit.

CudaStreamPool was a static singleton shared across all threads. With multiple concurrent query threads, calls to MultiSegmentCagraSearch.search() would alias onto the same pool slots after Math.floorMod, causing concurrent cudaEventRecord and cudaStreamWaitEvent calls on the same event handle (undefined behavior) and cross-thread stream interference. Additionally, CudaStreamPool.closeInstance() was called from CuVSResourcesImpl.close(), so whichever thread closed its resources first would destroy the shared pool while other threads were still using it. Fix: make CudaStreamPool a per-CuVSResources instance rather than a static singleton. One pool is created and owned by each CuVSResourcesImpl; it is closed when that instance is closed. Since CuVSResources is thread-local in the Lucene integration, each query thread gets its own independent set of streams and events with no sharing or locking required. - CudaStreamPool: remove static singleton (getOrCreate, closeInstance, static volatile instance, static AtomicInteger slotCounter); add package-private constructor; replace static slotCounter with an instance int and a new nextSlot(int count) method. - CuVSResourcesImpl: add final CudaStreamPool streamPool field (sized from com.nvidia.cuvs.streamPoolSize system property); close it directly in close(); add static getStreamPool(CuVSResources) helper for MultiSegmentCagraSearch to retrieve the per-resources pool. - MultiSegmentCagraSearch: get pool via CuVSResourcesImpl.getStreamPool and advance via pool.nextSlot. Remove redundant cuvsStreamSync after cuvsSelectK — the D2H copies are enqueued on the same stream so CUDA ordering already serializes them. Replace three separate hostArena.allocate calls with one contiguous allocation (Long.BYTES-aligned) sliced into three typed views, reducing OS-level allocation overhead per query.

The persistent kernel runner was previously keyed on (dataset_desc, graph, fixed search params), which forced a destroy/recreate cycle for every segment when searching a multi-segment Lucene index: each segment has a different graph pointer and potentially a different auto-computed max_iterations, producing a different hash on every call. C++ changes (search_single_cta_kernel-inl.cuh): - Move dataset_desc_ptr, graph_ptr, and graph_degree from fixed runner state into per-job fields in job_desc_t. The persistent kernel reads them from the job descriptor, so one runner instance can serve any number of segments without being rebuilt. - Remove dataset_desc and graph arguments from persistent_runner_t constructor and calculate_parameter_hash; the runner is now keyed only on fixed kernel parameters (block_size, smem_size, itopk, etc.). - Update select_and_run to initialize the device descriptor on the caller's stream and synchronize before submission, then pass dd_dev_ptr, graph.data_handle(), and graph_degree to runner::launch. - Remove dd_host from persistent_runner_t; dataset upload is now the caller's responsibility on each launch. Java changes: - Add persistent, persistentLifetime, and persistentDeviceUsage fields, getters, and Builder methods to CagraSearchParams. - Wire the three persistent params through CuVSParamsHelper into the Panama-generated cuvsCagraSearchParams struct.

…raSearch In persistent mode, `searchIntoBuffer` blocks on the CPU until the GPU signals completion via a system-scope atomic. Previously, segments were searched sequentially, so the GPU processed one segment at a time per query, leaving its job queue mostly idle between segment dispatches. Submit one async task per pool slot so all slots' segment searches are in-flight simultaneously. The persistent runner's job queue can hold all N segment jobs at once, allowing GPU workers to execute segments in parallel (bounded by worker_queue_size). Segments are grouped by pool slot rather than submitted one-per-segment to prevent concurrent access to the same cuvsResources_t handle: the descriptor_cache stored inside the RAFT resources object is not thread-safe, and multiple threads calling cuvsCagraSearch with the same handle causes a SIGSEGV. Grouping ensures each cuvsResources_t is accessed by at most one thread at a time. Effective parallelism is min(numSegments, pool.size()); increasing cuvsStreamPoolSize raises the ceiling. In non-persistent mode the existing sequential loop is unchanged: kernel launches are asynchronous and return immediately, so Java-level parallelism adds overhead without benefit. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Multi-segment search (C++/C/Java): - Add search_multi_segment() C++ function: builds a single search plan sized for the largest segment, packs per-segment descriptors, and launches search_kernel_ms — a new SINGLE_CTA kernel with grid dimensions (1, num_queries, num_segments) so each CTA independently searches one (query, segment) pair in a single kernel call. - Add cuvsCagraSearchMultiSegment() C API wrapping the above. - Simplify MultiSegmentCagraSearch.java to call cuvsCagraSearchMultiSegment() unconditionally, removing the previous per-segment stream pool, thread pool, CUDA event synchronization, and persistent/non-persistent branching. The search now completes in four phases: multi-segment kernel, GPU-side select-k, single D2H copy, result decoding. - Add BufferedCagraSearch.getIndexHandle() to expose the raw cuvsCagraIndex_t handle needed by the multi-segment kernel dispatch. Workspace pool (C/Java): - Add cuvsResourcesSetWorkspacePool(): configures the per-resources temporary workspace as an uncapped RMM pool that grows without shrinking. After warmup, cuvsRMMAlloc/cuvsRMMFree hit the pool cache instead of calling cudaMallocAsync/cudaFreeAsync, eliminating CUDA context lock contention under concurrent query threads. - Route cuvsRMMAlloc/cuvsRMMFree through the workspace resource so Java- side output buffer allocations also benefit from the pool. - Expose setWorkspacePool() in the CuVSResources Java interface with implementations in CuVSResourcesImpl and SynchronizedCuVSResources. Refactoring (search_single_cta_kernel-inl.cuh): - Extract TopkVariant enum and select_topk_variant() helper, shared by search_kernel_config and search_kernel_config_ms, replacing duplicated if/else trees in both choose_itopk_and_mx_candidates() bodies. - Extract kernel_dispatch_params::compute() to centralize max_candidates and max_itopk computation shared by select_and_run and select_and_run_multi_segment. - Extract hashmap_element_count() static helper on the search struct, used by both set_params (single-segment) and run_multi_segment.

Add HALF(2) to the DataType enum and wire it through the Java layer so callers can build and exchange float16 matrices without any Java-side type conversion: - CuVSMatrix.Builder: new addVector(short[]) overload; each element is a raw IEEE 754 binary16 bit pattern held in a short - LinkerHelper: add C_SHORT via canonicalLayouts("short") - CuVSMatrixBaseImpl: map HALF → C_SHORT in valueLayoutFromType(); decode DLPack (kDLFloat, bits=16) → DataType.HALF in dataTypeFromTensor() - CuVSMatrixInternal: HALF maps to the same kDLFloat DLPack type code as FLOAT; the bits field (16 vs 32) distinguishes them on the C side - JDKProvider.MatrixBuilder: implement addVector(short[]) following the same MemorySegment.ofArray pattern as the other overloads

copy-pr-bot · 2026-04-22T23:15:51Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

coderabbitai · 2026-04-22T23:17:46Z

📝 Walkthrough

Walkthrough

This pull request introduces multi-segment CAGRA search functionality, GPU-based selection, and RMM memory pool management. It adds C and C++ APIs for multi-segment approximate nearest neighbor search, async memory allocation, and select-K operations, alongside corresponding Java bindings and internal implementations supporting concurrent per-segment GPU processing and stream-based result aggregation.

Changes

Cohort / File(s)	Summary
Build Configuration `c/CMakeLists.txt`	Added `src/selection/select_k.cpp` to the C API library source list.
C API: RMM Memory Management `c/include/cuvs/core/c_api.h`, `c/src/core/c_api.cpp`	Introduced `cuvsResourcesSetWorkspacePool` to configure per-resources uncapped device memory pools and `cuvsRMMAsyncMemoryResourceEnable` to switch RMM to stream-ordered async allocation. Extended `cuvsRMMMemoryResourceReset` to clear thread-local async resources.
C API: Multi-Segment CAGRA `c/include/cuvs/neighbors/cagra.h`, `c/src/neighbors/cagra.cpp`	Added `cuvsCagraSearchMultiSegment` to perform concurrent ANN searches across multiple CAGRA index segments, accepting per-segment index pointers and query/result tensors.
C API: Select-K `c/include/cuvs/selection/select_k.h`, `c/src/selection/select_k.cpp`	Introduced `cuvsSelectK` to select k smallest values from GPU tensors, returning values and column indices.
C++ API: Multi-Segment CAGRA `cpp/include/cuvs/neighbors/cagra.hpp`, `cpp/src/neighbors/cagra.cuh`, `cpp/src/neighbors/cagra_search_inst.cu.in`	Added public `search_multi_segment` function overloads for `float`, `half`, `int8_t`, `uint8_t` index types with configurable output neighbor types (uint32_t or int64_t). Included explicit template instantiations for multi-segment variants.
C++ Implementation: Multi-Segment Search Logic `cpp/src/neighbors/detail/cagra/cagra_search.cuh`	Implemented multi-segment CAGRA search orchestration, handling dataset validation, kernel configuration, per-segment device descriptor management, and distance postprocessing (cosine-expanded normalization).
C++ Implementation: Multi-Segment Kernel Infrastructure `cpp/src/neighbors/detail/cagra/search_single_cta_kernel.cuh`, `cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh`, `cpp/src/neighbors/detail/cagra/search_single_cta_inst.cuh`	Added multi-segment kernel execution path with per-segment descriptor structure, concurrent CTA scheduling via blockIdx.z, hashmap sizing across segments, persistent runner refactoring to use job-specific descriptors, and `select_and_run_multi_segment` launcher.
C++ Implementation: Multi-Segment Search Helpers `cpp/src/neighbors/detail/cagra/search_single_cta.cuh`	Introduced `run_multi_segment` and `hashmap_element_count` helpers to orchestrate concurrent segment processing and allocate global hashmap buffers.
Java: Minor Updates `java/cuvs-java/src/main/java/com/nvidia/cuvs/{CagraIndex,HnswIndex}.java`, `java/cuvs-java/src/main/java/com/nvidia/cuvs/{CuVSAceParams,HnswAceParams,HnswIndexParams}.java`	Updated SPDX copyright headers and reformatted method signatures for consistency.
Java: Core API Extensions `java/cuvs-java/src/main/java/com/nvidia/cuvs/{CagraSearchParams,CuVSMatrix,CuVSResources,SynchronizedCuVSResources}.java`	Added persistent kernel configuration options to `CagraSearchParams`, `HALF` data type support to `CuVSMatrix`, and `setWorkspacePool` method to `CuVSResources` interface and implementations.
Java: SPI `java/cuvs-java/src/main/java/com/nvidia/cuvs/spi/{CuVSProvider,UnsupportedProvider}.java`	Added `enableRMMAsyncMemory()` method to provider interface and stub implementation.
Java: Multi-Segment Search API `java/cuvs-java/src/main/java22/com/nvidia/cuvs/{MultiSegmentSearchResults,MultiSegmentCagraSearch}.java`	Introduced result class to represent decoded multi-segment search outputs and public API for multi-segment search with global top-k selection via GPU-resident select-K.
Java: Buffered Search Interface `java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/{BufferedCagraSearch,CagraIndexImpl}.java`	Defined interface for GPU-side buffered search into caller-provided device buffers and implemented support in `CagraIndexImpl` for per-segment search without host synchronization.
Java: Internal Helpers & Bindings `java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/{SelectKHelper,CuVSParamsHelper,CudaStreamPool,LinkerHelper}.java`	Added native FFM bindings for select-K, CAGRA search parameters builder, fixed-size CUDA stream/resource pool with round-robin slot allocation, and `C_SHORT` layout constant.
Java: Internal Matrix & Resources `java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/{CuVSMatrixBaseImpl,CuVSMatrixInternal,CuVSResourcesImpl,HnswIndexImpl}.java`	Added half-precision float tensor handling, integrated stream pool ownership in resources, workspace pool implementation via native API, and formatting adjustments.
Java: Provider Implementation `java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java`	Implemented `enableRMMAsyncMemory()` and added `short[]` vector support for matrix building.
Java: Test Utilities `java/cuvs-java/src/test/java/com/nvidia/cuvs/CheckedCuVSResources.java`	Implemented `setWorkspacePool` delegation in test wrapper.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 42.39% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Multi segment cagra search' directly corresponds to the main feature being added: a CAGRA search API for multiple index segments. It clearly summarizes the primary change from the changeset.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description is directly related to the changeset, explaining the core feature (CAGRA multi-segment search API) and its purpose (batching workloads for single query, multiple indexes).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 17

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraSearchParams.java (1)
293-322: ⚠️ Potential issue | 🟡 Minor

The toString() method does not include the new persistent fields.

The three new fields (persistent, persistentLifetime, persistentDeviceUsage) are not included in the toString() output. While not critical, this creates an inconsistency where debugging output won't show the full configuration.
🔧 Proposed fix to include persistent fields in toString()
         + ", randXORMask="
         + randXORMask
+        + ", persistent="
+        + persistent
+        + ", persistentLifetime="
+        + persistentLifetime
+        + ", persistentDeviceUsage="
+        + persistentDeviceUsage
         + "]";
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraSearchParams.java` around
lines 293 - 322, The toString() in class CagraSearchParams currently omits the
new fields persistent, persistentLifetime, and persistentDeviceUsage; update the
CagraSearchParams.toString() method to append these three fields (with their
names and values) into the returned string (same formatting style as the other
fields) so debugging output shows the full configuration.

🧹 Nitpick comments (3)

java/cuvs-java/src/test/java/com/nvidia/cuvs/CheckedCuVSResources.java (1)
59-62: Add destroyed-state guard in the new delegate method.

setWorkspacePool(...) should call checkNotDestroyed() before delegating, consistent with this wrapper’s defensive contract.
Proposed change
   `@Override`
   public void setWorkspacePool(long sizeBytes) {
+    checkNotDestroyed();
     inner.setWorkspacePool(sizeBytes);
   }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@java/cuvs-java/src/test/java/com/nvidia/cuvs/CheckedCuVSResources.java`
around lines 59 - 62, The new delegate method setWorkspacePool(long sizeBytes)
is missing the wrapper's destroyed-state guard; modify
CheckedCuVSResources.setWorkspacePool to call checkNotDestroyed() at the start
(before delegating to inner.setWorkspacePool(sizeBytes)) so it matches the
class's defensive contract and uses the same destroyed-state check as other
methods.
java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraSearchParams.java (1)
527-537: Consider adding validation for persistentDeviceUsage bounds.

The Javadoc states the value "must be greater than 0.0 and not greater than 1.0", but the setter doesn't validate this constraint. Invalid values would only fail at the native layer.
🛡️ Optional: Add validation in the builder
     public Builder withPersistentDeviceUsage(float persistentDeviceUsage) {
+      if (persistentDeviceUsage <= 0.0f || persistentDeviceUsage > 1.0f) {
+        throw new IllegalArgumentException(
+            "persistentDeviceUsage must be > 0.0 and <= 1.0, got: " + persistentDeviceUsage);
+      }
       this.persistentDeviceUsage = persistentDeviceUsage;
       return this;
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraSearchParams.java` around
lines 527 - 537, The builder setter withPersistentDeviceUsage currently assigns
persistentDeviceUsage without enforcing the Javadoc constraint; update
Builder.withPersistentDeviceUsage to validate that persistentDeviceUsage > 0.0f
and <= 1.0f and throw an IllegalArgumentException (including the invalid value
in the message) when the check fails so invalid values are rejected early before
reaching native code.
cpp/src/neighbors/detail/cagra/cagra_search.cuh (1)
388-417: Consider hoisting query_norms allocation outside the loop for CosineExpanded metric.

When multiple segments use CosineExpanded metric, query_norms is allocated and computed redundantly for each segment. Since queries are the same across segments (repeated query vector), the norms could be computed once.

This is a minor optimization opportunity. The current implementation is correct; the redundant computation is bounded by the number of segments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@cpp/src/neighbors/detail/cagra/cagra_search.cuh` around lines 388 - 417,
Hoist the allocation and computation of query_norms out of the segment loop and
reuse it for every segment whose indices[i]->metric() ==
cuvs::distance::DistanceType::CosineExpanded: allocate query_norms once (using
raft::make_device_vector) before the for-loop, run raft::linalg::reduce on the
shared queries data to compute the norms, then inside the loop call
raft::linalg::matrix_vector_op (as currently done) using the precomputed
query_norms for each CosineExpanded segment; after the loop free or let
query_norms go out of scope. Ensure you still call
cuvs::neighbors::ivf::detail::postprocess_distances for non-CosineExpanded
branches and keep all existing ops (raft::compose_op, raft::sq_op,
raft::div_const_op, raft::cast_op, raft::add_const_op, raft::div_checkzero_op)
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@c/include/cuvs/core/c_api.h`:
- Around line 229-240: The current implementation of
cuvsRMMAsyncMemoryResourceEnable stores cuda_async_memory_resource in a
thread_local async_mr and passes it to rmm::mr::set_current_device_resource(),
creating a lifetime mismatch vs the API doc that says the change is global;
change the implementation so the async memory resource is process-scoped (not
thread_local) by replacing thread_local async_mr with a static/process-global
instance (e.g., a static unique_ptr or static object) so it outlives the thread
and remains valid for rmm::mr::set_current_device_resource(), and ensure
cuvsRMMAsyncMemoryResourceEnable and any cleanup use that same process-global
symbol (cuda_async_memory_resource / async_mr) when setting or resetting the
current device resource; alternatively, if thread-local semantics are intended,
update the cuvsRMMAsyncMemoryResourceEnable documentation to state it only
affects the calling thread and keep async_mr thread_local.

In `@c/src/neighbors/cagra.cpp`:
- Around line 710-733: The loop currently casts every indices[i]->addr and
builds device views without validating each segment's types/devices; update the
loop that fills idx_vec, q_vec, n_vec, d_vec to perform the same per-segment
checks used in cuvsCagraSearch: assert indices[i] != nullptr and addr != 0
(already present), then verify indices[i]->dtype.code == kDLFloat &&
indices[i]->dtype.bits == 32 for every i, and validate that queries[i],
neighbors[i], and distances[i] are device-backed DLPack tensors with the
expected element types (float for queries/distances, uint32 for neighbors)
before calling reinterpret_cast<const IndexT*>(indices[i]->addr) and
cuvs::core::from_dlpack to populate q_vec[i], n_vec[i], d_vec[i]; replace the
blind casts with RAFT_EXPECTS that include i in the error messages so a bad
segment fails fast and clearly.
- Around line 726-736: The loop that builds idx_vec/q_vec/n_vec/d_vec must also
validate that all segment indices use the same distance metric as the first
segment to prevent mixing incompatible metrics; inside the for-loop in cagra.cpp
(after the existing RAFT_EXPECTS that checks indices[i] non-null) compare
indices[i]->metric (or the actual metric field name on your IndexT struct/class)
to indices[0]->metric and fail fast with RAFT_EXPECTS (or equivalent) and a
clear message like "Mixed distance metrics across segments: expected %s but got
%s at segment %u"; keep this check before pushing idx_vec[i] and before calling
cuvs::neighbors::cagra::search_multi_segment so the function only runs when all
segments share the same metric.

In `@c/src/selection/select_k.cpp`:
- Around line 14-40: cuvsSelectK dereferences shape[1] and casts buffers without
validating the DLPack tensors; add explicit validation at the top of cuvsSelectK
for in_val, out_val, out_idx (non-null), then check each
DLManagedTensor->dl_tensor for expected ndim (==2), shapes (rows match expected
1 or compatible), dtype (in_val/out_val float32, out_idx int64), device type
(CUDA) and device id, byte_offset == 0, and contiguous row-major
strides/compatibility before creating device views with
raft::make_device_matrix_view; if any check fails return an appropriate
cuvsError_t (or throw inside translate_exceptions) instead of proceeding to
casts and calling cuvs::selection::select_k so malformed callers cannot crash or
corrupt memory.

In `@cpp/include/cuvs/neighbors/cagra.hpp`:
- Around line 1752-1806: Add Doxygen documentation for each overloaded
search_multi_segment declaration so they appear in generated API docs; either
add brief doxygen blocks above each overload or use `@copydoc` to reference the
primary search_multi_segment doc block (e.g., use `@copydoc`
search_multi_segment(raft::resources const&,
cuvs::neighbors::cagra::search_params const&, const std::vector<const
cuvs::neighbors::cagra::index<float, uint32_t>*>&, const
std::vector<raft::device_matrix_view<const float, int64_t, raft::row_major>>&,
const std::vector<raft::device_matrix_view<int64_t, int64_t, raft::row_major>>&,
const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>&))
for the overloads with half/int8_t/uint8_t and uint32_t/int64_t neighbor types
so each signature (the overloads of search_multi_segment) is documented.
- Around line 1744-1806: The header declares the template parameter order as <T,
OutputIdxT, IdxT> but the instantiation macro (used with <T, IdxT, OutputIdxT>
e.g. (data_t, uint32_t, int64_t)) expects <T, IdxT, OutputIdxT>, causing the
IdxT/OutputIdxT swap that trips the static_assert in cagra_search.cuh:276 and
breaks link-time overloads for int64_t; fix by changing the template parameter
order in the search_multi_segment declarations to <T, IdxT, OutputIdxT> (or
alternatively update the instantiation macro to match the declared order) so
types map correctly, and add missing Doxygen for overloads 2–8 by inserting a
`@copydoc` search_multi_segment (or equivalent documentation block) above each of
those overloaded search_multi_segment declarations to satisfy the public API
docs requirement.

In `@cpp/src/neighbors/cagra.cuh`:
- Around line 409-420: The wrapper search_multi_segment currently forwards
indices, queries, neighbors, and distances without validation; add the same
upfront shape checks used by search() before calling
cagra::detail::search_multi_segment: verify indices.size() == queries.size() ==
neighbors.size() == distances.size(), then for each segment i ensure
queries[i].n_rows == neighbors[i].n_rows, neighbors[i].n_cols ==
distances[i].n_cols (k matches), queries[i].n_cols equals the index dimension
for indices[i] (or indices[i]->dim() / appropriate accessor), and that k > 0 and
dims are consistent; if any check fails, return/throw a clear error (or use
RAFT/CUASSERT used elsewhere) rather than forwarding to the detail
implementation.

In `@java/cuvs-java/src/main/java/com/nvidia/cuvs/CuVSResources.java`:
- Around line 60-76: The Java API should validate workspace pool sizes before
calling native code: update the Javadoc for setWorkspacePool to state the valid
range is > 0, and in the implementation of setWorkspacePool (the method that
currently invokes the native cuvsResourcesSetWorkspacePool) add a check that
rejects non-positive values (<= 0) by throwing an appropriate Java exception
(e.g., IllegalArgumentException) with a clear message; only call
cuvsResourcesSetWorkspacePool when the value is positive to avoid
signed->unsigned wraparound in native size_t.

In `@java/cuvs-java/src/main/java/com/nvidia/cuvs/MultiSegmentSearchResults.java`:
- Around line 24-27: MultiSegmentSearchResults currently stores native uint32
ordinals in an int[] (field ordinals) which corrupts values > Integer.MAX_VALUE;
change ordinals from int[] to long[] (and any constructor/getter signatures) and
ensure code that decodes native uint32_t values writes unsigned values into the
long (e.g., value & 0xFFFFFFFFL) so ordinals remain non-negative; update any
consumers (notably MultiSegmentCagraSearch) to compare against a long sentinel
(e.g., -1L) or otherwise handle long ordinals instead of treating negative ints
as sentinels.

In `@java/cuvs-java/src/main/java/com/nvidia/cuvs/SynchronizedCuVSResources.java`:
- Around line 43-46: The setWorkspacePool method in SynchronizedCuVSResources is
not using the shared lock and must be serialized like access(); modify
setWorkspacePool to acquire the same lock used by access() (e.g., wrap the call
to inner.setWorkspacePool(sizeBytes) in the synchronized block or lock guard
used by access()) so that mutations to workspace pool are protected by the same
synchronization as access().

In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CagraIndexImpl.java`:
- Around line 391-396: The offset calculation uses segmentIdx * topK
(neighborByteOffset/distanceByteOffset) but the code builds tensors with shape
{numQueries, topK} (numQueries from queryVectors.size()), so when numQueries > 1
the buffer offsets are wrong; either enforce single-query by adding a guard (if
(numQueries != 1) throw new IllegalArgumentException(...)) near where numQueries
is computed (queryVectors) or update the offsets to multiply by numQueries
(neighborByteOffset = segmentIdx * numQueries * topK * C_INT_BYTE_SIZE and
distanceByteOffset = segmentIdx * numQueries * topK * Float.BYTES) before
creating neighborSlice/distanceSlice (globalNeighborsDP/globalDistancesDP).

In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CudaStreamPool.java`:
- Around line 115-119: Make nextSlot atomic and wrap the returned slot index by
changing slotCounter to an AtomicInteger and using an atomic get-and-add plus
modulo; specifically, replace the non-atomic increment in nextSlot with
something like int start = slotCounter.getAndAdd(count); then return
Math.floorMod(start, poolSize) (or equivalent using your pool size field) so the
operation is thread-safe and the returned slot is bounded by the pool size.
- Around line 125-131: In CudaStreamPool.close(), wrap the cuvsResourcesDestroy
call with the same error validation used elsewhere by replacing the raw
cuvsResourcesDestroy(resources[i]) invocation with a call to
checkCuVSError(cuvsResourcesDestroy(resources[i]), "cuvsResourcesDestroy") so
cleanup failures are logged/handled; keep the surrounding loop and existing
calls to checkCudaError(cudaEventDestroy(events[i]), "cudaEventDestroy") and
checkCudaError(cudaStreamDestroy(streams[i]), "cudaStreamDestroy") unchanged and
reference the close(), cuvsResourcesDestroy, checkCuVSError, events, resources,
streams, and size symbols to locate the change.
- Around line 59-85: The constructor CudaStreamPool currently leaks native
handles if a create call fails mid-loop; modify the CudaStreamPool(int size)
constructor to perform rollback cleanup on failure by tracking the current index
and, if any checkCudaError/checkCuVSError throws, iterating over
already-initialized entries in resources[], streams[], and events[] to call the
corresponding destroy functions (cudaStreamDestroy for streams[],
cudaEventDestroy for events[], and the cuvs resources destroy routine for
resources[]) before rethrowing the exception; implement this by wrapping the
allocation loop in try/catch (or try/finally with a success flag) and invoking
the same cleanup logic as close() for indices < currentIndex so no native
handles are leaked if construction aborts.

In
`@java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CuVSResourcesImpl.java`:
- Around line 31-33: The CudaStreamPool is currently allocated in the field
initializer (streamPool) which can leak native resources if
CuVSResourcesImpl(Path) throws before construction completes; move creation of
the CudaStreamPool from the field initializer into the CuVSResourcesImpl(Path)
constructor (use Integer.getInteger(CudaStreamPool.SIZE_PROPERTY,
CudaStreamPool.DEFAULT_SIZE) to determine size), assign it to the streamPool
field there, and in the constructor failure path ensure you call
streamPool.close() (or otherwise tear it down) before rethrowing so native
resources are not leaked; keep the streamPool field declaration but initialize
it only in the constructor and ensure close() is reachable on exceptions.

In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/MultiSegmentCagraSearch.java`:
- Around line 119-145: The code must enforce the single-query contract or size
outputs from the actual query row count: in MultiSegmentCagraSearch, after
obtaining var queryVectors = (CuVSMatrixInternal)
queries.get(i).getQueryVectors(), read its row count (e.g.,
queryVectors.getRowCount()/numRows()/size(0) — use the actual accessor on
CuVSMatrixInternal) into int nq; if nq > 1 throw an IllegalArgumentException
rejecting multi-row queries, or alternatively set segShape = new long[] {nq, k}
and compute neighbor/distance byte offsets and tensor sizes using nq*k (update
nByteOffset/dByteOffset and prepareTensor calls for
neighborsArray/distancesArray accordingly) so prepareTensor and the
globalNeighborsDP/globalDistancesDP slices match the query row count.
- Around line 96-111: The code in MultiSegmentCagraSearch currently uses
CagraSearchParams built from queries.get(0) (via
CuVSParamsHelper.buildCagraSearchParams) and thus drops per-segment settings
from subsequent CagraQuery entries; fix by either validating that all CagraQuery
instances in queries have identical search params and no per-segment filters
(throw IllegalArgumentException from MultiSegmentCagraSearch if any CagraQuery
differs from queries.get(0)), or plumb per-segment parameters through the native
call: extend the native wrapper (the cuvsCagraSearchMultiSegment binding) and
CuVSParamsHelper to accept/allocate an array of CagraSearchParams (build one
MemorySegment per CagraQuery) and pass that array/handle when invoking the
multi-segment search so each segment’s filters/params are applied.

---

Outside diff comments:
In `@java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraSearchParams.java`:
- Around line 293-322: The toString() in class CagraSearchParams currently omits
the new fields persistent, persistentLifetime, and persistentDeviceUsage; update
the CagraSearchParams.toString() method to append these three fields (with their
names and values) into the returned string (same formatting style as the other
fields) so debugging output shows the full configuration.

---

Nitpick comments:
In `@cpp/src/neighbors/detail/cagra/cagra_search.cuh`:
- Around line 388-417: Hoist the allocation and computation of query_norms out
of the segment loop and reuse it for every segment whose indices[i]->metric() ==
cuvs::distance::DistanceType::CosineExpanded: allocate query_norms once (using
raft::make_device_vector) before the for-loop, run raft::linalg::reduce on the
shared queries data to compute the norms, then inside the loop call
raft::linalg::matrix_vector_op (as currently done) using the precomputed
query_norms for each CosineExpanded segment; after the loop free or let
query_norms go out of scope. Ensure you still call
cuvs::neighbors::ivf::detail::postprocess_distances for non-CosineExpanded
branches and keep all existing ops (raft::compose_op, raft::sq_op,
raft::div_const_op, raft::cast_op, raft::add_const_op, raft::div_checkzero_op)
unchanged.

In `@java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraSearchParams.java`:
- Around line 527-537: The builder setter withPersistentDeviceUsage currently
assigns persistentDeviceUsage without enforcing the Javadoc constraint; update
Builder.withPersistentDeviceUsage to validate that persistentDeviceUsage > 0.0f
and <= 1.0f and throw an IllegalArgumentException (including the invalid value
in the message) when the check fails so invalid values are rejected early before
reaching native code.

In `@java/cuvs-java/src/test/java/com/nvidia/cuvs/CheckedCuVSResources.java`:
- Around line 59-62: The new delegate method setWorkspacePool(long sizeBytes) is
missing the wrapper's destroyed-state guard; modify
CheckedCuVSResources.setWorkspacePool to call checkNotDestroyed() at the start
(before delegating to inner.setWorkspacePool(sizeBytes)) so it matches the
class's defensive contract and uses the same destroyed-state check as other
methods.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 626e7fa1-95fc-4f40-b30d-d7fbcb6521a6

📥 Commits

Reviewing files that changed from the base of the PR and between f2bffb6 and 49e5a14.

📒 Files selected for processing (40)

c/CMakeLists.txt
c/include/cuvs/core/c_api.h
c/include/cuvs/neighbors/cagra.h
c/include/cuvs/selection/select_k.h
c/src/core/c_api.cpp
c/src/neighbors/cagra.cpp
c/src/selection/select_k.cpp
cpp/include/cuvs/neighbors/cagra.hpp
cpp/src/neighbors/cagra.cuh
cpp/src/neighbors/cagra_search_inst.cu.in
cpp/src/neighbors/detail/cagra/cagra_search.cuh
cpp/src/neighbors/detail/cagra/search_single_cta.cuh
cpp/src/neighbors/detail/cagra/search_single_cta_inst.cuh
cpp/src/neighbors/detail/cagra/search_single_cta_kernel-inl.cuh
cpp/src/neighbors/detail/cagra/search_single_cta_kernel.cuh
java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraIndex.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/CagraSearchParams.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/CuVSAceParams.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/CuVSMatrix.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/CuVSResources.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/HnswAceParams.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/HnswIndex.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/HnswIndexParams.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/MultiSegmentSearchResults.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/SynchronizedCuVSResources.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/spi/CuVSProvider.java
java/cuvs-java/src/main/java/com/nvidia/cuvs/spi/UnsupportedProvider.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/MultiSegmentCagraSearch.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/BufferedCagraSearch.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CagraIndexImpl.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CuVSMatrixBaseImpl.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CuVSMatrixInternal.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CuVSParamsHelper.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CuVSResourcesImpl.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CudaStreamPool.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/HnswIndexImpl.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/SelectKHelper.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/common/LinkerHelper.java
java/cuvs-java/src/main/java22/com/nvidia/cuvs/spi/JDKProvider.java
java/cuvs-java/src/test/java/com/nvidia/cuvs/CheckedCuVSResources.java

coderabbitai · 2026-04-22T23:17:50Z

+/**
+ * @brief Switches the working memory resource to use stream-ordered asynchronous allocation
+ * (cudaMallocAsync / cudaFreeAsync). Unlike the pool resource, this resource returns memory to
+ * the stream immediately without blocking the CPU, eliminating device-wide synchronization on
+ * deallocation. This is especially beneficial when multiple CAGRA searches run concurrently on
+ * separate CUDA streams, because the internal workspace allocations no longer serialize kernel
+ * launches. Be aware that this function will change the memory resource for the whole process
+ * and the new memory resource will be used until explicitly changed.
+ *
+ * @return cuvsError_t
+ */
+cuvsError_t cuvsRMMAsyncMemoryResourceEnable();


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

find . -name "c_api.cpp" -path "*/c/src/core/*" | head -5

Repository: rapidsai/cuvs

Length of output: 79

🏁 Script executed:

cat -n c/src/core/c_api.cpp | sed -n '160,195p'

Repository: rapidsai/cuvs

Length of output: 1665

🏁 Script executed:

cd c/src/core && grep -n "thread_local" c_api.cpp

Repository: rapidsai/cuvs

Length of output: 269

🏁 Script executed:

cd c/include/cuvs/core && grep -n "cuvsRMMAsyncMemoryResourceEnable" c_api.h -A 15

Repository: rapidsai/cuvs

Length of output: 580

🏁 Script executed:

# Search for RMM set_current_device_resource usage patterns in the codebase rg "set_current_device_resource" --type cpp -B 2 -A 2

Repository: rapidsai/cuvs

Length of output: 2101

🏁 Script executed:

# Check if there are any docs changes or migration guides in the PR find . -name "*.md" -o -name "*.rst" | xargs grep -l "RMM\|AsyncMemory\|cuvsRMM" 2>/dev/null | head -10

Repository: rapidsai/cuvs

Length of output: 200

🏁 Script executed:

# Check for any deprecation warnings or migration documentation find . -type f \( -name "*.h" -o -name "*.md" \) -path "*/docs/*" | xargs grep -l "cuvsRMM\|memory.*resource" 2>/dev/null | head -5

Repository: rapidsai/cuvs

Length of output: 89

🏁 Script executed:

# Check if there are any API change notes or migration guides in the docs cat docs/source/api_basics.rst | head -100

Repository: rapidsai/cuvs

Length of output: 2518

🏁 Script executed:

# Check CHANGELOG for any RMM API changes mentioned rg "Async|AsyncMemory|cuvsRMMAsync" CHANGELOG.md -B 2 -A 2

Repository: rapidsai/cuvs

Length of output: 39

🏁 Script executed:

# Check if reset_current_device_resource clears the async_mr cat -n c/src/core/c_api.cpp | sed -n '192,210p'

Repository: rapidsai/cuvs

Length of output: 784

🏁 Script executed:

# Check how pool memory resource is handled - it passes rvalues # Let's see if there's any documentation on RMM's set_current_device_resource behavior rg "set_current_device_resource" --type h | head -20

Repository: rapidsai/cuvs

Length of output: 331

🏁 Script executed:

# Look for RMM in vendored code or requirements to understand its semantics find . -name "*.txt" -o -name "*.cmake" | xargs grep -i "rmm" | head -10

Repository: rapidsai/cuvs

Length of output: 779

🏁 Script executed:

# Check if there's any documentation update about async memory resource being process-wide rg "cuvsRMMAsyncMemoryResourceEnable" docs/ 2>/dev/null || echo "No docs match found"

Repository: rapidsai/cuvs

Length of output: 76

🏁 Script executed:

# Let's check how the pool resource implementation differs from async # The pool passes a rvalue (temporary), while async passes a pointer to thread_local # Search for any notes on RMM memory resource ownership in comments rg "ownership|lifetime|thread" c/src/core/c_api.cpp | head -20

Repository: rapidsai/cuvs

Length of output: 275

🏁 Script executed:

# Check if cuvsRMMPoolMemoryResourceEnable has similar issues by checking if it's thread_local or static rg "pool_mr" c/src/core/c_api.cpp -B 5 -A 5

Repository: rapidsai/cuvs

Length of output: 665

🏁 Script executed:

# Let's check the RMM version used and look for any behavior documentation rg "rmm" vcpkg.json conanfile.txt requirements.txt 2>/dev/null | head -10

Repository: rapidsai/cuvs

Length of output: 39

🏁 Script executed:

# Check if there are any thread safety comments or warnings in the C API header cat -n c/include/cuvs/core/c_api.h | sed -n '220,250p'

Repository: rapidsai/cuvs

Length of output: 1728

The async-memory resource owner cannot be thread_local when this API changes the current resource globally.

The implementation at c/src/core/c_api.cpp:188 stores cuda_async_memory_resource in thread_local async_mr and passes it to rmm::mr::set_current_device_resource(), but the documentation explicitly states this function "will change the memory resource for the whole process" (line 235). This creates a critical lifetime mismatch:

If set_current_device_resource() is device-scoped (affecting all threads), then when the enabling thread exits, its thread_local async_mr is destroyed while still registered as the current resource, leaving RMM with a dangling pointer.

If thread-local semantics were intended, the documentation must be updated to reflect that this only affects the calling thread.

The pool resource avoids this issue by passing temporary rvalues to set_current_device_resource(), allowing RMM to manage the lifetime. Either make async_mr process/device-scoped (not thread_local), or narrow the documentation and implementation to clarify thread-local semantics.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/include/cuvs/core/c_api.h` around lines 229 - 240, The current implementation of cuvsRMMAsyncMemoryResourceEnable stores cuda_async_memory_resource in a thread_local async_mr and passes it to rmm::mr::set_current_device_resource(), creating a lifetime mismatch vs the API doc that says the change is global; change the implementation so the async memory resource is process-scoped (not thread_local) by replacing thread_local async_mr with a static/process-global instance (e.g., a static unique_ptr or static object) so it outlives the thread and remains valid for rmm::mr::set_current_device_resource(), and ensure cuvsRMMAsyncMemoryResourceEnable and any cleanup use that same process-global symbol (cuda_async_memory_resource / async_mr) when setting or resetting the current device resource; alternatively, if thread-local semantics are intended, update the cuvsRMMAsyncMemoryResourceEnable documentation to state it only affects the calling thread and keep async_mr thread_local.

coderabbitai · 2026-04-22T23:17:50Z

+    // Only float32 is supported for multi-segment search.
+    RAFT_EXPECTS(
+      indices[0]->dtype.code == kDLFloat && indices[0]->dtype.bits == 32,
+      "Multi-segment search only supports float32 indices");
+
+    using T        = float;
+    using IdxT     = uint32_t;
+    using OutIdxT  = uint32_t;
+    using DistanceT = float;
+    using IndexT    = cuvs::neighbors::cagra::index<T, IdxT>;
+
+    std::vector<const IndexT*> idx_vec(num_segments);
+    std::vector<raft::device_matrix_view<const T, int64_t, raft::row_major>> q_vec(num_segments);
+    std::vector<raft::device_matrix_view<OutIdxT, int64_t, raft::row_major>> n_vec(num_segments);
+    std::vector<raft::device_matrix_view<DistanceT, int64_t, raft::row_major>> d_vec(num_segments);
+
+    for (uint32_t i = 0; i < num_segments; i++) {
+      RAFT_EXPECTS(indices[i] != nullptr && indices[i]->addr != 0,
+                   "Index at position %u is null or not built", i);
+      idx_vec[i] = reinterpret_cast<const IndexT*>(indices[i]->addr);
+      q_vec[i]   = cuvs::core::from_dlpack<std::remove_reference_t<decltype(q_vec[i])>>(queries[i]);
+      n_vec[i]   = cuvs::core::from_dlpack<std::remove_reference_t<decltype(n_vec[i])>>(neighbors[i]);
+      d_vec[i]   = cuvs::core::from_dlpack<std::remove_reference_t<decltype(d_vec[i])>>(distances[i]);
+    }


⚠️ Potential issue | 🔴 Critical

Validate every segment before the reinterpret_cast.

Only indices[0] is checked for float32. The loop then blindly casts every indices[i]->addr to index<float, uint32_t>* and builds float32 device views from queries[i], neighbors[i], and distances[i]. If any later segment is half/int8/uint8 or any tensor is not device-backed, this path will feed the kernel mismatched types and can produce garbage results or GPU faults. Please mirror the per-input dtype/device/null checks from cuvsCagraSearch for each segment before from_dlpack.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/cagra.cpp` around lines 710 - 733, The loop currently casts every indices[i]->addr and builds device views without validating each segment's types/devices; update the loop that fills idx_vec, q_vec, n_vec, d_vec to perform the same per-segment checks used in cuvsCagraSearch: assert indices[i] != nullptr and addr != 0 (already present), then verify indices[i]->dtype.code == kDLFloat && indices[i]->dtype.bits == 32 for every i, and validate that queries[i], neighbors[i], and distances[i] are device-backed DLPack tensors with the expected element types (float for queries/distances, uint32 for neighbors) before calling reinterpret_cast<const IndexT*>(indices[i]->addr) and cuvs::core::from_dlpack to populate q_vec[i], n_vec[i], d_vec[i]; replace the blind casts with RAFT_EXPECTS that include i in the error messages so a bad segment fails fast and clearly.

coderabbitai · 2026-04-22T23:17:50Z

+    for (uint32_t i = 0; i < num_segments; i++) {
+      RAFT_EXPECTS(indices[i] != nullptr && indices[i]->addr != 0,
+                   "Index at position %u is null or not built", i);
+      idx_vec[i] = reinterpret_cast<const IndexT*>(indices[i]->addr);
+      q_vec[i]   = cuvs::core::from_dlpack<std::remove_reference_t<decltype(q_vec[i])>>(queries[i]);
+      n_vec[i]   = cuvs::core::from_dlpack<std::remove_reference_t<decltype(n_vec[i])>>(neighbors[i]);
+      d_vec[i]   = cuvs::core::from_dlpack<std::remove_reference_t<decltype(d_vec[i])>>(distances[i]);
+    }
+
+    cuvs::neighbors::cagra::search_multi_segment(
+      *res_ptr, search_params, idx_vec, q_vec, n_vec, d_vec);


⚠️ Potential issue | 🟠 Major

Reject mixed-distance metrics across segments.

This API combines raw distances from all segments into one global ranking, so all indices must use the same metric. Right now nothing checks that indices[i] matches indices[0] on metric, which means mixing L2/IP/Cosine segments will return nonsensical top-k ordering even though the header says the distances are comparable across segments.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/neighbors/cagra.cpp` around lines 726 - 736, The loop that builds idx_vec/q_vec/n_vec/d_vec must also validate that all segment indices use the same distance metric as the first segment to prevent mixing incompatible metrics; inside the for-loop in cagra.cpp (after the existing RAFT_EXPECTS that checks indices[i] non-null) compare indices[i]->metric (or the actual metric field name on your IndexT struct/class) to indices[0]->metric and fail fast with RAFT_EXPECTS (or equivalent) and a clear message like "Mixed distance metrics across segments: expected %s but got %s at segment %u"; keep this check before pushing idx_vec[i] and before calling cuvs::neighbors::cagra::search_multi_segment so the function only runs when all segments share the same metric.

coderabbitai · 2026-04-22T23:17:50Z

+extern "C" cuvsError_t cuvsSelectK(cuvsResources_t res,
+                                   DLManagedTensor* in_val,
+                                   DLManagedTensor* out_val,
+                                   DLManagedTensor* out_idx)
+{
+  return cuvs::core::translate_exceptions([=] {
+    auto* res_ptr = reinterpret_cast<raft::resources*>(res);
+
+    int64_t n = in_val->dl_tensor.shape[1];
+    int64_t k = out_val->dl_tensor.shape[1];
+
+    auto in_view = raft::make_device_matrix_view<const float, int64_t, raft::row_major>(
+      static_cast<const float*>(in_val->dl_tensor.data), 1, n);
+
+    auto out_val_view = raft::make_device_matrix_view<float, int64_t, raft::row_major>(
+      static_cast<float*>(out_val->dl_tensor.data), 1, k);
+
+    auto out_idx_view = raft::make_device_matrix_view<int64_t, int64_t, raft::row_major>(
+      static_cast<int64_t*>(out_idx->dl_tensor.data), 1, k);
+
+    cuvs::selection::select_k(
+      *res_ptr,
+      in_view,
+      std::nullopt,  // implicit positions [0, n) as in_idx
+      out_val_view,
+      out_idx_view,
+      true);  // select_min = true (smallest distance = nearest neighbor)


⚠️ Potential issue | 🔴 Critical

Validate the DLPack contract before dereferencing shape[1] and casting buffers.

cuvsSelectK currently assumes non-null 2D CUDA tensors with float32 / float32 / int64 dtypes, zero offset, and contiguous row-major layout. Without checking ndim, shape, device, dtype, byte_offset, and stride compatibility, malformed callers can crash here or feed corrupted views into select_k.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@c/src/selection/select_k.cpp` around lines 14 - 40, cuvsSelectK dereferences shape[1] and casts buffers without validating the DLPack tensors; add explicit validation at the top of cuvsSelectK for in_val, out_val, out_idx (non-null), then check each DLManagedTensor->dl_tensor for expected ndim (==2), shapes (rows match expected 1 or compatible), dtype (in_val/out_val float32, out_idx int64), device type (CUDA) and device id, byte_offset == 0, and contiguous row-major strides/compatibility before creating device views with raft::make_device_matrix_view; if any check fails return an appropriate cuvsError_t (or throw inside translate_exceptions) instead of proceeding to casts and calling cuvs::selection::select_k so malformed callers cannot crash or corrupt memory.

coderabbitai · 2026-04-22T23:17:50Z

+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<float, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const float, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<uint32_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);
+
+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<float, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const float, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<int64_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);
+
+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<half, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const half, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<uint32_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);
+
+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<half, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const half, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<int64_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);
+
+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<int8_t, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const int8_t, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<uint32_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);
+
+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<int8_t, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const int8_t, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<int64_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);
+
+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<uint8_t, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const uint8_t, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<uint32_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);
+
+void search_multi_segment(
+  raft::resources const& res,
+  cuvs::neighbors::cagra::search_params const& params,
+  const std::vector<const cuvs::neighbors::cagra::index<uint8_t, uint32_t>*>& indices,
+  const std::vector<raft::device_matrix_view<const uint8_t, int64_t, raft::row_major>>& queries,
+  const std::vector<raft::device_matrix_view<int64_t, int64_t, raft::row_major>>& neighbors,
+  const std::vector<raft::device_matrix_view<float, int64_t, raft::row_major>>& distances);


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "Declaration in cpp/src/neighbors/detail/cagra/cagra_search.cuh" rg -n -A8 -B2 'template <typename T, typename OutputIdxT = uint32_t, typename IdxT = uint32_t' \ cpp/src/neighbors/detail/cagra/cagra_search.cuh echo echo "Explicit instantiation site in cpp/src/neighbors/cagra_search_inst.cu.in" rg -n -A8 -B2 'CUVS_INST_CAGRA_SEARCH_MULTI_SEGMENT|search_multi_segment<' \ cpp/src/neighbors/cagra_search_inst.cu.in

Repository: rapidsai/cuvs

Length of output: 2317

🏁 Script executed:

# Check for static_assert on IdxT in the implementation rg -n "static_assert.*IdxT" cpp/src/neighbors/detail/cagra/cagra_search.cuh echo "---" # Read the header to see all overloads and their documentation cat -n cpp/include/cuvs/neighbors/cagra.hpp | sed -n '1710,1810p'

Repository: rapidsai/cuvs

Length of output: 6679

Fix template parameter order mismatch and add missing documentation.

The instantiation macro calls the template with parameter order <T, IdxT, OutputIdxT>, but the template declares them as <T, OutputIdxT, IdxT>. When the macro instantiates with (data_t, uint32_t, int64_t), it produces <data_t, uint32_t, int64_t>, incorrectly mapping IdxT to int64_t. This violates the static_assert at cagra_search.cuh:276 (IdxT == uint32_t) and will fail at link time for the int64_t output overloads.

Additionally, overloads 2–8 (lines 1752–1806) lack Doxygen documentation. Per the coding guidelines, all public functions must be documented. Add a @copydoc search_multi_segment directive before each additional overload, or provide explicit documentation blocks.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@cpp/include/cuvs/neighbors/cagra.hpp` around lines 1744 - 1806, The header declares the template parameter order as <T, OutputIdxT, IdxT> but the instantiation macro (used with <T, IdxT, OutputIdxT> e.g. (data_t, uint32_t, int64_t)) expects <T, IdxT, OutputIdxT>, causing the IdxT/OutputIdxT swap that trips the static_assert in cagra_search.cuh:276 and breaks link-time overloads for int64_t; fix by changing the template parameter order in the search_multi_segment declarations to <T, IdxT, OutputIdxT> (or alternatively update the instantiation macro to match the declared order) so types map correctly, and add missing Doxygen for overloads 2–8 by inserting a `@copydoc` search_multi_segment (or equivalent documentation block) above each of those overloaded search_multi_segment declarations to satisfy the public API docs requirement.

coderabbitai · 2026-04-22T23:17:51Z

+  public int nextSlot(int count) {
+    int start = slotCounter;
+    slotCounter += count;
+    return start;
+  }


⚠️ Potential issue | 🔴 Critical

Make nextSlot atomic and modulo-bounded.

At Line 116–118, slotCounter updates are racy and the returned value is not wrapped despite the method contract; this can cause slot collisions and eventual overflow issues.

🔧 Proposed fix

import java.lang.foreign.Arena; import java.lang.foreign.MemorySegment; +import java.util.concurrent.atomic.AtomicInteger; @@ - private int slotCounter; + private final AtomicInteger slotCounter = new AtomicInteger(0); @@ public int nextSlot(int count) { - int start = slotCounter; - slotCounter += count; - return start; + if (count <= 0 || count > size) { + throw new IllegalArgumentException("count must be in [1, size]"); + } + return Math.floorMod(slotCounter.getAndAdd(count), size); }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

public int nextSlot(int count) {

int start = slotCounter;

slotCounter += count;

return start;

}

public int nextSlot(int count) {

if (count <= 0 || count > size) {

throw new IllegalArgumentException("count must be in [1, size]");

}

return Math.floorMod(slotCounter.getAndAdd(count), size);

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CudaStreamPool.java` around lines 115 - 119, Make nextSlot atomic and wrap the returned slot index by changing slotCounter to an AtomicInteger and using an atomic get-and-add plus modulo; specifically, replace the non-atomic increment in nextSlot with something like int start = slotCounter.getAndAdd(count); then return Math.floorMod(start, poolSize) (or equivalent using your pool size field) so the operation is thread-safe and the returned slot is bounded by the pool size.

coderabbitai · 2026-04-22T23:17:51Z

+  @Override
+  public void close() {
+    for (int i = 0; i < size; i++) {
+      checkCudaError(cudaEventDestroy(events[i]), "cudaEventDestroy");
+      cuvsResourcesDestroy(resources[i]);
+      checkCudaError(cudaStreamDestroy(streams[i]), "cudaStreamDestroy");
+    }


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify all cuvsResourcesDestroy call sites are validated. rg -n -C2 '\bcuvsResourcesDestroy\s*\(' --type=java

Repository: rapidsai/cuvs

Length of output: 1298

Wrap cuvsResourcesDestroy with error validation during cleanup.

Line 129 calls cuvsResourcesDestroy without checkCuVSError, allowing cleanup failures to be silently ignored. Other call sites in the codebase (e.g., CuVSResourcesImpl.java) properly validate the return code. Apply the same pattern here for consistency:

Proposed fix

- cuvsResourcesDestroy(resources[i]); + checkCuVSError(cuvsResourcesDestroy(resources[i]), "cuvsResourcesDestroy");

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

@Override

public void close() {

for (int i = 0; i < size; i++) {

checkCudaError(cudaEventDestroy(events[i]), "cudaEventDestroy");

cuvsResourcesDestroy(resources[i]);

checkCudaError(cudaStreamDestroy(streams[i]), "cudaStreamDestroy");

}

`@Override`

public void close() {

for (int i = 0; i < size; i++) {

checkCudaError(cudaEventDestroy(events[i]), "cudaEventDestroy");

checkCuVSError(cuvsResourcesDestroy(resources[i]), "cuvsResourcesDestroy");

checkCudaError(cudaStreamDestroy(streams[i]), "cudaStreamDestroy");

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CudaStreamPool.java` around lines 125 - 131, In CudaStreamPool.close(), wrap the cuvsResourcesDestroy call with the same error validation used elsewhere by replacing the raw cuvsResourcesDestroy(resources[i]) invocation with a call to checkCuVSError(cuvsResourcesDestroy(resources[i]), "cuvsResourcesDestroy") so cleanup failures are logged/handled; keep the surrounding loop and existing calls to checkCudaError(cudaEventDestroy(events[i]), "cudaEventDestroy") and checkCudaError(cudaStreamDestroy(streams[i]), "cudaStreamDestroy") unchanged and reference the close(), cuvsResourcesDestroy, checkCuVSError, events, resources, streams, and size symbols to locate the change.

coderabbitai · 2026-04-22T23:17:51Z

+  private final CudaStreamPool streamPool =
+      new CudaStreamPool(
+          Integer.getInteger(CudaStreamPool.SIZE_PROPERTY, CudaStreamPool.DEFAULT_SIZE));


⚠️ Potential issue | 🟠 Major

Construct the stream pool inside the constructor.

Allocating CudaStreamPool in a field initializer leaks native resources if CuVSResourcesImpl(Path) throws later during cuvsResourcesCreate/cuvsDeviceIdGet, because the object never finishes construction and close() is never reachable. Build the pool inside the constructor and tear it down in the constructor's failure path.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/internal/CuVSResourcesImpl.java` around lines 31 - 33, The CudaStreamPool is currently allocated in the field initializer (streamPool) which can leak native resources if CuVSResourcesImpl(Path) throws before construction completes; move creation of the CudaStreamPool from the field initializer into the CuVSResourcesImpl(Path) constructor (use Integer.getInteger(CudaStreamPool.SIZE_PROPERTY, CudaStreamPool.DEFAULT_SIZE) to determine size), assign it to the streamPool field there, and in the constructor failure path ensure you call streamPool.close() (or otherwise tear it down) before rethrowing so native resources are not leaked; keep the streamPool field declaration but initialize it only in the constructor and ensure close() is reachable on exceptions.

coderabbitai · 2026-04-22T23:17:51Z

+    CagraSearchParams searchParameters = queries.get(0).getCagraSearchParameters();
+
+    try (var resourcesAccessor = resources.access()) {
+      long cuvsRes = resourcesAccessor.handle();
+      var cuvsStream = getStream(cuvsRes);
+
+      try (var globalNeighborsDP = allocateRMMSegment(cuvsRes, neighborsBytes);
+          var globalDistancesDP = allocateRMMSegment(cuvsRes, distancesBytes);
+          var outIdxDP = allocateRMMSegment(cuvsRes, outIdxBytes);
+          var outValDP = allocateRMMSegment(cuvsRes, outValBytes)) {
+
+        // --- Phase 1: call cuvsCagraSearchMultiSegment ---
+        // Single kernel launch covers all segments; results land in globalNeighborsDP /
+        // globalDistancesDP on the same CUDA stream, so SelectK below sees them via ordering.
+        try (var arena = Arena.ofConfined()) {
+          MemorySegment sp = CuVSParamsHelper.buildCagraSearchParams(arena, searchParameters);


⚠️ Potential issue | 🟠 Major

Per-segment query settings are being dropped.

This path always builds search params from queries.get(0), and the native multi-segment API you're calling has no filter argument, so any search params or prefilter carried by queries[1..] are silently ignored. Either validate that every CagraQuery is equivalent and unfiltered, or plumb per-segment filtering/params through the native API before exposing List<CagraQuery> here.

Also applies to: 147-156

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/MultiSegmentCagraSearch.java` around lines 96 - 111, The code in MultiSegmentCagraSearch currently uses CagraSearchParams built from queries.get(0) (via CuVSParamsHelper.buildCagraSearchParams) and thus drops per-segment settings from subsequent CagraQuery entries; fix by either validating that all CagraQuery instances in queries have identical search params and no per-segment filters (throw IllegalArgumentException from MultiSegmentCagraSearch if any CagraQuery differs from queries.get(0)), or plumb per-segment parameters through the native call: extend the native wrapper (the cuvsCagraSearchMultiSegment binding) and CuVSParamsHelper to accept/allocate an array of CagraSearchParams (build one MemorySegment per CagraQuery) and pass that array/handle when invoking the multi-segment search so each segment’s filters/params are applied.

coderabbitai · 2026-04-22T23:17:51Z

+          long[] segShape = {1, k};
+          for (int i = 0; i < numSegments; i++) {
+            // Index handle
+            indexArray.setAtIndex(ValueLayout.ADDRESS, i, buffered[i].getIndexHandle());
+
+            // Query DLTensor
+            var queryVectors = (CuVSMatrixInternal) queries.get(i).getQueryVectors();
+            queriesArray.setAtIndex(ValueLayout.ADDRESS, i, queryVectors.toTensor(arena));
+
+            // Neighbors DLTensor — slice of global buffer
+            long nByteOffset = (long) i * k * Integer.BYTES;
+            MemorySegment nSlice =
+                MemorySegment.ofAddress(globalNeighborsDP.handle().address() + nByteOffset);
+            neighborsArray.setAtIndex(
+                ValueLayout.ADDRESS,
+                i,
+                prepareTensor(arena, nSlice, segShape, kDLUInt(), 32, kDLCUDA()));
+
+            // Distances DLTensor — slice of global buffer
+            long dByteOffset = (long) i * k * Float.BYTES;
+            MemorySegment dSlice =
+                MemorySegment.ofAddress(globalDistancesDP.handle().address() + dByteOffset);
+            distancesArray.setAtIndex(
+                ValueLayout.ADDRESS,
+                i,
+                prepareTensor(arena, dSlice, segShape, kDLFloat(), 32, kDLCUDA()));
+          }


⚠️ Potential issue | 🔴 Critical

Enforce the single-query contract before sizing output buffers.

Each segment gets a [1, k] neighbors/distances tensor, but the input query tensor comes straight from queries.get(i).getQueryVectors(). If any caller passes nq > 1, the native search will try to write nq * k results into storage sized for only k, which is an out-of-bounds device write. Please reject multi-row query tensors here or size the output slices from the actual query row count.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@java/cuvs-java/src/main/java22/com/nvidia/cuvs/MultiSegmentCagraSearch.java` around lines 119 - 145, The code must enforce the single-query contract or size outputs from the actual query row count: in MultiSegmentCagraSearch, after obtaining var queryVectors = (CuVSMatrixInternal) queries.get(i).getQueryVectors(), read its row count (e.g., queryVectors.getRowCount()/numRows()/size(0) — use the actual accessor on CuVSMatrixInternal) into int nq; if nq > 1 throw an IllegalArgumentException rejecting multi-row queries, or alternatively set segShape = new long[] {nq, k} and compute neighbor/distance byte offsets and tensor sizes using nq*k (update nByteOffset/dByteOffset and prepareTensor calls for neighborsArray/distancesArray accordingly) so prepareTensor and the globalNeighborsDP/globalDistancesDP slices match the query row count.

jamxia155 and others added 8 commits April 22, 2026 12:58

Default initialize CAGRA search parameters

82dcf71

jamxia155 requested review from a team as code owners April 22, 2026 23:02

github-project-automation Bot added this to Unstructured Data Processing Apr 22, 2026

jamxia155 marked this pull request as draft April 22, 2026 23:15

coderabbitai Bot reviewed Apr 22, 2026

View reviewed changes

jamxia155 mentioned this pull request Apr 22, 2026

Multi segment cagra search rapidsai/cuvs-lucene#133

Draft

jamxia155 added 2 commits April 22, 2026 17:07

Fix errors after merge

29751c7

Remove dead code

2fffcb8

Conversation

jamxia155 commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

copy-pr-bot Bot commented Apr 22, 2026

Uh oh!

coderabbitai Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Apr 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jamxia155 commented Apr 22, 2026 •

edited

Loading

coderabbitai Bot commented Apr 22, 2026 •

edited

Loading