Cuvs-Lucene-139: Fix GPU OOMs && Java Heap OOMs by nvzm123 · Pull Request #141 · rapidsai/cuvs-lucene

nvzm123 · 2026-04-29T04:26:26Z

GPU OOM fix:
Replaced usage of CuVSMatrix.deviceBuilder with usage of CuVSMatrix.hostBuilder instead. deviceBuilder was eagerly loading the full dataset to GPU. using hostBuilder gives responsibility to CAGRA as to how to optimally stream data from host memory to GPU.

Java Heap OOM fix:
Stream subsets of data during HNSW graph construction instead of trying to load the full dataset onto the Java Heap.

Note: this PR also includes the commit from cuvs-lucene-137 (#137)

…onversion from CAGRA to HNSW

without loading the full set of data on the Java Heap, but instead allows us to stream the set of data to the Java Heap.

copy-pr-bot · 2026-04-29T04:26:29Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

nvzm123 · 2026-04-29T04:27:07Z

cc: @narangvivek10 @kshitizgupta21 @singhmanas1 @cjnolet

narangvivek10 · 2026-05-22T16:34:44Z

+      int size,
+      int dimensions,
+      CuVSMatrix adjacencyListMatrix,
+      CuVSMatrix vectorDataset,


We have createMultiLayerHnswGraph method just above this one, mainly differing in this parameter. Is there any strong reasoning behind having a copy of the method almost the same code? Can we not just modify the above to mainly change vectors param to vectorDataset?

narangvivek10 · 2026-05-22T16:42:05Z

+        builder.addVector(mergedVectors.vectorValue(it.index()));
+      }
+      CuVSHostMatrix dataset = builder.build();
+      writeFieldInternal(fieldInfo, dataset, size);


We can derive size from dataset so is there a strong reasoning behind passing size as a separate primitive int parameter? Will there a situation where the size parameter's value will be different from dataset.size()?

narangvivek10 · 2026-05-22T16:49:16Z

+   * @param size      number of vectors in the dataset
+   * @throws IOException
+   */
+  private void writeFieldInternal(FieldInfo fieldInfo, CuVSHostMatrix dataset, int size)


Similar observation with the duplicated writeFieldInternal method and almost the same code as above, with mainly one parameter change from vectors to dataset.

narangvivek10 · 2026-05-22T16:51:57Z

-            CuVSMatrix.DataType.FLOAT);
-
-    // Add vectors one by one - builder copies directly to device memory
+        CuVSMatrix.hostBuilder( // was: CuVSMatrix.deviceBuilder(resources, ...


This being part of Utils, maybe we can think of allowing the caller of this method to choose between host or device in case it is needed anywhere in the future?

EC2 Default User added 2 commits April 28, 2026 21:13

cuvs-lucene-137: increase max-number of HNSW layers in CAGRA_HNSW's c…

2449b4f

…onversion from CAGRA to HNSW

cuvs-lucene__139: This code allows us to construct the HNSW graph on GPU

7c5c448

without loading the full set of data on the Java Heap, but instead allows us to stream the set of data to the Java Heap.

nvzm123 requested a review from a team as a code owner April 29, 2026 04:26

nvzm123 mentioned this pull request Apr 29, 2026

Avoid GPU OOMs and also Java-heap OOMs #139

Open

narangvivek10 assigned nvzm123 Apr 30, 2026

narangvivek10 added improvement Improves an existing functionality non-breaking Introduces a non-breaking change labels Apr 30, 2026

nvzm123 mentioned this pull request May 5, 2026

look into observed weak ivf-pq indexing-time performance #143

Open

narangvivek10 reviewed May 22, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cuvs-Lucene-139: Fix GPU OOMs && Java Heap OOMs#141

Cuvs-Lucene-139: Fix GPU OOMs && Java Heap OOMs#141
nvzm123 wants to merge 2 commits into
rapidsai:mainfrom
nvzm123:cuvslucene-139__zackm

nvzm123 commented Apr 29, 2026

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

nvzm123 commented Apr 29, 2026

Uh oh!

narangvivek10 May 22, 2026

Uh oh!

narangvivek10 May 22, 2026

Uh oh!

narangvivek10 May 22, 2026

Uh oh!

narangvivek10 May 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

nvzm123 commented Apr 29, 2026

Uh oh!

copy-pr-bot Bot commented Apr 29, 2026

Uh oh!

nvzm123 commented Apr 29, 2026

Uh oh!

narangvivek10 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

narangvivek10 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

narangvivek10 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

narangvivek10 May 22, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants