Cuvs-Lucene-139: Fix GPU OOMs && Java Heap OOMs#141
Conversation
…onversion from CAGRA to HNSW
without loading the full set of data on the Java Heap, but instead allows us to stream the set of data to the Java Heap.
| int size, | ||
| int dimensions, | ||
| CuVSMatrix adjacencyListMatrix, | ||
| CuVSMatrix vectorDataset, |
There was a problem hiding this comment.
We have createMultiLayerHnswGraph method just above this one, mainly differing in this parameter. Is there any strong reasoning behind having a copy of the method almost the same code? Can we not just modify the above to mainly change vectors param to vectorDataset?
| builder.addVector(mergedVectors.vectorValue(it.index())); | ||
| } | ||
| CuVSHostMatrix dataset = builder.build(); | ||
| writeFieldInternal(fieldInfo, dataset, size); |
There was a problem hiding this comment.
We can derive size from dataset so is there a strong reasoning behind passing size as a separate primitive int parameter? Will there a situation where the size parameter's value will be different from dataset.size()?
| * @param size number of vectors in the dataset | ||
| * @throws IOException | ||
| */ | ||
| private void writeFieldInternal(FieldInfo fieldInfo, CuVSHostMatrix dataset, int size) |
There was a problem hiding this comment.
Similar observation with the duplicated writeFieldInternal method and almost the same code as above, with mainly one parameter change from vectors to dataset.
| CuVSMatrix.DataType.FLOAT); | ||
|
|
||
| // Add vectors one by one - builder copies directly to device memory | ||
| CuVSMatrix.hostBuilder( // was: CuVSMatrix.deviceBuilder(resources, ... |
There was a problem hiding this comment.
This being part of Utils, maybe we can think of allowing the caller of this method to choose between host or device in case it is needed anywhere in the future?
GPU OOM fix:
Replaced usage of CuVSMatrix.deviceBuilder with usage of CuVSMatrix.hostBuilder instead. deviceBuilder was eagerly loading the full dataset to GPU. using hostBuilder gives responsibility to CAGRA as to how to optimally stream data from host memory to GPU.
Java Heap OOM fix:
Stream subsets of data during HNSW graph construction instead of trying to load the full dataset onto the Java Heap.
Note: this PR also includes the commit from cuvs-lucene-137 (#137)