Skip to content

Commit 7dd7bc1

Browse files
authored
Implement CUDA multipass for knn > GPU_MAX_SELECTION_K (#7381)
The KNN search on GPU breaks silently when the k value is larger than the macro GPU_MAX_SELECTION_K, resulting in a trash output (all 0s, large indices > number of total points, or even negative indices). The macro GPU_MAX_SELECTION_K is 2048 if CUDA_VERSION > 9000, otherwise it is 1024. On CPU, the KNN search obviously has no such limits. To improve the GPU KNN search without altering the macro GPU_MAX_SELECTION_K, a multipass algorithm should be implemented, splitting the KNN search into batches where each batch size is < GPU_MAX_SELECTION_K.
1 parent 55a51af commit 7dd7bc1

4 files changed

Lines changed: 433 additions & 49 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -69,6 +69,7 @@
6969
- Fix advanced indexing bug with sliced boolean masks on CUDA devices (PR #7340)
7070
- Fix logic for adding -allow-unsupported-compiler to nvcc (PR #7337)
7171
- Fix linker error "library limit of 65535 objects exceeded" with Ninja generator on MSVC (PR #7335)
72+
- Implement CUDA multipass for KNN > `GPU_MAX_SELECTION_K` (PR #7381)
7273
- Download tarballs instead of Git repos for "3rdparty/uvatlas" (PR #7371)
7374
- macOS x86_64 not longer supported, only macOS arm64 is supported.
7475
- Python 3.13+3.14 support

0 commit comments

Comments
 (0)