You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adds farthest-point sampling (FPS) — a core point-cloud primitive the library was missing — as a FunctionSpec functional under nn/functional/geometry/, with a pure-PyTorch baseline and a fused, Warp-accelerated CUDA backend.
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.
This PR adds farthest_point_sampling to physicsnemo.nn.functional, implementing greedy FPS for point clouds with a pure-PyTorch CPU fallback and a single-launch fused Warp CUDA kernel. It follows the existing FunctionSpec dispatch pattern established by MeshPoissonDiskSample and MeshToVoxelFraction.
Warp kernel (kernels.py): one block per cloud with strided lane ownership of min_dist avoids cross-lane aliasing; block-wide tile_max/tile_min reductions correctly implement argmax with smallest-index tie-breaking matching torch.argmax. Algorithm is correct for all valid num_samples ∈ [1, N].
Dispatch (farthest_point_sampling.py): auto-selects warp on CUDA, torch on CPU; explicit implementation overrides are validated against availability; torch.library.custom_op + register_fake enable torch.compile compatibility for the warp path.
Test suite (test_farthest_point_sampling.py): good coverage including known-answer collinear/outlier cases, batching, backend parity, opcheck, and fullgraph=True compile verification.
Warp-accelerated FPS using @torch.library.custom_op for torch.compile compatibility; follows the established pattern of unconditional warp import and wp.init() at module level.
Single-launch fused Warp kernel using tile_max/tile_min for block-wide argmax; algorithm is correct — strided lane ownership avoids cross-lane aliasing on min_dist, and the sentinel num_points correctly handles tie-breaking by smallest index.
FunctionSpec dispatch class with warp/torch backends; auto-dispatch correctly routes CUDA tensors to warp and CPU tensors to torch; compare_forward uses sorted set comparison which is weaker than order comparison for a deterministic algorithm.
Comprehensive test suite with known-answer, parity, batching, opcheck, and compile tests; unconditional top-level import of private _warp_impl module means any warp load failure prevents collection of all tests including pure-torch ones.
benchmarks/physicsnemo/nn/functional/registry.py
Correctly registers FarthestPointSampling in the benchmark FUNCTIONAL_SPECS tuple.
Clean package init, exports both class and functional form.
Comments Outside Diff (2)
test/nn/functional/geometry/test_farthest_point_sampling.py, line 640-642 (link)
Unconditional top-level import of private _warp_impl module
fps_warp_op is imported unconditionally at module collection time and is only used in test_fps_opcheck. If Warp fails to load for any reason (CUDA toolkit mismatch, compilation error), this import failure prevents pytest from collecting the entire test file — including tests like test_fps_higher_dim and test_fps_error_handling that only exercise the torch backend and don't need warp at all. Since torch.library.opcheck requires the raw registered op, consider moving this import inside test_fps_opcheck so a warp load failure only skips that one test.
physicsnemo/nn/functional/geometry/farthest_point_sampling/farthest_point_sampling.py, line 459-468 (link)
compare_forward sorted-set comparison is weaker than order comparison for a deterministic algorithm
FPS is fully deterministic for tie-free inputs and both backends are documented to traverse in the same order. Sorting before assert_close means this method would silently pass even if the two backends produce identical index sets but in different orders — which would indicate a tie-breaking divergence. Since the benchmark framework relies on compare_forward to validate backend parity, a direct (un-sorted) comparison would be a stricter and more informative check for tie-free inputs. If sorted comparison is intentional for robustness with near-tie float64 inputs, that's worth a comment clarifying the trade-off.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
PhysicsNeMo Pull Request
Description
Adds farthest-point sampling (FPS) — a core point-cloud primitive the library was missing — as a FunctionSpec functional under nn/functional/geometry/, with a pure-PyTorch baseline and a fused, Warp-accelerated CUDA backend.
Checklist
Dependencies
Review Process
All PRs are reviewed by the PhysicsNeMo team before merging.
Depending on which files are changed, GitHub may automatically assign a maintainer for review.
We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.
AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.