Farthest Point Sampling by mnabian · Pull Request #1696 · NVIDIA/physicsnemo

mnabian · 2026-06-05T01:02:44Z

PhysicsNeMo Pull Request

Description

Adds farthest-point sampling (FPS) — a core point-cloud primitive the library was missing — as a FunctionSpec functional under nn/functional/geometry/, with a pure-PyTorch baseline and a fused, Warp-accelerated CUDA backend.

case	torch (ms)	warp (ms)	speedup	match
small-p1024-d3-k128	5.384	0.221	24.31×	ok
medium-p4096-d3-k512	21.491	1.771	12.13×	ok
large-p16384-d3-k1024	43.121	15.273	2.82×	ok
batched-b4-p4096-d3-k512	21.407	1.772	12.08×	ok

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.
The CHANGELOG.md is up to date with these changes.
An issue is linked to this pull request.
If I am implementing a new model or modifying any existing model, I have followed the Models Implementation Coding Standards.

Dependencies

Review Process

All PRs are reviewed by the PhysicsNeMo team before merging.

Depending on which files are changed, GitHub may automatically assign a maintainer for review.

We are also testing AI-based code review tools (e.g., Greptile), which may add automated comments with a confidence score.
This score reflects the AI’s assessment of merge readiness and is not a qualitative judgment of your work, nor is
it an indication that the PR will be accepted / rejected.

AI-generated feedback should be reviewed critically for usefulness.
You are not required to respond to every AI comment, but they are intended to help both authors and reviewers.
Please react to Greptile comments with 👍 or 👎 to provide feedback on their accuracy.

copy-pr-bot · 2026-06-05T01:02:47Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

greptile-apps · 2026-06-05T01:11:35Z

Greptile Summary

This PR adds farthest_point_sampling to physicsnemo.nn.functional, implementing greedy FPS for point clouds with a pure-PyTorch CPU fallback and a single-launch fused Warp CUDA kernel. It follows the existing FunctionSpec dispatch pattern established by MeshPoissonDiskSample and MeshToVoxelFraction.

Warp kernel (kernels.py): one block per cloud with strided lane ownership of min_dist avoids cross-lane aliasing; block-wide tile_max/tile_min reductions correctly implement argmax with smallest-index tie-breaking matching torch.argmax. Algorithm is correct for all valid num_samples ∈ [1, N].
Dispatch (farthest_point_sampling.py): auto-selects warp on CUDA, torch on CPU; explicit implementation overrides are validated against availability; torch.library.custom_op + register_fake enable torch.compile compatibility for the warp path.
Test suite (test_farthest_point_sampling.py): good coverage including known-answer collinear/outlier cases, batching, backend parity, opcheck, and fullgraph=True compile verification.

Important Files Changed

Filename	Overview
physicsnemo/nn/functional/geometry/farthest_point_sampling/_warp_impl.py	Warp-accelerated FPS using @torch.library.custom_op for torch.compile compatibility; follows the established pattern of unconditional warp import and wp.init() at module level.
physicsnemo/nn/functional/geometry/farthest_point_sampling/kernels.py	Single-launch fused Warp kernel using tile_max/tile_min for block-wide argmax; algorithm is correct — strided lane ownership avoids cross-lane aliasing on min_dist, and the sentinel `num_points` correctly handles tie-breaking by smallest index.
physicsnemo/nn/functional/geometry/farthest_point_sampling/_torch_impl.py	Pure-PyTorch FPS implementation; correct vectorized algorithm — inf-initialized min_dist with torch.minimum update prevents re-selection; Python for-loop will cause graph breaks under torch.compile (expected, documented).
physicsnemo/nn/functional/geometry/farthest_point_sampling/farthest_point_sampling.py	FunctionSpec dispatch class with warp/torch backends; auto-dispatch correctly routes CUDA tensors to warp and CPU tensors to torch; compare_forward uses sorted set comparison which is weaker than order comparison for a deterministic algorithm.
physicsnemo/nn/functional/geometry/farthest_point_sampling/utils.py	Input validation correctly normalizes 2D/3D inputs, enforces num_samples bounds, and returns was_unbatched flag.
test/nn/functional/geometry/test_farthest_point_sampling.py	Comprehensive test suite with known-answer, parity, batching, opcheck, and compile tests; unconditional top-level import of private _warp_impl module means any warp load failure prevents collection of all tests including pure-torch ones.
benchmarks/physicsnemo/nn/functional/registry.py	Correctly registers FarthestPointSampling in the benchmark FUNCTIONAL_SPECS tuple.
physicsnemo/nn/functional/geometry/farthest_point_sampling/init.py	Clean package init, exports both class and functional form.

Comments Outside Diff (2)

test/nn/functional/geometry/test_farthest_point_sampling.py, line 640-642 (link)

Unconditional top-level import of private _warp_impl module

fps_warp_op is imported unconditionally at module collection time and is only used in test_fps_opcheck. If Warp fails to load for any reason (CUDA toolkit mismatch, compilation error), this import failure prevents pytest from collecting the entire test file — including tests like test_fps_higher_dim and test_fps_error_handling that only exercise the torch backend and don't need warp at all. Since torch.library.opcheck requires the raw registered op, consider moving this import inside test_fps_opcheck so a warp load failure only skips that one test.
physicsnemo/nn/functional/geometry/farthest_point_sampling/farthest_point_sampling.py, line 459-468 (link)

compare_forward sorted-set comparison is weaker than order comparison for a deterministic algorithm

FPS is fully deterministic for tie-free inputs and both backends are documented to traverse in the same order. Sorting before assert_close means this method would silently pass even if the two backends produce identical index sets but in different orders — which would indicate a tie-breaking divergence. Since the benchmark framework relies on compare_forward to validate backend parity, a direct (un-sorted) comparison would be a stricter and more informative check for tie-free inputs. If sorted comparison is intentional for robustness with near-tie float64 inputs, that's worth a comment clarifying the trade-off.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

_{Reviews (1): Last reviewed commit: "add FPS" | Re-trigger Greptile}

mnabian · 2026-06-05T21:09:47Z

/blossom-ci

add FPS

1e37765

mnabian requested a review from loliverhennigh as a code owner June 5, 2026 01:02

mnabian self-assigned this Jun 5, 2026

Merge branch 'main' into fps

9700874

mnabian mentioned this pull request Jun 5, 2026

Add AeroJEPA model + SuperWing tutorial recipe (experimental) #1690

Open

6 tasks

address greptile comments

b83d161

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Farthest Point Sampling#1696

Farthest Point Sampling#1696
mnabian wants to merge 3 commits into
NVIDIA:mainfrom
mnabian:fps

mnabian commented Jun 5, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading

Comments Outside Diff (2)

Uh oh!

mnabian commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mnabian commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PhysicsNeMo Pull Request

Description

Checklist

Dependencies

Review Process

Uh oh!

copy-pr-bot Bot commented Jun 5, 2026

Uh oh!

greptile-apps Bot commented Jun 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Important Files Changed

Comments Outside Diff (2)

Uh oh!

mnabian commented Jun 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mnabian commented Jun 5, 2026 •

edited

Loading

greptile-apps Bot commented Jun 5, 2026 •

edited

Loading