perf(patchworkpp): kill per-patch heap traffic in R-VPF / R-GPF (+14.8% Hz)#100
Merged
Merged
Conversation
…8% Hz) Three changes inside PatchWorkpp::extract_piecewiseground and PatchWorkpp::estimate_plane that together take KITTI seq 00 from 97.5 Hz to 111.9 Hz (median, 24-core i7-12700, 2900 timed frames): * estimate_plane: replace MatrixX3f eigen_ground / centered / centered.adjoint() * centered with a single-pass scalar accumulation of mean and 9 cross-products, then build the 3x3 cov on the stack. No more per-call Eigen heap allocations. * extract_piecewiseground: promote src_wo_verticals and src_tmp to reused instance scratch members. Per-patch malloc pressure on the glibc heap (which was serialising the loop, see issue #96) goes away after the first few patches because vector::clear() retains capacity. * estimateGround main loop: `auto& zone` instead of `auto zone` for the ConcentricZoneModel_[zone_idx] read. Avoids a deep-copy of the full 3-level nested vector per outer iteration; the in-place std::sort is safe because each (zone, ring, sector) patch is read once and the CZM is flushed at the top of every estimateGround call. Also (smaller wins, kept for cleanliness): * JacobiSVD<Matrix3f> -> SelfAdjointEigenSolver::computeDirect on the 3x3 PSD covariance in both cpp/common/src/plane_fit.cpp and the patchworkpp in-place estimate_plane. Closed-form, no Jacobi iterations. singular_values_ is repacked descending so every consumer (linearity_ / planarity_ in common, flatness_thr index (2) in patchwork classic, ground_flatness=minCoeff() and line_variable=sv(0)/sv(1) in patchwork++) keeps the same convention bit-for-bit. * const& on addCloud's 'add' parameter, RevertCandidate loop variables, and the temporal_ground_revert / calc_point_to_plane_d / calc_mean_stdev signatures. Numerical equivalence verified end-to-end on KITTI seq 00: * patchwork (pw protocol): P/R/F1 unchanged to 0.01 (bit-identical). * patchwork++ (pp protocol): F1 96.62 -> 96.63 (delta 0.01, well within the ±0.05 macro budget). Algebraic identity of JacobiSVD vs eigh on 500 real KITTI patch covariances confirmed to FP precision for all derived scalars (linearity_, planarity_, ground_flatness, line_variable, normal_ up to sign). Patchwork classic is unchanged on the perf side (TBB-bound, SVD is sub-µs/patch after parallel_for); the win is concentrated in Patchwork++ where these allocations dominated the profile.
LimHyungTae
added a commit
that referenced
this pull request
May 23, 2026
* chore(release): v1.4.1 Patch bump for the per-patch heap-traffic refactor (#100). pypatchworkpp.patchworkpp gains 14.8% Hz (97.5 -> 111.9 Hz on KITTI seq 00, i7-12700), driven by killing short-lived vector<PointXYZ> / Eigen::Matrix allocations in R-VPF + R-GPF. Closes part of #96. Numerical equivalence verified end-to-end on KITTI seq 00 (4541 frames): F1 delta 0.00 for patchwork classic (bit-identical), +0.01 for patchwork++ (within the ±0.05 macro budget). Bumps: - python/pyproject.toml 1.4.0 -> 1.4.1 - cpp/CMakeLists.txt 1.4.0 -> 1.4.1 CHANGELOG.md updated with the full v1.4.1 entry. See #100. * chore(release): mdformat CHANGELOG.md * chore(release): mdformat 0.7.9 escapes
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Cuts per-frame Patchwork++ time on KITTI seq 00 from 10.26 ms → 8.94 ms (97.5 Hz → 111.9 Hz, +14.8% Hz, median of 3 runs, i7-12700). Patchwork classic is unaffected (already TBB-bound, see #95).
The win comes from killing short-lived heap allocations in the per-patch hot loop, exactly the bottleneck identified in #96.
What changed
High-impact (this is where the 14.8% comes from):
PatchWorkpp::estimate_plane: dropEigen::MatrixX3f eigen_ground+centered+centered.adjoint() * centered. Replace with a single-pass scalar accumulation of mean and 9 cross-products, then build the 3×3 cov on the stack. No more per-call Eigen heap allocations.PatchWorkpp::extract_piecewiseground: promotesrc_wo_verticalsandsrc_tmpto reused instance scratch members.vector::clear()keeps capacity, so per-patch malloc pressure on the glibc heap (which was serializing the loop per Why Patchwork++ has no TBB parallelisation mode #96) drops away after the first few patches.PatchWorkpp::estimateGroundmain loop:auto& zoneinstead ofauto zoneforConcentricZoneModel_[zone_idx]. Avoids a deep-copy of the full 3-level nested vector per outer iteration. Safe: the in-placestd::sortmutates patches that are read once and then flushed at the top of everyestimateGroundcall.Lower-impact, kept for cleanliness:
JacobiSVD<Matrix3f>→SelfAdjointEigenSolver::computeDirectfor the 3×3 PSD covariance in bothcpp/common/src/plane_fit.cppand the in-placePatchWorkpp::estimate_plane. Closed-form, no Jacobi iterations.singular_values_is repacked descending so every consumer (linearity/planarity incommon,flatness_thrindex(2)in patchwork classic,ground_flatness=minCoeff()andline_variable=sv(0)/sv(1)in patchwork++) keeps the same convention bit-for-bit.const&onaddCloud'saddparameter,RevertCandidateloop vars, and thetemporal_ground_revert/calc_point_to_plane_d/calc_mean_stdevsignatures.Benchmarks
KITTI seq 00, 2900 timed frames, median of 3 runs:
Patchwork classic is unchanged on the perf side: TBB parallel_for already amortizes allocations across 24 cores and SVD is sub-µs/patch.
Numerical equivalence
Full KITTI seq 00 (4541 frames):
Algebraic identity of JacobiSVD vs eigh verified on 500 real KITTI patch covariances:
normal_(up to sign),singular_values_,linearity_,planarity_,ground_flatness, andline_variableall match to FP precision. All deltas well within the ±0.05 macro / ±0.10 per-seq budget.Refs #96.
Test plan
pip install -v ./python/builds clean on Linux (gcc 13)evaluate_semantickitti.py --method patchworkpp --eval_protocol patchworkpp --seqs 00matches baseline within ±0.05 F1evaluate_semantickitti.py --method patchwork --eval_protocol patchwork --seqs 00bit-identical to baselinebench_hz.py --method patchworkpp --seq 00: +14.8% Hz, stable across 3 runsbench_hz.py --method patchwork --seq 00: within run-to-run noise of baseline