perf(patchwork): TBB parallel_for in classic Patchwork (+1.73× Hz); Patchwork++ stays sequential after benchmark by LimHyungTae · Pull Request #95 · url-kaist/patchwork-plusplus

LimHyungTae · 2026-05-21T07:48:40Z

What

Part B of the TBB / common-library refactor agreed in #89 (Part A is #94). Adds tbb::parallel_for to the classic Patchwork main loop and benchmarks Patchwork++ under the same pattern. Patchwork++ does not become parallel — measurement showed it hurts at every thread count.

Headline numbers (KITTI seq 00, i7-12700, 24 logical cores)

Configuration	Median ms / frame	Median Hz
Classic Patchwork single-thread (taskset -c 0)	8.31	120.4
Classic Patchwork parallel (this PR)	4.81	207.8 (+1.73×)
Patchwork++ single-thread	9.02	110.9
Patchwork++ — TBB at 2 threads (rejected)	10.73	93.2
Patchwork++ — TBB at 8 threads (rejected)	11.05	90.5
Patchwork++ — TBB at 24 threads (rejected)	14.55	68.7

Per-patch work in Patchwork++ is too small (~14 µs avg) and dominated by short-lived std::vector / Eigen::Matrix allocations inside R-VPF + R-GPF. Concurrent malloc serialises on the heap allocator, and TBB scheduler overhead exceeds the parallelisation benefit at every thread count. Single-threaded Patchwork++ already runs at ~2× the paper's reported 55 Hz on i7-7700K, so there is no real-time motivation to parallelise. The decision is documented in a long-form code comment in estimateGround.

Numerical equivalence — passes

KITTI 00-10 full sweep (23,201 frames), Patchwork++ paper protocol:

Configuration	Master	This PR	ΔF1
`--method patchwork` × pp proto	96.0172	96.0172	0 (byte-identical)
`--method patchworkpp` × pp proto	96.2918	96.2919	+0.0001 (float noise)

Both well under the ±0.05 macro budget set in the refactor plan.

Build changes

cpp/CMakeLists.txt: find_package(TBB CONFIG REQUIRED) (falls back to MODULE), with an error message listing the install command for Ubuntu / macOS / Windows if TBB is missing.
cpp/patchwork/CMakeLists.txt: links TBB::tbb.
cpp/patchworkpp/CMakeLists.txt: unchanged (no TBB use).
The CI workflows already install libtbb-dev on Ubuntu via apt; macOS / Windows wheel builds should pick TBB up automatically — I'll watch the CI matrix here.

Adds

python/examples/bench_hz.py — small per-frame timing harness that reports median / mean / p95 / p99 ms and Hz from getTimeTaken(). Useful for future perf work.

Code changes summary

cpp/patchwork/src/patchwork.cpp: replace the triple-nested for loop with a flat tbb::parallel_for over a pre-built patch index list; serial reduction phase walks the outcome buffer in deterministic order.
cpp/patchwork/include/patchwork/patchwork.h: extract_initial_seeds and perform_regionwise_segmentation are now const (they were already pure functions of their arguments + params_; needed so the TBB worker can call them).
cpp/patchworkpp/src/patchworkpp.cpp: unchanged behaviour, but the main loop now has the "why we stayed sequential" comment.
Various small const / namespace tidying in the patchworkpp headers from the TBB prototyping that's harmless to keep.

Test plan

pip install -v ./python/ builds.
Smoke test seq 00 50 frames matches v1.3.1 baseline exactly.
Equivalence sweep KITTI 00-10 (table above).
Hz benchmark (table above), scaling test on Patchwork++ (1/2/4/8/16/24 threads, all in commit message).
pre-commit run clean.

Follow-up

After this lands, plan is to cut v1.4.0 (per the refactor plan agreed earlier):

v1.4.0 = Part A (refactor(common): extract shared PointXYZ / PCAFeature / plane-fit math (Part A of TBB work) #94) + Part B (this PR).
Bump python/pyproject.toml + cpp/CMakeLists.txt to 1.4.0.
CHANGELOG entry noting the +1.73× classic-Patchwork speedup and the no-op Patchwork++ result.
Cut the GitHub Release to trigger PyPI publish.

Cross-refs: #89, #94.

The classic Patchwork main loop in cpp/patchwork/src/patchwork.cpp is now parallelised with a single tbb::parallel_for over all (zone, ring, sector) patches, mirroring the upstream ~/git/patchwork pattern. Per-patch work (sort + plane fit + GLE) runs in worker threads; a serial reduction then accumulates ground / nonground in deterministic order. median per-frame time on KITTI seq 00 (i7-12700, 24 logical cores): single-thread (taskset -c 0) 8.31 ms (120.4 Hz) parallel (default TBB scheduler) 4.81 ms (207.8 Hz) → 1.73x speedup Patchwork++ (cpp/patchworkpp/src/patchworkpp.cpp) was also benchmarked under the same TBB pattern at 1 / 2 / 4 / 8 / 16 / 24 threads. Every multi-thread configuration was SLOWER than single-thread on KITTI: 1 thread → 111 Hz (baseline) 2 threads → 93 Hz 4 threads → 91 Hz 8 threads → 91 Hz 16 threads → 85 Hz 24 threads → 69 Hz The per-patch work in Patchwork++ is small (~14 µs avg) and dominated by short-lived std::vector / Eigen::Matrix allocations inside R-VPF and R-GPF. Concurrent malloc serialises on the heap allocator and TBB scheduler overhead exceeds the parallelisation benefit at every thread count. Single-threaded Patchwork++ already runs at ~2x the paper's reported 55 Hz on i7-7700K, so there is no real-time motivation to parallelise. Patchwork++ remains single-threaded; the estimateGround loop has a long-form comment explaining why. Numerical equivalence verified on KITTI 00-10 full sweep (23,201 frames), both methods, Patchwork++ paper protocol: patchwork x pp protocol pre: 96.0172 post: 96.0172 (byte-identical) patchwork++ x pp protocol pre: 96.2918 post: 96.2919 (Δ +0.0001) Both within the ±0.05 budget set in the refactor plan. Build: - Adds find_package(TBB CONFIG/MODULE REQUIRED) to cpp/CMakeLists.txt with a helpful error message listing the install command for Ubuntu / macOS / Windows. - cpp/patchwork/CMakeLists.txt links TBB::tbb; cpp/patchworkpp/ does not (since it does not use TBB). Also adds: - python/examples/bench_hz.py — small per-frame timing harness that reports median / mean / p95 / p99 ms and Hz from getTimeTaken(). - A `const` qualifier on PatchWork::extract_initial_seeds and PatchWork::perform_regionwise_segmentation since neither writes to *this any more — needed so the TBB worker can call them.

CI runners (cpp_api on Ubuntu/macOS/Windows, python_package and cibuildwheel jobs) do not install libtbb-dev, so PR #95's FATAL_ERROR on missing TBB broke 11 of 18 checks. Switch to a soft find: - find_package(TBB CONFIG/MODULE QUIET) — sets TBB_FOUND or not. - When TBB_FOUND, classic Patchwork links TBB::tbb and gets a PATCHWORK_HAS_TBB compile define. - cpp/patchwork/src/patchwork.cpp now guards both the #include and the parallel_for site with #ifdef PATCHWORK_HAS_TBB and falls back to a sequential loop over the same patch-index list when TBB is unavailable. - cpp/CMakeLists.txt prints a STATUS message either way so users know whether they got the 1.73x speedup or not. Tested locally: - With libtbb-dev installed: "-- TBB found — classic Patchwork will use tbb::parallel_for." → builds + runs, matches v1.3.1 numbers. - With -DCMAKE_DISABLE_FIND_PACKAGE_TBB=ON: "-- TBB not found — classic Patchwork falls back to a sequential loop." → builds clean, no TBB symbols required. Patchwork++ remains untouched (issue #96).

Minor bump for the common-library refactor (#94) and the optional TBB parallelisation of classic Patchwork (#95). pypatchworkpp.patchwork gains 1.73x (120 -> 208 Hz on KITTI seq 00, i7-12700). pypatchworkpp.patchworkpp stays sequential after measurement — see #96 for the why. TBB is an optional build dependency: missing TBB falls back to a sequential loop with a CMake STATUS message; CI/wheel builds keep working without libtbb-dev installed. Bumps: - python/pyproject.toml 1.3.1 -> 1.4.0 - cpp/CMakeLists.txt 1.3.1 -> 1.4.0 CHANGELOG.md updated with the full v1.4.0 entry. See #94, #95, #96.

LimHyungTae mentioned this pull request May 21, 2026

Why Patchwork++ has no TBB parallelisation mode #96

Closed

LimHyungTae merged commit 95f9557 into master May 21, 2026
17 checks passed

LimHyungTae deleted the refactor/tbb-parallel branch May 21, 2026 08:23

LimHyungTae mentioned this pull request May 21, 2026

chore(release): v1.4.0 #97

Merged

LimHyungTae mentioned this pull request May 23, 2026

perf(patchworkpp): kill per-patch heap traffic in R-VPF / R-GPF (+14.8% Hz) #100

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(patchwork): TBB parallel_for in classic Patchwork (+1.73× Hz); Patchwork++ stays sequential after benchmark#95

perf(patchwork): TBB parallel_for in classic Patchwork (+1.73× Hz); Patchwork++ stays sequential after benchmark#95
LimHyungTae merged 2 commits into
masterfrom
refactor/tbb-parallel

LimHyungTae commented May 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

LimHyungTae commented May 21, 2026

What

Headline numbers (KITTI seq 00, i7-12700, 24 logical cores)

Numerical equivalence — passes

Build changes

Adds

Code changes summary

Test plan

Follow-up

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant