You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
perf(patchwork): TBB parallel_for in classic Patchwork (+1.73× Hz); Patchwork++ stays sequential after benchmark (#95)
* perf(patchwork): TBB parallel_for over patches in classic Patchwork
The classic Patchwork main loop in cpp/patchwork/src/patchwork.cpp is
now parallelised with a single tbb::parallel_for over all
(zone, ring, sector) patches, mirroring the upstream ~/git/patchwork
pattern. Per-patch work (sort + plane fit + GLE) runs in worker
threads; a serial reduction then accumulates ground / nonground in
deterministic order.
median per-frame time on KITTI seq 00 (i7-12700, 24 logical cores):
single-thread (taskset -c 0) 8.31 ms (120.4 Hz)
parallel (default TBB scheduler) 4.81 ms (207.8 Hz)
→ 1.73x speedup
Patchwork++ (cpp/patchworkpp/src/patchworkpp.cpp) was also benchmarked
under the same TBB pattern at 1 / 2 / 4 / 8 / 16 / 24 threads. Every
multi-thread configuration was SLOWER than single-thread on KITTI:
1 thread → 111 Hz (baseline)
2 threads → 93 Hz
4 threads → 91 Hz
8 threads → 91 Hz
16 threads → 85 Hz
24 threads → 69 Hz
The per-patch work in Patchwork++ is small (~14 µs avg) and dominated
by short-lived std::vector / Eigen::Matrix allocations inside R-VPF
and R-GPF. Concurrent malloc serialises on the heap allocator and TBB
scheduler overhead exceeds the parallelisation benefit at every
thread count. Single-threaded Patchwork++ already runs at ~2x the
paper's reported 55 Hz on i7-7700K, so there is no real-time
motivation to parallelise. Patchwork++ remains single-threaded; the
estimateGround loop has a long-form comment explaining why.
Numerical equivalence verified on KITTI 00-10 full sweep (23,201
frames), both methods, Patchwork++ paper protocol:
patchwork x pp protocol pre: 96.0172 post: 96.0172 (byte-identical)
patchwork++ x pp protocol pre: 96.2918 post: 96.2919 (Δ +0.0001)
Both within the ±0.05 budget set in the refactor plan.
Build:
- Adds find_package(TBB CONFIG/MODULE REQUIRED) to cpp/CMakeLists.txt
with a helpful error message listing the install command for
Ubuntu / macOS / Windows.
- cpp/patchwork/CMakeLists.txt links TBB::tbb; cpp/patchworkpp/ does
not (since it does not use TBB).
Also adds:
- python/examples/bench_hz.py — small per-frame timing harness that
reports median / mean / p95 / p99 ms and Hz from getTimeTaken().
- A `const` qualifier on PatchWork::extract_initial_seeds and
PatchWork::perform_regionwise_segmentation since neither writes to
*this any more — needed so the TBB worker can call them.
* build: make TBB optional in classic Patchwork
CI runners (cpp_api on Ubuntu/macOS/Windows, python_package and
cibuildwheel jobs) do not install libtbb-dev, so PR #95's FATAL_ERROR
on missing TBB broke 11 of 18 checks. Switch to a soft find:
- find_package(TBB CONFIG/MODULE QUIET) — sets TBB_FOUND or not.
- When TBB_FOUND, classic Patchwork links TBB::tbb and gets a
PATCHWORK_HAS_TBB compile define.
- cpp/patchwork/src/patchwork.cpp now guards both the #include and
the parallel_for site with #ifdef PATCHWORK_HAS_TBB and falls
back to a sequential loop over the same patch-index list when
TBB is unavailable.
- cpp/CMakeLists.txt prints a STATUS message either way so users
know whether they got the 1.73x speedup or not.
Tested locally:
- With libtbb-dev installed: "-- TBB found — classic Patchwork will
use tbb::parallel_for." → builds + runs, matches v1.3.1 numbers.
- With -DCMAKE_DISABLE_FIND_PACKAGE_TBB=ON: "-- TBB not found —
classic Patchwork falls back to a sequential loop." → builds
clean, no TBB symbols required.
Patchwork++ remains untouched (issue #96).
0 commit comments