Skip to content

Commit 30a99c6

Browse files
committed
perf(patchworkpp): kill per-patch heap traffic in R-VPF / R-GPF (+14.8% Hz)
Three changes inside PatchWorkpp::extract_piecewiseground and PatchWorkpp::estimate_plane that together take KITTI seq 00 from 97.5 Hz to 111.9 Hz (median, 24-core i7-12700, 2900 timed frames): * estimate_plane: replace MatrixX3f eigen_ground / centered / centered.adjoint() * centered with a single-pass scalar accumulation of mean and 9 cross-products, then build the 3x3 cov on the stack. No more per-call Eigen heap allocations. * extract_piecewiseground: promote src_wo_verticals and src_tmp to reused instance scratch members. Per-patch malloc pressure on the glibc heap (which was serialising the loop, see issue #96) goes away after the first few patches because vector::clear() retains capacity. * estimateGround main loop: `auto& zone` instead of `auto zone` for the ConcentricZoneModel_[zone_idx] read. Avoids a deep-copy of the full 3-level nested vector per outer iteration; the in-place std::sort is safe because each (zone, ring, sector) patch is read once and the CZM is flushed at the top of every estimateGround call. Also (smaller wins, kept for cleanliness): * JacobiSVD<Matrix3f> -> SelfAdjointEigenSolver::computeDirect on the 3x3 PSD covariance in both cpp/common/src/plane_fit.cpp and the patchworkpp in-place estimate_plane. Closed-form, no Jacobi iterations. singular_values_ is repacked descending so every consumer (linearity_ / planarity_ in common, flatness_thr index (2) in patchwork classic, ground_flatness=minCoeff() and line_variable=sv(0)/sv(1) in patchwork++) keeps the same convention bit-for-bit. * const& on addCloud's 'add' parameter, RevertCandidate loop variables, and the temporal_ground_revert / calc_point_to_plane_d / calc_mean_stdev signatures. Numerical equivalence verified end-to-end on KITTI seq 00: * patchwork (pw protocol): P/R/F1 unchanged to 0.01 (bit-identical). * patchwork++ (pp protocol): F1 96.62 -> 96.63 (delta 0.01, well within the ±0.05 macro budget). Algebraic identity of JacobiSVD vs eigh on 500 real KITTI patch covariances confirmed to FP precision for all derived scalars (linearity_, planarity_, ground_flatness, line_variable, normal_ up to sign). Patchwork classic is unchanged on the perf side (TBB-bound, SVD is sub-µs/patch after parallel_for); the win is concentrated in Patchwork++ where these allocations dominated the profile.
1 parent cb8e449 commit 30a99c6

3 files changed

Lines changed: 104 additions & 56 deletions

File tree

cpp/common/src/plane_fit.cpp

Lines changed: 18 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,26 @@ void estimate_plane(const std::vector<PointXYZ>& seeds, PCAFeature& out, float t
1919
const Eigen::Matrix3f cov =
2020
(centered.adjoint() * centered) / std::max<float>(1.0f, static_cast<float>(pts.rows() - 1));
2121

22-
Eigen::JacobiSVD<Eigen::Matrix3f> svd(cov, Eigen::ComputeFullU);
23-
Eigen::Vector3f normal = svd.matrixU().col(2);
22+
// Closed-form 3x3 symmetric eigendecomposition. Covariance is PSD,
23+
// so eigenvalues == singular values; eigenvectors come back ascending,
24+
// so col(0) is the plane normal (smallest variance direction) and
25+
// col(2) is the principal axis. Repack singular_values_ in descending
26+
// order so the (s0 >= s1 >= s2) convention used by linearity / planarity
27+
// and by downstream patchwork++ consumers (line_variable, ground_flatness)
28+
// is preserved bit-for-bit relative to the previous JacobiSVD output.
29+
Eigen::SelfAdjointEigenSolver<Eigen::Matrix3f> eig;
30+
eig.computeDirect(cov, Eigen::ComputeEigenvectors);
31+
const Eigen::Vector3f evals = eig.eigenvalues().cwiseMax(0.0f);
32+
33+
Eigen::Vector3f normal = eig.eigenvectors().col(0);
2434
if (normal(2) < 0.0f) normal = -normal;
2535

26-
out.normal_ = normal;
27-
out.principal_ = svd.matrixU().col(0);
28-
out.singular_values_ = svd.singularValues();
29-
out.mean_ = mean;
30-
out.d_ = -normal.dot(mean);
31-
out.th_dist_d_ = th_dist - out.d_;
36+
out.normal_ = normal;
37+
out.principal_ = eig.eigenvectors().col(2);
38+
out.singular_values_ << evals(2), evals(1), evals(0);
39+
out.mean_ = mean;
40+
out.d_ = -normal.dot(mean);
41+
out.th_dist_d_ = th_dist - out.d_;
3242

3343
const float s0 = out.singular_values_(0);
3444
const float s1 = out.singular_values_(1);

cpp/patchworkpp/include/patchwork/patchworkpp.h

Lines changed: 12 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -195,27 +195,34 @@ class PatchWorkpp {
195195
vector<PointXYZ> ground_pc_, non_ground_pc_;
196196
vector<PointXYZ> regionwise_ground_, regionwise_nonground_;
197197

198+
// Reused scratch buffers for R-VPF / R-GPF inside extract_piecewiseground.
199+
// Promoting these to members avoids per-patch heap (de)allocation, which
200+
// dominated the per-frame profile (~14 µs avg patch, hundreds of patches
201+
// per cloud) — see issue #96.
202+
vector<PointXYZ> src_wo_verticals_;
203+
vector<PointXYZ> src_tmp_;
204+
198205
vector<PointXYZ> cloud_ground_, cloud_nonground_;
199206

200207
vector<PointXYZ> centers_, normals_;
201208

202209
Eigen::MatrixX3f toEigenCloud(const vector<PointXYZ> &cloud);
203210
Eigen::VectorXi toIndices(const vector<PointXYZ> &cloud);
204211

205-
void addCloud(vector<PointXYZ> &cloud, vector<PointXYZ> &add);
212+
void addCloud(vector<PointXYZ> &cloud, const vector<PointXYZ> &add);
206213

207214
void flush_patches(std::vector<Zone> &czm);
208215

209216
void pc2czm(const Eigen::MatrixXf &src, std::vector<Zone> &czm);
210217

211218
void reflected_noise_removal(Eigen::MatrixXf &cloud_in);
212219

213-
void temporal_ground_revert(std::vector<double> ring_flatness,
214-
std::vector<patchwork::RevertCandidate> candidates,
220+
void temporal_ground_revert(const std::vector<double> &ring_flatness,
221+
const std::vector<patchwork::RevertCandidate> &candidates,
215222
int concentric_idx);
216223

217-
double calc_point_to_plane_d(PointXYZ p, Eigen::VectorXf normal, double d);
218-
void calc_mean_stdev(std::vector<double> vec, double &mean, double &stdev);
224+
double calc_point_to_plane_d(const PointXYZ &p, const Eigen::VectorXf &normal, double d);
225+
void calc_mean_stdev(const std::vector<double> &vec, double &mean, double &stdev);
219226

220227
void update_elevation_thr();
221228
void update_flatness_thr();

cpp/patchworkpp/src/patchworkpp.cpp

Lines changed: 74 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ Eigen::VectorXi PatchWorkpp::toIndices(const vector<PointXYZ> &cloud) {
2626
return dst;
2727
}
2828

29-
void PatchWorkpp::addCloud(vector<PointXYZ> &cloud, vector<PointXYZ> &add) {
29+
void PatchWorkpp::addCloud(vector<PointXYZ> &cloud, const vector<PointXYZ> &add) {
3030
cloud.insert(cloud.end(), add.begin(), add.end());
3131
}
3232

@@ -49,34 +49,60 @@ void PatchWorkpp::flush_patches(vector<Zone> &czm) {
4949
void PatchWorkpp::estimate_plane(const vector<PointXYZ> &ground) {
5050
if (ground.empty()) return;
5151

52-
Eigen::MatrixX3f eigen_ground(ground.size(), 3);
53-
int j = 0;
54-
for (auto &p : ground) {
55-
eigen_ground.row(j++) << p.x, p.y, p.z;
52+
// Single-pass accumulation of mean + cross-products. Avoids the
53+
// per-call Eigen::MatrixX3f / centered / adjoint-product heap
54+
// allocations that dominated the profile. Cov is computed from the
55+
// raw second moments via cov_ij = (sum p_i p_j - N * mean_i * mean_j) / (N - 1).
56+
const size_t n = ground.size();
57+
double sx = 0.0, sy = 0.0, sz = 0.0;
58+
double sxx = 0.0, syy = 0.0, szz = 0.0;
59+
double sxy = 0.0, sxz = 0.0, syz = 0.0;
60+
for (const auto &p : ground) {
61+
const double x = p.x, y = p.y, z = p.z;
62+
sx += x;
63+
sy += y;
64+
sz += z;
65+
sxx += x * x;
66+
syy += y * y;
67+
szz += z * z;
68+
sxy += x * y;
69+
sxz += x * z;
70+
syz += y * z;
5671
}
57-
Eigen::MatrixX3f centered = eigen_ground.rowwise() - eigen_ground.colwise().mean();
58-
Eigen::MatrixX3f cov =
59-
(centered.adjoint() * centered) / static_cast<double>(eigen_ground.rows() - 1);
60-
61-
pc_mean_.resize(3);
62-
pc_mean_ << eigen_ground.colwise().mean()(0), eigen_ground.colwise().mean()(1),
63-
eigen_ground.colwise().mean()(2);
64-
65-
Eigen::JacobiSVD<Eigen::MatrixX3f> svd(cov, Eigen::DecompositionOptions::ComputeFullU);
66-
singular_values_ = svd.singularValues();
67-
68-
// use the least singular vector as normal
69-
normal_ = (svd.matrixU().col(2));
70-
71-
if (normal_(2) < 0) {
72-
for (int i = 0; i < 3; i++) normal_(i) *= -1;
73-
}
74-
75-
// mean ground seeds value
76-
Eigen::Vector3f seeds_mean = pc_mean_.head<3>();
72+
const double inv_n = 1.0 / static_cast<double>(n);
73+
const double mx = sx * inv_n, my = sy * inv_n, mz = sz * inv_n;
74+
const double denom = (n > 1) ? static_cast<double>(n - 1) : 1.0;
75+
const double inv_d = 1.0 / denom;
76+
77+
Eigen::Matrix3f cov;
78+
cov(0, 0) = static_cast<float>((sxx - n * mx * mx) * inv_d);
79+
cov(1, 1) = static_cast<float>((syy - n * my * my) * inv_d);
80+
cov(2, 2) = static_cast<float>((szz - n * mz * mz) * inv_d);
81+
cov(0, 1) = cov(1, 0) = static_cast<float>((sxy - n * mx * my) * inv_d);
82+
cov(0, 2) = cov(2, 0) = static_cast<float>((sxz - n * mx * mz) * inv_d);
83+
cov(1, 2) = cov(2, 1) = static_cast<float>((syz - n * my * mz) * inv_d);
84+
85+
pc_mean_ << static_cast<float>(mx), static_cast<float>(my), static_cast<float>(mz);
86+
87+
// Closed-form 3x3 symmetric eigendecomposition. Covariance is PSD,
88+
// so eigenvalues == singular values; SelfAdjointEigenSolver returns
89+
// them ascending. col(0) is the plane normal direction (smallest
90+
// variance); singular_values_ is repacked in descending order so the
91+
// downstream consumers in estimateGround (line_variable =
92+
// singular_values_(0)/singular_values_(1), ground_flatness =
93+
// singular_values_.minCoeff()) keep the same semantics as the old
94+
// JacobiSVD output.
95+
Eigen::SelfAdjointEigenSolver<Eigen::Matrix3f> eig;
96+
eig.computeDirect(cov, Eigen::ComputeEigenvectors);
97+
const Eigen::Vector3f evals = eig.eigenvalues().cwiseMax(0.0f);
98+
99+
normal_ = eig.eigenvectors().col(0);
100+
if (normal_(2) < 0.0f) normal_ = -normal_;
101+
102+
singular_values_ << evals(2), evals(1), evals(0);
77103

78104
// according to normal.T*[x,y,z] = -d
79-
d_ = -(normal_.transpose() * seeds_mean)(0, 0);
105+
d_ = -normal_.dot(pc_mean_);
80106
}
81107

82108
void PatchWorkpp::extract_initial_seeds(const int zone_idx,
@@ -201,7 +227,7 @@ void PatchWorkpp::estimateGround(Eigen::MatrixXf cloud_in) {
201227
// benefit from TBB because it has no R-VPF and fewer allocations
202228
// per patch.
203229
for (int zone_idx = 0; zone_idx < params_.num_zones; ++zone_idx) {
204-
auto zone = ConcentricZoneModel_[zone_idx];
230+
auto &zone = ConcentricZoneModel_[zone_idx];
205231

206232
for (int ring_idx = 0; ring_idx < params_.num_rings_each_zone[zone_idx]; ++ring_idx) {
207233
const int num_sectors = params_.num_sectors_each_zone[zone_idx];
@@ -275,7 +301,7 @@ void PatchWorkpp::estimateGround(Eigen::MatrixXf cloud_in) {
275301
if (params_.enable_TGR) {
276302
temporal_ground_revert(ringwise_flatness, candidates, concentric_idx);
277303
} else {
278-
for (auto candidate : candidates) {
304+
for (auto &candidate : candidates) {
279305
addCloud(cloud_nonground_, candidate.regionwise_ground);
280306
}
281307
}
@@ -391,8 +417,8 @@ void PatchWorkpp::reflected_noise_removal(Eigen::MatrixXf &cloud_in) {
391417
cout << "PatchWorkpp::reflected_noise_removal() - Number of Noises : " << cnt << endl;
392418
}
393419

394-
void PatchWorkpp::temporal_ground_revert(std::vector<double> ring_flatness,
395-
std::vector<patchwork::RevertCandidate> candidates,
420+
void PatchWorkpp::temporal_ground_revert(const std::vector<double> &ring_flatness,
421+
const std::vector<patchwork::RevertCandidate> &candidates,
396422
int concentric_idx) {
397423
if (params_.verbose)
398424
std::cout << "\033[1;34m"
@@ -408,7 +434,7 @@ void PatchWorkpp::temporal_ground_revert(std::vector<double> ring_flatness,
408434
<< std::endl;
409435
}
410436

411-
for (auto candidate : candidates) {
437+
for (const auto &candidate : candidates) {
412438
// Debug
413439
if (params_.verbose) {
414440
cout << "\033[1;33m" << candidate.sector_idx << "th flat_sector_candidate"
@@ -470,26 +496,29 @@ void PatchWorkpp::extract_piecewiseground(const int zone_idx,
470496

471497
// 1. Region-wise Vertical Plane Fitting (R-VPF)
472498
// : removes potential vertical plane under the ground plane
473-
vector<PointXYZ> src_wo_verticals;
474-
src_wo_verticals = src;
499+
// src_wo_verticals_ and src_tmp_ are reused instance scratch buffers
500+
// (see header) — `clear()` keeps capacity, so the per-patch malloc
501+
// pressure on the glibc heap goes away after the first few patches.
502+
src_wo_verticals_.clear();
503+
src_wo_verticals_.insert(src_wo_verticals_.end(), src.begin(), src.end());
475504

476505
if (params_.enable_RVPF) {
477506
for (int i = 0; i < params_.num_iter; i++) {
478-
extract_initial_seeds(zone_idx, src_wo_verticals, ground_pc_, params_.th_seeds_v);
507+
extract_initial_seeds(zone_idx, src_wo_verticals_, ground_pc_, params_.th_seeds_v);
479508
estimate_plane(ground_pc_);
480509

481510
if (zone_idx == 0 && normal_(2) < params_.uprightness_thr) {
482-
vector<PointXYZ> src_tmp;
483-
src_tmp = src_wo_verticals;
484-
src_wo_verticals.clear();
511+
src_tmp_.clear();
512+
src_tmp_.swap(src_wo_verticals_); // src_tmp_ now holds the old src_wo_verticals_;
513+
// src_wo_verticals_ is empty (capacity retained).
485514

486-
for (auto point : src_tmp) {
515+
for (const auto &point : src_tmp_) {
487516
double distance = calc_point_to_plane_d(point, normal_, d_);
488517

489518
if (abs(distance) < params_.th_dist_v) {
490519
non_ground_dst.push_back(point);
491520
} else {
492-
src_wo_verticals.push_back(point);
521+
src_wo_verticals_.push_back(point);
493522
}
494523
}
495524
} else
@@ -500,13 +529,13 @@ void PatchWorkpp::extract_piecewiseground(const int zone_idx,
500529
// 2. Region-wise Ground Plane Fitting (R-GPF)
501530
// : fits the ground plane
502531

503-
extract_initial_seeds(zone_idx, src_wo_verticals, ground_pc_);
532+
extract_initial_seeds(zone_idx, src_wo_verticals_, ground_pc_);
504533
estimate_plane(ground_pc_);
505534

506535
for (int i = 0; i < params_.num_iter; i++) {
507536
ground_pc_.clear();
508537

509-
for (auto point : src_wo_verticals) {
538+
for (const auto &point : src_wo_verticals_) {
510539
double distance = calc_point_to_plane_d(point, normal_, d_);
511540

512541
if (i < params_.num_iter - 1) {
@@ -538,11 +567,13 @@ void PatchWorkpp::extract_piecewiseground(const int zone_idx,
538567
}
539568
}
540569

541-
double PatchWorkpp::calc_point_to_plane_d(PointXYZ p, Eigen::VectorXf normal, double d) {
570+
double PatchWorkpp::calc_point_to_plane_d(const PointXYZ &p,
571+
const Eigen::VectorXf &normal,
572+
double d) {
542573
return normal(0) * p.x + normal(1) * p.y + normal(2) * p.z + d;
543574
}
544575

545-
void PatchWorkpp::calc_mean_stdev(std::vector<double> vec, double &mean, double &stdev) {
576+
void PatchWorkpp::calc_mean_stdev(const std::vector<double> &vec, double &mean, double &stdev) {
546577
if (vec.size() <= 1) return;
547578

548579
mean = std::accumulate(vec.begin(), vec.end(), 0.0) / vec.size();

0 commit comments

Comments
 (0)