Skip to content

perf: wave 3 follow-up improvements#21

Merged
Autoparallel merged 3 commits into
mainfrom
codex/perf-wave3-followup
Mar 7, 2026
Merged

perf: wave 3 follow-up improvements#21
Autoparallel merged 3 commits into
mainfrom
codex/perf-wave3-followup

Conversation

@Autoparallel
Copy link
Copy Markdown
Member

Summary

  • supersedes perf: more improvements #20 with the same follow-up work rebased cleanly onto main
  • adds bench_diffgeo coverage to the perf smoke/deep GitHub workflows, including fallback reporting when the baseline branch predates that benchmark target
  • specializes DEC curvature traversal for direct SoA/CSR storage
  • updates the Wave 3 notes with the continuation profiler pass and rejected experiments

Validation

  • cmake --build build -j8
  • ctest --test-dir build --output-on-failure -j8
  • 14/14 passing locally

Measured Impact

  • bench_curvature_kernel/400: 888653 ns -> 816836 ns CPU (-8.08%)
  • bench_geometry Grid 1000x1000 curvature: 5.716 ms -> 5.358 ms (-6.26%)
  • bench_geometry Grid 1000x1000 structure: 14.516 ms -> 13.661 ms (-5.89%)
  • perf CI now reports bench_diffgeo on follow-up branches even when main lacks the target, instead of silently omitting it

Notes

  • the continuation profile and rejected experiments are recorded in notes/perf/20260306-wave3/

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Mar 7, 2026

Perf Smoke Report

Baseline ref: main
Baseline: e0e8e9f93e5aa434e0ab8e7dde63316c6160b5f9
Backend: parallel
Threads: 4

PR smoke: bench_geometry (baseline e0e8e9f)

  • Baseline source: artifacts/perf/base/bench_geometry_base.txt
  • Current source: artifacts/perf/head/bench_geometry_head.txt
  • Baseline commit: e0e8e9f93e5aa434e0ab8e7dde63316c6160b5f9
  • Comparable benchmarks: 16/16
  • Improved: 2 | Regressed: 14

Top Wins

Benchmark Baseline Current Delta
bench_geometry_flow_ms/100x100 79.000 us 78.000 us -1.27%
bench_geometry_flow_ms/250x250 474.000 us 473.000 us -0.21%

Top Regressions

Benchmark Baseline Current Delta
bench_geometry_structure_ms/100x100 432.000 us 485.000 us +12.27%
bench_geometry_structure_ms/1000x1000 95.640 ms 106.204 ms +11.05%
bench_geometry_structure_ms/250x250 5.100 ms 5.522 ms +8.27%
bench_geometry_frame_ms/1000x1000 214.942 ms 229.595 ms +6.82%
bench_geometry_structure_ms/500x500 24.228 ms 25.868 ms +6.77%

Full Comparison

Benchmark Baseline Current Delta Status
bench_geometry_curvature_ms/1000x1000 109.488 ms 113.456 ms +3.62% regressed
bench_geometry_curvature_ms/100x100 1.110 ms 1.140 ms +2.70% regressed
bench_geometry_curvature_ms/250x250 6.773 ms 7.085 ms +4.61% regressed
bench_geometry_curvature_ms/500x500 27.218 ms 28.838 ms +5.95% regressed
bench_geometry_flow_ms/1000x1000 9.814 ms 9.935 ms +1.23% regressed
bench_geometry_flow_ms/100x100 79.000 us 78.000 us -1.27% improved
bench_geometry_flow_ms/250x250 474.000 us 473.000 us -0.21% improved
bench_geometry_flow_ms/500x500 2.445 ms 2.452 ms +0.29% regressed
bench_geometry_frame_ms/1000x1000 214.942 ms 229.595 ms +6.82% regressed
bench_geometry_frame_ms/100x100 1.621 ms 1.703 ms +5.06% regressed
bench_geometry_frame_ms/250x250 12.347 ms 13.080 ms +5.94% regressed
bench_geometry_frame_ms/500x500 53.891 ms 57.158 ms +6.06% regressed
bench_geometry_structure_ms/1000x1000 95.640 ms 106.204 ms +11.05% regressed
bench_geometry_structure_ms/100x100 432.000 us 485.000 us +12.27% regressed
bench_geometry_structure_ms/250x250 5.100 ms 5.522 ms +8.27% regressed
bench_geometry_structure_ms/500x500 24.228 ms 25.868 ms +6.77% regressed

PR smoke: bench_dod (baseline e0e8e9f)

  • Baseline source: artifacts/perf/base/bench_dod_base.json
  • Current source: artifacts/perf/head/bench_dod_head.json
  • Baseline commit: e0e8e9f93e5aa434e0ab8e7dde63316c6160b5f9
  • Comparable benchmarks: 11/11
  • Improved: 3 | Regressed: 8

Top Wins

Benchmark Baseline Current Delta
bench_curvature_kernel/400 14.887 ms 12.694 ms -14.73%
bench_curl_energy/2000/16 1.854 ms 1.791 ms -3.41%
bench_diffusion_build/2000 23.676 ms 23.520 ms -0.66%

Top Regressions

Benchmark Baseline Current Delta
bench_eigenbasis/2000/16 17.634 ms 18.191 ms +3.16%
bench_flow_kernel/400 3.168 ms 3.250 ms +2.57%
bench_markov_step/2000 40.304 us 40.693 us +0.97%
bench_1form_gram/2000/16 922.350 us 927.913 us +0.60%
bench_weak_derivative/2000/16 144.261 us 145.113 us +0.59%

Full Comparison

Benchmark Baseline Current Delta Status
bench_1form_gram/2000/16 922.350 us 927.913 us +0.60% regressed
bench_curl_energy/2000/16 1.854 ms 1.791 ms -3.41% improved
bench_curvature_kernel/400 14.887 ms 12.694 ms -14.73% improved
bench_diffusion_build/2000 23.676 ms 23.520 ms -0.66% improved
bench_eigenbasis/2000/16 17.634 ms 18.191 ms +3.16% regressed
bench_flow_kernel/400 3.168 ms 3.250 ms +2.57% regressed
bench_hodge_solve/2000/16 146.058 us 146.261 us +0.14% regressed
bench_markov_multi_step/2000/20 807.024 us 810.383 us +0.42% regressed
bench_markov_multi_step/20000/20 8.631 ms 8.632 ms +0.01% regressed
bench_markov_step/2000 40.304 us 40.693 us +0.97% regressed
bench_weak_derivative/2000/16 144.261 us 145.113 us +0.59% regressed

PR smoke: bench_pipelines (baseline e0e8e9f)

  • Baseline source: artifacts/perf/base/bench_pipelines_base.json
  • Current source: artifacts/perf/head/bench_pipelines_head.json
  • Baseline commit: e0e8e9f93e5aa434e0ab8e7dde63316c6160b5f9
  • Comparable benchmarks: 10/10
  • Improved: 4 | Regressed: 6

Top Wins

Benchmark Baseline Current Delta
bench_hodge_phase_weak_derivative 5.201 ms 5.047 ms -2.96%
bench_pipeline_diffusion_main/20 31.800 ms 31.459 ms -1.07%
bench_pipeline_spectral_main 57.898 ms 57.525 ms -0.64%
bench_pipeline_diffusion_main/100 35.666 ms 35.656 ms -0.03%

Top Regressions

Benchmark Baseline Current Delta
bench_hodge_phase_gram 6.458 ms 6.665 ms +3.21%
bench_hodge_phase_curl_energy 52.176 ms 53.758 ms +3.03%
bench_hodge_phase_eigenbasis 676.718 ms 689.545 ms +1.90%
bench_pipeline_hodge_main 820.761 ms 832.987 ms +1.49%
bench_hodge_phase_circular 12.681 ms 12.770 ms +0.70%

Full Comparison

Benchmark Baseline Current Delta Status
bench_hodge_phase_circular 12.681 ms 12.770 ms +0.70% regressed
bench_hodge_phase_curl_energy 52.176 ms 53.758 ms +3.03% regressed
bench_hodge_phase_eigenbasis 676.718 ms 689.545 ms +1.90% regressed
bench_hodge_phase_gram 6.458 ms 6.665 ms +3.21% regressed
bench_hodge_phase_solve 5.409 ms 5.409 ms +0.01% regressed
bench_hodge_phase_weak_derivative 5.201 ms 5.047 ms -2.96% improved
bench_pipeline_diffusion_main/100 35.666 ms 35.656 ms -0.03% improved
bench_pipeline_diffusion_main/20 31.800 ms 31.459 ms -1.07% improved
bench_pipeline_hodge_main 820.761 ms 832.987 ms +1.49% regressed
bench_pipeline_spectral_main 57.898 ms 57.525 ms -0.64% improved

PR smoke: bench_diffgeo (baseline e0e8e9f)

  • Baseline source: artifacts/perf/base/bench_diffgeo_base.json
  • Current source: artifacts/perf/head/bench_diffgeo_head.json
  • Baseline commit: e0e8e9f93e5aa434e0ab8e7dde63316c6160b5f9
  • Comparable benchmarks: 10/10
  • Improved: 1 | Regressed: 9

Top Wins

Benchmark Baseline Current Delta
bench_diffgeo_phase_circular/4000/64/32/32 12.595 ms 12.558 ms -0.30%

Top Regressions

Benchmark Baseline Current Delta
bench_diffgeo_phase_eigenbasis/1000/50/32/32 53.242 ms 55.938 ms +5.06%
bench_diffgeo_phase_k2_up/4000/64/32/32 106.383 ms 107.353 ms +0.91%
bench_diffgeo_phase_eigenbasis/4000/64/32/32 562.468 ms 565.953 ms +0.62%
bench_diffgeo_phase_circular/1000/50/32/32 3.866 ms 3.889 ms +0.60%
bench_diffgeo_phase_structure_build/1000/50/32/32 11.868 ms 11.932 ms +0.54%

Full Comparison

Benchmark Baseline Current Delta Status
bench_diffgeo_phase_circular/1000/50/32/32 3.866 ms 3.889 ms +0.60% regressed
bench_diffgeo_phase_circular/4000/64/32/32 12.595 ms 12.558 ms -0.30% improved
bench_diffgeo_phase_eigenbasis/1000/50/32/32 53.242 ms 55.938 ms +5.06% regressed
bench_diffgeo_phase_eigenbasis/4000/64/32/32 562.468 ms 565.953 ms +0.62% regressed
bench_diffgeo_phase_k1_up/4000/64/32/32 76.475 ms 76.527 ms +0.07% regressed
bench_diffgeo_phase_k2_up/4000/64/32/32 106.383 ms 107.353 ms +0.91% regressed
bench_diffgeo_phase_structure_build/1000/50/32/32 11.868 ms 11.932 ms +0.54% regressed
bench_diffgeo_phase_structure_build/4000/64/32/32 47.901 ms 47.966 ms +0.14% regressed
bench_diffgeo_pipeline/1000/50/32/32 121.459 ms 122.056 ms +0.49% regressed
bench_diffgeo_pipeline/4000/64/32/32 833.608 ms 836.911 ms +0.40% regressed

@Autoparallel Autoparallel merged commit 3114b42 into main Mar 7, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant