benchmark: alloc profile testitem (Ipopt + MadNLP)#95
Open
jack-champagne wants to merge 4 commits into
Open
Conversation
Wires the new HBJ analyzer (`benchmark_memory!` + `report_alloc_profile`) into the benchmark suite as a fourth testitem. Produces JLD2 allocation-profile artifacts under `benchmark/results/allocs/` (already covered by the existing workflow's `benchmark/results/` upload) and prints a top-K breakdown by type / leaf call site / frame for each solver. The testitem uses the same bilinear N=51 problem as the existing Ipopt-vs-MadNLP timing testitem so allocation hotspots line up with the timing numbers. `sample_rate = 0.01` keeps the trace tractable — `Profile.Allocs` slows the solve roughly linearly in number of allocations, and the bilinear solve produces millions of allocs; full-rate sampling on N=10 hung >15 min in earlier experiments (cf. closed DTO#71). The 1/sample_rate extrapolation applied by `report_alloc_profile` rebuilds the totals. Bumps HBJ pin from 5401542c (v0.2.0 prep) to c38418cb (post-#12, analyzer + Piccolo-aligned CI) since the analyzer's exports (`top_alloc_types`, `report_alloc_profile`, …) didn't exist at the v0.2.0 prep commit. Other consumers of the bench env (timing, scaling) already work against c38418cb — no behavioral change for them. Stacked on `benchmarks/directtrajopt-initial-v2` (PR #75) so reviewers see only this testitem's diff. Will retarget to main when #75 lands.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
Profile.Allocs adds ~30-40min per solve regardless of max_iter — local timing on a fast workstation showed 47m for the testitem alone, ~74m on GH Actions runners. That makes it impractical to gate every PR on the alloc profile (push to 180min timeout would burn ~115min of CI per benchmark/src/ change). Local hard data: bilinear N=51, max_iter=30, sample_rate=0.01: Ipopt section ~22min local / ~42min CI, MadNLP section ~25min local / ~32min CI. Solution: split into a dedicated alloc-profile workflow with a paths filter (benchmark/alloc_profile.jl, benchmark/problem_utils.jl, benchmark/Project.toml, .github/workflows/alloc-profile.yml). Main benchmark workflow filters the alloc-profile testitem out and reverts to timeout-minutes: 60 (now ~32min wall time again). The alloc-profile workflow gets its own 90min budget, uploads artifacts under benchmark/results/allocs/. TestItemRunner filter mirrors the same approach already used in test/runtests.jl to keep main-package CI from picking up benchmark testitems.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds an allocation profile of the bilinear-N=51 solve under each solver
(Ipopt + MadNLP), using HBJ's new
benchmark_memory!+report_alloc_profileAPI. Produces a JLD2
AllocProfileResultartifact per solver and prints atop-K breakdown by allocated type, leaf call site, and frame.
Picks up where closed DTO#71
left off — that PR's analyzer logic has since shipped in HBJ as
src/analyze.jl;this PR is the DTO-side wiring that exercises it.
Stacked on #75 so
reviewers see only this PR's diff. GitHub will auto-retarget to
mainonce#75 merges.
What's in the PR
benchmark/alloc_profile.jl— new testitem"Alloc profile: bilinear N=51 (Ipopt + MadNLP)". Usesmax_iter = 30andsample_rate = 0.01(full-rate sampling hangs the bilinear solve >15 min, per closed DTO#71's notes; even0.01runs each solver at ~30-40 min underProfile.Allocsoverhead, which is fundamental to Julia'sProfile.Allocsregardless ofmax_iter)..github/workflows/alloc-profile.yml(new) — dedicated workflow for the alloc profile testitem. Paths-filtered tobenchmark/alloc_profile.jl,benchmark/problem_utils.jl,benchmark/Project.toml, and itself, so it only runs when the alloc-profile config changes. 90-min timeout. Uploadsbenchmark/results/allocs/as a separate artifact..github/workflows/benchmark.yml— filters the alloc-profile testitem out of the main benchmark workflow. Main workflow's timeout reverts to 60 min (back from the brief 180 min experiment) since the heavy testitem has moved.benchmark/Project.toml— HBJ pin bumped from5401542c(v0.2.0 prep) toc38418cb(post-HBJ#12, ships the analyzer).Why split workflows
Local hard data on a fast workstation: alloc-profile testitem alone = 47m19s. GH Actions = ~74m.
Profile.Allocsadds ~30-40 min per solve regardless ofmax_iter— the per-allocation check overhead is the bottleneck, not the sampling rate. Keeping the testitem in the main benchmark workflow forced every src/ or benchmark/ PR to pay that cost (or pre-split, hit timeout).By splitting, the main benchmark workflow stays at ~32 min for the fast green-light signal; alloc profile runs in its own 90-min-budget workflow only when the alloc-profile config actually changes. Same TestItemRunner filter pattern already used to keep main-package CI from picking up benchmark testitems.
Sample output (from local run, identical shape on CI)
Hot spot:
Memory{ForwardDiff.Dual{...}}buffers in the integrators' jacobian + hessian AD pipelines. Same conclusion as the evaluator micro-bench (eval_hessian_lagrangiandominates per-iter time). Real, actionable.Test plan
Benchmarksworkflow green (~32 min) — confirms the alloc-profile filter removed it cleanlyAlloc Profileworkflow green (~75 min) — confirms the split workflow runs the testitem and uploads artifactsalloc-profile-95-<sha>containsalloc_bilinear_N51_ipopt_<sha>_allocs.jld2+ the MadNLP equivalent=== Alloc profile: ... ===sections with populated "Top N allocated types" tables (top entry not aProfile.Allocs.*noise type)Follow-ups
main.