bench: refresh all benchmark suites at v0.5.908 (2026-05-14) by proggeramlug · Pull Request #765 · PerryTS/perry

proggeramlug · 2026-05-14T08:21:41Z

Summary

Full rerun of all four benchmark suites (compute polyglot, JSON polyglot, honest_bench, suite/) against Perry v0.5.908 on an otherwise-idle machine, plus a full doc refresh.

Confirms yesterday's apparent regressions were parallel-build contamination. A parallel cargo build (Ralph's issue-665-resolver-opt-in worktree) was running through part of the 2026-05-13 v0.5.891 sweep, inflating σ on every Perry compute cell. Today's idle-machine numbers drop σ from 25-57 ms to 0.3-2.2 ms — and Perry compute medians land back within 1-4 ms of the v0.5.585 historical baseline across all 9 cells.
Verifies RSS regression on JSON polyglot: 85→254 MB roundtrip, 100→411 MB parse-and-iterate (v0.5.279 → v0.5.891) #745 partial fix landed in v0.5.900. JSON polyglot RSS dropped 254 → 227 MB on roundtrip and 411 → 309 MB on parse-and-iterate. Wall-time is back to v0.5.279 levels. Residual ~150-200 MB gap vs the v0.5.279 floor flagged in #745 comment — the mark-sweep + no-lazy-tape path is actually slightly worse than v0.5.891, so the residual has its own root cause.
honest_bench: 300/300 output-matched rows. Perry slightly faster on all 3 workloads vs v0.5.891 (image_conv 365 → 354, json_small 39.6 → 39.2, json_full 1155 → 1098 ms).
suite/: method_calls back to 9 ms (yesterday's 25 ms was single-run noise from concurrent CPU). closure (50 ms) and factorial (107 ms) regressions vs v0.5.173 persist and are flagged as open follow-ups in benchmarks/suite/results/RESULTS.md.

Diff highlights

README.md, benchmarks/README.md, benchmarks/polyglot/RESULTS{,_AUTO,_OPT}.md, benchmarks/honest_bench/REPORT.md, benchmarks/json_polyglot/RESULTS.md — refreshed tables + prose at v0.5.908 / 2026-05-14
benchmarks/honest_bench/charts/*.png, results/results.json, results/metadata.json, results/summary.txt — regenerated from this sweep
benchmarks/suite/results/RESULTS.md — new file; the suite/run_benchmarks.sh runner doesn't write a permanent results file, so this committed one captures the v0.5.908 snapshot + delta tables vs v0.5.891 and v0.5.173 baselines
Historical comparison notes (vs v0.5.585, v0.5.891, v0.5.279) preserved throughout for trend visibility

Test plan

Compute polyglot RUNS=11 default mode → benchmarks/polyglot/RESULTS_AUTO.md body
Compute polyglot RUNS=11 PERRY_FAST_MATH=1 rerun → RESULTS_AUTO.md addendum
honest_bench (5 warmup + 20 measured, output-correctness gated) → 300/300 match
JSON polyglot RUNS=11 → benchmarks/json_polyglot/RESULTS.md
suite/ microbenchmarks single-run → benchmarks/suite/results/RESULTS.md
python3 scripts/report.py regenerates REPORT.md from results.json
python3 scripts/plot.py regenerates charts/*.png
All _TBD_ placeholders from yesterday's partial sweep filled in
grep for stale v0.5.891 / 2026-05-13 confirmed remaining hits are intentional historical references

No code changes — pure docs + bench results refresh. CI's cargo-test / parity / compile-smoke / api-docs-drift / security-audit paths shouldn't have anything to verify here.

Maintainer note: per CLAUDE.md flow, no version bump or CHANGELOG entry in this branch — those go on at merge time.

Full rerun of polyglot, JSON polyglot, honest_bench, and suite/ microbenchmarks on an otherwise-idle machine. Confirms that yesterday's v0.5.891 sweep (#745 follow-up) was dominated by parallel cargo-build contamination — σ on Perry compute cells dropped from 25-57 ms to 0.3-2.2 ms. Key results: - Compute polyglot matches v0.5.585 historical numbers within 1-4 ms across all 9 cells (default + --fast-math); fast-math cleanly reproduces 8× / 3.6× / 2.9× speedups on loop_overhead / math_intensive / accumulate. - honest_bench: Perry slightly faster on all 3 workloads vs v0.5.891 (image_conv 365 → 354 ms; json_full 1155 → 1098 ms); 300/300 output-matched rows. - #745 partial fix verification: JSON polyglot RSS dropped 254 → 227 MB roundtrip and 411 → 309 MB iterate after v0.5.900's GC trigger-ratchet fix. Residual ~150 MB gap vs v0.5.279 baseline flagged on the issue. - suite/: method_calls back to 9 ms (yesterday's 25 ms was noise); closure/factorial regressions vs v0.5.173 persist as known follow-ups. Docs refreshed: top-level README, benchmarks/README, polyglot RESULTS{,_AUTO,_OPT}.md, honest_bench REPORT.md (+ regenerated charts), json_polyglot RESULTS.md (auto), suite/results/RESULTS.md (new). All with 2026-05-14 / v0.5.908 datestamps and historical deltas vs v0.5.891 and v0.5.279.

proggeramlug merged commit 8a7ea99 into main May 14, 2026
9 checks passed

proggeramlug deleted the worktree-refresh-benchmarks branch May 14, 2026 08:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bench: refresh all benchmark suites at v0.5.908 (2026-05-14)#765

bench: refresh all benchmark suites at v0.5.908 (2026-05-14)#765
proggeramlug merged 1 commit into
mainfrom
worktree-refresh-benchmarks

proggeramlug commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

proggeramlug commented May 14, 2026

Summary

Diff highlights

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant