Skip to content

bench: refresh all benchmark suites at v0.5.908 (2026-05-14)#765

Merged
proggeramlug merged 1 commit into
mainfrom
worktree-refresh-benchmarks
May 14, 2026
Merged

bench: refresh all benchmark suites at v0.5.908 (2026-05-14)#765
proggeramlug merged 1 commit into
mainfrom
worktree-refresh-benchmarks

Conversation

@proggeramlug
Copy link
Copy Markdown
Contributor

Summary

Full rerun of all four benchmark suites (compute polyglot, JSON polyglot, honest_bench, suite/) against Perry v0.5.908 on an otherwise-idle machine, plus a full doc refresh.

  • Confirms yesterday's apparent regressions were parallel-build contamination. A parallel cargo build (Ralph's issue-665-resolver-opt-in worktree) was running through part of the 2026-05-13 v0.5.891 sweep, inflating σ on every Perry compute cell. Today's idle-machine numbers drop σ from 25-57 ms to 0.3-2.2 ms — and Perry compute medians land back within 1-4 ms of the v0.5.585 historical baseline across all 9 cells.
  • Verifies RSS regression on JSON polyglot: 85→254 MB roundtrip, 100→411 MB parse-and-iterate (v0.5.279 → v0.5.891) #745 partial fix landed in v0.5.900. JSON polyglot RSS dropped 254 → 227 MB on roundtrip and 411 → 309 MB on parse-and-iterate. Wall-time is back to v0.5.279 levels. Residual ~150-200 MB gap vs the v0.5.279 floor flagged in #745 comment — the mark-sweep + no-lazy-tape path is actually slightly worse than v0.5.891, so the residual has its own root cause.
  • honest_bench: 300/300 output-matched rows. Perry slightly faster on all 3 workloads vs v0.5.891 (image_conv 365 → 354, json_small 39.6 → 39.2, json_full 1155 → 1098 ms).
  • suite/: method_calls back to 9 ms (yesterday's 25 ms was single-run noise from concurrent CPU). closure (50 ms) and factorial (107 ms) regressions vs v0.5.173 persist and are flagged as open follow-ups in benchmarks/suite/results/RESULTS.md.

Diff highlights

  • README.md, benchmarks/README.md, benchmarks/polyglot/RESULTS{,_AUTO,_OPT}.md, benchmarks/honest_bench/REPORT.md, benchmarks/json_polyglot/RESULTS.md — refreshed tables + prose at v0.5.908 / 2026-05-14
  • benchmarks/honest_bench/charts/*.png, results/results.json, results/metadata.json, results/summary.txt — regenerated from this sweep
  • benchmarks/suite/results/RESULTS.md — new file; the suite/run_benchmarks.sh runner doesn't write a permanent results file, so this committed one captures the v0.5.908 snapshot + delta tables vs v0.5.891 and v0.5.173 baselines
  • Historical comparison notes (vs v0.5.585, v0.5.891, v0.5.279) preserved throughout for trend visibility

Test plan

  • Compute polyglot RUNS=11 default mode → benchmarks/polyglot/RESULTS_AUTO.md body
  • Compute polyglot RUNS=11 PERRY_FAST_MATH=1 rerun → RESULTS_AUTO.md addendum
  • honest_bench (5 warmup + 20 measured, output-correctness gated) → 300/300 match
  • JSON polyglot RUNS=11 → benchmarks/json_polyglot/RESULTS.md
  • suite/ microbenchmarks single-run → benchmarks/suite/results/RESULTS.md
  • python3 scripts/report.py regenerates REPORT.md from results.json
  • python3 scripts/plot.py regenerates charts/*.png
  • All _TBD_ placeholders from yesterday's partial sweep filled in
  • grep for stale v0.5.891 / 2026-05-13 confirmed remaining hits are intentional historical references

No code changes — pure docs + bench results refresh. CI's cargo-test / parity / compile-smoke / api-docs-drift / security-audit paths shouldn't have anything to verify here.

Maintainer note: per CLAUDE.md flow, no version bump or CHANGELOG entry in this branch — those go on at merge time.

Full rerun of polyglot, JSON polyglot, honest_bench, and suite/
microbenchmarks on an otherwise-idle machine. Confirms that
yesterday's v0.5.891 sweep (#745 follow-up) was dominated by
parallel cargo-build contamination — σ on Perry compute cells
dropped from 25-57 ms to 0.3-2.2 ms.

Key results:
- Compute polyglot matches v0.5.585 historical numbers within
  1-4 ms across all 9 cells (default + --fast-math); fast-math
  cleanly reproduces 8× / 3.6× / 2.9× speedups on loop_overhead /
  math_intensive / accumulate.
- honest_bench: Perry slightly faster on all 3 workloads vs
  v0.5.891 (image_conv 365 → 354 ms; json_full 1155 → 1098 ms);
  300/300 output-matched rows.
- #745 partial fix verification: JSON polyglot RSS dropped
  254 → 227 MB roundtrip and 411 → 309 MB iterate after v0.5.900's
  GC trigger-ratchet fix. Residual ~150 MB gap vs v0.5.279 baseline
  flagged on the issue.
- suite/: method_calls back to 9 ms (yesterday's 25 ms was noise);
  closure/factorial regressions vs v0.5.173 persist as known
  follow-ups.

Docs refreshed: top-level README, benchmarks/README, polyglot
RESULTS{,_AUTO,_OPT}.md, honest_bench REPORT.md (+ regenerated
charts), json_polyglot RESULTS.md (auto), suite/results/RESULTS.md
(new). All with 2026-05-14 / v0.5.908 datestamps and historical
deltas vs v0.5.891 and v0.5.279.
@proggeramlug proggeramlug merged commit 8a7ea99 into main May 14, 2026
9 checks passed
@proggeramlug proggeramlug deleted the worktree-refresh-benchmarks branch May 14, 2026 08:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant