Skip to content

Commit 17fafdb

Browse files
committed
docs(bench): polyglot JSON suite + consolidated benchmarks/README.md (v0.5.241)
New benchmarks/json_polyglot/ — 6 implementations of identical 10k-record parse + stringify workload: - bench.ts (Perry / Bun / Node) - bench.go (encoding/json) - bench.rs + Cargo.toml (serde_json) - bench.swift (Foundation) - bench.cpp (nlohmann/json) - run.sh runs each best-of-5 with idiomatic + optimized flag profiles Results on M1 Max / macOS 26.4 (best of 5, ms / MB): perry (gen-gc + lazy tape) optimized 65 / 85 ← LEAD rust serde_json LTO+1cgu optimized 180 / 11 rust serde_json idiomatic 192 / 11 bun idiomatic 242 / 80 perry (mark-sweep+nolazy) idiomatic 351 / 102 node idiomatic 359 / 182 c++ -O3 -flto (nlohmann) optimized 778 / 25 go (encoding/json) 785 / 22 c++ -O2 (nlohmann) idiomatic 843 / 25 swift -O -wmo (Foundation) optimized 3706 / 33 Perry leads on time (2.8× over Rust LTO, 3.7× over Bun, 12× over Go/C++). RSS mid-pack: beats Node, ties Bun, 8× higher than typed-struct languages (fundamental to dynamic JSON parsing). New benchmarks/README.md — single GitHub-renderable consolidated page: - TL;DR tables (JSON polyglot + compute microbench) - Methodology, hardware, fairness statement - Full compiler-flag table per language (idiomatic + optimized) - JSON library choice rationale (nlohmann is "popular default", not fastest) - Honest disclaimers per cell (lazy-tape workload-specificity, Rust RSS fundamental to typed deserialization, Go/Swift floor explanations) - Memory + GC stability suite reference - Strengths section (4 wins with one-line "why" each) - Weaknesses section (8 known gaps with tracking refs) - "What this page does not measure" honesty section - Reproducing instructions - Design / implementation reference links The point: ONE page a skeptical reader can open and verify every claim. "Numbers that don't survive scrutiny don't belong here." Dependencies added: brew install nlohmann-json (3.12.0) for the C++ bench.
1 parent 18c2b01 commit 17fafdb

15 files changed

Lines changed: 1211 additions & 29 deletions

File tree

CHANGELOG.md

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,64 @@
22

33
Detailed changelog for Perry. See CLAUDE.md for concise summaries.
44

5+
## v0.5.241 — Polyglot JSON benchmark suite + consolidated `benchmarks/README.md`. The repo previously had two benchmark sources — `benchmarks/polyglot/` (8 compute microbenches across 10 runtimes, last refreshed at v0.5.164) and `benchmarks/suite/` (Perry-only roundtrip / readonly / GC-pressure benches). What was missing was a JSON encoding/decoding comparison against native runtimes (Go, Rust, C++, Swift) and a single consolidated page that a skeptical reader could open, see every benchmark in context, and verify Perry's claims. This commit fixes both.
6+
7+
**`benchmarks/json_polyglot/`** — new directory, 6 benchmark implementations of the identical 10k-record / ~1 MB blob / 50-iteration parse + stringify workload:
8+
9+
- `bench.ts` — TypeScript (runs on Perry, Bun, Node)
10+
- `bench.go` — Go using `encoding/json`
11+
- `bench.rs` + `Cargo.toml` — Rust using `serde_json`
12+
- `bench.swift` — Swift using `Foundation.JSONEncoder`/`JSONDecoder`
13+
- `bench.cpp` — C++ using nlohmann/json (the de facto standard library for C++ JSON)
14+
- `run.sh` — runner that compiles each implementation under both *idiomatic* (default release-mode flags most projects use) and *optimized* (aggressive LTO + single codegen unit + fast-math where applicable) profiles, runs each best-of-5, captures wall-clock ms and peak RSS via `/usr/bin/time -l`, and writes a sorted markdown table to `RESULTS.md`.
15+
16+
**Each language listed twice in the results.** Idiomatic and optimized profiles separately, so a reader who points out "Perry's defaults are themselves aggressive" sees both the default-build floor and the full optimization ceiling for every comparison runtime. The point: meet skeptics on their own ground.
17+
18+
**Run on 2026-04-25 macOS arm64 M1 Max:**
19+
20+
| Implementation | Profile | Time (ms) | Peak RSS (MB) |
21+
|---|---|---:|---:|
22+
| **perry (gen-gc + lazy tape)** | optimized | **65** | 85 |
23+
| rust serde_json (LTO+1cgu) | optimized | 180 | 11 |
24+
| rust serde_json | idiomatic | 192 | 11 |
25+
| bun | idiomatic | 242 | 80 |
26+
| perry (mark-sweep, no lazy) | idiomatic | 351 | 102 |
27+
| node | idiomatic | 359 | 182 |
28+
| node --max-old=4096 | optimized | 362 | 181 |
29+
| c++ -O3 -flto (nlohmann/json) | optimized | 778 | 25 |
30+
| go (encoding/json) | optimized | 785 | 22 |
31+
| go (encoding/json) | idiomatic | 785 | 24 |
32+
| c++ -O2 (nlohmann/json) | idiomatic | 843 | 25 |
33+
| swift -O -wmo (Foundation) | optimized | 3706 | 33 |
34+
| swift -O (Foundation) | idiomatic | 3710 | 34 |
35+
36+
**Perry leads on time across the entire field**: 2.8× over Rust serde_json LTO, 3.7× over Bun, 5.5× over Node, 12× over Go encoding/json and C++ nlohmann/json, 57× over Swift Foundation. **Perry's RSS is mid-pack**: 85 MB beats Node's 182 MB, ties Bun (80 MB), is 8× higher than typed-struct languages (Rust 11 MB, Go 22 MB, C++ 25 MB). The 8× RSS gap to typed-struct languages is fundamental to dynamic JSON parsing — every parsed value is a heap-allocated NaN-boxed object — and is acknowledged explicitly in the consolidated readme. The fix is typed JSON parse (`JSON.parse<T>(blob)`), tracked as `docs/json-typed-parse-plan.md` (Step 1 done in v0.5.200).
37+
38+
Honest disclaimers documented per-cell:
39+
40+
- Perry's lazy-tape win is workload-specific (parse-then-iterate-every-element is a net loss; parse-then-`.length`-or-stringify is what this bench measures).
41+
- Rust's RSS lead is fundamental to typed deserialization, not an unfair advantage.
42+
- Go's "optimized" ≈ idiomatic — `-ldflags="-s -w" -trimpath` strips debug info, no perf delta. Go has no `-ffast-math` or `reassoc` flag; some compute deltas are unrecoverable in stock Go.
43+
- Swift's slow time is real, not a setup problem. Foundation JSON goes through `Mirror`-based reflection on `Codable` types and is genuinely slow on macOS. swift-json is faster; not included because Foundation is the standard.
44+
- nlohmann/json is the de facto popular C++ library, not the fastest. Replacing it with simdjson would beat Perry on parse-only workloads (no stringify support).
45+
46+
**`benchmarks/README.md`** — new consolidated landing page. Single GitHub-renderable markdown page that pulls together every benchmark in the repo. Sections:
47+
48+
1. **TL;DR** — JSON polyglot table + compute microbench table at-a-glance.
49+
2. **How to read this page** — methodology, hardware, fairness statement, what idiomatic vs optimized means.
50+
3. **JSON polyglot — full data** — workload TypeScript code, full compiler-flag table per language, library choice rationale, honest disclaimers.
51+
4. **Compute microbenches — full data** — links to existing `benchmarks/polyglot/RESULTS.md` + `RESULTS_OPT.md`, summary of where Perry wins (`reassoc contract` on f64 ops giving LLVM the freedom to autovectorize) and where C++ closes the gap with `-O3 -ffast-math`.
52+
5. **Memory + GC stability** — links to `scripts/run_memory_stability_tests.sh`, RSS-history table for `bench_json_roundtrip` direct path showing the v0.5.193 → v0.5.236 drop from 213 MB → 107 MB.
53+
6. **Strengths** — JSON parse+stringify roundtrip, f64 tight loops, object allocation in tight loops, generational GC defaults that adapt.
54+
7. **Weaknesses** — RSS on dynamic-JSON, stop-the-world GC, no old-gen compaction, shadow stack still parallel-not-replacing the conservative scanner, TypeScript parity gaps, no JIT, single-threaded by default, non-incremental compilation.
55+
8. **What this page does not measure** — GC tail latency, JIT warmup, async/await, I/O, realistic application workloads, contention, compile time / binary size.
56+
9. **Reproducing** — exact commands per benchmark suite.
57+
10. **Design / implementation references** — links to all 6 design plan docs.
58+
59+
The closing line: "If you spot something that looks unfair, biased, or wrong: open an issue at https://github.com/PerryTS/perry/issues … the point of this page is to be defensible, not to win. Numbers that don't survive scrutiny don't belong here."
60+
61+
`brew install nlohmann-json` added as a dependency for the C++ bench (v3.12.0). Build clean across all 7 languages on this M1 Max + macOS 26.4 setup.
62+
563
## v0.5.240 — Gen-GC docs: academic + industry lineage appendix. Adds a defensibility section to `docs/generational-gc-plan.md` mapping each phase of Perry's GC architecture to its canonical paper and a list of shipping VMs that use the same techniques. The point of the appendix isn't to claim novelty — the opposite: every design decision traces to a paper or a real-world VM that does the same thing. The contribution Perry makes is engineering, not algorithms.
664

765
**Single strongest reference: Bartlett 1988, *Mostly Copying Garbage Collection*** (DEC SRC Technical Note TN-13). This describes Perry's C4b almost verbatim — conservative scan of registers + C stack discovers candidate pointers and pins them; precise scan of heap fields finds movable objects; forwarding pointers in evacuated objects' headers; pinned objects stay in place. Perry's `CONS_PINNED` HashSet, `pin_currently_marked_as_conservative` helper, `GC_FLAG_FORWARDED` flag, and `rewrite_forwarded_references` walker collectively implement Bartlett's algorithm in Rust. The generational extension follows Ungar 1984 (*Generation Scavenging*).

CLAUDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
88

99
Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.
1010

11-
**Current Version:** 0.5.240
11+
**Current Version:** 0.5.241
1212

1313
## TypeScript Parity Status
1414

@@ -149,6 +149,7 @@ First-resolved directory cached in `compile_package_dirs`; subsequent imports re
149149

150150
Keep entries to 1-2 lines max. Full details in CHANGELOG.md.
151151

152+
- **v0.5.241** — Polyglot JSON benchmark suite + consolidated `benchmarks/README.md`. New `benchmarks/json_polyglot/` runs identical 10k-record / ~1 MB blob / 50-iteration parse + stringify workload across Perry / Bun / Node / Go / Rust serde_json / Swift Foundation / C++ nlohmann. Each language listed twice (idiomatic + optimized flag profile) so skeptics see both the default-build floor and the aggressive-tuning ceiling. **Perry leads on time at 65 ms** vs Rust LTO 180 ms / Bun 242 ms / Node 359 ms / C++ 778 ms / Go 785 ms / Swift 3706 ms. Perry's RSS (85 MB) is mid-pack — beats Node (182 MB), comparable to Bun (80 MB), 8× higher than typed-struct languages (Rust 11 MB, Go 22 MB). Honest framing called out: dynamic-typing fundamental cost, lazy-tape workload-specificity, library-choice (nlohmann vs simdjson) trade-offs. New `benchmarks/README.md` consolidates everything into ONE GitHub-renderable page: TL;DR tables, methodology, full compiler-flags table, JSON polyglot results, compute microbench results (linked from existing `benchmarks/polyglot/`), memory + GC stability suite, strengths, weaknesses, "what this doesn't measure" honesty section, reproducing instructions, design-doc links. Single page for defensibility — every implementation, every flag, every methodology decision in one place.
152153
- **v0.5.240** — Gen-GC docs: academic + industry lineage appendix added to `docs/generational-gc-plan.md`. Maps each phase (A/B/C/C4/C4b/D) to its canonical paper and lists shipping VMs that use the same techniques (V8, JSC, HotSpot, SpiderMonkey, .NET, OCaml, Mono, Go, LuaJIT). Single strongest reference: **Bartlett 1988 *Mostly Copying Garbage Collection*** (DEC SRC TN-13) — Perry's `CONS_PINNED` + `GC_FLAG_FORWARDED` + reference rewriting + conservative-pin policy is essentially Bartlett's algorithm in Rust extended to the generational case (per Ungar 1984). 8-paper bibliography + textbook reference (Jones/Hosking/Moss *Garbage Collection Handbook*). Defensibility material: every design decision traces to a paper or shipping VM doing the same thing. No code changes.
153154
- **v0.5.239** — Gen-GC **roadmap complete (architectural)**: `docs/generational-gc-plan.md` Log table filled in with 21 commits across Phases A→D. The original Phase D scope listed a conservative-scanner shrink ("scan only the C stack below JS frames") as a sub-goal; **deferred** with rationale documented in plan §Deferred-follow-ups. The naive simple-shrink (skip ranges by SP-at-push only) is unsafe — Rust runtime frames sandwiched between JS frames (e.g., `js_array_map` between caller-JS and callback-JS) hold JSValue locals that need conservative coverage; skipping them prematurely frees live objects. A correct implementation requires platform-specific frame-pointer chain walking on entry to `js_shadow_frame_push`, with deep alternating-call test coverage. Conservative-scan time is sub-1% of every measured benchmark, so the optimization deferred is genuinely marginal. Phase D's three primary ship criteria — `PERRY_GEN_GC=1` default, escape hatch retained, docs updated — are all met by v0.5.237/238/239.
154155
- **v0.5.238** — Gen-GC **Phase D part 2 prep**: flip `PERRY_SHADOW_STACK` codegen default to ON. Every compiled JS function now emits shadow-stack push/pop in its prologue/epilogue plus slot-set calls at every safepoint, giving the GC tracer a precise view of pointer-typed locals in JS frames. `PERRY_SHADOW_STACK=0`/`off`/`false` opts out (bisection escape hatch). Sets up the precondition for the next commit's conservative-scanner shrink — without shadow stack live, dropping JS-frame conservative coverage would lose pointer roots. Bench impact within noise on every measured workload (`bench_json_roundtrip` direct 380 ms / 107 MB, lazy 68 ms / 90 MB, `07_object_create` 0-1 ms / 6.5 MB, `bench_gc_pressure` 16-17 ms / 26.6 MB — all ±1 ms vs `PERRY_SHADOW_STACK=0`). 168/168 unit tests, 9/9 `test_json_*.ts` × 4 modes, 18/18 + 6/6 memory-stability all clean.

Cargo.lock

Lines changed: 27 additions & 27 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ opt-level = "s" # Optimize for size in stdlib
109109
opt-level = 3
110110

111111
[workspace.package]
112-
version = "0.5.240"
112+
version = "0.5.241"
113113
edition = "2021"
114114
license = "MIT"
115115
repository = "https://github.com/PerryTS/perry"

0 commit comments

Comments
 (0)