Skip to content

Commit 9b99e07

Browse files
committed
perf(sso): Step 2 flip — emit SSO by default (v0.5.216)
DirectParser::parse_string_value and tape-path materialize_string_value both now emit inline SHORT_STRING_TAG values for strings ≤ 5 bytes unconditionally. Zero heap allocation on the short-string hot path. PERRY_SSO_FORCE env var retained as no-op for script compatibility. Also fixed js_number_coerce in builtins.rs to accept SSO strings via is_any_string() + str_bytes_from_jsvalue decoder — the one regression surfaced by flipping the default (caught on test_gap_json_advanced's reviver test: `Number(sso)` was returning NaN). Now correctly decodes SSO into a stack scratch buffer and parses bytes as f64. Measurements (best-of-5, macOS ARM64): bench_sso_strings — synthetic short-string workload (20k records × 4 string values × 30 iters, all ≤ 5 bytes): direct pre-flip: 290 ms / 123 MB direct+SSO: 150 ms / 76 MB 1.9× faster, 38% less RSS Node 25.8: 250 ms / 91 MB Perry 1.7× faster, 17% less RSS Bun 1.3.12: 130 ms / 71 MB Perry 15% slower, 7% more RSS (closest Perry has gotten to Bun on a short-string workload) Main JSON benches (lazy-default path): bench_json_roundtrip: 80 ms / 108 MB (was 90/130 at v0.5.215) bench_json_readonly: 80 ms / 90 MB (unchanged) bench_json_readonly_indexed: 90 ms / 90 MB (unchanged) The main benches are dominated by .length + stringify fast paths where string materialization isn't on the hot path, so SSO wins there are small. The win shows on workloads that hit direct parse (< 1 KB blobs, non-array roots, opt-out) or force-materialize via indexed access. Full test sweep: 10/10 test_json_*.ts match Node byte-for-byte Runtime tests 136/136 Gap tests 25/28 (test_gap_json_advanced flipped back to passing) Fastify 5/5 Thread 4/4 Parity 117/130 (up from 106/118 — new corpus tests added; pre-existing fails unchanged in noise) Step 2 of docs/sso-migration-plan.md is now done. Steps 3+ (object key storage, string methods, codegen literals, stdlib) are deferred — the measured main-bench win is modest, so the effort-reward ratio for deeper migration isn't tier-1 priority anymore. Most of the SSO value is captured by the producer flip.
1 parent 35faf4b commit 9b99e07

7 files changed

Lines changed: 125 additions & 84 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
88

99
Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.
1010

11-
**Current Version:** 0.5.215
11+
**Current Version:** 0.5.216
1212

1313
## TypeScript Parity Status
1414

@@ -148,6 +148,7 @@ First-resolved directory cached in `compile_package_dirs`; subsequent imports re
148148
Keep entries to 1-2 lines max. Full details in CHANGELOG.md.
149149

150150
- **v0.5.205** — Fix #183: `perry compile --target web` on a real-world app (Bloom Jump built on the Bloom engine) produced a WASM binary the browser refused to load — `Compiling function #687 failed: expected 0 elements on the stack for fallthru, found 103` (count varies with engine state). Root cause in `crates/perry-codegen-wasm/src/emit.rs`: the four direct-`Call`-instruction code paths — `Expr::Call` FuncRef arm (~4302), `Expr::Call` ExternFuncRef arm (~4324), `Expr::New` user-class ctor (~5844), `Expr::SuperCall` parent-ctor (~5894), `Expr::StaticMethodCall` direct-static path (~5979) — each emit `emit_expr(arg)` per source arg and pad up with `TAG_UNDEFINED` when `args.len() < expected`, but had no matching drop-excess branch when `args.len() > expected`. WASM `call` consumes exactly the callee's declared param count, so when JS's "extra args evaluated for side effects, then silently ignored" semantics met Perry's WASM codegen, every extra evaluated arg leaked past the call and accumulated on the enclosing block's operand stack — 103 values by the time `_start`'s final `end` hit the validator. The shape that triggered it in jump/bloom was `bloom/src/core/colors.ts`'s `Colors = new __AnonShape_2(...24 PropertyGets...)` landing on a Phase-3-synthesized ctor with lower declared arity, multiplied across bloom's 10 submodules. Fix: after each existing `for _ in args.len()..expected { I64Const(TAG_UNDEFINED) }` pad-up loop, add the mirror `for _ in expected..args.len() { Drop }` — matches JS semantics (extras evaluated for side effects but discarded) and keeps the operand stack aligned with the callee's WASM type at every direct-Call site. Verified end-to-end against the exact issue repro cloned fresh from `github.com/Bloom-Engine/jump` + `github.com/Bloom-Engine/engine`: both path A `file:./vendor/bloom/` and path B `file:../engine/` now compile to a WebAssembly-validating `.wasm` (416,923 / 413,780 bytes respectively, 140 FFI imports intact, `WebAssembly.compile` resolves clean on node 20+); a synthetic `takesFive(mc(),mc(),1,2,3,4)` minimal case that previously failed `Compiling function #213 failed: ... found 1` also validates. `cargo test --release -p perry-runtime -p perry-hir -p perry-codegen-wasm -p perry`: 262/262 passed. Note: issue #183 also claimed path A found only 1 module and emitted 9 FFI imports — could not reproduce in a fresh clone (both paths find 10 modules identically); most likely an artifact of the reporter's local `vendor/bloom` snapshot predating the `exports` map, and the "runGame silently no-ops" symptom the user actually observed was the browser refusing to instantiate the invalid WASM with the surrounding JS glue swallowing the error — fixed here.
151+
- **v0.5.216** — SSO Step 2 flip: `DirectParser::parse_string_value` AND tape-path `materialize_string_value` both now emit inline `SHORT_STRING_TAG` values for strings ≤ 5 bytes unconditionally. Zero heap allocation on the short-string hot path. `PERRY_SSO_FORCE` env var retained as no-op for script compatibility — all values fall through to the unconditional emit. Also fixed `js_number_coerce` in `crates/perry-runtime/src/builtins.rs` to accept SSO strings via `is_any_string()` + `str_bytes_from_jsvalue` decoder, so `Number("42")` (where `"42"` is an SSO value from `JSON.parse` with a reviver) correctly produces `42` — this was the one real regression surfaced by flipping the default (caught on `test_gap_json_advanced`'s reviver test). Measured win on the `bench_sso_strings.ts` synthetic (20k records × 4 short string values × 30 iters — every string ≤ 5 bytes so prime SSO territory): direct path 290 ms / 123 MB → **150 ms / 76 MB** (1.9× faster, 38% less RSS). vs Node 250 ms / 91 MB → Perry 1.7× faster / 17% less RSS; vs Bun 130 ms / 71 MB → Perry 15% slower / 7% more RSS (closest Perry has gotten to Bun on a short-string workload). On the main JSON benches the direct-forced path sees 7-12% time / 2-5% RSS wins; the lazy default path (PERRY_JSON_TAPE unset) sees a smaller 0-10% win depending on shape because the main benches are dominated by numbers + medium-length strings that don't fit SSO. Full test sweep: 10/10 `test_json_*.ts` match Node byte-for-byte; runtime tests 136/136; gap tests 25/28 (`test_gap_json_advanced` flipped from fail → pass, matching baseline); fastify 5/5; thread 4/4; parity 117/130 (up from 106/118 at v0.5.215 — new tests were added to the parity corpus; within noise on pre-existing fails). Step 2 of `docs/sso-migration-plan.md` is now done. Step 3 (object key storage) and later steps are deferred — the measured main-bench win is modest, so the effort-reward ratio for deeper migration isn't worth tier-1 priority anymore; most of the SSO value is already captured by the producer flip.
151152
- **v0.5.215** — SSO Step 1.5 — codegen PropertyGet three-way branch for SHORT_STRING_TAG receivers. Closes the last two `test_json_*.ts` regressions that failed under `PERRY_SSO_FORCE=1` at v0.5.214. Root cause: `crates/perry-codegen/src/expr.rs::Expr::PropertyGet` receiver-validity guard at ~line 2647 masked `tag & 0xFFFD` and checked `== 0x7FFD`. SSO (0x7FF9) failed the guard, routed to the "invalid" block, returned `undefined`. Widening the mask was not safe (the subsequent PIC fast path's `*(obj_handle + 16)` read lands in arbitrary userspace memory for SSO receivers, verified crashes). Fix: added an explicit `is_sso = tag == 0x7FF9` check BEFORE the existing POINTER/STRING validity check; SSO values now route to a dedicated `sso_idx` block that calls `js_object_get_field_by_name_f64(obj_bits, key_handle)` directly — which has an SSO-aware entry (v0.5.214) that returns `.length` from the NaN-box length byte and `undefined` for other keys. PIC fast path / invalid branch / SSO branch all merge into the existing `final_merge_idx` via a three-way phi. No change to the hot path for POINTER/STRING receivers (the `is_sso` check is predicted "not SSO" and branch-folded by LLVM). Also fixed `js_array_join` in `crates/perry-runtime/src/array.rs` to handle SSO elements inline (was falling through to `is_number()` → "NaN" output on `arr.join(",")` when elements were SSO strings); new branch decodes SSO to a stack scratch buffer and pushes bytes directly without a heap roundtrip via `materialize_to_heap`. **Final test result: 10/10 `test_json_*.ts` tests match Node byte-for-byte under both default and `PERRY_SSO_FORCE=1` modes.** Broader sweep: Runtime tests 136/136; gap tests 25/28 (up from 24/28 at v0.5.214 — one pre-existing compile failure unrelated to SSO flipped positive); fastify tests 5/5; thread tests 4/4. Step 1 of the SSO migration is now fully complete. Step 2 (flip default to `PERRY_SSO_FORCE=1`) requires the measured perf win to justify — the short-string-heavy benchmark pending, will measure and decide.
152153
- **v0.5.214** — SSO Step 1 consumer-arm migration (follow-up to v0.5.213 infrastructure landing). Added `PERRY_SSO_FORCE=1` env-var gate (cached via `OnceLock` in `crates/perry-runtime/src/json.rs::sso_emit_enabled`) that flips `DirectParser::parse_string_value` to emit inline SSO values for strings ≤ 5 bytes — default OFF, used exclusively by the migration test matrix. Added parallel `SHORT_STRING_TAG` arms to every `== STRING_TAG` dispatch in `json.rs` stringify paths: `stringify_object_inner` field-value inline dispatch + replacer block, `stringify_array_depth` element inline dispatch, `extract_string_array`, the 3 `replaced_tag` sites on the replacer spacer paths, `js_json_stringify_full` top-level replacer arm, and the spacer-as-string check (for `JSON.stringify(obj, null, " ")` with short indent). Runtime additions: `js_jsvalue_to_string` now materializes SSO to a heap `StringHeader` via `js_string_materialize_to_heap` for the common "caller needs `*mut StringHeader`" contract; `js_object_get_field_by_name` handles `.length` on an SSO receiver by reading the length byte directly from the NaN-box payload (returns `JSValue::undefined()` for other keys on SSO values, matching the string-property baseline). Measured: 8 out of 10 `test_json_*.ts` tests match Node byte-for-byte under `PERRY_SSO_FORCE=1`; all 10 match under default (SSO-off) mode — no user-visible regressions from the infrastructure landing. Remaining 2 failures are both caused by **Step 1.5** (new section in `docs/sso-migration-plan.md`): the codegen's PropertyGet receiver-validity guard at `crates/perry-codegen/src/expr.rs:~2647` masks `tag & 0xFFFD` and checks `== 0x7FFD`, which accepts POINTER_TAG + STRING_TAG but rejects SHORT_STRING_TAG (0x7FF9). SSO receivers fall to the "invalid" branch → return `undefined`. Widening the mask to `0xFFF9` accepts SSO but the PIC fast path's subsequent `*(obj_handle + 16)` read lands in arbitrary userspace memory for SSO values (the low 48 bits are SSO data, not a heap pointer) — verified: widening without further guarding crashed 2 tests under SSO_FORCE. Safe fix is a three-way codegen branch: POINTER/STRING → PIC fast path, SSO → call `js_object_get_field_by_name_f64` directly skipping the PIC memory read, else → invalid. Estimated ~2 hours, one codegen site — scheduled as Step 1.5. Runtime tests 136/136 (unchanged — no new unit tests added this commit). Also verified: no infrastructure crashes on stringify / equality / comparison / typeof / length paths when SSO values do reach them from the runtime side.
153154
- **v0.5.213** — Small String Optimization (SSO) infrastructure (tier 1 #2 per `docs/memory-perf-roadmap.md`). **Infrastructure-only landing**; no creation sites migrated yet. New tag `SHORT_STRING_TAG = 0x7FF9_0000_0000_0000` encoding strings of length 0..=5 inline in the 48-bit NaN-box payload (8-bit length at bits 40..47 + 5 bytes of data at bits 0..39). Zero heap allocation for short strings when emitted — the value IS the data. Added: `JSValue::try_short_string(&[u8])` (constructor), `short_string_to_buf` / `short_string_len` (decoders), `is_short_string` / `is_any_string` (predicates, with `is_string` kept strict for legacy call sites that rely on `as_string_ptr` returning a real heap pointer), `js_string_new_sso(ptr, len) -> f64` (SSO-aware creation that falls back to heap for long inputs), `str_bytes_from_jsvalue(value, &mut scratch)` (central decoder producing `(*const u8, u32)` view for either representation), `js_string_materialize_to_heap(value)` (compatibility shim that allocates a heap StringHeader from an SSO value). Consumer-side dispatch already wired in: `typeof` (builtins.rs, accepts both tags), `js_jsvalue_equals` + `js_jsvalue_compare` (value.rs — SSO fast path when both operands are SSO because encoding is canonical, otherwise decode via scratch buffers and byte-compare), `js_value_length_f64` (direct bit extraction for SSO, no heap access), `js_jsvalue_to_string` (materializes SSO to heap when caller needs `*mut StringHeader`), three stringify arms in json.rs (top-level `stringify_value`, object field inline dispatch in `stringify_object_inner`, array element inline dispatch in `stringify_array_depth`). 6 new unit tests in `value::tests` cover roundtrip, rejection of 6+ byte inputs, embedded-NUL handling (length is authoritative), tag-band distinctness from POINTER/INT32/NUMBER/UNDEFINED, empty-string roundtrip, and byte-order stability (first byte lands in LSB of payload — invariant relied on by any future SIMD bulk-decoder). **Why infrastructure-only:** flipping `DirectParser::parse_string_value` to emit SSO without first auditing every consumer produces regressions — `grep "== STRING_TAG" crates/perry-runtime/src/json.rs` alone shows 20+ sites, and the broader consumer surface spans object.rs property-get helpers, string.rs methods (split/replace/slice/indexOf/etc.), regex.rs match extractors, set.rs/map.rs key equality, stdlib HTTP/DB paths, and codegen string-literal emission. Attempting the flip in-session reproduced the hazard: 3 `test_json_lazy_*.ts` tests diffed from Node with stringify emitting `"null"` where SSO values should have decoded. Rolled back the producer flip; kept every consumer arm already added so Step 1 of the migration is ~50% complete. New doc `docs/sso-migration-plan.md` sequences the 6-step roll-out (stringify consumers → DirectParser emit → object key storage → string methods → codegen literals → stdlib) with per-step ship criteria and a decision gate after Step 2 to re-evaluate whether Steps 3-6 are worth the effort vs jumping to tier 2/3 (escape analysis + generational GC). Runtime tests 130 → 136 (added 6 SSO tests). All 10 existing `test_json_*` regressions green under infrastructure-only landing.

Cargo.lock

Lines changed: 27 additions & 27 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ opt-level = "s" # Optimize for size in stdlib
109109
opt-level = 3
110110

111111
[workspace.package]
112-
version = "0.5.215"
112+
version = "0.5.216"
113113
edition = "2021"
114114
license = "MIT"
115115
repository = "https://github.com/PerryTS/perry"

0 commit comments

Comments
 (0)