Context: docs/memory-perf-roadmap.md tier 1 is landed — Perry beats Node on
both time and RSS on bench_json_roundtrip (373 ms / 144 MB vs 385 ms /
188 MB). Bun is still ~1.5× ahead on both (250 ms / 83 MB). The remaining
gap is not fixable by incremental GC tuning — it's architectural. This
doc captures the plan to close it.
Goal: beat Bun on both time and peak RSS on bench_json_roundtrip
(and ideally on a broader benchmark set) without breaking Node
compatibility.
Node compatibility is a hard constraint. At runtime, every change
here must compile down to bytecode indistinguishable from JSON.parse(…)
semantics that Node/V8 will execute identically. The <T> type
argument is compile-time-only in TypeScript and is fully erased by
tsc, so Node never sees it — safe.
The benchmark's parse → tree → stringify loop has a middle
representation — a 60k-object JSValue tree — that exists only so
arbitrary JS code could inspect it. On bench_json_roundtrip,
nothing does beyond parsed.length. All three of (Bun, Node, simdjson)
accept this middle step as a given. That's the wheel worth reinventing.
Perry is uniquely positioned to attack this because:
- TypeScript types at compile time → know the shape up front
- Compiles to native code → can specialize per call site
- Already does type-directed codegen elsewhere (Buffer intrinsics, typed arrays, i32 loop counters)
Four steps, landing in order. Each is independently shippable, each compounds on the prior.
What: accept an optional TypeScript type argument on JSON.parse.
When the compiler can see T as a concrete object or array type at the
call site, generate a specialized parser path that:
- Pre-builds the expected
keys_array(no shape-cache lookup per record) - Allocates objects with the right field count (no overflow map)
- Skips hash-lookup on keys in the expected order (but tolerates out-of-order and missing fields — JSON is unordered)
- Uses typed extraction for known leaf types (f64 for number, etc.)
Compatibility:
- Runtime: identical to
JSON.parse(blob)— the<T>erases. - TypeScript: add
parse<T>(text, reviver?): Toverload in Perry's ambient types. Projects using stocklib.es5.d.tsuseJSON.parse(blob) as T— also erases to the same runtime call. - Fallback: when the input doesn't match T (extra/missing/wrong-typed fields), fall through to the generic parser. Correctness-preserving.
Design:
Type descriptor, built at codegen time as static data:
enum TypeKind { Any, Number, String, Boolean, Null, Object, Array }
struct TypeDescriptor {
kind: TypeKind,
// OBJECT: list of expected fields
fields: *const FieldDescriptor,
field_count: u32,
// ARRAY: element type
element: *const TypeDescriptor,
}
struct FieldDescriptor {
name_ptr: *const u8,
name_len: u32,
type_ptr: *const TypeDescriptor,
}Call-site codegen emits one static descriptor per distinct type shape,
dedup-ed at module scope. The static descriptor lives in .rodata —
zero runtime cost to build.
Runtime entry: js_json_parse_typed(blob, *const TypeDescriptor) -> JSValue.
Walks the descriptor tree as it parses; on shape miss, tail-calls
js_json_parse with the same input.
Expected win on bench_json_roundtrip: ~20-40% parse speedup.
Not enough to beat Bun alone — sets up infrastructure for Step 2.
Scope:
crates/perry-hir: recognizeJSON.parse<T>in the type-argument position; carry T through to HIR.crates/perry-codegen: emit static type descriptors + routed call.crates/perry-runtime/src/json.rs: newjs_json_parse_typed.- Types overlay: add
parse<T>(…): Toverload.
Tests: new test-files/test_json_typed_*.ts — must not touch existing
test_json_*.ts files. Parity test against Node's JSON.parse(blob) as T
semantics (identical, since the type is erased).
Benchmarks: new benchmarks/suite/bench_json_typed_roundtrip.ts —
mirrors bench_json_roundtrip but adds <Item[]> type argument.
Side-by-side measurement keeps both benches, shows the delta.
What: replace DirectParser::parse_value with a two-phase design:
Phase 1 — tape build. One SIMD-friendly pass over the blob, emits a
flat Tape of (offset, kind) structural positions. No tree, no
JSValue allocation, no strings. Output size bounded by input size. On
bench_json_roundtrip's 1 MB blob, the tape is ~100 KB of u32s.
Phase 2 — lazy materialization. JSON.parse returns a TapedJsValue
— a small handle (tape pointer, root position, blob pointer) wrapped in
a JSValue. Property access, iteration, and .length read from the tape
and decode on demand. Full materialization only happens if the user:
- Mutates a field (
parsed.x = 5) → materialize the enclosing object and all ancestors to the root, then switch to tree-mode for that path - Passes the value to an FFI boundary that reads bytes opaquely
- Calls a method that walks structurally (Object.keys, for…in)
Subtrees never touched stay as tape views forever. On
bench_json_roundtrip where only .length is read, 99% of the tree
never materializes.
Stringify pairing: if a TapedJsValue hasn't been mutated, stringify
is a memcpy of the relevant blob bytes (with re-escaping handled by the
tape kinds). Zero tree walk. If mutated, fall through to the generic
stringifier.
Compatibility:
- Runtime:
JSON.parse(blob)returns a value that behaves indistinguishably from the current tree for all observable operations (property access, iteration, stringify). Performance characteristics differ. - No source-code changes required — purely a runtime refactor.
- All existing JSON tests should pass unchanged.
Design challenges:
- NaN-boxing representation for
TapedJsValue(new pointer type vs. reuse POINTER_TAG with a discriminant flag in the header) - Proxy semantics: property access on tape views dispatches through a
helper that walks the tape; must maintain pointer identity where
required (
a === amust hold for the same tape path) - Write barrier: mutation triggers materialization — needs to propagate the new tree pointer back through all referring JSValues
Expected win on bench_json_roundtrip: RSS ≤50 MB (below Bun's 83 MB),
time ≤150 ms (below Bun's 250 ms).
Scope:
crates/perry-runtime/src/json.rs: tape types, tape builder (SIMD structural scan), taped parser entrycrates/perry-runtime/src/value.rs: new tag or flag for taped values,is_tapedpredicatecrates/perry-runtime: object/array accessors route throughmaterialize_if_tapedhelper- Every existing is_object/is_array consumer audited for taped-view semantics
Tests: new test-files/test_json_lazy_*.ts exercising
pure-read, single-mutation, full-materialization paths. All existing
test_json_*.ts must continue passing byte-for-byte against Node.
Benchmarks:
benchmarks/suite/bench_json_lazy_readonly.ts— read.lengthonly, never touch fields. Expected: ~10× faster, ~20× less RSS than current.benchmarks/suite/bench_json_lazy_full.ts— touch every field, forcing full materialization. Expected: at most 2× slower than current (the tape-then-materialize overhead).
What: young nursery + old space. Precise root tracking via codegen shadow stacks. Non-moving within a generation.
Why still: Step 1 + Step 2 solve the JSON case. Other workloads that allocate short-lived temporaries (array comprehensions, string building, iterator chains) still get Perry's current flat-arena treatment. Generational GC closes the Bun gap for them too.
Scope / risk: 3-4 weeks. See docs/memory-perf-roadmap.md tier 3 #6.
What: when a taped value is mutated, record the mutation in a path-indexed overlay instead of materializing. On stringify, emit un-mutated ranges as memcpy from the blob, emit mutated paths through the generic stringifier, splice.
Expected win: 100× on the parse → mutate-1% → stringify pattern.
Parked: too speculative until Step 2 proves out. The tape infrastructure is a prerequisite.
- Step 1a:
JSON.parse<T>signature + codegen pass-through (no fast path yet — just carry the type argument end-to-end without regressing anything) - Step 1b:
js_json_parse_typedwith pre-built shape for top-level object. Add tests, benchmarks. Measure. - Step 1c: extend to
Array<T>and nested object types. - Gate: schema-directed parse working on the new typed benchmark. Ship as a minor version bump.
- Step 2a: tape types + builder (no consumer yet). Unit tests for the builder itself.
- Step 2b: taped parser entry behind a feature flag
(
PERRY_LAZY_JSON=1). Both paths coexist. - Step 2c: flip flag default to on after full regression sweep
(
test_json_*, all gap tests, all regression benches). - Gate:
bench_json_roundtripRSS below Bun's. Time at most 1.2× Bun's. Ship as a major version bump. - Step 3 planning follows the roadmap.
JSON.parse(blob)with no type argument behaves identically to today — byte-for-byte compatible with Node for all existing inputs.JSON.parse<T>(blob)at runtime is identical toJSON.parse(blob) as T. TypeScript erases the<T>; Perry's compiler may use it for specialization, but never for semantic change.- Mutation semantics of parse output are identical between
(tape-backed) and (tree-backed) implementations. Including
===identity on repeated property access within the same expression. - All 28
test_gap_*tests at 24/28 or better throughout. 07_object_create,12_binary_trees,02_loop_overhead,06_math_intensive,bench_gc_pressure,bench_array_growwithin 5% of v0.5.198 baseline for every intermediate commit.
Step 1 tests (all new files, no conflict with existing
test_json_*.ts):
test-files/test_json_typed_basic.ts—JSON.parse<{a: number, b: string}>on exact-shape inputtest-files/test_json_typed_extra_fields.ts— input has fields not in T; should be tolerated (present in result)test-files/test_json_typed_missing_fields.ts— T declares more fields than input; missing ones → undefinedtest-files/test_json_typed_nested.ts— nested object and array shapestest-files/test_json_typed_array.ts—JSON.parse<Item[]>test-files/test_json_typed_mismatch.ts— wrong runtime type for a field (e.g. string in number slot); must fall through to generic parser without crashing
Step 2 tests (new):
test-files/test_json_lazy_readonly.ts— read one field, compare to Node byte-for-bytetest-files/test_json_lazy_identity.ts—const a = p.x; const b = p.x; assert a === b(pointer identity on repeated access)test-files/test_json_lazy_mutate.ts— mutate one field, then verify ancestor chain is materialized correctlytest-files/test_json_lazy_stringify.ts— parse then immediate stringify must produce the same bytes (modulo JSON normalization)
benchmarks/suite/bench_json_typed_roundtrip.ts— mirror ofbench_json_roundtripwith<Item[]>type argumentbenchmarks/suite/bench_json_lazy_readonly.ts— parse + read.lengthonly, 50 iters × 1 MB blobbenchmarks/suite/bench_json_lazy_full.ts— parse + touch every field, 50 iters × 1 MB blob
Each benchmark compared against Node and Bun for the same workload.
| Date | Version | Change | Result |
|---|---|---|---|
| 2026-04-24 | v0.5.198 | Tier 1 complete (roadmap) | 373 ms / 144 MB — beats Node, behind Bun |
| TBD | Step 1a: JSON.parse<T> passthrough |
no perf change, plumbing only | |
| TBD | Step 1b: typed parse with pre-built shape | ||
| TBD | Step 2: tape-based lazy parse |