Skip to content

Commit 81b81ea

Browse files
committed
feat(gc): Phase C4 — non-moving tenuring (v0.5.228)
Objects that survive ≥2 minor GCs earn GC_FLAG_TENURED and get treated identically to OLD_ARENA-allocated objects by drain_trace_worklist_minor — their fields aren't recursively visited. crates/perry-runtime/src/gc.rs: GC_FLAG_TENURED = 0x20 logically promoted GC_FLAG_HAS_SURVIVED = 0x40 one mark phase observed; tenures next Age-bump pass at the end of gc_collect_minor's mark phase: walks arena_walk_objects, for each MARKED-or-PINNED nursery object: - already TENURED → skip - HAS_SURVIVED set → set TENURED, clear HAS_SURVIVED - otherwise → set HAS_SURVIVED Two-bit aging gives PROMOTION_AGE=2 without a counter field. drain_trace_worklist_minor skip predicate extended to fire on `pointer_in_old_gen(addr) || (gc_flags & GC_FLAG_TENURED != 0)`. Minor trace closure now bounded O(young live + RS roots). Non-moving design: tenured objects stay PHYSICALLY in nursery. No copying / forwarding pointers / reference rewrites. C4b adds copying evacuation for the RSS win — this commit lands the time-win half. Measured (best-of-5, macOS ARM64): bench_json_roundtrip: default: 80 ms / 109 MB PERRY_GEN_GC=1: 70 ms / 109 MB -12% time PERRY_WRITE_BARRIERS=1: 80 ms / 109 MB GEN_GC + WB: 70 ms / 109 MB RSS unchanged (no compaction yet — C4b territory). New unit test test_minor_gc_promotes_after_two_survivals: pinned object flags transition {} → HAS_SURVIVED → TENURED across 3 minor collections; TENURED idempotent thereafter. Runtime tests 158 -> 159. Regression: 20/20 test_json_*.ts under default + GEN_GC=1 WB=1 Gap tests 25/28 (baseline) Phase C is now correctness + time-win complete. C4b (copying evacuation → RSS ≤70 MB) and Phase D (flip default + drop conservative scanner) remain. C4b is the architectural boss- fight: cross-arena copying with full reference-update propagation, significant correctness surface. With C4 landed, the bench 70 ms / 109 MB result is honest: time matches projection, RSS bottlenecked by no-compaction.
1 parent 783f1cf commit 81b81ea

4 files changed

Lines changed: 148 additions & 41 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
88

99
Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.
1010

11-
**Current Version:** 0.5.227
11+
**Current Version:** 0.5.228
1212

1313
## TypeScript Parity Status
1414

@@ -148,6 +148,7 @@ First-resolved directory cached in `compile_package_dirs`; subsequent imports re
148148
Keep entries to 1-2 lines max. Full details in CHANGELOG.md.
149149

150150
- **v0.5.205** — Fix #183: `perry compile --target web` on a real-world app (Bloom Jump built on the Bloom engine) produced a WASM binary the browser refused to load — `Compiling function #687 failed: expected 0 elements on the stack for fallthru, found 103` (count varies with engine state). Root cause in `crates/perry-codegen-wasm/src/emit.rs`: the four direct-`Call`-instruction code paths — `Expr::Call` FuncRef arm (~4302), `Expr::Call` ExternFuncRef arm (~4324), `Expr::New` user-class ctor (~5844), `Expr::SuperCall` parent-ctor (~5894), `Expr::StaticMethodCall` direct-static path (~5979) — each emit `emit_expr(arg)` per source arg and pad up with `TAG_UNDEFINED` when `args.len() < expected`, but had no matching drop-excess branch when `args.len() > expected`. WASM `call` consumes exactly the callee's declared param count, so when JS's "extra args evaluated for side effects, then silently ignored" semantics met Perry's WASM codegen, every extra evaluated arg leaked past the call and accumulated on the enclosing block's operand stack — 103 values by the time `_start`'s final `end` hit the validator. The shape that triggered it in jump/bloom was `bloom/src/core/colors.ts`'s `Colors = new __AnonShape_2(...24 PropertyGets...)` landing on a Phase-3-synthesized ctor with lower declared arity, multiplied across bloom's 10 submodules. Fix: after each existing `for _ in args.len()..expected { I64Const(TAG_UNDEFINED) }` pad-up loop, add the mirror `for _ in expected..args.len() { Drop }` — matches JS semantics (extras evaluated for side effects but discarded) and keeps the operand stack aligned with the callee's WASM type at every direct-Call site. Verified end-to-end against the exact issue repro cloned fresh from `github.com/Bloom-Engine/jump` + `github.com/Bloom-Engine/engine`: both path A `file:./vendor/bloom/` and path B `file:../engine/` now compile to a WebAssembly-validating `.wasm` (416,923 / 413,780 bytes respectively, 140 FFI imports intact, `WebAssembly.compile` resolves clean on node 20+); a synthetic `takesFive(mc(),mc(),1,2,3,4)` minimal case that previously failed `Compiling function #213 failed: ... found 1` also validates. `cargo test --release -p perry-runtime -p perry-hir -p perry-codegen-wasm -p perry`: 262/262 passed. Note: issue #183 also claimed path A found only 1 module and emitted 9 FFI imports — could not reproduce in a fresh clone (both paths find 10 modules identically); most likely an artifact of the reporter's local `vendor/bloom` snapshot predating the `exports` map, and the "runGame silently no-ops" symptom the user actually observed was the browser refusing to instantiate the invalid WASM with the surrounding JS glue swallowing the error — fixed here.
151+
- **v0.5.228** — Gen-GC **Phase C4 (non-moving)**: tenuring via flag-based aging. Objects that survive ≥2 minor GCs earn `GC_FLAG_TENURED` and get treated identically to OLD_ARENA-allocated objects by `drain_trace_worklist_minor` — their fields aren't recursively visited, only the object itself stays marked. Implementation: two new GcHeader flag bits in `crates/perry-runtime/src/gc.rs`: `GC_FLAG_TENURED = 0x20` (logically promoted) and `GC_FLAG_HAS_SURVIVED = 0x40` (saw one mark phase, will tenure on next). New age-bump pass at the end of the minor mark phase walks `arena_walk_objects`: for each MARKED-or-PINNED nursery object, if already TENURED → skip; if HAS_SURVIVED → set TENURED and clear HAS_SURVIVED; otherwise → set HAS_SURVIVED. Two-bit aging gives `PROMOTION_AGE=2` (matches V8's quick-promotion policy) without needing a counter field. The `drain_trace_worklist_minor` skip predicate from C3b is extended to fire on `pointer_in_old_gen(addr) || (gc_flags & GC_FLAG_TENURED != 0)` — minor trace's transitive closure now bounded by `O(young live + RS roots)`. **Non-moving design**: tenured objects stay PHYSICALLY in nursery (no copying / forwarding pointers / reference rewrites). Phase C4b will add copying evacuation for the RSS win — today's commit lands the time-win half. **Measured win**: `bench_json_roundtrip` best-of-5 default 80 ms vs `PERRY_GEN_GC=1` **70 ms** = -12% time, RSS unchanged at 109 MB (expected — no compaction). New unit test `test_minor_gc_promotes_after_two_survivals` pins the aging state machine: pinned object's flags transition cleanly from {} → HAS_SURVIVED → TENURED across three minor collections, with TENURED idempotent on subsequent cycles. Runtime tests 158 → **159**. Full regression sweep clean: 20/20 `test_json_*.ts` (10 default + 10 with `PERRY_GEN_GC=1 PERRY_WRITE_BARRIERS=1`); gap tests 25/28 (baseline). **Phase C is now correctness + time-win complete.** Phase C4b (copying evacuation → RSS ≤70 MB) and Phase D (flip default + drop conservative scanner) remain. C4b is the architectural boss-fight — copying objects across arenas with full reference-update propagation; significant correctness surface. With C4 landed, the bench_json_roundtrip 70 ms / 109 MB result is honest: time matches the gen-GC win projection, RSS still bottlenecked by no-compaction.
151152
- **v0.5.227** — Gen-GC **Phase C3b**: minor GC trace skips old-gen objects. New `gc_collect_minor()` entry in `crates/perry-runtime/src/gc.rs` runs the same root-mark phase (stack + globals + 9 registered scanners + RS) but drains the worklist via the new `drain_trace_worklist_minor` variant which calls `pointer_in_old_gen(user_addr)` per popped header and `continue`s for old-gen objects without invoking `trace_object` / `trace_array` / `trace_closure` / etc. — they stay marked (treated as black leaves) but their fields aren't recursively visited. Young children held by old-gen parents reach the worklist exclusively via the remembered set, scanned by `mark_remembered_set_roots` (C3a). New `gen_gc_enabled()` helper reads `PERRY_GEN_GC` env var (cached via `OnceLock`); when set to `1` / `on` / `true`, every collection routes through `gc_collect_minor` instead of the standard mark-sweep. `gc_collect_inner` checks the flag at entry and tail-calls minor when the gate is on. Default OFF — opt-in until Phase C4 hits the bench_json_roundtrip ship criterion (RSS ≤70 MB direct path) and proves out across the full test corpus. **Trace specialization is the time-win core of generational GC**: in workloads with substantial old-gen working set, minor GC is now `O(young live + RS roots)` instead of `O(all live)` — but today the win is unobservable on the JSON benches because OLD_ARENA stays empty (Phase C4 will wire nursery→old promotion to actually fill it). Two new unit tests pin the C3b invariants: `test_gc_collect_minor_clears_rs` verifies RS is empty after `gc_collect_minor`, `test_gc_collect_minor_runs_without_panic` exercises a mixed nursery+old-gen heap through three sequential minor collections. Runtime tests 156 → **158**. Full regression sweep clean: 20/20 `test_json_*.ts` (10 default + 10 with `PERRY_GEN_GC=1 PERRY_WRITE_BARRIERS=1`); `bench_json_roundtrip` best-of-5 across all 4 mode combinations (default / GEN_GC / WB / GEN_GC+WB) all 72-74 ms — within noise. The infrastructure layer is now complete; Phase C4 lights it up by adding promotion (nursery survivors → OLD_ARENA via copying evacuation), which makes the "skip old-gen during minor" optimization meaningful.
152153
- **v0.5.226** — Gen-GC **Phase C3a**: remembered-set roots flow into the GC mark phase + RS clears after every collection. New `mark_remembered_set_roots(valid_ptrs)` in `crates/perry-runtime/src/gc.rs` snapshots the per-thread `REMEMBERED_SET` (populated by the codegen-emitted write barriers from C2/C2-expansion) and re-marks each old-gen header as a `POINTER_TAG`-tagged value via the standard `try_mark_value` machinery. Wired into `gc_collect_inner` between `mark_registered_roots` and `trace_marked_objects`. RS cleared via `REMEMBERED_SET.with(|s| s.borrow_mut().clear())` immediately after `sweep()` returns, so the next collection cycle starts coherent — barrier emissions at C2 sites repopulate the RS as needed during the next allocation epoch. **Today this is correctness-equivalent to before** (the conservative C-stack scan + 9 root scanners already kept everything alive); the contribution is the **infrastructure point** — the RS now has a real consumer in the GC, validated end-to-end. C3b will add the actual generational specialization (skip old-gen objects during marking, scan only nursery from RS roots, gated `PERRY_GEN_GC=1`) which is where the time/RSS wins land. New unit test `test_remembered_set_cleared_after_full_gc` pins the clear-after-GC invariant: populate RS via barrier, run full GC, assert RS is empty. Runtime tests 155 → **156**. Full regression sweep clean: 10/10 `test_json_*.ts` match Node under default AND `PERRY_WRITE_BARRIERS=1` (where the RS actually fills with old→young entries during parse). `bench_json_roundtrip` best-of-5 WB-off 65 ms vs WB-on 65 ms — RS clear cost invisible (HashSet of <100 entries clearing in microseconds). Gap tests 25/28 (baseline). Phase C is now 3/3 sub-phases done at the infrastructure level: C1 (RS storage), C2 (codegen emission), C3a (GC consumes RS). C3b adds the generational mark-phase specialization that yields the bench_json_roundtrip RSS ≤70 MB ship criterion.
153154
- **v0.5.225** — Gen-GC **Phase C2 expansion**: write-barrier emission extended to every remaining heap-store site. New `emit_write_barrier(ctx, parent_bits, child_bits)` helper at the top of `crates/perry-codegen/src/expr.rs` consolidates the gate-checked emit (gated `PERRY_WRITE_BARRIERS=1`, branchless at codegen via `OnceLock`-cached `write_barriers_enabled()`). Now wired at: (1) `Expr::PropertySet` generic `obj.x = y` path (refactored from inline emit at v0.5.224 to use the helper), (2) `Expr::IndexSet` array element path — both the local-without-slot fallback AND the no-local-id fallback at the runtime-call site, (3) `Expr::IndexSet` array element FAST path inside `lower_index_set_fast` — covers `arr[i] = v` for `arr` in `ctx.locals` (the most common shape), barrier emitted in the merge block after both inline-extend and realloc paths converge, (4) `Expr::IndexSet` string-literal-key path `obj["k"] = v`, (5) `Expr::IndexSet` runtime-string-key fallback path, (6) `Expr::LocalSet` closure capture write — both branches: boxed (parent = box ptr from `js_box_set`) AND non-boxed (parent = closure ptr from `js_closure_set_capture_f64`). NOT yet covered (deferred — separate codegen paths): class-field-set fast path with statically-typed receivers using direct field-index store. The 4-WB-site stress test (`/tmp/wb_all_sites.ts`: array indexed-set, two object key sets, one PropertySet) emits 4 `call void @js_write_barrier(...)` lines in IR and matches Node byte-for-byte. Regression sweep clean: 10/10 `test_json_*.ts` match Node under `PERRY_WRITE_BARRIERS=1`. `bench_json_roundtrip` best-of-5 WB-off 64 ms vs WB-on 64 ms — barrier overhead invisible because the bench's hot path is parse + stringify (no user-code field/element writes). The barrier becomes load-bearing once Phase C3 lands and minor GC consumes the remembered set — at that point the barrier replaces a full-arena scan with an RS-only scan, which is the actual time win the gen-GC plan promises.

Cargo.lock

Lines changed: 27 additions & 27 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ opt-level = "s" # Optimize for size in stdlib
109109
opt-level = 3
110110

111111
[workspace.package]
112-
version = "0.5.227"
112+
version = "0.5.228"
113113
edition = "2021"
114114
license = "MIT"
115115
repository = "https://github.com/PerryTS/perry"

0 commit comments

Comments
 (0)