Skip to content

Commit 6f0ac74

Browse files
committed
feat(gc): Phase C4b-α — forwarding-pointer infrastructure (v0.5.229)
Lays the data-structure foundation for cross-arena copying evacuation. No actual evacuation logic yet — that's C4b-β + γ in a focused next session. crates/perry-runtime/src/gc.rs: GC_FLAG_FORWARDED = 0x80 (last available bit in u8 gc_flags) Convention: when GC_FLAG_FORWARDED is set, the user-payload's first 8 bytes hold the new user pointer (the address returned by arena_alloc_gc_old when the object was relocated). Every Perry GC type starts its payload with a header of at least 8 bytes (StringHeader / ArrayHeader / ObjectHeader / ClosureHeader / etc.), so this storage location is universally available without changing object layouts. forwarding_address(header) -> *mut u8 Reads the new user-pointer from a forwarded header. Debug-asserts the flag is set. set_forwarding_address(header, new_user_addr) Installs the forwarding pointer + sets GC_FLAG_FORWARDED while preserving every other gc_flags bit. 3 new unit tests pin invariants: test_forwarding_pointer_roundtrip test_forwarding_does_not_disturb_other_flags test_forwarding_pointer_value_is_8_bytes_at_user_offset_zero docs/generational-gc-plan.md extended with the full C4b-β/γ design: - Pinning policy via GC_FLAG_CONS_PINNED for objects discovered by conservative stack scan (can't safely rewrite their referencing words → don't move them) - C4b-β = conservative-pinning + evacuation pass (gated PERRY_GEN_GC_EVACUATE=1 initially) - C4b-γ = reference rewriting via shadow stack + globals + 9 registered root scanners + heap walks - Ship criterion: bench_json_roundtrip direct RSS ≤70 MB Runtime tests 159 -> 162. Regression clean: 10/10 test_json_*.ts match Node — no behavior change because nothing calls the helpers yet. Honest scoping: C4b-β + γ together are the architectural boss- fight of the gen-GC plan (~3-5 days, significant correctness surface around conservative-stack pinning + ref-rewrite walks for the 9 registered root scanners). Shipping the infrastructure layer here so the real evacuation can land cleanly without having to also relitigate the data-structure design.
1 parent 81b81ea commit 6f0ac74

5 files changed

Lines changed: 215 additions & 29 deletions

File tree

CLAUDE.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
88

99
Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.
1010

11-
**Current Version:** 0.5.228
11+
**Current Version:** 0.5.229
1212

1313
## TypeScript Parity Status
1414

@@ -148,6 +148,7 @@ First-resolved directory cached in `compile_package_dirs`; subsequent imports re
148148
Keep entries to 1-2 lines max. Full details in CHANGELOG.md.
149149

150150
- **v0.5.205** — Fix #183: `perry compile --target web` on a real-world app (Bloom Jump built on the Bloom engine) produced a WASM binary the browser refused to load — `Compiling function #687 failed: expected 0 elements on the stack for fallthru, found 103` (count varies with engine state). Root cause in `crates/perry-codegen-wasm/src/emit.rs`: the four direct-`Call`-instruction code paths — `Expr::Call` FuncRef arm (~4302), `Expr::Call` ExternFuncRef arm (~4324), `Expr::New` user-class ctor (~5844), `Expr::SuperCall` parent-ctor (~5894), `Expr::StaticMethodCall` direct-static path (~5979) — each emit `emit_expr(arg)` per source arg and pad up with `TAG_UNDEFINED` when `args.len() < expected`, but had no matching drop-excess branch when `args.len() > expected`. WASM `call` consumes exactly the callee's declared param count, so when JS's "extra args evaluated for side effects, then silently ignored" semantics met Perry's WASM codegen, every extra evaluated arg leaked past the call and accumulated on the enclosing block's operand stack — 103 values by the time `_start`'s final `end` hit the validator. The shape that triggered it in jump/bloom was `bloom/src/core/colors.ts`'s `Colors = new __AnonShape_2(...24 PropertyGets...)` landing on a Phase-3-synthesized ctor with lower declared arity, multiplied across bloom's 10 submodules. Fix: after each existing `for _ in args.len()..expected { I64Const(TAG_UNDEFINED) }` pad-up loop, add the mirror `for _ in expected..args.len() { Drop }` — matches JS semantics (extras evaluated for side effects but discarded) and keeps the operand stack aligned with the callee's WASM type at every direct-Call site. Verified end-to-end against the exact issue repro cloned fresh from `github.com/Bloom-Engine/jump` + `github.com/Bloom-Engine/engine`: both path A `file:./vendor/bloom/` and path B `file:../engine/` now compile to a WebAssembly-validating `.wasm` (416,923 / 413,780 bytes respectively, 140 FFI imports intact, `WebAssembly.compile` resolves clean on node 20+); a synthetic `takesFive(mc(),mc(),1,2,3,4)` minimal case that previously failed `Compiling function #213 failed: ... found 1` also validates. `cargo test --release -p perry-runtime -p perry-hir -p perry-codegen-wasm -p perry`: 262/262 passed. Note: issue #183 also claimed path A found only 1 module and emitted 9 FFI imports — could not reproduce in a fresh clone (both paths find 10 modules identically); most likely an artifact of the reporter's local `vendor/bloom` snapshot predating the `exports` map, and the "runGame silently no-ops" symptom the user actually observed was the browser refusing to instantiate the invalid WASM with the surrounding JS glue swallowing the error — fixed here.
151+
- **v0.5.229** — Gen-GC **Phase C4b-α**: forwarding-pointer infrastructure (per `docs/generational-gc-plan.md` §C4b-α). New `GC_FLAG_FORWARDED = 0x80` (the last available bit in the u8 `gc_flags` byte) marks objects that have been evacuated to a new location. The new address is stored in the **user-payload's first 8 bytes** — a convention every Perry GC type accommodates because every payload starts with a header (StringHeader / ArrayHeader / ObjectHeader / ClosureHeader / etc.) of at least 8 bytes. Two helpers in `crates/perry-runtime/src/gc.rs`: `forwarding_address(header) -> *mut u8` reads the new user-pointer from a forwarded header (debug-asserts the flag is set); `set_forwarding_address(header, new_user_addr)` installs the forwarding pointer + sets `GC_FLAG_FORWARDED` while preserving every other gc_flags bit. **No actual evacuation logic yet** — this commit ships the data-structure layer so C4b-β (conservative-pinning + copy pass) and C4b-γ (reference rewriting) can build on a tested primitive. 3 new unit tests pin the invariants: `test_forwarding_pointer_roundtrip` (set + read returns the same pointer), `test_forwarding_does_not_disturb_other_flags` (every pre-existing gc_flags bit survives the install), `test_forwarding_pointer_value_is_8_bytes_at_user_offset_zero` (the storage convention is load-bearing for any future walker that wants to skip evacuated objects by reading the new address inline). Runtime tests 159 → **162**. Full regression clean: 10/10 `test_json_*.ts` match Node — no behavior change because nothing calls the helpers yet. `docs/generational-gc-plan.md` extended with the full C4b-β/γ design: pinning policy via `GC_FLAG_CONS_PINNED`, three sub-step rollout, ship criterion `bench_json_roundtrip` direct RSS ≤70 MB. C4b-β + γ together are the architectural boss-fight (~3-5 days, significant correctness surface around conservative-stack pinning + ref-rewrite walks for the 9 registered root scanners) — scoping them as separate commits in a focused next session.
151152
- **v0.5.228** — Gen-GC **Phase C4 (non-moving)**: tenuring via flag-based aging. Objects that survive ≥2 minor GCs earn `GC_FLAG_TENURED` and get treated identically to OLD_ARENA-allocated objects by `drain_trace_worklist_minor` — their fields aren't recursively visited, only the object itself stays marked. Implementation: two new GcHeader flag bits in `crates/perry-runtime/src/gc.rs`: `GC_FLAG_TENURED = 0x20` (logically promoted) and `GC_FLAG_HAS_SURVIVED = 0x40` (saw one mark phase, will tenure on next). New age-bump pass at the end of the minor mark phase walks `arena_walk_objects`: for each MARKED-or-PINNED nursery object, if already TENURED → skip; if HAS_SURVIVED → set TENURED and clear HAS_SURVIVED; otherwise → set HAS_SURVIVED. Two-bit aging gives `PROMOTION_AGE=2` (matches V8's quick-promotion policy) without needing a counter field. The `drain_trace_worklist_minor` skip predicate from C3b is extended to fire on `pointer_in_old_gen(addr) || (gc_flags & GC_FLAG_TENURED != 0)` — minor trace's transitive closure now bounded by `O(young live + RS roots)`. **Non-moving design**: tenured objects stay PHYSICALLY in nursery (no copying / forwarding pointers / reference rewrites). Phase C4b will add copying evacuation for the RSS win — today's commit lands the time-win half. **Measured win**: `bench_json_roundtrip` best-of-5 default 80 ms vs `PERRY_GEN_GC=1` **70 ms** = -12% time, RSS unchanged at 109 MB (expected — no compaction). New unit test `test_minor_gc_promotes_after_two_survivals` pins the aging state machine: pinned object's flags transition cleanly from {} → HAS_SURVIVED → TENURED across three minor collections, with TENURED idempotent on subsequent cycles. Runtime tests 158 → **159**. Full regression sweep clean: 20/20 `test_json_*.ts` (10 default + 10 with `PERRY_GEN_GC=1 PERRY_WRITE_BARRIERS=1`); gap tests 25/28 (baseline). **Phase C is now correctness + time-win complete.** Phase C4b (copying evacuation → RSS ≤70 MB) and Phase D (flip default + drop conservative scanner) remain. C4b is the architectural boss-fight — copying objects across arenas with full reference-update propagation; significant correctness surface. With C4 landed, the bench_json_roundtrip 70 ms / 109 MB result is honest: time matches the gen-GC win projection, RSS still bottlenecked by no-compaction.
152153
- **v0.5.227** — Gen-GC **Phase C3b**: minor GC trace skips old-gen objects. New `gc_collect_minor()` entry in `crates/perry-runtime/src/gc.rs` runs the same root-mark phase (stack + globals + 9 registered scanners + RS) but drains the worklist via the new `drain_trace_worklist_minor` variant which calls `pointer_in_old_gen(user_addr)` per popped header and `continue`s for old-gen objects without invoking `trace_object` / `trace_array` / `trace_closure` / etc. — they stay marked (treated as black leaves) but their fields aren't recursively visited. Young children held by old-gen parents reach the worklist exclusively via the remembered set, scanned by `mark_remembered_set_roots` (C3a). New `gen_gc_enabled()` helper reads `PERRY_GEN_GC` env var (cached via `OnceLock`); when set to `1` / `on` / `true`, every collection routes through `gc_collect_minor` instead of the standard mark-sweep. `gc_collect_inner` checks the flag at entry and tail-calls minor when the gate is on. Default OFF — opt-in until Phase C4 hits the bench_json_roundtrip ship criterion (RSS ≤70 MB direct path) and proves out across the full test corpus. **Trace specialization is the time-win core of generational GC**: in workloads with substantial old-gen working set, minor GC is now `O(young live + RS roots)` instead of `O(all live)` — but today the win is unobservable on the JSON benches because OLD_ARENA stays empty (Phase C4 will wire nursery→old promotion to actually fill it). Two new unit tests pin the C3b invariants: `test_gc_collect_minor_clears_rs` verifies RS is empty after `gc_collect_minor`, `test_gc_collect_minor_runs_without_panic` exercises a mixed nursery+old-gen heap through three sequential minor collections. Runtime tests 156 → **158**. Full regression sweep clean: 20/20 `test_json_*.ts` (10 default + 10 with `PERRY_GEN_GC=1 PERRY_WRITE_BARRIERS=1`); `bench_json_roundtrip` best-of-5 across all 4 mode combinations (default / GEN_GC / WB / GEN_GC+WB) all 72-74 ms — within noise. The infrastructure layer is now complete; Phase C4 lights it up by adding promotion (nursery survivors → OLD_ARENA via copying evacuation), which makes the "skip old-gen during minor" optimization meaningful.
153154
- **v0.5.226** — Gen-GC **Phase C3a**: remembered-set roots flow into the GC mark phase + RS clears after every collection. New `mark_remembered_set_roots(valid_ptrs)` in `crates/perry-runtime/src/gc.rs` snapshots the per-thread `REMEMBERED_SET` (populated by the codegen-emitted write barriers from C2/C2-expansion) and re-marks each old-gen header as a `POINTER_TAG`-tagged value via the standard `try_mark_value` machinery. Wired into `gc_collect_inner` between `mark_registered_roots` and `trace_marked_objects`. RS cleared via `REMEMBERED_SET.with(|s| s.borrow_mut().clear())` immediately after `sweep()` returns, so the next collection cycle starts coherent — barrier emissions at C2 sites repopulate the RS as needed during the next allocation epoch. **Today this is correctness-equivalent to before** (the conservative C-stack scan + 9 root scanners already kept everything alive); the contribution is the **infrastructure point** — the RS now has a real consumer in the GC, validated end-to-end. C3b will add the actual generational specialization (skip old-gen objects during marking, scan only nursery from RS roots, gated `PERRY_GEN_GC=1`) which is where the time/RSS wins land. New unit test `test_remembered_set_cleared_after_full_gc` pins the clear-after-GC invariant: populate RS via barrier, run full GC, assert RS is empty. Runtime tests 155 → **156**. Full regression sweep clean: 10/10 `test_json_*.ts` match Node under default AND `PERRY_WRITE_BARRIERS=1` (where the RS actually fills with old→young entries during parse). `bench_json_roundtrip` best-of-5 WB-off 65 ms vs WB-on 65 ms — RS clear cost invisible (HashSet of <100 entries clearing in microseconds). Gap tests 25/28 (baseline). Phase C is now 3/3 sub-phases done at the infrastructure level: C1 (RS storage), C2 (codegen emission), C3a (GC consumes RS). C3b adds the generational mark-phase specialization that yields the bench_json_roundtrip RSS ≤70 MB ship criterion.

Cargo.lock

Lines changed: 27 additions & 27 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ opt-level = "s" # Optimize for size in stdlib
109109
opt-level = 3
110110

111111
[workspace.package]
112-
version = "0.5.228"
112+
version = "0.5.229"
113113
edition = "2021"
114114
license = "MIT"
115115
repository = "https://github.com/PerryTS/perry"

0 commit comments

Comments
 (0)