PerryTS
diff --git a/‎CLAUDE.md‎
Lines changed: 2 additions & 1 deletion b/‎CLAUDE.md‎
Lines changed: 2 additions & 1 deletion
diff --git a/‎Cargo.lock‎
Lines changed: 27 additions & 27 deletions b/‎Cargo.lock‎
Lines changed: 27 additions & 27 deletions
diff --git a/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion b/‎Cargo.toml‎
Lines changed: 1 addition & 1 deletion
@@ -8,7 +8,7 @@ This file provides guidance to Claude Code (claude.ai/code) when working with co
 
 Perry is a native TypeScript compiler written in Rust that compiles TypeScript source code directly to native executables. It uses SWC for TypeScript parsing and LLVM for code generation.
 
-**Current Version:** 0.5.226
+**Current Version:** 0.5.227
 
 ## TypeScript Parity Status
 
@@ -148,6 +148,7 @@ First-resolved directory cached in `compile_package_dirs`; subsequent imports re
 Keep entries to 1-2 lines max. Full details in CHANGELOG.md.
 
 - **v0.5.205** — Fix #183: `perry compile --target web` on a real-world app (Bloom Jump built on the Bloom engine) produced a WASM binary the browser refused to load — `Compiling function #687 failed: expected 0 elements on the stack for fallthru, found 103` (count varies with engine state). Root cause in `crates/perry-codegen-wasm/src/emit.rs`: the four direct-`Call`-instruction code paths — `Expr::Call` FuncRef arm (~4302), `Expr::Call` ExternFuncRef arm (~4324), `Expr::New` user-class ctor (~5844), `Expr::SuperCall` parent-ctor (~5894), `Expr::StaticMethodCall` direct-static path (~5979) — each emit `emit_expr(arg)` per source arg and pad up with `TAG_UNDEFINED` when `args.len() < expected`, but had no matching drop-excess branch when `args.len() > expected`. WASM `call` consumes exactly the callee's declared param count, so when JS's "extra args evaluated for side effects, then silently ignored" semantics met Perry's WASM codegen, every extra evaluated arg leaked past the call and accumulated on the enclosing block's operand stack — 103 values by the time `_start`'s final `end` hit the validator. The shape that triggered it in jump/bloom was `bloom/src/core/colors.ts`'s `Colors = new __AnonShape_2(...24 PropertyGets...)` landing on a Phase-3-synthesized ctor with lower declared arity, multiplied across bloom's 10 submodules. Fix: after each existing `for _ in args.len()..expected { I64Const(TAG_UNDEFINED) }` pad-up loop, add the mirror `for _ in expected..args.len() { Drop }` — matches JS semantics (extras evaluated for side effects but discarded) and keeps the operand stack aligned with the callee's WASM type at every direct-Call site. Verified end-to-end against the exact issue repro cloned fresh from `github.com/Bloom-Engine/jump` + `github.com/Bloom-Engine/engine`: both path A `file:./vendor/bloom/` and path B `file:../engine/` now compile to a WebAssembly-validating `.wasm` (416,923 / 413,780 bytes respectively, 140 FFI imports intact, `WebAssembly.compile` resolves clean on node 20+); a synthetic `takesFive(mc(),mc(),1,2,3,4)` minimal case that previously failed `Compiling function #213 failed: ... found 1` also validates. `cargo test --release -p perry-runtime -p perry-hir -p perry-codegen-wasm -p perry`: 262/262 passed. Note: issue #183 also claimed path A found only 1 module and emitted 9 FFI imports — could not reproduce in a fresh clone (both paths find 10 modules identically); most likely an artifact of the reporter's local `vendor/bloom` snapshot predating the `exports` map, and the "runGame silently no-ops" symptom the user actually observed was the browser refusing to instantiate the invalid WASM with the surrounding JS glue swallowing the error — fixed here.
+- **v0.5.227** — Gen-GC **Phase C3b**: minor GC trace skips old-gen objects. New `gc_collect_minor()` entry in `crates/perry-runtime/src/gc.rs` runs the same root-mark phase (stack + globals + 9 registered scanners + RS) but drains the worklist via the new `drain_trace_worklist_minor` variant which calls `pointer_in_old_gen(user_addr)` per popped header and `continue`s for old-gen objects without invoking `trace_object` / `trace_array` / `trace_closure` / etc. — they stay marked (treated as black leaves) but their fields aren't recursively visited. Young children held by old-gen parents reach the worklist exclusively via the remembered set, scanned by `mark_remembered_set_roots` (C3a). New `gen_gc_enabled()` helper reads `PERRY_GEN_GC` env var (cached via `OnceLock`); when set to `1` / `on` / `true`, every collection routes through `gc_collect_minor` instead of the standard mark-sweep. `gc_collect_inner` checks the flag at entry and tail-calls minor when the gate is on. Default OFF — opt-in until Phase C4 hits the bench_json_roundtrip ship criterion (RSS ≤70 MB direct path) and proves out across the full test corpus. **Trace specialization is the time-win core of generational GC**: in workloads with substantial old-gen working set, minor GC is now `O(young live + RS roots)` instead of `O(all live)` — but today the win is unobservable on the JSON benches because OLD_ARENA stays empty (Phase C4 will wire nursery→old promotion to actually fill it). Two new unit tests pin the C3b invariants: `test_gc_collect_minor_clears_rs` verifies RS is empty after `gc_collect_minor`, `test_gc_collect_minor_runs_without_panic` exercises a mixed nursery+old-gen heap through three sequential minor collections. Runtime tests 156 → **158**. Full regression sweep clean: 20/20 `test_json_*.ts` (10 default + 10 with `PERRY_GEN_GC=1 PERRY_WRITE_BARRIERS=1`); `bench_json_roundtrip` best-of-5 across all 4 mode combinations (default / GEN_GC / WB / GEN_GC+WB) all 72-74 ms — within noise. The infrastructure layer is now complete; Phase C4 lights it up by adding promotion (nursery survivors → OLD_ARENA via copying evacuation), which makes the "skip old-gen during minor" optimization meaningful.
 - **v0.5.226** — Gen-GC **Phase C3a**: remembered-set roots flow into the GC mark phase + RS clears after every collection. New `mark_remembered_set_roots(valid_ptrs)` in `crates/perry-runtime/src/gc.rs` snapshots the per-thread `REMEMBERED_SET` (populated by the codegen-emitted write barriers from C2/C2-expansion) and re-marks each old-gen header as a `POINTER_TAG`-tagged value via the standard `try_mark_value` machinery. Wired into `gc_collect_inner` between `mark_registered_roots` and `trace_marked_objects`. RS cleared via `REMEMBERED_SET.with(|s| s.borrow_mut().clear())` immediately after `sweep()` returns, so the next collection cycle starts coherent — barrier emissions at C2 sites repopulate the RS as needed during the next allocation epoch. **Today this is correctness-equivalent to before** (the conservative C-stack scan + 9 root scanners already kept everything alive); the contribution is the **infrastructure point** — the RS now has a real consumer in the GC, validated end-to-end. C3b will add the actual generational specialization (skip old-gen objects during marking, scan only nursery from RS roots, gated `PERRY_GEN_GC=1`) which is where the time/RSS wins land. New unit test `test_remembered_set_cleared_after_full_gc` pins the clear-after-GC invariant: populate RS via barrier, run full GC, assert RS is empty. Runtime tests 155 → **156**. Full regression sweep clean: 10/10 `test_json_*.ts` match Node under default AND `PERRY_WRITE_BARRIERS=1` (where the RS actually fills with old→young entries during parse). `bench_json_roundtrip` best-of-5 WB-off 65 ms vs WB-on 65 ms — RS clear cost invisible (HashSet of <100 entries clearing in microseconds). Gap tests 25/28 (baseline). Phase C is now 3/3 sub-phases done at the infrastructure level: C1 (RS storage), C2 (codegen emission), C3a (GC consumes RS). C3b adds the generational mark-phase specialization that yields the bench_json_roundtrip RSS ≤70 MB ship criterion.
 - **v0.5.225** — Gen-GC **Phase C2 expansion**: write-barrier emission extended to every remaining heap-store site. New `emit_write_barrier(ctx, parent_bits, child_bits)` helper at the top of `crates/perry-codegen/src/expr.rs` consolidates the gate-checked emit (gated `PERRY_WRITE_BARRIERS=1`, branchless at codegen via `OnceLock`-cached `write_barriers_enabled()`). Now wired at: (1) `Expr::PropertySet` generic `obj.x = y` path (refactored from inline emit at v0.5.224 to use the helper), (2) `Expr::IndexSet` array element path — both the local-without-slot fallback AND the no-local-id fallback at the runtime-call site, (3) `Expr::IndexSet` array element FAST path inside `lower_index_set_fast` — covers `arr[i] = v` for `arr` in `ctx.locals` (the most common shape), barrier emitted in the merge block after both inline-extend and realloc paths converge, (4) `Expr::IndexSet` string-literal-key path `obj["k"] = v`, (5) `Expr::IndexSet` runtime-string-key fallback path, (6) `Expr::LocalSet` closure capture write — both branches: boxed (parent = box ptr from `js_box_set`) AND non-boxed (parent = closure ptr from `js_closure_set_capture_f64`). NOT yet covered (deferred — separate codegen paths): class-field-set fast path with statically-typed receivers using direct field-index store. The 4-WB-site stress test (`/tmp/wb_all_sites.ts`: array indexed-set, two object key sets, one PropertySet) emits 4 `call void @js_write_barrier(...)` lines in IR and matches Node byte-for-byte. Regression sweep clean: 10/10 `test_json_*.ts` match Node under `PERRY_WRITE_BARRIERS=1`. `bench_json_roundtrip` best-of-5 WB-off 64 ms vs WB-on 64 ms — barrier overhead invisible because the bench's hot path is parse + stringify (no user-code field/element writes). The barrier becomes load-bearing once Phase C3 lands and minor GC consumes the remembered set — at that point the barrier replaces a full-arena scan with an RS-only scan, which is the actual time win the gen-GC plan promises.
 - **v0.5.224** — Gen-GC **Phase C sub-phase 2**: codegen emits `js_write_barrier(parent_bits, child_bits)` after the generic `Expr::PropertySet` heap store in `crates/perry-codegen/src/expr.rs`. Gated behind `PERRY_WRITE_BARRIERS=1` env var (cached via `OnceLock` in new `crates/perry-codegen/src/codegen.rs::write_barriers_enabled()`) — default OFF, so no production-perf impact until Phase C3 lands and minor GC actually consumes the remembered set. Initial-scope coverage: the generic PropertySet path (used for `any`-typed receivers like `const h: any = {}; h.v = ...`). Class-field-set fast paths (statically-typed receivers using direct field-index lookup) are intentionally NOT instrumented in this sub-phase — those go through different codegen and will be wired in C2 follow-up. Verified end-to-end via `PERRY_SAVE_LL=<dir>`: a 3-field-write program emits exactly 3 `call void @js_write_barrier(i64 ..., i64 ...)` lines after the matching `js_object_set_field_by_name` calls, runs cleanly, output matches Node. **Regression sweep clean under WB on**: 10/10 `test_json_*.ts` match Node byte-for-byte; runtime tests 155/155 unchanged (sub-phase C1 6 tests still cover the runtime side); `bench_json_roundtrip` best-of-5 WB-off 66 ms vs WB-on 65 ms — within noise because the bench doesn't write object fields on the hot path (parse+stringify only). The barrier's per-call cost (one bitcast + one extern call + the runtime's O(blocks) old-vs-young range scan) is currently the dominant overhead per heap store; sub-phase C3's `GC_FLAG_YOUNG` bit-test will replace the range scan with a single conditional branch. **Next**: sub-phase C3 (minor GC implementation — scan precise roots + remembered set, evacuate survivors to old-gen, clear RS, gated behind `PERRY_GEN_GC=1`). C3 is where the precision and arena split actually become useful — bench_json_roundtrip direct-path RSS should drop to ≤70 MB per the Phase C ship criterion.
 
@@ -109,7 +109,7 @@ opt-level = "s"       # Optimize for size in stdlib
 opt-level = 3
 
 [workspace.package]
-version = "0.5.226"
+version = "0.5.227"
 edition = "2021"
 license = "MIT"
 repository = "https://github.com/PerryTS/perry"