## v0.5.843 — perf(transform): #691 skip `__async_throw` closure alloc when no awaiting try/catch — promise_all_chains 41.7→37.1 ms (−11%, 1.77×→1.69× bun). Phase 1 of the issue #691 (Bun-gap on `promise_all_chains.ts`) workstream. The v0.5.816–v0.5.833 arc had closed runtime/queue overhead; the remaining 1.77× gap was profile-attributed in the issue to the ~84% callback bucket. Re-profiling with `PERRY_MT_PROFILE=1` plus a 0/1/2/3-await sweep showed the gap is **not** per-state-transition cost — bun's first-await delta over us is +12 ms, additional awaits cost roughly the same on both runtimes. The dominant cost is **per-async-fn-invocation closure allocation** (150,849 closures for 50k `unitOfWork` calls = 3 per invocation, 99,900 with non-singleton captures). Each `was_plain_async` invocation was allocating two closures, one of which — `__async_throw` — is cold path (only invoked when an awaited promise rejects) but built unconditionally with `captures.clone() + mutable_captures.clone()` at the wrapper site. When `linearize_body` reports no awaiting try/catch (`catches.is_empty()`), the throw body collapses to a single `Stmt::Throw(__throw_val)` rethrow that references zero captures — pure dead weight to allocate. Change at `crates/perry-transform/src/generator.rs::build_async_step_driver_direct`: signature now takes `throw_closure_expr: Option<Expr>`. Caller passes `None` for the no-user-catch case, the driver emits `Stmt::Throw(value)` inline in the dispatch's is-error arm (caught by the existing outer try/catch which re-enters `__step(e, true)` and returns `Promise.reject` via the `is_error` short-circuit — semantics identical), and skips both the `let __async_throw = ...` outer stmt and the `throw_id` capture in `step_captures`. Verification: `PERRY_MT_PROFILE=1 /tmp/pac` shows closure allocs drop 150,849 → 100,899 (−49,950 = exactly 1 per `unitOfWork` invocation), `cap_singleton_miss` drops 99,900 → 49,950, callback bucket 37.5 → 32.1 ms (−14%). All 7 `test_microtask_inv_*.ts` probes pass (including `01_two_fn_interleave` which the rejected aggressive inline gate broke — this change preserves microtask ordering because the existing per-await `AsyncStepChain` boundaries are untouched). `test_issue_256_microtask_ordering.ts` passes. Gap tests 26/28 (same 2 pre-existing failures: `console_methods`, `regexp_advanced`). Full `cargo test --release --workspace` clean. The user-catch path (`catches` non-empty) is untouched — still routes through the lazy-but-allocated `__async_throw` closure that holds the inlined catch body. Phase 2 candidate (per #691 plan): stack-allocate the step closure's capture environment for async fns with proven-non-escaping captures. Not in this commit.
0 commit comments