You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The optimized ARM selection path (the default, non-`--relocatable` self-contained
image) uses r4-r8 as scratch / promoted locals but emitted no prologue, so a
caller's callee-saved registers were silently clobbered — a systemic AAPCS
violation. It was masked because non-leaf callers route through the direct
selector (which over-saves) and the frozen byte gate only covers the
`--relocatable` direct path; it blocks flipping the non-leaf thin-forwarder
prologue lever (#428).
Fix: after range-realloc, wrap a body that genuinely touches a callee-saved
register in a conservative `push {r4-r8,lr}` / `pop {r4-r8,pc}`;
`shrink_callee_saved_saves` then trims the save to the registers actually used.
Two new liveness helpers:
- `body_uses_callee_saved` — the emit-side twin of `shrink_callee_saved_saves`,
reading the same control-flow allowlist + `reg_effect` classification as
push-conditions. Conservative on `reg_effect` `None` (an unmodeled op may touch
a callee-saved register → force the push): under-saving is the AAPCS bug,
over-saving is harmless (shrink pads it down).
- `ensure_callee_saved_prologue` — wraps the body, no-op when a prologue already
exists (direct path) or the body is callee-saved-free (a spurious push is
permanent — shrink never removes one, only pads).
Decided on the POST-realloc body, where realloc has lowered low-pressure r4-r8
scratch back to r0-r3, so a save is added only for genuinely-clobbered registers
(a pre-realloc decision over-saves). Runs in both realloc-on (shrink trims) and
realloc-off (full save kept) configurations.
Scope: callee-saved (r4-r8) only. Caller-saved (r0-r3) preservation across
import calls on the optimized path is a separate known gap (#197). Functions with
>4 params (incoming stack args) decline to the direct selector, so the wrap is a
no-op there and cannot perturb stack-arg offsets (verified byte-identical to
`--no-optimize`). Default-on (a correctness fix is not flag-gated); it adds
save/restore cycles to optimized-path functions that use r4-r8 — a non-shipped
path (gale ships `--relocatable`), so no silicon cost. Frozen byte gate stays
bit-identical (direct path untouched).
Validation:
- New unicorn sentinel oracle (`callee_saved_490_differential.py`, CI-gated):
sets r4-r8 to sentinels, asserts result == wasmtime AND sentinels restored,
across 16-bit push and 32-bit PUSH.W (high-register) forms; passes default,
`SYNTH_RANGE_REALLOC=0`, and `SYNTH_DEAD_FRAME_ELIM=1`. FAILs without the fix.
- 4 new liveness unit tests; full workspace suite + bulk-memory optimized-path
differential green; frozen byte gate bit-identical.
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
0 commit comments