fix(arm): lower functions needing the AAPCS stack-arg path with >8 scalar params/args (#503, #242)#504
Merged
Merged
Conversation
…alar params/args (#503, #242) The arm direct selector (`select_with_stack`, the SHIPPED `--relocatable` path) SKIPPED — emitted no code for — any function whose signature needs the AAPCS stack-argument path beyond a conservative cap: `num_params > 8`, or a call passing `arg_count > 8`. gale hit this on the falcon flight component: 3 reachable helpers (10-param, 25-param, and a 64-bit case) were dropped from the ELF. The incoming-param homing (`compute_local_layout` → `incoming_params`, offset `frame_size+24+(k-4)*4`) and the outgoing store (`emit_stack_args`, offset `(k-4)*4`) are both already GENERIC in the param/arg index — no fixed ≤8 structure. This lifts the two conservative `> 8` refusals and leans on the existing 12-bit `[sp,#imm]` guards (incoming homing + `emit_stack_args`) as the real Ok-or-Err backstop. Functions with ≤8 i32 params/args never reach the cap, so the shared path is untouched and the frozen byte gate stays bit-identical. Scope: only the >8-scalar-i32 case. The 64-bit stack-param case (AAPCS pair-alignment + back-fill + 8-byte slots) stays refused — that lowering is a #503 follow-up. All remaining stack-arg refusal warnings now cite #503, not the closed #359 (ask #2 in the issue). Gated by scripts/repro/stack_args_503_differential.py (CI: stack-args-503-oracle): sum10, sum25 (incoming, reading param 24 — a high stack offset, not just the first slot), and a caller passing 10 args (outgoing, exercising emit_stack_args) all execute bit-identically to wasmtime under unicorn via the --relocatable path. Unit tests test_503_ten_param_reads_incoming_stack + test_503_nine_arg_call_lowers_to_outgoing_stack replace the now-obsolete too-many-args-errs test. Frozen byte gate bit-identical; control_step (13/13), flight_seam, callee-saved-490, block-brif-483, r12-spill-496 oracles all green. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A 6-param function that BOTH reads its incoming stack params (4,5) AND passes them through a 6-arg call — the realistic falcon shape and the only path where the incoming-offset formula `frame_size+24+(k-4)*4` must skip over the `outgoing_arg_bytes` region that also lives in `frame_size`. Reads params 4,5 again after the call to confirm the incoming region (above the frame) survives the inner call's use of the outgoing region (frame bottom). 3 vectors, all bit-identical to wasmtime. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
This was referenced Jun 26, 2026
avrabe
added a commit
that referenced
this pull request
Jun 26, 2026
… the #509 negative result (#242) (#510) Appends a dated slice to VCR-ORACLE-001's history covering the 2026-06-26 second arc: #496 (R12-exhaustion decline-to-direct, PR #502), #503 (AAPCS stack-arg cap-lift, shipped v0.16.0, PR #504), #507 (optimized-path br_table decline, PR #508), each with its new CI execution oracle (r12-spill-496, stack-args-503, br-table-507). Most importantly it records #509 as a LOAD-BEARING NEGATIVE RESULT so the decline-to-direct shortcut is not re-attempted: the direct selector (the shipped --relocatable path) drops the carried value of a value-returning br/br_if/ br_table-direct, and the shortcut cannot fix it because the IR carries no block-result arity (WasmOp::Block is a unit variant) — a non-empty-vstack decline can't distinguish a carried result from an unwound void-target value, so it over-refuses valid code. This is the sharpest evidence yet for VCR-SEL-001's selector-collapse motivation (the hand-written selectors lack the structured block-arity model a verified DSL carries by construction); the real fix is arity-threading, cross-cutting and silicon-gated. Docs-only, behavior-frozen by construction: frozen byte gate bit-identical; rivet validate non-xref ERROR count 0 (unchanged 49/75/0 baseline). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The arm direct selector (
select_with_stack— the shipped--relocatablepath falcon uses) skipped (emitted no code for) any function whose signature needs the AAPCS stack-argument path beyond a conservative cap:num_params > 8, or a call passingarg_count > 8. gale hit this on the falcon flight component — 3 reachable helpers (10-param, 25-param, and a 64-bit case) were dropped from the ELF, breaking self-contained firmware.Fix
The incoming-param homing (
compute_local_layout→incoming_params, offsetframe_size+24+(k-4)*4) and the outgoing store (emit_stack_args, offset(k-4)*4) are both already generic in the index — no fixed ≤8 structure. This lifts the two conservative> 8refusals and leans on the existing 12-bit[sp,#imm]guards (incoming homing site +emit_stack_args) as the real Ok-or-Err backstop (#180/#185).Frozen-safe by construction: functions with ≤8 i32 params/args never reach the cap, so the shared path is untouched — the frozen byte gate stays bit-identical (verified).
Scope
Only the >8-scalar-i32 case (falcon's func_57/func_58). The 64-bit stack-param case (AAPCS pair-alignment + back-fill + 8-byte slots — func_163) stays refused; that lowering is a #503 follow-up. All remaining stack-arg refusal warnings now cite #503, not the closed #359 (ask #2 in the issue).
Gate
New
scripts/repro/stack_args_503_differential.py(CI:stack-args-503-oracle), all via the--relocatableshipped path, executed under unicorn vs wasmtime:sum10— 10 i32 params (incoming).sum25— 25 params, reads param 24 (a high incoming stack offset, not just the first slot — so a high-index read is actually exercised).caller— passes 10 args through a callee (outgoing, exercisesemit_stack_args+ the lifted outgoing cap).All execute bit-identically to wasmtime. (Found+fixed a harness bug along the way: ARM symbol values carry the Thumb bit, so byte offsets must mask
& ~1.)Verification
stack-args-503-oracle: PASS (10/10 vectors).test_503_ten_param_reads_incoming_stack+test_503_nine_arg_call_lowers_to_outgoing_stackreplace the obsolete too-many-args-errs test; fullsynth-synthesissuite (463) green.cargo fmt --check,clippy -D warningsclean.This is on the shipped path (unlike the optimized-path #490/#483/#496), so it warrants a bug-fix release + on-silicon confirmation from gale on the falcon component.
Addresses #503 — the >8-scalar-i32 case + diagnostic relabel. This unblocks 2 of the 3 falcon functions (func_57 10-param, func_58 25-param). func_163 (5 params incl. a 64-bit) still skips: the 64-bit stack-param lowering remains open under #503 as the follow-up. Do NOT auto-close on merge.
🤖 Generated with Claude Code