Skip to content

fix(arm): lower functions needing the AAPCS stack-arg path with >8 scalar params/args (#503, #242)#504

Merged
avrabe merged 2 commits into
mainfrom
fix/503-aapcs-stack-arg-cap-lift
Jun 26, 2026
Merged

fix(arm): lower functions needing the AAPCS stack-arg path with >8 scalar params/args (#503, #242)#504
avrabe merged 2 commits into
mainfrom
fix/503-aapcs-stack-arg-cap-lift

Conversation

@avrabe

@avrabe avrabe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

The arm direct selector (select_with_stack — the shipped --relocatable path falcon uses) skipped (emitted no code for) any function whose signature needs the AAPCS stack-argument path beyond a conservative cap: num_params > 8, or a call passing arg_count > 8. gale hit this on the falcon flight component — 3 reachable helpers (10-param, 25-param, and a 64-bit case) were dropped from the ELF, breaking self-contained firmware.

Fix

The incoming-param homing (compute_local_layoutincoming_params, offset frame_size+24+(k-4)*4) and the outgoing store (emit_stack_args, offset (k-4)*4) are both already generic in the index — no fixed ≤8 structure. This lifts the two conservative > 8 refusals and leans on the existing 12-bit [sp,#imm] guards (incoming homing site + emit_stack_args) as the real Ok-or-Err backstop (#180/#185).

Frozen-safe by construction: functions with ≤8 i32 params/args never reach the cap, so the shared path is untouched — the frozen byte gate stays bit-identical (verified).

Scope

Only the >8-scalar-i32 case (falcon's func_57/func_58). The 64-bit stack-param case (AAPCS pair-alignment + back-fill + 8-byte slots — func_163) stays refused; that lowering is a #503 follow-up. All remaining stack-arg refusal warnings now cite #503, not the closed #359 (ask #2 in the issue).

Gate

New scripts/repro/stack_args_503_differential.py (CI: stack-args-503-oracle), all via the --relocatable shipped path, executed under unicorn vs wasmtime:

  • sum10 — 10 i32 params (incoming).
  • sum25 — 25 params, reads param 24 (a high incoming stack offset, not just the first slot — so a high-index read is actually exercised).
  • caller — passes 10 args through a callee (outgoing, exercises emit_stack_args + the lifted outgoing cap).

All execute bit-identically to wasmtime. (Found+fixed a harness bug along the way: ARM symbol values carry the Thumb bit, so byte offsets must mask & ~1.)

Verification

  • New stack-args-503-oracle: PASS (10/10 vectors).
  • Frozen byte gate: bit-identical.
  • Unit tests: test_503_ten_param_reads_incoming_stack + test_503_nine_arg_call_lowers_to_outgoing_stack replace the obsolete too-many-args-errs test; full synth-synthesis suite (463) green.
  • Regression: control_step 13/13, flight_seam, callee-saved-490, block-brif-483, r12-spill-496 all green.
  • cargo fmt --check, clippy -D warnings clean.

This is on the shipped path (unlike the optimized-path #490/#483/#496), so it warrants a bug-fix release + on-silicon confirmation from gale on the falcon component.

Addresses #503 — the >8-scalar-i32 case + diagnostic relabel. This unblocks 2 of the 3 falcon functions (func_57 10-param, func_58 25-param). func_163 (5 params incl. a 64-bit) still skips: the 64-bit stack-param lowering remains open under #503 as the follow-up. Do NOT auto-close on merge.

🤖 Generated with Claude Code

avrabe and others added 2 commits June 26, 2026 06:32
…alar params/args (#503, #242)

The arm direct selector (`select_with_stack`, the SHIPPED `--relocatable` path)
SKIPPED — emitted no code for — any function whose signature needs the AAPCS
stack-argument path beyond a conservative cap: `num_params > 8`, or a call
passing `arg_count > 8`. gale hit this on the falcon flight component: 3 reachable
helpers (10-param, 25-param, and a 64-bit case) were dropped from the ELF.

The incoming-param homing (`compute_local_layout` → `incoming_params`, offset
`frame_size+24+(k-4)*4`) and the outgoing store (`emit_stack_args`, offset
`(k-4)*4`) are both already GENERIC in the param/arg index — no fixed ≤8
structure. This lifts the two conservative `> 8` refusals and leans on the
existing 12-bit `[sp,#imm]` guards (incoming homing + `emit_stack_args`) as the
real Ok-or-Err backstop. Functions with ≤8 i32 params/args never reach the cap,
so the shared path is untouched and the frozen byte gate stays bit-identical.

Scope: only the >8-scalar-i32 case. The 64-bit stack-param case (AAPCS
pair-alignment + back-fill + 8-byte slots) stays refused — that lowering is a
#503 follow-up. All remaining stack-arg refusal warnings now cite #503, not the
closed #359 (ask #2 in the issue).

Gated by scripts/repro/stack_args_503_differential.py (CI: stack-args-503-oracle):
sum10, sum25 (incoming, reading param 24 — a high stack offset, not just the
first slot), and a caller passing 10 args (outgoing, exercising emit_stack_args)
all execute bit-identically to wasmtime under unicorn via the --relocatable path.
Unit tests test_503_ten_param_reads_incoming_stack +
test_503_nine_arg_call_lowers_to_outgoing_stack replace the now-obsolete
too-many-args-errs test. Frozen byte gate bit-identical; control_step (13/13),
flight_seam, callee-saved-490, block-brif-483, r12-spill-496 oracles all green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
A 6-param function that BOTH reads its incoming stack params (4,5) AND passes
them through a 6-arg call — the realistic falcon shape and the only path where
the incoming-offset formula `frame_size+24+(k-4)*4` must skip over the
`outgoing_arg_bytes` region that also lives in `frame_size`. Reads params 4,5
again after the call to confirm the incoming region (above the frame) survives
the inner call's use of the outgoing region (frame bottom). 3 vectors, all
bit-identical to wasmtime.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.28571% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/instruction_selector.rs 89.28% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 8e917a2 into main Jun 26, 2026
21 checks passed
@avrabe avrabe deleted the fix/503-aapcs-stack-arg-cap-lift branch June 26, 2026 05:07
avrabe added a commit that referenced this pull request Jun 26, 2026
… the #509 negative result (#242) (#510)

Appends a dated slice to VCR-ORACLE-001's history covering the 2026-06-26 second
arc: #496 (R12-exhaustion decline-to-direct, PR #502), #503 (AAPCS stack-arg
cap-lift, shipped v0.16.0, PR #504), #507 (optimized-path br_table decline, PR
#508), each with its new CI execution oracle (r12-spill-496, stack-args-503,
br-table-507).

Most importantly it records #509 as a LOAD-BEARING NEGATIVE RESULT so the
decline-to-direct shortcut is not re-attempted: the direct selector (the shipped
--relocatable path) drops the carried value of a value-returning br/br_if/
br_table-direct, and the shortcut cannot fix it because the IR carries no
block-result arity (WasmOp::Block is a unit variant) — a non-empty-vstack decline
can't distinguish a carried result from an unwound void-target value, so it
over-refuses valid code. This is the sharpest evidence yet for VCR-SEL-001's
selector-collapse motivation (the hand-written selectors lack the structured
block-arity model a verified DSL carries by construction); the real fix is
arity-threading, cross-cutting and silicon-gated.

Docs-only, behavior-frozen by construction: frozen byte gate bit-identical;
rivet validate non-xref ERROR count 0 (unchanged 49/75/0 baseline).

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Arg-register lowering drops/shifts the first arg for calls with 5 args + struct(sret) return (cortex-m4f, --native-pointer-abi)

1 participant