Skip to content

Commit b8cfd03

Browse files
avrabeclaude
andauthored
fix(#372): i64.load/i64.store correctness + v0.11.47 (#373)
* track(#372): i64.load/i64.store 3-layer gap — repro + numeric oracle + decoder WIP NOT a fix (partial WIP). Diagnosed why falcon loud-skips 39 i64.load/store sites on v0.11.46. It's THREE layers, worse than the issue assumed (gale thought the lowering was ready): L1 DECODER GAP: convert_operator decodes the narrow i64 loads (I64Load8..32) and I32Load/Store but has NO arm for full-width I64Load/I64Store -> _ => None -> dropped -> loud-skipped since v0.11.46 (GI-FPU-001). (decoder arm added here as WIP — INSUFFICIENT alone: see L2.) L2 OPTIMIZED-PATH STUB: with the op decoded, the default optimized path drops it -> stub (ld64 -> `bx lr`, st64 -> `mov r0,r1`). So the decoder arm ALONE is net-negative (turns the honest loud-skip into a silent stub again). Needs an optimizer decline -> direct-selector fallback (mirror #120/#188). L3 ENCODER DROPS THE ADDRESS: arm_encoder.rs:5303 I64Ldr/I64Str use addr.base + addr.offset but IGNORE addr.index (the address register). Emits [R11+offset]/[R11+offset+4], dropping the operand -> reads the WRONG location. Proven numerically: ld64(16) returns mem[0] (0xaa..), not mem[16] (i64_load_store_372_differential.py). Same class as #206 indexed-load drop. Fix (next block): decoder arm + optimizer decline + I64Ldr/I64Str encoder index (materialize ip = addr_reg + offset; ldr/str [R11, ip] {,+4}, like #206), gated on the numeric differential. Frozen fixtures use no i64.load/store. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * fix(#372): lower i64.load/i64.store correctly (decoder + decline + encoder index) falcon loud-skipped 39 i64.load/store sites on v0.11.46. The gap was 3 layers, and the lowering itself was broken (not just unwired): L1 DECODER: convert_operator decoded the narrow i64 loads (I64Load8..32) and I32Load/Store but had NO arm for full-width I64Load/I64Store -> _ => None -> dropped -> loud-skipped. Added the two arms. L2 OPTIMIZER: the default optimized path has no IR opcode for them and dropped them to a stub (ld64 -> `bx lr`). optimize_full now DECLINES i64.load/store -> falls back to the direct selector (the #120/#188 pattern). L3 ENCODER (the real bug): arm_encoder.rs I64Ldr/I64Str used addr.base+offset and IGNORED addr.offset_reg -> emitted [R11+offset], dropping the address. Proven: ld64(16) returned mem[0], not mem[16]. New i64_effective_base() materializes `ADD.W ip, base, index` (byte-verified) then loads/stores via [ip,#off]/[ip,#off+4]; non-indexed frame access (offset_reg=None) is unchanged -> byte-identical. Same class as #206. Verified: ld64(16)=mem[16] and st64 use the address on BOTH the optimized and direct paths (i64_load_store_372_differential.py); frozen oracles byte-identical (control_step 0x00210A55 13/13, flight_seam 0x07FDF307, div_const 338/338); i64-FRAME fixtures byte-identical (u64_unpack); unit test test_372_i64_ldr_indexed_materializes_address; 39 suites green; fmt+clippy clean. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore(release): v0.11.47 — #372 i64.load/store correctness Pin sweep 0.11.46 -> 0.11.47 (workspace + 10 path-deps + MODULE.bazel + Cargo.lock). CHANGELOG v0.11.47 with falsification. rivet GI-MEM-001 -> implemented + GI-MEM-VER-001. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 050e077 commit b8cfd03

19 files changed

Lines changed: 257 additions & 46 deletions

File tree

CHANGELOG.md

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,34 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.11.47] - 2026-06-18
11+
12+
**CORRECTNESS — #372: `i64.load`/`i64.store` now lower correctly (they were
13+
dropped, then mis-addressed). Unblocks 39 falcon i64-memory sites.**
14+
15+
- **#372 — full-width i64 load/store** (#373): three independent defects, the
16+
deepest a real miscompile:
17+
- **decoder**`convert_operator` decoded the narrow forms
18+
(`I64Load8..32`, `I64Store8..32`) and `I32Load/Store` but had **no arm for
19+
full-width `I64Load`/`I64Store`**, so they fell through `_ => None`, were
20+
dropped, and (since v0.11.46) loud-skipped. Added the two arms.
21+
- **optimizer** — the optimized path has no IR opcode for them and dropped
22+
them to a stub (`ld64``bx lr`). `optimize_full` now **declines**
23+
`i64.load`/`i64.store` and falls back to the direct selector (the #120/#188
24+
pattern), which lowers them as a lo/hi register pair.
25+
- **encoder (the real bug)**`I64Ldr`/`I64Str` used `addr.base + offset`
26+
and **ignored `addr.offset_reg`**, emitting `[R11 + offset]` and **dropping
27+
the address operand**: `ld64(16)` read `mem[0]`, not `mem[16]` (same class
28+
as #206). The new `i64_effective_base` materializes `ADD.W ip, base, index`
29+
then loads/stores via `[ip, #off]`/`[ip, #off+4]`. Non-indexed (frame) i64
30+
access keeps the plain `[base, #off]` form → byte-identical.
31+
32+
**Falsification:** `ld64(addr)`/`st64(addr)` now read/write `mem[addr]` on
33+
both the optimized and direct paths (`i64_load_store_372_differential.py`);
34+
i64-*frame* access and the three frozen oracles stay bit-identical
35+
(control_step `0x00210A55` 13/13, flight_seam `0x07FDF307`, div_const 338/338;
36+
`u64_unpack` byte-identical). No fixture uses `i64.load`/`i64.store`.
37+
1038
## [0.11.46] - 2026-06-17
1139

1240
**CORRECTNESS — #369: an op the backend cannot lower now LOUD-SKIPS its

Cargo.lock

Lines changed: 17 additions & 17 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ resolver = "2"
2727
# semver to publish, so the convention now catches up: workspace
2828
# version follows the release tag, bumped pre-tag in the release
2929
# checklist. See docs/release-process.md.
30-
version = "0.11.46"
30+
version = "0.11.47"
3131
edition = "2024"
3232
rust-version = "1.88"
3333
authors = ["PulseEngine Team"]

MODULE.bazel

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ module(
77
name = "synth",
88
# Kept in lockstep with [workspace.package] version in Cargo.toml.
99
# Both are bumped pre-tag — see docs/release-process.md.
10-
version = "0.11.46",
10+
version = "0.11.47",
1111
)
1212

1313
# Bazel dependencies

artifacts/gale-integration.yaml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -695,3 +695,56 @@ artifacts:
695695
pass-criteria: >
696696
cargo test test_369 green; f32 module emits no fadd/dmul symbol;
697697
control_step / flight_seam / div_const differentials byte-identical.
698+
699+
- id: GI-MEM-001
700+
type: sw-req
701+
title: Full-width i64.load/i64.store lower correctly (#372)
702+
description: >
703+
gale/jess #372 (surfaced by the GI-FPU-001 loud-skip): full-width
704+
`i64.load`/`i64.store` were a 3-layer gap blocking 39 falcon sites.
705+
DECODER had no full-width I64Load/I64Store arm (only narrow I64Load8..32),
706+
so they dropped to `_ => None` and loud-skipped. OPTIMIZER had no IR
707+
opcode and stubbed them (`ld64` -> `bx lr`). ENCODER (the real bug)
708+
I64Ldr/I64Str ignored `addr.offset_reg`, emitting `[R11+offset]` and
709+
DROPPING the address (ld64(16) read mem[0], not mem[16] — #206 class).
710+
IMPLEMENTED v0.11.47 (#373): decoder arms + optimizer decline->direct
711+
fallback (#120/#188 pattern) + `i64_effective_base` materializing
712+
`ADD.W ip, base, index` then `[ip,#off]`/`[ip,#off+4]`; non-indexed frame
713+
i64 access unchanged (byte-identical). Verified value-level on both paths.
714+
status: implemented
715+
tags: [gale, jess, i64, memory, encoder, correctness, falcon, release-v0.11.47]
716+
links:
717+
- type: derives-from
718+
target: GI-005
719+
- type: traces-to
720+
target: gale:372
721+
fields:
722+
req-type: functional
723+
priority: must
724+
verification-criteria: >
725+
scripts/repro/i64_load_store_372_differential.py: ld64(16) returns
726+
mem[16] (0x8877665544332211), not mem[0], on BOTH the optimized and
727+
--no-optimize ELFs (unicorn); st64 materializes the index symmetrically;
728+
unit test test_372_i64_ldr_indexed_materializes_address; the three frozen
729+
oracles (control_step 0x00210A55, flight_seam 0x07FDF307, div_const
730+
338/338) and i64-frame fixtures (u64_unpack) stay byte-identical.
731+
732+
- id: GI-MEM-VER-001
733+
type: sw-verification
734+
title: "i64.load/store value-level + frozen byte-identity (#372)"
735+
description: >
736+
Verifies GI-MEM-001 (v0.11.47). Numeric differential
737+
i64_load_store_372_differential.py proves ld64 reads mem[addr] (was mem[0]
738+
pre-fix) on both compile paths; unit test asserts indexed I64Ldr emits the
739+
ADD.W index materialization and non-indexed does not; 39 suites green;
740+
control_step/flight_seam/div_const + u64_unpack byte-identical.
741+
status: implemented
742+
tags: [gale, i64, verification, frozen, release-v0.11.47]
743+
links:
744+
- type: verifies
745+
target: GI-MEM-001
746+
fields:
747+
method: test
748+
pass-criteria: >
749+
ld64(16)=0x8877665544332211 on optimized + direct ELFs; frozen oracles
750+
+ u64_unpack byte-identical; cargo test test_372 green.

crates/synth-backend-awsm/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,6 @@ categories.workspace = true
1111
description = "aWsm backend integration for the Synth compiler"
1212

1313
[dependencies]
14-
synth-core = { path = "../synth-core", version = "0.11.46" }
14+
synth-core = { path = "../synth-core", version = "0.11.47" }
1515
anyhow.workspace = true
1616
thiserror.workspace = true

crates/synth-backend-riscv/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ categories.workspace = true
1111
description = "RISC-V encoder, ELF builder, PMP allocator, and bare-metal startup for synth"
1212

1313
[dependencies]
14-
synth-core = { path = "../synth-core", version = "0.11.46" }
15-
synth-synthesis = { path = "../synth-synthesis", version = "0.11.46" }
14+
synth-core = { path = "../synth-core", version = "0.11.47" }
15+
synth-synthesis = { path = "../synth-synthesis", version = "0.11.47" }
1616
anyhow.workspace = true
1717
thiserror.workspace = true
1818
tracing.workspace = true

crates/synth-backend-wasker/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,6 @@ categories.workspace = true
1111
description = "Wasker backend integration for the Synth compiler"
1212

1313
[dependencies]
14-
synth-core = { path = "../synth-core", version = "0.11.46" }
14+
synth-core = { path = "../synth-core", version = "0.11.47" }
1515
anyhow.workspace = true
1616
thiserror.workspace = true

crates/synth-backend/Cargo.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ default = ["arm-cortex-m"]
1515
arm-cortex-m = ["synth-synthesis"]
1616

1717
[dependencies]
18-
synth-core = { path = "../synth-core", version = "0.11.46" }
19-
synth-synthesis = { path = "../synth-synthesis", version = "0.11.46", optional = true }
18+
synth-core = { path = "../synth-core", version = "0.11.47" }
19+
synth-synthesis = { path = "../synth-synthesis", version = "0.11.47", optional = true }
2020
anyhow.workspace = true
2121
thiserror.workspace = true

crates/synth-backend/src/arm_encoder.rs

Lines changed: 74 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5307,10 +5307,20 @@ impl ArmEncoder {
53075307
} else {
53085308
addr.offset as u32
53095309
};
5310-
bytes.extend_from_slice(&self.encode_thumb32_ldr(rdlo, &addr.base, offset)?);
5310+
// #372: a memory `i64.load` carries an index register
5311+
// (`reg_imm(R11, addr_reg, offset)` = R11 + addr + offset). The
5312+
// immediate `encode_thumb32_ldr` below uses only base+offset and
5313+
// would SILENTLY DROP `offset_reg` — the #206 defect, here for
5314+
// i64. Materialize the effective base `ip = base + index` first
5315+
// (ADD.W ip, base, index — byte-verified), then load with
5316+
// immediate offsets. Frame i64 loads (no `offset_reg`, e.g. a
5317+
// spilled local at `[SP, #off]`) keep the plain `[base,#off]`
5318+
// form unchanged — so existing output is byte-identical.
5319+
let base = self.i64_effective_base(&mut bytes, addr);
5320+
bytes.extend_from_slice(&self.encode_thumb32_ldr(rdlo, &base, offset)?);
53115321
bytes.extend_from_slice(&self.encode_thumb32_ldr(
53125322
rdhi,
5313-
&addr.base,
5323+
&base,
53145324
offset.wrapping_add(4),
53155325
)?);
53165326
Ok(bytes)
@@ -5324,10 +5334,12 @@ impl ArmEncoder {
53245334
} else {
53255335
addr.offset as u32
53265336
};
5327-
bytes.extend_from_slice(&self.encode_thumb32_str(rdlo, &addr.base, offset)?);
5337+
// #372: same index-materialization as I64Ldr (see above).
5338+
let base = self.i64_effective_base(&mut bytes, addr);
5339+
bytes.extend_from_slice(&self.encode_thumb32_str(rdlo, &base, offset)?);
53285340
bytes.extend_from_slice(&self.encode_thumb32_str(
53295341
rdhi,
5330-
&addr.base,
5342+
&base,
53315343
offset.wrapping_add(4),
53325344
)?);
53335345
Ok(bytes)
@@ -6355,6 +6367,29 @@ impl ArmEncoder {
63556367
Ok(bytes)
63566368
}
63576369

6370+
/// #372: resolve the base register for an `I64Ldr`/`I64Str` whose address
6371+
/// may carry an index register. If `addr.offset_reg` is set (a memory
6372+
/// `i64.load`/`i64.store`: `R11 + addr + offset`), emit `ADD.W ip, base,
6373+
/// index` and return `ip` (R12) as the base for the two immediate-offset
6374+
/// halves. If unset (a frame access at `[base, #off]`), return `addr.base`
6375+
/// unchanged — emitting nothing — so non-indexed i64 access is byte-identical.
6376+
/// `ip = base + index` is computed BEFORE the halves load, so an `rdlo`
6377+
/// aliasing the index register is safe (the address is already materialized).
6378+
fn i64_effective_base(&self, bytes: &mut Vec<u8>, addr: &MemAddr) -> Reg {
6379+
match addr.offset_reg {
6380+
Some(idx) => {
6381+
let ip = Reg::R12;
6382+
// ADD.W ip, addr.base, idx (Thumb-2, byte-verified vs as)
6383+
let hw1: u16 = 0xEB00 | reg_to_bits(&addr.base) as u16;
6384+
let hw2: u16 = 0x0C00 | reg_to_bits(&idx) as u16;
6385+
bytes.extend_from_slice(&hw1.to_le_bytes());
6386+
bytes.extend_from_slice(&hw2.to_le_bytes());
6387+
ip
6388+
}
6389+
None => addr.base,
6390+
}
6391+
}
6392+
63586393
/// Encode Thumb-2 32-bit LDR
63596394
fn encode_thumb32_ldr(&self, rd: &Reg, base: &Reg, offset: u32) -> Result<Vec<u8>> {
63606395
let rd_bits = reg_to_bits(rd);
@@ -8775,6 +8810,41 @@ mod tests {
87758810
assert!(code.len() >= 4, "I64Ldr should emit at least 4 bytes");
87768811
}
87778812

8813+
#[test]
8814+
fn test_372_i64_ldr_indexed_materializes_address() {
8815+
// #372: a memory i64.load carries an index register (R11 + addr + off).
8816+
// The encoder must materialize `ip = base + index` (ADD.W) and load via
8817+
// `[ip,#off]` — NOT drop the index. A frame (non-indexed) i64.load must
8818+
// stay byte-identical (plain `[base,#off]`, no ADD).
8819+
let encoder = ArmEncoder::new_thumb2();
8820+
let indexed = encoder
8821+
.encode(&ArmOp::I64Ldr {
8822+
rdlo: Reg::R0,
8823+
rdhi: Reg::R1,
8824+
addr: MemAddr::reg_imm(Reg::R11, Reg::R0, 0),
8825+
})
8826+
.unwrap();
8827+
// ADD.W ip, fp, r0 = eb0b 0c00 (byte-verified vs arm-none-eabi-as).
8828+
assert_eq!(
8829+
&indexed[0..4],
8830+
&[0x0b, 0xeb, 0x00, 0x0c],
8831+
"indexed I64Ldr must start with ADD.W ip, base, index"
8832+
);
8833+
let frame = encoder
8834+
.encode(&ArmOp::I64Ldr {
8835+
rdlo: Reg::R0,
8836+
rdhi: Reg::R1,
8837+
addr: MemAddr::imm(Reg::SP, 8),
8838+
})
8839+
.unwrap();
8840+
// No index -> no ADD.W prefix (byte-identical frame access).
8841+
assert_ne!(
8842+
&frame[0..2],
8843+
&[0x0b, 0xeb],
8844+
"frame (non-indexed) I64Ldr must NOT emit an ADD.W"
8845+
);
8846+
}
8847+
87788848
#[test]
87798849
fn test_encode_i64_str_thumb2() {
87808850
let encoder = ArmEncoder::new_thumb2();

0 commit comments

Comments
 (0)