Skip to content

Commit 2c05dca

Browse files
avrabeclaude
andauthored
feat(vcr-ra): optimized-path const-CSE behind SYNTH_CONST_CSE (flag-off) (#242) (#514)
The optimized (non-`--relocatable`) ARM path re-materializes a constant at every use — the same `i32.const N` becomes a fresh movw/movt (or mov) each time. On the silicon hot path this is the dominant redundancy class (61% of flat_flight's const materializations target a value already in a register). Add a pressure-neutral const cache in `ir_to_arm` (optimizer_bridge.rs): when the wanted value already lives in a still-valid register, alias the new vreg to that register and emit NO materialization. Aliasing never adds register pressure — the value is already resident, so it can only SHARE a register, never demand one. The cache (`reg_holds_const`, keyed by u32 bit-pattern so a negative i32 matches its movw/movt reconstruction) is rebuilt from the EMITTED ARM at the top of each lowering step — so it survives the many `continue` arms — and RESET at every control-flow boundary (an unmodeled `reg_effect` op), confining reuse to straight-line segments. Byte-CHANGING codegen, so it ships DEFAULT-OFF (`SYNTH_CONST_CSE`): - OFF ⇒ byte-identical. Gated by const_cse_reduction_242.rs's golden, a pinned FNV-1a of the flag-off optimized-path `.text` for const_cse.wat, captured against the pre-change tree (stash-compare verified equal). The frozen differential fixtures compile `--relocatable` (direct path), so this golden is the ONLY gate pinning optimized-path-OFF bytes. - ON ⇒ semantics-preserving. New CI oracle const_cse_differential.py executes the flag-on build under unicorn and diffs the returned value vs wasmtime across large/small/negative/mixed consts, reuse across an if/else (cache must reset), and a 12-live-local function that forces real spills. - ON ⇒ real reduction on headroom: large3 (a >16-bit const reused 3×) is strictly smaller (movw+movt pairs collapse to aliases); inert under register pressure (never a regression). NOT a default-on flip — that is a separate, silicon-gated step. Two prerequisites are NAMED in the code and test, not assumed handled: - reg_effect DEF-COMPLETENESS (broader than the #513 consistency oracle, which only pins that reg_effect and rewrite_op AGREE — they could agree and both under-report a def, leaving a stale alias). - ALIAS-EVICTION: aliasing makes two live vregs share one register, breaking the spill model's vreg↔reg bijection. Not reachable today (the IR optimizer dedups consecutive identical consts upstream), but the flip must prove unreachability or make the spill path alias-aware. VCR-RA / epic #242. Behavior frozen on every shipped path. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent cd70aab commit 2c05dca

5 files changed

Lines changed: 496 additions & 0 deletions

File tree

.github/workflows/ci.yml

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -536,3 +536,41 @@ jobs:
536536
env:
537537
SYNTH: ./target/debug/synth
538538
run: python scripts/repro/br_table_507_differential.py
539+
540+
const-cse-242-oracle:
541+
name: const-CSE flag-on execution oracle
542+
# VCR-RA const-CSE (#242): EXECUTE the optimized path compiled with
543+
# SYNTH_CONST_CSE=1 under unicorn (UC_ARCH_ARM / Thumb) and diff the returned
544+
# value vs wasmtime across redundant-const shapes (large/small/negative/mixed
545+
# consts, reuse ACROSS an if/else where the cache must reset, and a
546+
# 12-live-local function that forces real spills). The const cache aliases a
547+
# repeated const to the register already holding it; this proves the aliasing
548+
# is semantics-preserving on the flag-ON path. The flag ships DEFAULT-OFF
549+
# (off ⇒ byte-identical, pinned by const_cse_reduction_242.rs's golden), so
550+
# nothing else exercises the optimized-path const cache. Isolated job:
551+
# emulation deps pip-installed here ONLY.
552+
runs-on: ubuntu-latest
553+
steps:
554+
- uses: actions/checkout@v7
555+
- uses: dtolnay/rust-toolchain@stable
556+
- name: Cache Cargo dependencies
557+
uses: actions/cache@v5
558+
with:
559+
path: |
560+
~/.cargo/registry
561+
~/.cargo/git
562+
target/
563+
key: ${{ runner.os }}-cargo-${{ hashFiles('**/Cargo.lock') }}
564+
restore-keys: |
565+
${{ runner.os }}-cargo-
566+
- name: Build synth
567+
run: cargo build -p synth-cli
568+
- uses: actions/setup-python@v5
569+
with:
570+
python-version: "3.x"
571+
- name: Install emulation deps
572+
run: pip install wasmtime unicorn pyelftools
573+
- name: Run const-CSE flag-on oracle
574+
env:
575+
SYNTH: ./target/debug/synth
576+
run: python scripts/repro/const_cse_differential.py
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
//! VCR-RA const-CSE (#242) — CI-gated reduction + frozen-safety oracle.
2+
//!
3+
//! The optimized (non-`--relocatable`) ARM path re-materializes a constant at
4+
//! every use (`i32.const N` → a fresh `movw`/`movt` each time). gale measured
5+
//! this on silicon: flat_flight spends 61% of its const materializations on
6+
//! values already held in a register. `SYNTH_CONST_CSE=1` enables a
7+
//! pressure-neutral const cache (`optimizer_bridge.rs`) that aliases a repeated
8+
//! const to the register already holding it, emitting nothing.
9+
//!
10+
//! This is byte-CHANGING codegen, so the flag ships DEFAULT-OFF. Two claims are
11+
//! locked here as executable CI gates; semantic equivalence under flag-ON is the
12+
//! separate `const_cse_differential.py` unicorn-vs-wasmtime oracle:
13+
//!
14+
//! 1. FROZEN-SAFE (OFF ≡ pre-change baseline). With the flag OFF the optimized
15+
//! path emits a SPECIFIC, pinned `.text` — the golden below was captured
16+
//! against the pre-const-CSE tree (verified equal by a `git stash` compare:
17+
//! post-change-OFF hash == pre-change hash, both `8c3dfcbb…`). The frozen
18+
//! differential fixtures (control_step / flight_algo) compile `--relocatable`
19+
//! → they exercise the DIRECT path and never touch this code, so this golden
20+
//! is the ONLY gate pinning optimized-path-OFF bytes. A golden, not a
21+
//! compile-twice determinism check: determinism alone would not catch the
22+
//! flag-off path drifting away from the byte-identical baseline.
23+
//!
24+
//! 2. REAL REDUCTION ON HEADROOM. On `large3` — a >16-bit const reused 3× with
25+
//! ample free registers — the flag-ON `.text` is STRICTLY SMALLER (the two
26+
//! redundant `movw`+`movt` pairs collapse to register aliases). If a future
27+
//! change makes CSE inert on headroom, this fails.
28+
//!
29+
//! WHAT THIS DOES NOT CLAIM — the named prerequisites for a default-ON flip
30+
//! (a separate, silicon-gated release, NOT this PR):
31+
//! - reg_effect DEF-COMPLETENESS. The cache's "never stale-wrong" property
32+
//! rests on `liveness::reg_effect` reporting EVERY GP-register a non-const op
33+
//! writes (so the reconciliation clears a clobbered alias). That is a broader
34+
//! property than the #513 reg_effect↔rewrite_op *consistency* oracle, which
35+
//! only pins that the two AGREE — they could agree and both under-report. The
36+
//! flip must be gated on op-coverage of reg_effect, not on #513.
37+
//! - ALIAS-EVICTION. Aliasing `dest` to an existing register makes two live
38+
//! vregs share one physical register, breaking the spill model's vreg↔reg
39+
//! bijection. If the OLDER alias is chosen as a spill victim while the
40+
//! younger keeps the alias, the freed register is reused under the younger →
41+
//! stale read. Not reachable in today's fixtures (the IR optimizer dedups
42+
//! two consecutive identical consts before they reach this pass), but the
43+
//! flip must either prove unreachability or make the spill path alias-aware.
44+
45+
use std::collections::HashMap;
46+
use std::path::Path;
47+
use std::process::Command;
48+
49+
use object::read::elf::ElfFile32;
50+
use object::{Object, ObjectSection};
51+
52+
/// Golden FNV-1a-64 of the flag-OFF optimized-path `.text` for `const_cse.wat`.
53+
/// Captured against the pre-const-CSE tree (stash-compare verified). Re-bless
54+
/// ONLY when an intentional optimized-path lowering change is made — a surprise
55+
/// failure here means the supposedly-frozen flag-off path drifted.
56+
const GOLDEN_OFF_TEXT_FNV1A: u64 = 0xa68a_a2da_e5af_e4a7;
57+
const GOLDEN_OFF_TEXT_LEN: usize = 576;
58+
59+
fn synth() -> &'static str {
60+
env!("CARGO_BIN_EXE_synth")
61+
}
62+
63+
fn fixture() -> std::path::PathBuf {
64+
Path::new(env!("CARGO_MANIFEST_DIR"))
65+
.join("../..")
66+
.join("scripts/repro/const_cse.wat")
67+
}
68+
69+
/// Compile the const-CSE fixture via the optimized path. `cse` toggles
70+
/// `SYNTH_CONST_CSE`; returns the raw ELF bytes.
71+
fn compile(out: &str, cse: bool) -> Vec<u8> {
72+
let mut cmd = Command::new(synth());
73+
if cse {
74+
cmd.env("SYNTH_CONST_CSE", "1");
75+
}
76+
let status = cmd
77+
.args([
78+
"compile",
79+
fixture().to_str().unwrap(),
80+
"-o",
81+
out,
82+
"-b",
83+
"arm",
84+
"--target",
85+
"cortex-m4",
86+
"--all-exports",
87+
])
88+
.status()
89+
.expect("run synth compile");
90+
assert!(status.success(), "synth compile failed (cse={cse})");
91+
std::fs::read(out).expect("read ELF")
92+
}
93+
94+
/// Map every named section to its bytes.
95+
fn sections(elf: &[u8]) -> HashMap<String, Vec<u8>> {
96+
let obj = ElfFile32::<object::Endianness>::parse(elf).expect("parse ELF");
97+
let mut out = HashMap::new();
98+
for sec in obj.sections() {
99+
if let Ok(name) = sec.name()
100+
&& !name.is_empty()
101+
{
102+
out.insert(name.to_string(), sec.data().unwrap_or(&[]).to_vec());
103+
}
104+
}
105+
out
106+
}
107+
108+
/// `.text` bytes of one named function, by reading its symbol size.
109+
fn func_text_len(elf: &[u8], name: &str) -> usize {
110+
use object::ObjectSymbol;
111+
let obj = ElfFile32::<object::Endianness>::parse(elf).expect("parse ELF");
112+
for sym in obj.symbols() {
113+
if sym.name() == Ok(name) {
114+
return sym.size() as usize;
115+
}
116+
}
117+
panic!("symbol {name} not found");
118+
}
119+
120+
fn fnv1a64(bytes: &[u8]) -> u64 {
121+
let mut h: u64 = 0xcbf2_9ce4_8422_2325;
122+
for &b in bytes {
123+
h ^= b as u64;
124+
h = h.wrapping_mul(0x0000_0100_0000_01b3);
125+
}
126+
h
127+
}
128+
129+
/// CLAIM 1 — flag OFF emits the pinned, pre-change-identical `.text`.
130+
#[test]
131+
fn const_cse_off_matches_frozen_baseline_242() {
132+
let off = compile("/tmp/const_cse_off.elf", false);
133+
let text = sections(&off).remove(".text").expect(".text present");
134+
assert_eq!(
135+
text.len(),
136+
GOLDEN_OFF_TEXT_LEN,
137+
"flag-off .text length drifted from the frozen baseline"
138+
);
139+
assert_eq!(
140+
fnv1a64(&text),
141+
GOLDEN_OFF_TEXT_FNV1A,
142+
"flag-off optimized-path .text drifted from the pre-const-CSE baseline \
143+
— the default-off path is supposed to be byte-identical; re-bless the \
144+
golden ONLY if this was an intentional optimized-path lowering change"
145+
);
146+
}
147+
148+
/// CLAIM 2 — flag ON strictly shrinks `large3` (a >16-bit const reused 3× with
149+
/// register headroom): the redundant movw+movt pairs become register aliases.
150+
#[test]
151+
fn const_cse_on_shrinks_headroom_function_242() {
152+
let off = compile("/tmp/const_cse_red_off.elf", false);
153+
let on = compile("/tmp/const_cse_red_on.elf", true);
154+
155+
let off_len = func_text_len(&off, "large3");
156+
let on_len = func_text_len(&on, "large3");
157+
158+
assert!(
159+
on_len < off_len,
160+
"const-CSE must shrink large3 on headroom: off={off_len}B on={on_len}B"
161+
);
162+
163+
// Each eliminated `i32.const 100000` removes a movw(4)+movt(4) = 8 bytes;
164+
// two of the three are redundant, so expect ~16 bytes saved.
165+
assert!(
166+
off_len - on_len >= 8,
167+
"expected ≥8B saved (≥1 movw+movt pair), got {}B",
168+
off_len - on_len
169+
);
170+
}

crates/synth-synthesis/src/optimizer_bridge.rs

Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2983,8 +2983,54 @@ impl OptimizerBridge {
29832983
});
29842984
}
29852985

2986+
// VCR-RA const-CSE (#242): when the same constant is already
2987+
// materialized in a still-valid register, alias the new vreg to that
2988+
// register and emit NO materialization. Pressure-neutral-or-better — the
2989+
// value is already in a register, so aliasing never adds register
2990+
// pressure (it can only share one). `reg_holds_const` (keyed by the
2991+
// u32 bit-pattern, so a negative i32 const matches its movw/movt
2992+
// reconstruction) is derived from the EMITTED ARM at the top of each
2993+
// iteration — so it survives the many `continue` arms — and is reset at
2994+
// every control-flow boundary, confining CSE to straight-line segments.
2995+
// Opt-in `SYNTH_CONST_CSE=1`; off ⇒ byte-identical (no new state read).
2996+
let const_cse = std::env::var("SYNTH_CONST_CSE").is_ok();
2997+
let mut reg_holds_const: HashMap<Reg, u32> = HashMap::new();
2998+
let mut cse_seen_len = 0usize;
2999+
29863000
// Second pass: generate ARM instructions
29873001
for inst in instructions {
3002+
if const_cse {
3003+
// Reconcile the cache with everything emitted since last iteration.
3004+
for op in &arm_instrs[cse_seen_len..] {
3005+
match op {
3006+
ArmOp::Movw { rd, imm16 } => {
3007+
reg_holds_const.insert(*rd, *imm16 as u32);
3008+
}
3009+
ArmOp::Movt { rd, imm16 } => {
3010+
let lo = reg_holds_const.get(rd).copied().unwrap_or(0) & 0xFFFF;
3011+
reg_holds_const.insert(*rd, lo | ((*imm16 as u32) << 16));
3012+
}
3013+
ArmOp::Mov {
3014+
rd,
3015+
op2: Operand2::Imm(v),
3016+
} => {
3017+
reg_holds_const.insert(*rd, *v as u32);
3018+
}
3019+
other => match crate::liveness::reg_effect(other) {
3020+
// Any other write clobbers the const in its dest reg(s).
3021+
Some(eff) => {
3022+
for d in eff.defs {
3023+
reg_holds_const.remove(&d);
3024+
}
3025+
}
3026+
// Unmodeled op (branch/bl/bx/label) = control-flow
3027+
// boundary: the cache cannot be trusted across it.
3028+
None => reg_holds_const.clear(),
3029+
},
3030+
}
3031+
}
3032+
cse_seen_len = arm_instrs.len();
3033+
}
29883034
match &inst.opcode {
29893035
Opcode::Nop => continue,
29903036

@@ -3073,6 +3119,33 @@ impl OptimizerBridge {
30733119
{
30743120
continue;
30753121
}
3122+
// const-CSE: if this exact value already lives in a register
3123+
// (and `dest` isn't a pre-assigned local/param), alias `dest`
3124+
// to it and emit nothing. The aliased register is protected
3125+
// from reuse while `dest` is live (it's in `vreg_to_arm`).
3126+
//
3127+
// DEFAULT-OFF prerequisite for the flip (see
3128+
// const_cse_reduction_242.rs): aliasing makes two live vregs
3129+
// share one physical register, breaking the spill model's
3130+
// vreg↔reg bijection (the eviction path at ~3168 spills a
3131+
// SINGLE victim vreg). If the older alias is spilled while the
3132+
// younger keeps the alias, the freed register is reused under
3133+
// the younger → stale read. Not reachable today (the IR
3134+
// optimizer dedups consecutive identical consts upstream), but
3135+
// default-on must prove unreachability or make the spill path
3136+
// alias-aware (spill ALL vregs sharing the victim register).
3137+
let cse_want = *value as u32;
3138+
if const_cse
3139+
&& !vreg_to_arm.contains_key(&dest.0)
3140+
&& !local_vregs.contains(&dest.0)
3141+
&& let Some(rs) = reg_holds_const
3142+
.iter()
3143+
.find_map(|(r, v)| if *v == cse_want { Some(*r) } else { None })
3144+
{
3145+
vreg_to_arm.insert(dest.0, rs);
3146+
vreg_alloc_order.push(dest.0);
3147+
continue;
3148+
}
30763149
// Allocate a register for this constant
30773150
let rd = if let Some(&r) = vreg_to_arm.get(&dest.0) {
30783151
r

scripts/repro/const_cse.wat

Lines changed: 62 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,62 @@
1+
;; VCR-RA const-CSE (#242) execution-oracle fixture.
2+
;;
3+
;; Each export materializes the SAME constant several times in a straight-line
4+
;; segment — exactly the redundant-const-materialization pattern gale measured
5+
;; on silicon (flat_flight: 61% of materializations redundant). With
6+
;; SYNTH_CONST_CSE=1 the optimized path aliases the repeated const to the
7+
;; register that already holds it and emits no second movw/movt. This harness
8+
;; proves that aliasing is SEMANTICS-PRESERVING (result bit-identical to
9+
;; wasmtime) across:
10+
;; - large3 : a >16-bit const (movw+movt) reused 3×
11+
;; - small3 : a <16-bit const (single mov) reused 3×
12+
;; - neg : a negative const (sign-extended bit-pattern) reused 2×
13+
;; - mixed : the same const interleaved with other consts/ops (xor/mul)
14+
;; - ctrl : a const reused ACROSS an if/else — the cache must RESET at the
15+
;; control-flow boundary, so the post-merge use re-materializes;
16+
;; a stale alias here would read a register the other arm clobbered
17+
;; - spill12: 12 simultaneously-live locals each = param+100000 → forces real
18+
;; register spills; proves the const cache stays correct when the
19+
;; aliased register is itself spilled (STR is a use, not a def, so
20+
;; the cache entry survives the spill and still names the live value)
21+
(module
22+
(func (export "large3") (param i32) (result i32)
23+
local.get 0 i32.const 100000 i32.add i32.const 100000 i32.add i32.const 100000 i32.add)
24+
25+
(func (export "small3") (param i32) (result i32)
26+
local.get 0 i32.const 200 i32.add i32.const 200 i32.add i32.const 200 i32.add)
27+
28+
(func (export "neg") (param i32) (result i32)
29+
local.get 0 i32.const -100000 i32.add i32.const -100000 i32.add)
30+
31+
(func (export "mixed") (param i32) (result i32)
32+
local.get 0 i32.const 70000 i32.mul i32.const 70000 i32.add
33+
i32.const 12345 i32.xor i32.const 70000 i32.add)
34+
35+
(func (export "ctrl") (param i32) (result i32)
36+
(local.get 0) (i32.const 50000) (i32.add)
37+
(local.get 0)
38+
(if (result i32) (then (i32.const 50000)) (else (i32.const 60000)))
39+
(i32.add)
40+
(i32.const 50000) (i32.add))
41+
42+
(func (export "spill12") (param i32) (result i32)
43+
(local $v0 i32) (local $v1 i32) (local $v2 i32) (local $v3 i32)
44+
(local $v4 i32) (local $v5 i32) (local $v6 i32) (local $v7 i32)
45+
(local $v8 i32) (local $v9 i32) (local $v10 i32) (local $v11 i32)
46+
(local.set $v0 (i32.add (local.get 0) (i32.const 100000)))
47+
(local.set $v1 (i32.add (local.get 0) (i32.const 100000)))
48+
(local.set $v2 (i32.add (local.get 0) (i32.const 100000)))
49+
(local.set $v3 (i32.add (local.get 0) (i32.const 100000)))
50+
(local.set $v4 (i32.add (local.get 0) (i32.const 100000)))
51+
(local.set $v5 (i32.add (local.get 0) (i32.const 100000)))
52+
(local.set $v6 (i32.add (local.get 0) (i32.const 100000)))
53+
(local.set $v7 (i32.add (local.get 0) (i32.const 100000)))
54+
(local.set $v8 (i32.add (local.get 0) (i32.const 100000)))
55+
(local.set $v9 (i32.add (local.get 0) (i32.const 100000)))
56+
(local.set $v10 (i32.add (local.get 0) (i32.const 100000)))
57+
(local.set $v11 (i32.add (local.get 0) (i32.const 100000)))
58+
(local.get $v0) (local.get $v1) (i32.add) (local.get $v2) (i32.add)
59+
(local.get $v3) (i32.add) (local.get $v4) (i32.add) (local.get $v5) (i32.add)
60+
(local.get $v6) (i32.add) (local.get $v7) (i32.add) (local.get $v8) (i32.add)
61+
(local.get $v9) (i32.add) (local.get $v10) (i32.add) (local.get $v11) (i32.add)
62+
(i32.const 100000) (i32.add)))

0 commit comments

Comments
 (0)