test(vcr-oracle): estimator↔encoder agreement oracle for the optimized path (#498, #242)#511
Merged
Merged
Conversation
…d path (#498, #242) VCR-ORACLE-001. The optimized ARM path (`ir_to_arm`) resolves branch displacements by summing a hand-maintained byte-size *estimator* over the instruction stream — a mirror of the Thumb-2 encoder, kept by hand only because synth-synthesis cannot depend on synth-backend (the encoder lives downstream). When the mirror drifts, a forward branch spanning the drifting op lands at the wrong byte (the #483-class miscompile). This is the structural cause behind #498. - Extract the inline `instr_byte_size`/`reg_num` closures from `ir_to_arm` to a module-level `pub fn estimate_arm_byte_size(op: &ArmOp) -> usize` (logic-identical; whitespace-normalized diff of the match body is empty). Frozen byte gate + 59 synthesis tests confirm the optimized path is bit-identical. - Add `crates/synth-backend/tests/estimator_encoder_agreement.rs` (synth-backend CAN see both the estimator and the real encoder): for every op the optimized path emits, at the operand shapes it emits them in, assert `estimate_arm_byte_size(op) == ArmEncoder::encode(op).len()` OR a documented `KNOWN_GAP` pinned to its exact measured (est, enc) pair. A no-wildcard `coverage()` match over all 220 `ArmOp` variants is a compile-time tripwire: a new variant won't compile until consciously classified OnPath/OffPath. (It forces classification, NOT an agreement case — an OnPath variant with no `cases()` entry still passes vacuously; adding it is a documented manual step.) Scope: a gap-documenting REGRESSION GUARD, NOT a #498 fix — correcting the estimator is byte-changing codegen (separately gated). Findings the oracle records (correct + extend #498's report): - #498's claim that `Cmp` high-reg drifts is FALSE: 16-bit CMP (T2, 0x45xx) encodes high regs → 2 bytes. The real high-reg drifts are `Cmn`/`Adds`/`Subs` (no 16-bit high-reg / flag-setting form) → 4, est 2. - `Popcnt` is absent from the estimator entirely (`_ => 2`) but the encoder expands it to 86 bytes — an 84-byte hole, the largest single drift. - `I64DivU/RemU/DivS/RemS`, `I64Popcnt`, `I64Extend32S` over-estimate. - far `BOffset`/`BCondOffset` need the 4-byte form but the estimator sizes the pre-resolution 0-offset placeholder as 2 (single-pass chicken-and-egg). - `Mov` small-negative imm: encoder's signed `imm <= 255` test emits a wrong-value 2-byte `MOVS #(imm&0xFF)` — a latent encoder bug, surfaced here. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
avrabe
added a commit
that referenced
this pull request
Jun 26, 2026
…n estimator (#511 follow-on, #242) (#512) The `#[cfg(test)]` byte-counting helper `count_arm_byte_size` was a hand-maintained mirror of the optimized-path size table — a drifted copy with its own `_ => 4` default and only a partial op set. PR #511 extracted that table to `estimate_arm_byte_size` AND established the real independent check (the `estimator_encoder_agreement` oracle, which pins the table against the actual Thumb-2 encoder, the ground truth). With the encoder as the independent oracle, the local hand-drifted proxy is redundant. Replace its body with `arm.iter().map(estimate_arm_byte_size).sum()` and delete the now-unused `reg_idx` test helper (−43 lines). The three byte-count tests (`test_issue94_*`) assert `bytes < 30` on POST-optimization sequences (Mov/Movw/Asr, all ≤4 in both tables) plus direct structural checks (`!has_runtime_shift`, `asr_count == 1`); the production estimator's `_ => 2` default yields counts ≤ the old proxy, so every assertion still holds. Test-only: production codegen is untouched — `estimate_arm_byte_size` is unchanged, only the test helper's body is replaced (frozen-by-construction). Whole synth-synthesis suite green (463 lib tests); no unused-symbol warnings (confirms `reg_idx` had no other consumer). Scope: test consolidation, not a codegen change. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe
added a commit
that referenced
this pull request
Jun 26, 2026
… (#513) The register allocator reads each op's def/use classification two ways that MUST agree: `reg_effect` (liveness — which registers an op defines vs uses) and `rewrite_op` (renaming — which fields it rewrites through the def-map vs the use-map). If they drift — an op edited in one but not the other, or a new op modeled inconsistently — the allocator renames a def as a use and silently miscompiles. liveness.rs is the actively-churned heart of VCR-RA and nothing pinned this invariant. This is the Track-A (allocator) analogue of the #511 Track-B (encoder) agreement oracle. There is no third ground truth here, so the achievable invariant is mutual CONSISTENCY, checked structurally without a register extractor: build the def/use maps FROM `reg_effect`'s classification (def regs → a def sentinel, use regs → a use sentinel, read-modify-write regs → one shared sentinel so `rewrite_op` doesn't decline), apply `rewrite_op`, then read the result back with `reg_effect`. If the two agree on every field, every register is rerouted to a sentinel; a SURVIVING original register means `rewrite_op` routed a field through the opposite map — the drift. What the oracle pins, for all 55 modeled ops: - the def/use ROLE of every field (survivor check), and - the read-modify-write PROPERTY of dual-role fields (Movt/MovtSym/SelectMove `rd`): a register `reg_effect` reports in both defs and uses must make `rewrite_op` DECLINE when the two maps disagree on it — otherwise the shared sentinel would mask a drift that turned the RMW field def-only or use-only. - `is_modeled`: a no-wildcard match over all 220 `ArmOp` variants — a new variant won't compile until classified (the tripwire; it already caught `B` and `Nop` being mis-bucketed during authoring). The modeled (true) side is exhaustive (careful 55-variant extraction, all constructed + checked); the unmodeled (false) side is spot-sampled. Scope: a regression GUARD, not a bug fix — the classification AGREES for every modeled op today (measured exhaustively). Test-only; no production code changes. Negative tests confirmed non-vacuous on BOTH branches: misrouting one op's `rd` (def→use) trips the survivor check; dropping `Movt`'s RMW decline trips the RMW check. 464 synth-synthesis lib tests green. Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
VCR-ORACLE-001 (epic #242). A CI-gated oracle that pins the optimized ARM path's hand-maintained byte-size estimator against the real Thumb-2 encoder, for every op the path emits.
Why this matters
ir_to_armresolves branch displacements by summingestimate_arm_byte_sizeover the instruction stream. That estimator is a hand-maintained mirror ofArmEncoder::encode, kept by hand only becausesynth-synthesiscannot depend onsynth-backend(the encoder lives downstream) — the structural cause behind #498. When the mirror drifts, a forward branch spanning the drifting op lands at the wrong byte: under-estimate → short (the #483-class miscompile), over-estimate → long.synth-backendsits downstream and can see both, so the oracle lives there.How
instr_byte_size/reg_numclosures fromir_to_armto a module-levelpub fn estimate_arm_byte_size(op: &ArmOp) -> usize. Logic-identical — the whitespace-normalized diff of the match body is empty; frozen byte gate + 59 synthesis tests confirm the optimized path is bit-identical.crates/synth-backend/tests/estimator_encoder_agreement.rs): for each op the optimized path emits, at the operand shapes it emits them in, assertestimate == encode().len()OR a documentedKNOWN_GAPpinned to its exact measured(est, enc)pair. A no-wildcardcoverage()match over all 220ArmOpvariants is a compile-time tripwire — a new variant won't compile until consciously classified OnPath/OffPath. (It forces classification, not an agreement case; that remains a documented manual step.)Both failure paths verified non-vacuous: a perturbed gap value trips
KNOWN_GAP CHANGED; anagreecase pointed at a drift op tripsNEW DRIFT.Scope
A gap-documenting regression guard, NOT a #498 fix. Correcting the estimator is byte-changing codegen (shifts every optimized-path branch displacement) and stays a separately-gated step (re-freeze + execution differential + silicon). No release — pure refactor + test.
Findings (correct + extend #498's original report)
Cmphigh-reg drifts is FALSE — 16-bit CMP (T2,0x45xx) encodes high regs → 2 bytes; the estimator default is right. The real high-reg drifts areCmn/Adds/Subs(no 16-bit high-reg / flag-setting form) → 4, est 2.Popcntis absent from the estimator entirely (_ => 2) but the encoder expands it to 86 bytes — an 84-byte hole, the largest single drift.I64DivU/RemU/DivS/RemS,I64Popcnt,I64Extend32Sover-estimate (e.g. DivU est 100 vs 74).BOffset/BCondOffsetneed the 4-byte form but the estimator sizes the pre-resolution 0-offset placeholder as 2 (single-pass chicken-and-egg).Movsmall-negative imm: the encoder's signedimm <= 255test emits a wrong-value 2-byteMOVS #(imm&0xFF)— a latent encoder bug, surfaced here as a side effect.Refs #498, #242.
🤖 Generated with Claude Code