Skip to content

test(vcr-oracle): estimator↔encoder agreement oracle for the optimized path (#498, #242)#511

Merged
avrabe merged 1 commit into
mainfrom
vcr-oracle-498-estimator-encoder-agreement
Jun 26, 2026
Merged

test(vcr-oracle): estimator↔encoder agreement oracle for the optimized path (#498, #242)#511
avrabe merged 1 commit into
mainfrom
vcr-oracle-498-estimator-encoder-agreement

Conversation

@avrabe

@avrabe avrabe commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

What

VCR-ORACLE-001 (epic #242). A CI-gated oracle that pins the optimized ARM path's hand-maintained byte-size estimator against the real Thumb-2 encoder, for every op the path emits.

Why this matters

ir_to_arm resolves branch displacements by summing estimate_arm_byte_size over the instruction stream. That estimator is a hand-maintained mirror of ArmEncoder::encode, kept by hand only because synth-synthesis cannot depend on synth-backend (the encoder lives downstream) — the structural cause behind #498. When the mirror drifts, a forward branch spanning the drifting op lands at the wrong byte: under-estimate → short (the #483-class miscompile), over-estimate → long. synth-backend sits downstream and can see both, so the oracle lives there.

How

  1. Extract the inline instr_byte_size/reg_num closures from ir_to_arm to a module-level pub fn estimate_arm_byte_size(op: &ArmOp) -> usize. Logic-identical — the whitespace-normalized diff of the match body is empty; frozen byte gate + 59 synthesis tests confirm the optimized path is bit-identical.
  2. Oracle (crates/synth-backend/tests/estimator_encoder_agreement.rs): for each op the optimized path emits, at the operand shapes it emits them in, assert estimate == encode().len() OR a documented KNOWN_GAP pinned to its exact measured (est, enc) pair. A no-wildcard coverage() match over all 220 ArmOp variants is a compile-time tripwire — a new variant won't compile until consciously classified OnPath/OffPath. (It forces classification, not an agreement case; that remains a documented manual step.)

Both failure paths verified non-vacuous: a perturbed gap value trips KNOWN_GAP CHANGED; an agree case pointed at a drift op trips NEW DRIFT.

Scope

A gap-documenting regression guard, NOT a #498 fix. Correcting the estimator is byte-changing codegen (shifts every optimized-path branch displacement) and stays a separately-gated step (re-freeze + execution differential + silicon). No release — pure refactor + test.

Findings (correct + extend #498's original report)

  • Optimized-path byte-size estimator is incomplete (Cmp/Cmn/Adds/Subs/Popcnt) — latent branch-displacement drift #498's claim that Cmp high-reg drifts is FALSE — 16-bit CMP (T2, 0x45xx) encodes high regs → 2 bytes; the estimator default is right. The real high-reg drifts are Cmn/Adds/Subs (no 16-bit high-reg / flag-setting form) → 4, est 2.
  • Popcnt is absent from the estimator entirely (_ => 2) but the encoder expands it to 86 bytes — an 84-byte hole, the largest single drift.
  • I64DivU/RemU/DivS/RemS, I64Popcnt, I64Extend32S over-estimate (e.g. DivU est 100 vs 74).
  • far BOffset/BCondOffset need the 4-byte form but the estimator sizes the pre-resolution 0-offset placeholder as 2 (single-pass chicken-and-egg).
  • Mov small-negative imm: the encoder's signed imm <= 255 test emits a wrong-value 2-byte MOVS #(imm&0xFF) — a latent encoder bug, surfaced here as a side effect.

Refs #498, #242.

🤖 Generated with Claude Code

…d path (#498, #242)

VCR-ORACLE-001. The optimized ARM path (`ir_to_arm`) resolves branch
displacements by summing a hand-maintained byte-size *estimator* over the
instruction stream — a mirror of the Thumb-2 encoder, kept by hand only
because synth-synthesis cannot depend on synth-backend (the encoder lives
downstream). When the mirror drifts, a forward branch spanning the drifting op
lands at the wrong byte (the #483-class miscompile). This is the structural
cause behind #498.

- Extract the inline `instr_byte_size`/`reg_num` closures from `ir_to_arm` to a
  module-level `pub fn estimate_arm_byte_size(op: &ArmOp) -> usize`
  (logic-identical; whitespace-normalized diff of the match body is empty).
  Frozen byte gate + 59 synthesis tests confirm the optimized path is
  bit-identical.
- Add `crates/synth-backend/tests/estimator_encoder_agreement.rs` (synth-backend
  CAN see both the estimator and the real encoder): for every op the optimized
  path emits, at the operand shapes it emits them in, assert
  `estimate_arm_byte_size(op) == ArmEncoder::encode(op).len()` OR a documented
  `KNOWN_GAP` pinned to its exact measured (est, enc) pair. A no-wildcard
  `coverage()` match over all 220 `ArmOp` variants is a compile-time tripwire:
  a new variant won't compile until consciously classified OnPath/OffPath. (It
  forces classification, NOT an agreement case — an OnPath variant with no
  `cases()` entry still passes vacuously; adding it is a documented manual step.)

Scope: a gap-documenting REGRESSION GUARD, NOT a #498 fix — correcting the
estimator is byte-changing codegen (separately gated). Findings the oracle
records (correct + extend #498's report):
- #498's claim that `Cmp` high-reg drifts is FALSE: 16-bit CMP (T2, 0x45xx)
  encodes high regs → 2 bytes. The real high-reg drifts are `Cmn`/`Adds`/`Subs`
  (no 16-bit high-reg / flag-setting form) → 4, est 2.
- `Popcnt` is absent from the estimator entirely (`_ => 2`) but the encoder
  expands it to 86 bytes — an 84-byte hole, the largest single drift.
- `I64DivU/RemU/DivS/RemS`, `I64Popcnt`, `I64Extend32S` over-estimate.
- far `BOffset`/`BCondOffset` need the 4-byte form but the estimator sizes the
  pre-resolution 0-offset placeholder as 2 (single-pass chicken-and-egg).
- `Mov` small-negative imm: encoder's signed `imm <= 255` test emits a
  wrong-value 2-byte `MOVS #(imm&0xFF)` — a latent encoder bug, surfaced here.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@codecov

codecov Bot commented Jun 26, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 95.18072% with 4 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/synth-synthesis/src/optimizer_bridge.rs 95.18% 4 Missing ⚠️

📢 Thoughts on this report? Let us know!

@avrabe avrabe merged commit 861d4c0 into main Jun 26, 2026
22 checks passed
@avrabe avrabe deleted the vcr-oracle-498-estimator-encoder-agreement branch June 26, 2026 11:18
avrabe added a commit that referenced this pull request Jun 26, 2026
…n estimator (#511 follow-on, #242) (#512)

The `#[cfg(test)]` byte-counting helper `count_arm_byte_size` was a
hand-maintained mirror of the optimized-path size table — a drifted copy with
its own `_ => 4` default and only a partial op set. PR #511 extracted that table
to `estimate_arm_byte_size` AND established the real independent check (the
`estimator_encoder_agreement` oracle, which pins the table against the actual
Thumb-2 encoder, the ground truth). With the encoder as the independent oracle,
the local hand-drifted proxy is redundant.

Replace its body with `arm.iter().map(estimate_arm_byte_size).sum()` and delete
the now-unused `reg_idx` test helper (−43 lines). The three byte-count tests
(`test_issue94_*`) assert `bytes < 30` on POST-optimization sequences
(Mov/Movw/Asr, all ≤4 in both tables) plus direct structural checks
(`!has_runtime_shift`, `asr_count == 1`); the production estimator's `_ => 2`
default yields counts ≤ the old proxy, so every assertion still holds.

Test-only: production codegen is untouched — `estimate_arm_byte_size` is
unchanged, only the test helper's body is replaced (frozen-by-construction).
Whole synth-synthesis suite green (463 lib tests); no unused-symbol warnings
(confirms `reg_idx` had no other consumer). Scope: test consolidation, not a
codegen change.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
avrabe added a commit that referenced this pull request Jun 26, 2026
… (#513)

The register allocator reads each op's def/use classification two ways that MUST
agree: `reg_effect` (liveness — which registers an op defines vs uses) and
`rewrite_op` (renaming — which fields it rewrites through the def-map vs the
use-map). If they drift — an op edited in one but not the other, or a new op
modeled inconsistently — the allocator renames a def as a use and silently
miscompiles. liveness.rs is the actively-churned heart of VCR-RA and nothing
pinned this invariant.

This is the Track-A (allocator) analogue of the #511 Track-B (encoder)
agreement oracle. There is no third ground truth here, so the achievable
invariant is mutual CONSISTENCY, checked structurally without a register
extractor: build the def/use maps FROM `reg_effect`'s classification (def regs →
a def sentinel, use regs → a use sentinel, read-modify-write regs → one shared
sentinel so `rewrite_op` doesn't decline), apply `rewrite_op`, then read the
result back with `reg_effect`. If the two agree on every field, every register
is rerouted to a sentinel; a SURVIVING original register means `rewrite_op`
routed a field through the opposite map — the drift.

What the oracle pins, for all 55 modeled ops:
- the def/use ROLE of every field (survivor check), and
- the read-modify-write PROPERTY of dual-role fields (Movt/MovtSym/SelectMove
  `rd`): a register `reg_effect` reports in both defs and uses must make
  `rewrite_op` DECLINE when the two maps disagree on it — otherwise the shared
  sentinel would mask a drift that turned the RMW field def-only or use-only.

- `is_modeled`: a no-wildcard match over all 220 `ArmOp` variants — a new
  variant won't compile until classified (the tripwire; it already caught `B`
  and `Nop` being mis-bucketed during authoring). The modeled (true) side is
  exhaustive (careful 55-variant extraction, all constructed + checked); the
  unmodeled (false) side is spot-sampled.

Scope: a regression GUARD, not a bug fix — the classification AGREES for every
modeled op today (measured exhaustively). Test-only; no production code changes.
Negative tests confirmed non-vacuous on BOTH branches: misrouting one op's `rd`
(def→use) trips the survivor check; dropping `Movt`'s RMW decline trips the RMW
check. 464 synth-synthesis lib tests green.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant