Skip to content

Pinch prototype#17

Closed
fmasalha wants to merge 1 commit into
mozilla-firefox:mainfrom
fmasalha:pinchProtoJan5
Closed

Pinch prototype#17
fmasalha wants to merge 1 commit into
mozilla-firefox:mainfrom
fmasalha:pinchProtoJan5

Conversation

@fmasalha
Copy link
Copy Markdown
Contributor

@fmasalha fmasalha commented Jan 6, 2026

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Jan 6, 2026

View this pull request in Lando to land it once approved.

@shtrom
Copy link
Copy Markdown
Contributor

shtrom commented Jan 7, 2026

@fmasalha you'll need to edit your PR to point to the autoland branch (see bug 2007050).

@fmasalha fmasalha closed this Jan 9, 2026
@shtrom shtrom reopened this Jan 12, 2026
@shtrom shtrom closed this Jan 12, 2026
runlevel5 added a commit to runlevel5/firefox that referenced this pull request Apr 30, 2026
…est + ma_*TestCarry

Audit-driven cleanup of redundant scratch-acquires and dead instructions
across several helper families. All fixes share the same theme:
identify a GPR scratch that's allocated for a single-use intermediate
when the same operation can be done in-place or via a pre-existing
delegation.

Concretely:

(1) branchMul32: scratch is dead after `mulld dest, dest, scratch`
(consumed as RB), so the sign-extension round-trip reuses it instead
of acquiring a second scratch.

(2) branchTest32(Address, Imm32) / branchTest32(AbsoluteAddress, Imm32) /
branchTestPtr(Address, Imm32) / test32MovePtr: previously had inline
"if (16-bit-fit) andi. else { scratch2 + move + and_ }". Replaced with
a direct call to and32/andPtr, which already has the rlwinm
contiguous-mask fast path (item mozilla-firefox#16/mozilla-firefox#17 in PLAN.md) for tag-mask /
header-bitfield immediates that aren't 16-bit-fit but are contig
runs. Also folds the trailing extsw inside and32.

(3) loadUnalignedSimd128(BaseIndex) / storeUnalignedSimd128(BaseIndex) /
loadDouble(BaseIndex) / loadFloat32(BaseIndex) /
store{8,16,32,Ptr}(BaseIndex) non-fitting-offset paths: previously
all had `Register scratch2 = temps.Acquire(); movePtr(ImmWord(offset),
scratch2); stXx(src, scratch, scratch2)` (or load equivalent). The
addPtr(ImmWord, Register) helper already picks up POWER10 paddi (1
prefixed insn) when the offset fits 34-bit signed, falling back to
the same movePtr+add on P9/P8. So we just `addPtr(ImmWord(offset),
scratch); stX(src, scratch, 0)` (D-form with 0 displacement, or
stdx(src, r0, scratch) for stdptr to keep alignment-agnostic). Drops
scratch2 acquire entirely and shrinks the offset emit on POWER10
from 2 insns to 1.

(4) ma_add32TestCarry: cmplw compares only the low 32 bits as
unsigned, so the explicit `rldicl ..., 0, 32` zero-extends on rs and
rd were dead. Drop them along with the scratch they needed; just
move32 + add32 + cmplw + ma_b.

(5) ma_addPtrTestCarry: rs is preserved by addPtr (which writes only
rd), so when rd != rs we can compare result vs rs directly without
saving rs to a scratch first. The rd == rs case still needs the
scratch since addPtr clobbers the input — split into two paths.

PLAN.md items mozilla-firefox#19, mozilla-firefox#20 superseded by these; remaining items mozilla-firefox#29-mozilla-firefox#31
(POWER10 prefixed-load/store family) still standalone wins.
runlevel5 added a commit to runlevel5/firefox that referenced this pull request May 11, 2026
…"lvx-only dest write"

Twelve more probes building on the trunc_sat helper. Key results:

  mozilla-firefox#10  load-via-scratch loadUnalignedSimd128: 20/20 REPRO (no help)
  mozilla-firefox#15  compact single-write-to-dest SIMD: 10/10 REPRO at 30s
  mozilla-firefox#16  scalar body with correct lane mapping: 6/10 = 60% REPRO
  mozilla-firefox#17  sync;isync between two dest writes: 9/10 REPRO (no help)
  mozilla-firefox#18  single-instruction xvcvspsxws dest,src: 10/10 REPRO
  mozilla-firefox#19  3-insn body with 50 nops between dest writes: 8/10 REPRO
  mozilla-firefox#20  xvcvspsxws scratch + stxvd2x + lvx dest: 10/20 = 50% REPRO  *
  mozilla-firefox#21  NaN-correct probe mozilla-firefox#20 (3 SIMD ops on scratch): 7/10 REPRO

Probe mozilla-firefox#20 is the best partial fix. The KEY pattern: dest is written
exactly once, via Altivec lvx from memory, with the trunc_sat result
computed in scratch first. Adding more SIMD ops on scratch (probe mozilla-firefox#21)
hurts even though the dest-write count is unchanged.

NaN handling currently dropped (NaN -> INT32_MIN instead of NaN -> 0).
Visually breaks lc.wasm rendering but does not affect the OOB rate
measurement. Needs a wasm-correct re-derivation before landing.

Residual 50% fires at DIFFERENT bytecode locations (29750, 105808)
with the SAME captured vr5 = vperm-control byte pattern. Same root-cause
corruption manifesting through other Ion SIMD paths. Helper-by-helper
patching reduces but cannot eliminate -- the bug appears upstream of
multiple lowered SIMD ops.

Build verified clean, source committed for the ppc64 branch.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants