Pinch prototype#17
Closed
fmasalha wants to merge 1 commit into
Closed
Conversation
Contributor
|
View this pull request in Lando to land it once approved. |
Contributor
|
@fmasalha you'll need to edit your PR to point to the |
runlevel5
added a commit
to runlevel5/firefox
that referenced
this pull request
Apr 30, 2026
…est + ma_*TestCarry
Audit-driven cleanup of redundant scratch-acquires and dead instructions
across several helper families. All fixes share the same theme:
identify a GPR scratch that's allocated for a single-use intermediate
when the same operation can be done in-place or via a pre-existing
delegation.
Concretely:
(1) branchMul32: scratch is dead after `mulld dest, dest, scratch`
(consumed as RB), so the sign-extension round-trip reuses it instead
of acquiring a second scratch.
(2) branchTest32(Address, Imm32) / branchTest32(AbsoluteAddress, Imm32) /
branchTestPtr(Address, Imm32) / test32MovePtr: previously had inline
"if (16-bit-fit) andi. else { scratch2 + move + and_ }". Replaced with
a direct call to and32/andPtr, which already has the rlwinm
contiguous-mask fast path (item mozilla-firefox#16/mozilla-firefox#17 in PLAN.md) for tag-mask /
header-bitfield immediates that aren't 16-bit-fit but are contig
runs. Also folds the trailing extsw inside and32.
(3) loadUnalignedSimd128(BaseIndex) / storeUnalignedSimd128(BaseIndex) /
loadDouble(BaseIndex) / loadFloat32(BaseIndex) /
store{8,16,32,Ptr}(BaseIndex) non-fitting-offset paths: previously
all had `Register scratch2 = temps.Acquire(); movePtr(ImmWord(offset),
scratch2); stXx(src, scratch, scratch2)` (or load equivalent). The
addPtr(ImmWord, Register) helper already picks up POWER10 paddi (1
prefixed insn) when the offset fits 34-bit signed, falling back to
the same movePtr+add on P9/P8. So we just `addPtr(ImmWord(offset),
scratch); stX(src, scratch, 0)` (D-form with 0 displacement, or
stdx(src, r0, scratch) for stdptr to keep alignment-agnostic). Drops
scratch2 acquire entirely and shrinks the offset emit on POWER10
from 2 insns to 1.
(4) ma_add32TestCarry: cmplw compares only the low 32 bits as
unsigned, so the explicit `rldicl ..., 0, 32` zero-extends on rs and
rd were dead. Drop them along with the scratch they needed; just
move32 + add32 + cmplw + ma_b.
(5) ma_addPtrTestCarry: rs is preserved by addPtr (which writes only
rd), so when rd != rs we can compare result vs rs directly without
saving rs to a scratch first. The rd == rs case still needs the
scratch since addPtr clobbers the input — split into two paths.
PLAN.md items mozilla-firefox#19, mozilla-firefox#20 superseded by these; remaining items mozilla-firefox#29-mozilla-firefox#31
(POWER10 prefixed-load/store family) still standalone wins.
runlevel5
added a commit
to runlevel5/firefox
that referenced
this pull request
May 11, 2026
…"lvx-only dest write" Twelve more probes building on the trunc_sat helper. Key results: mozilla-firefox#10 load-via-scratch loadUnalignedSimd128: 20/20 REPRO (no help) mozilla-firefox#15 compact single-write-to-dest SIMD: 10/10 REPRO at 30s mozilla-firefox#16 scalar body with correct lane mapping: 6/10 = 60% REPRO mozilla-firefox#17 sync;isync between two dest writes: 9/10 REPRO (no help) mozilla-firefox#18 single-instruction xvcvspsxws dest,src: 10/10 REPRO mozilla-firefox#19 3-insn body with 50 nops between dest writes: 8/10 REPRO mozilla-firefox#20 xvcvspsxws scratch + stxvd2x + lvx dest: 10/20 = 50% REPRO * mozilla-firefox#21 NaN-correct probe mozilla-firefox#20 (3 SIMD ops on scratch): 7/10 REPRO Probe mozilla-firefox#20 is the best partial fix. The KEY pattern: dest is written exactly once, via Altivec lvx from memory, with the trunc_sat result computed in scratch first. Adding more SIMD ops on scratch (probe mozilla-firefox#21) hurts even though the dest-write count is unchanged. NaN handling currently dropped (NaN -> INT32_MIN instead of NaN -> 0). Visually breaks lc.wasm rendering but does not affect the OOB rate measurement. Needs a wasm-correct re-derivation before landing. Residual 50% fires at DIFFERENT bytecode locations (29750, 105808) with the SAME captured vr5 = vperm-control byte pattern. Same root-cause corruption manifesting through other Ion SIMD paths. Helper-by-helper patching reduces but cannot eliminate -- the bug appears upstream of multiple lowered SIMD ops. Build verified clean, source committed for the ppc64 branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.