Skip to content

Bug 1989335 - Copy permission list before iterating#24

Closed
boek wants to merge 1 commit into
mozilla-firefox:autolandfrom
boek:bug_1989335
Closed

Bug 1989335 - Copy permission list before iterating#24
boek wants to merge 1 commit into
mozilla-firefox:autolandfrom
boek:bug_1989335

Conversation

@boek
Copy link
Copy Markdown
Contributor

@boek boek commented Jan 14, 2026

No description provided.

@github-actions
Copy link
Copy Markdown
Contributor

View this pull request in Lando to land it once approved.

lando-worker Bot pushed a commit that referenced this pull request Jan 20, 2026
@lando-worker
Copy link
Copy Markdown

lando-worker Bot commented Jan 20, 2026

Pull request closed by commit 1f79df6

@lando-worker lando-worker Bot closed this Jan 20, 2026
akliuxingyuan pushed a commit to akliuxingyuan/android-components that referenced this pull request Mar 24, 2026
runlevel5 added a commit to runlevel5/firefox that referenced this pull request Apr 30, 2026
…ouble

Two related cleanup batches.

(1) Atomic-helpers — drop dead pre-store register-copy. Power ISA v3.0B
§3.3.16 specifies that stwcx./stdcx./stbcx./sthcx. read RS without
modifying it, so the four lwarx/stwcx-loop helpers in this file each
had a pointless `xs_mr scratch, value; stXcx scratch, ...` pair that
boils down to `stXcx value, ...`. Removing the move:

  - Saves 1 insn per atomic-loop iteration in CompareExchange (4-byte),
    CompareExchange64, AtomicExchange (4-byte), AtomicExchange64.
  - Eliminates the `scratch2` (or `scratch`) GPR acquire entirely in
    each, since these helpers only acquired it for that one move.

On heavily-contended atomics (Atomics.compareExchange, SharedArrayBuffer
fetch-add inner loops), retries can spin many times so per-iteration
cost compounds.

(2) Drop orphaned MacroAssemblerPPC64Compat::convertUInt64ToDouble
(single-Register, no-Register64). It dated from before commit
ddee1e2 collapsed the public 3-arg API to `mtvsrd + fcfidu`
(POWER7+); the Compat helper was never delegated to and now has zero
callers anywhere in tree. Same shape as the ExtractLow32x4Lanes
incident from 2026-04-30 — the kind of orphan that
`./mach build binaries -Werror=unused-function` doesn't catch but a
full firefox build does. Drops the broken signed-split branchy
fallback (12+ insns + 2 GPR scratches + branch) along with it.

Verified on real P9 (obj-sm-dbgopt):
- jit-test --jitflags=none: 13715 PASS / 0 FAIL (default)
- MOZ_PPC64_FORCE_POWER8=1 jit-test --jitflags=none: 13715 / 0
- jstests default: PASS
- MOZ_PPC64_FORCE_POWER8=1 jstests: PASS
- Sim MOZ_PPC64_FORCE_POWER9=1 jit-test: 13651 / 0
- Sim MOZ_PPC64_FORCE_POWER10=1 jit-test: 13651 / 0
- Sim MOZ_PPC64_FORCE_POWER8=1 jit-test: 13651 / 0

PLAN.md items mozilla-firefox#24-mozilla-firefox#28.
runlevel5 added a commit to runlevel5/firefox that referenced this pull request Apr 30, 2026
runlevel5 added a commit to runlevel5/firefox that referenced this pull request May 11, 2026
…identify V0 cross-call state

Three more probes:

  mozilla-firefox#22  probe mozilla-firefox#20 + signal-handler dump disabled: 4/10 = 40% (= mozilla-firefox#20 noise).
       Rules out the lc1 stderr dump as the corruption source.

  mozilla-firefox#23  probe mozilla-firefox#20 + constant-pool patcher (PatchConstantPoolLoad P8 fallback
       routed through V0 scratch): 4/10 = 40% (= mozilla-firefox#20 noise). Rules out the
       constant-pool load path as the residual source.

  mozilla-firefox#24  globalized probe mozilla-firefox#20 pattern: loadUnalignedSimd128 P8 fallback
       rewritten to lxvd2x V0; xxpermdi V0; stxvd2x V0; lvx dest. Result:
       10/10 with 16-23 OOBs per iter (200 total) -- DRASTICALLY WORSE.
       V0 (= ScratchSimd128Reg) is in active cross-call use by the wasm
       Ion caller; clobbering it inside loadUnalignedSimd128 breaks 1000s
       of valid runs.

Probe mozilla-firefox#24 forces the conclusion that the bug is upstream of any single
helper: the wasm Ion register allocator is using V0 as both a persistent
vperm-control register (for the matrix-loop shuffle constant) AND as a
re-usable scratch for downstream helpers. Probe mozilla-firefox#20 works at 30%
specifically because the truncSatFloat32x4ToInt32x4 callers happen to
not depend on V0's prior contents; generalizing the pattern to
loadUnalignedSimd128 doesn't survive that constraint.

Real fix paths (next session):
  (a) reserve a dedicated VR for the wasm-loop vperm-control instead of
      sharing V0 with helpers' ScratchSimd128Scope, or
  (b) audit every helper's V0 use and round-trip the prior contents.

Source state: probe mozilla-firefox#20 trunc_sat retained as the best-known partial
fix (~20-50% REPRO range over multiple runs); all other patches
reverted to clean baseline.
runlevel5 added a commit to runlevel5/firefox that referenced this pull request May 11, 2026
…to peers regresses

Probe mozilla-firefox#26: probe mozilla-firefox#25 + apply same lvx-final pattern to
unsignedTruncSatFloat32x4ToInt32x4 (single-op xvcvspuxws). Result on P9 ×
FORCE_POWER8 over 30 iters at 30 s: 11/30 = 36.7% REPRO -- significantly
WORSE than probe mozilla-firefox#25's 5/30 = 16.7%.

Probe mozilla-firefox#24 already showed that V0 (= ScratchSimd128Reg) carries cross-call
state. Adding a new helper that clobbers V0 via the lvx-final path breaks
callers that depend on V0's prior contents. Each helper has its own
caller-V0 contract; extending the pattern blindly does not compose.

Reverted unsignedTruncSat to single-op xvcvspuxws. Final state on the
ppc64 branch: probe mozilla-firefox#25 alone, 16.7% REPRO baseline.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants