Skip to content

Commit 2d4ff1c

Browse files
committed
PPC64: clampDoubleToUint8 — P9 branchless via xsmaxjdp + isel
The previous clampDoubleToUint8 had two mispredictable branches (≤0/NaN and ≥255), each with a small body that jumped to a shared exit. Hot on Uint8ClampedArray store paths. POWER9 added xsmaxjdp/xsminjdp which use Java/JS semantics (ISA v3.0B §7.6.1.7): any NaN is treated as "less than any number that is not a NaN". So xsmaxjdp(input, 0) collapses {NaN, -Inf, ≤ 0} all to 0 in a single instruction — the entire "≤ 0 or NaN → 0" branch dance disappears. After the max, fctid (round-to-nearest-even per FPSCR default — matches ECMA Uint8ClampedArray's round-half-to-even) saturates out-of-int64 values to INT64_MAX. The remaining upper clamp (output > 255 → 255) is one cmpdi + isel. POWER9 path (7 insns, no branches): zeroDouble fpscratch xsmaxjdp fpscratch, input, fpscratch ; max(input, 0); NaN→0 fctid fpscratch, fpscratch mfvsrd output, fpscratch li max255, 255 cmpdi output, 255 isel output, max255, output, GreaterThan POWER8 path: unchanged (xsmaxjdp unavailable; fctid maps NaN to INT64_MAX which would clamp to 255 instead of the spec-required 0, so we keep the explicit NaN-filtering branches). Verified end-to-end: - Real P9 jit-test --jitflags=none: 13715 PASS / 0 FAIL (default) - Real P9 jit-test --jitflags=none MOZ_PPC64_FORCE_POWER8=1: 13715 / 0 - Real P9 jstests default: PASS - Real P9 jstests MOZ_PPC64_FORCE_POWER8=1: PASS - Sim MOZ_PPC64_FORCE_POWER9=1 jit-test: 13651 / 0 - Sim MOZ_PPC64_FORCE_POWER10=1 jit-test: 13651 / 0 - Sim MOZ_PPC64_FORCE_POWER8=1 jit-test: 13651 / 0 PLAN.md item mozilla-firefox#23 (POWER9 fast path; POWER8 fallback unchanged).
1 parent 88c5aeb commit 2d4ff1c

1 file changed

Lines changed: 26 additions & 7 deletions

File tree

js/src/jit/ppc64/MacroAssembler-ppc64.cpp

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -388,10 +388,34 @@ void MacroAssemblerPPC64Compat::handleFailureWithHandlerTail(
388388
}
389389

390390
void MacroAssembler::clampDoubleToUint8(FloatRegister input, Register output) {
391-
Label positive, below255, done;
392391
ScratchDoubleScope fpscratch(asMasm());
393392

394-
// <= 0 or NaN --> 0
393+
if (HasPOWER9()) {
394+
// P9 xsmaxjdp uses Java/JS semantics (ISA v3.0B §7.6.1.7): any NaN
395+
// is treated as "less than any number that is not a NaN", so
396+
// xsmaxjdp(input, 0) collapses {NaN, -Inf, ≤ 0} to 0 in one insn —
397+
// the "≤ 0 or NaN → 0" branch dance disappears.
398+
//
399+
// After the max, fctid (round-to-nearest-even per FPSCR default,
400+
// matches ECMA Uint8ClampedArray's round-half-to-even) saturates
401+
// out-of-int64 values to INT64_MAX. Remaining upper clamp
402+
// (output > 255 → 255) is one cmpdi + isel.
403+
zeroDouble(fpscratch);
404+
as_xsmaxjdp(fpscratch, input, fpscratch);
405+
as_fctid(fpscratch, fpscratch);
406+
as_mfvsrd(output, fpscratch);
407+
UseScratchRegisterScope temps(asMasm());
408+
Register max255 = temps.Acquire();
409+
xs_li(max255, 255);
410+
as_cmpdi(output, 255);
411+
as_isel(output, max255, output, GreaterThan);
412+
return;
413+
}
414+
415+
// POWER8 fallback: xsmaxjdp is unavailable, so filter NaN explicitly
416+
// before fctid. Per Power ISA, fctid maps NaN to INT64_MAX, which
417+
// would clamp to 255 instead of the spec-required 0.
418+
Label positive, below255, done;
395419
zeroDouble(fpscratch);
396420
branchDouble(DoubleGreaterThan, input, fpscratch, &positive);
397421
{
@@ -401,7 +425,6 @@ void MacroAssembler::clampDoubleToUint8(FloatRegister input, Register output) {
401425

402426
bind(&positive);
403427

404-
// >= 255 --> 255
405428
loadConstantDouble(255.0, fpscratch);
406429
branchDouble(DoubleLessThan, input, fpscratch, &below255);
407430
{
@@ -411,12 +434,8 @@ void MacroAssembler::clampDoubleToUint8(FloatRegister input, Register output) {
411434

412435
bind(&below255);
413436

414-
// Round to nearest even and convert to int64.
415-
// PPC64's frin rounds ties away from zero, not to even. Use fctid
416-
// which rounds according to FPSCR (default: round-to-nearest-even).
417437
as_fctid(fpscratch, input);
418438
as_mfvsrd(output, fpscratch);
419-
420439
bind(&done);
421440
}
422441

0 commit comments

Comments
 (0)