Skip to content

PPU/SPU LLVM: Use native ARM shuffles in recompilers instead of emulating x86 pshufb#18056

Open
Whatcookie wants to merge 2 commits into
RPCS3:masterfrom
Whatcookie:TBLTBX
Open

PPU/SPU LLVM: Use native ARM shuffles in recompilers instead of emulating x86 pshufb#18056
Whatcookie wants to merge 2 commits into
RPCS3:masterfrom
Whatcookie:TBLTBX

Conversation

@Whatcookie
Copy link
Copy Markdown
Member

Finally properly emulates the PS3's most iconic instruction (according to me) efficiently on ARM machines too!
Brings SHUFB from 9 instructions down to 5, though it should be 4 if LLVM would just emit BCAX...

Should result in some nice speedup for arm machines. In another pull request I will tackle the ROTQBY family of instructions.

@Whatcookie
Copy link
Copy Markdown
Member Author

Crashes in some games with a message along the lines of: LLVM Emergency Exit Invoked: 'Error while trying to spill X8 from class GPR64: Cannot scavenge register without an emergency spill slot!'

Seems to be an LLVM bug. Will check with newer LLVM versions, and if it's not fixed, try to open an issue with reproducible code on the LLVM repo.

@Whatcookie Whatcookie marked this pull request as draft January 16, 2026 03:29
Comment thread rpcs3/Emu/Cell/PPUTranslator.cpp Outdated
@rcaridade145
Copy link
Copy Markdown
Contributor

The only issues i've seen online regarding this were fixed by adding AliasAnalysis

llvm/llvm-project#64277

@Whatcookie
Copy link
Copy Markdown
Member Author

So I couldn't find a way to prevent LLVM from failing to compile some SPU programs when TBL2/TBX2 is used. But I couldn't really accept leaving the performance on the table.

So instead, we try to compile the program with TBL2/TBX2, then if that crashes, we try again with plain TBL/TBX instead. It works, but is somewhat insane of a solution I guess.

I only tested on ARM linux, so would appreciate testing on macos ARM, and on x86 so we can be sure I didn't fuck something up.

image image

@Whatcookie Whatcookie marked this pull request as ready for review May 11, 2026 04:27
@shinra-electric
Copy link
Copy Markdown
Contributor

shinra-electric commented May 11, 2026

Segfaults when building SPU cache on macOS Arm.

F {LLVM JIT} SIG: Thread terminated due to fatal error: Segfault writing location 00000003000195c4 at 000000018427d40c.
Emu Thread Name: 'LLVM JIT'.

RPCS3 2.log.zip

Edit: Also hangs with the "Thread too sleepy" error in Puppeteer, though this is not related to this PR

E SIG: Thread [PPU[0x1000000] main_thread] is too sleepy. Waiting for it 290786.875us already!

…6 pshufb

> - SHUFB from 9 instructions down to 5
> - Though it should be 4 if LLVM would just emit BCAX...
- Some SPU programs inexplicably fail to compile when TBL2/TBX2 are used.
- As an insane workaround, first try to compile with TBL2/TBX2, if LLVM crashes while compiling, try to compile the same program without TBL2/TBX2.
@Whatcookie
Copy link
Copy Markdown
Member Author

Segfaults when building SPU cache on macOS Arm.

F {LLVM JIT} SIG: Thread terminated due to fatal error: Segfault writing location 00000003000195c4 at 000000018427d40c.
Emu Thread Name: 'LLVM JIT'.

RPCS3 2.log.zip

Edit: Also hangs with the "Thread too sleepy" error in Puppeteer, though this is not related to this PR

E SIG: Thread [PPU[0x1000000] main_thread] is too sleepy. Waiting for it 290786.875us already!

Thanks for testing, I pushed a new build, could you test it too?

@shinra-electric
Copy link
Copy Markdown
Contributor

Thanks for testing, I pushed a new build, could you test it too?

The new build fixes the segfaults. Gets in-game after a couple of tries, just like the main branch.

RPCS3.log.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CPU: Arm64 Optimization Optimizes existing code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants