Skip to content

ARM64EC: Optimize GPR and MM state setting#5410

Open
Sonicadvance1 wants to merge 1 commit intoFEX-Emu:mainfrom
Sonicadvance1:131
Open

ARM64EC: Optimize GPR and MM state setting#5410
Sonicadvance1 wants to merge 1 commit intoFEX-Emu:mainfrom
Sonicadvance1:131

Conversation

@Sonicadvance1
Copy link
Copy Markdown
Member

I think the code improvement speaks for itself here.

Before:
Ew, Brother, Ew! What's That?

After:
That's the good stuff!

I think the code improvement speaks for itself here.
@bylaws
Copy link
Copy Markdown
Collaborator

bylaws commented Apr 3, 2026

Does this present a measurable improvement in any particular benchmark? This code path is super cold and only happens on a signal, and at that point this code doesn't remotely dominate. I don't think it makes sense to complicate this with inline asm

@Sonicadvance1
Copy link
Copy Markdown
Member Author

Does this present a measurable improvement in any particular benchmark? This code path is super cold and only happens on a signal, and at that point this code doesn't remotely dominate. I don't think it makes sense to complicate this with inline asm

Showed up enough when running games that I noticed it anyway. If llvm-mingw was smart enough it would optimize itself, but it just...doesn't.

@bylaws
Copy link
Copy Markdown
Collaborator

bylaws commented Apr 3, 2026

Does this present a measurable improvement in any particular benchmark? This code path is super cold and only happens on a signal, and at that point this code doesn't remotely dominate. I don't think it makes sense to complicate this with inline asm

Showed up enough when running games that I noticed it anyway. If llvm-mingw was smart enough it would optimize itself, but it just...doesn't.

Right, but any numbers to back it up? I very much don't see how this isn't negligible vs signal overhead.

@Sonicadvance1
Copy link
Copy Markdown
Member Author

Does this present a measurable improvement in any particular benchmark? This code path is super cold and only happens on a signal, and at that point this code doesn't remotely dominate. I don't think it makes sense to complicate this with inline asm

Showed up enough when running games that I noticed it anyway. If llvm-mingw was smart enough it would optimize itself, but it just...doesn't.

Right, but any numbers to back it up? I very much don't see how this isn't negligible vs signal overhead.

I'll need to double check. The codegen was bad enough that I didn't even check again.

@Sonicadvance1
Copy link
Copy Markdown
Member Author

Does this present a measurable improvement in any particular benchmark? This code path is super cold and only happens on a signal, and at that point this code doesn't remotely dominate. I don't think it makes sense to complicate this with inline asm

Showed up enough when running games that I noticed it anyway. If llvm-mingw was smart enough it would optimize itself, but it just...doesn't.

Right, but any numbers to back it up? I very much don't see how this isn't negligible vs signal overhead.

I'll need to double check. The codegen was bad enough that I didn't even check again.

So Elden Ring was the game I saw this on, it's spamming thread SIGUSR1 constantly or something which triggers this code path. I must have got the game in to a weird state where it was happening even more frequently because the single digit CPU usage percentages weren't showing back up.

But I did measure in the "regular" state and saw a 25%-33% measurable performance uplift on the code, which of course didn't correlate to an FPS increase in this particular case. So still worth it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants