perf: soft revert of the performance regressions introduced in #1469#2885
Conversation
| { | ||
| // MUST BE A METHOD AND AGGRESSIVELY INLINED, OTHERWISE THE JIT WILL NOT ELIMINATE THE BRANCH | ||
| [MethodImpl(MethodImplOptions.AggressiveInlining)] | ||
| private static bool IsUnalignedSafe() => |
There was a problem hiding this comment.
What if it is a static readonly field? Theoretically, doesn't the JIT evaluate those only once and uses them like a const later on?
There was a problem hiding this comment.
You would think, but nope, here's disasm:
; Method C:CopyWithAlignmentFallback2(byref,byref,uint):float (FullOpts)
G_M000_IG01: ;; offset=0x0000
sub rsp, 40
G_M000_IG02: ;; offset=0x0004
test byte ptr [(reloc 0x7ffb0f9b41d8)], 1
je SHORT G_M000_IG08
G_M000_IG03: ;; offset=0x000D
cmp byte ptr [(reloc 0x7ffb0f6fb090)], 0
jne SHORT G_M000_IG06
G_M000_IG04: ;; offset=0x0016
vmovss xmm0, dword ptr [reloc @RWD00]
G_M000_IG05: ;; offset=0x001E
add rsp, 40
ret
G_M000_IG06: ;; offset=0x0023
vxorps xmm0, xmm0, xmm0
G_M000_IG07: ;; offset=0x0027
add rsp, 40
ret
G_M000_IG08: ;; offset=0x002C
mov rcx, 0x7FFB0F9B4168
call CORINFO_HELP_GET_NONGCSTATIC_BASE
jmp SHORT G_M000_IG03
RWD00 dd 3F800000h ; 1While this one is:
; Method C:CopyWithAlignmentFallback2(byref,byref,uint):float (FullOpts)
G_M000_IG01: ;; offset=0x0000
G_M000_IG02: ;; offset=0x0000
vxorps xmm0, xmm0, xmm0
G_M000_IG03: ;; offset=0x0004
ret
; Total bytes of code: 5Godbolt shows similar overhead, so this is not a config issue.
There was a problem hiding this comment.
Maybe one of those things that it does when doing PGO and tiering?
Anyway, not that important.
| fixed (void* src = &source, dst = &destination) | ||
| Buffer.MemoryCopy(src, dst, byteCount, byteCount); |
There was a problem hiding this comment.
Also, for cases where we are sure the memory is aligned (like those where we control the type and ensure proper layouts), you also have the option of Unsafe.CopyBlock
There was a problem hiding this comment.
I chose Buffer.MemoryCopy because it's more straightforward than creating two spans and using Span.CopyTo between them. But as you said, they map to the same method, so if span does not provide anything else compared to buffer I think that this is kind of bikeshedd-y ?
There was a problem hiding this comment.
While Unsafe.CopyBlock is an intrinsic, it is not optimized for the wide array of ranges one might have when making copies
There was a problem hiding this comment.
For these utility methods, it's enough to use Buffer.MemoryCopy, yes.
My point is at those call sites where we could be using a Span. Maybe we have more of those in the future, as more of the API is converted to use/accept spans.
PR Details
Calls to
Utilities.CopyMemorymapping toSystem.Buffer.MemoryCopywere swapped forUnsafe.CopyBlockUnalignedin #1469 without clarification.Those two methods are not equivalent,
CopyBlockUnalignedis safer but significantly slower compared toMemoryCopyas it does not have any fast path based on the allocation sizes.As for the safety of the unaligned version, under x86/64 and recent ARM, reading or writing data outside of their natural alignment do not cause any issues besides slowdown. I've ensured that other platforms, like ARM7 and connected devices' memory, keep using the unaligned version.
The vast majority of our codebase ensures that fields and allocations are aligned, so using operations that expect to operate on aligned memory provides a significant performance improvement.
Scenes with a significant amount of draw calls rely on memcopy to fill in graphics commands and shader parameters.
In one of the heavier sections I have at my disposal, frames can take up to 23 millisecond to prepare on the CPU side, introducing this change cuts it down to 14 millisecond.
Here's the Profiler sessions.zip
More information in
dotnet/runtime#1650 (comment)
dotnet/runtime#80068 (comment)
Related Issue
N/A
Types of changes
Checklist