Skip to content

Commit 50cef42

Browse files
TimDettmersclaude
andcommitted
Fix BNB_WARP_SIZE detection for HIP host compilation pass
__GFX9__ is only defined during the device compilation pass, not during host compilation. This caused BNB_WARP_SIZE to be 32 on the host pass even for gfx942 (CDNA, warp=64), making the conditional WARP_TRANSPOSE vs DIRECT selection wrong. Use __AMDGCN_WAVEFRONT_SIZE instead, which the HIP compiler defines correctly in both host and device passes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 0b33411 commit 50cef42

File tree

1 file changed

+5
-4
lines changed

1 file changed

+5
-4
lines changed

csrc/common.cuh

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,12 @@
1212
// ============================================================================
1313

1414
#if BNB_HIP
15-
// AMD GFX9 (CDNA) uses 64-wide warps; RDNA uses 32-wide
16-
#ifdef __GFX9__
17-
#define BNB_WARP_SIZE 64
15+
// Use the compiler-provided wavefront size, which is correctly defined in
16+
// both host and device compilation passes. CDNA (gfx9xx) = 64, RDNA = 32.
17+
#ifdef __AMDGCN_WAVEFRONT_SIZE
18+
#define BNB_WARP_SIZE __AMDGCN_WAVEFRONT_SIZE
1819
#else
19-
#define BNB_WARP_SIZE 32
20+
#define BNB_WARP_SIZE 64 // Safe default for HIP (matches CDNA)
2021
#endif
2122
#else
2223
#define BNB_WARP_SIZE 32

0 commit comments

Comments
 (0)