Skip to content

Commit 4fae098

Browse files
TimDettmersclaude
andcommitted
Guard blocksize=64 quantize instantiations for warp size compatibility
On AMD CDNA GPUs (warp size 64), blocksize=64 would mean only 1 thread per warp in the quantize kernels, which is incompatible. Wrap these instantiations with #if BNB_WARP_SIZE == 32 so they only compile on NVIDIA. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 214943c commit 4fae098

File tree

1 file changed

+10
-0
lines changed

1 file changed

+10
-0
lines changed

csrc/kernels.cu

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2461,36 +2461,46 @@ MAKE_kQuantizeBlockwise(half, 1024, 4, 0, General8bit)
24612461
MAKE_kQuantizeBlockwise(half, 512, 2, 0, General8bit)
24622462
MAKE_kQuantizeBlockwise(half, 256, 2, 0, General8bit)
24632463
MAKE_kQuantizeBlockwise(half, 128, 2, 0, General8bit)
2464+
#if BNB_WARP_SIZE == 32
24642465
MAKE_kQuantizeBlockwise(half, 64, 2, 0, General8bit)
2466+
#endif
24652467
MAKE_kQuantizeBlockwise(half, 4096, 4, 0, FP4)
24662468
MAKE_kQuantizeBlockwise(half, 2048, 4, 0, FP4)
24672469
MAKE_kQuantizeBlockwise(half, 1024, 4, 0, FP4)
24682470
MAKE_kQuantizeBlockwise(half, 512, 2, 0, FP4)
24692471
MAKE_kQuantizeBlockwise(half, 256, 2, 0, FP4)
24702472
MAKE_kQuantizeBlockwise(half, 128, 2, 0, FP4)
2473+
#if BNB_WARP_SIZE == 32
24712474
MAKE_kQuantizeBlockwise(half, 64, 2, 0, FP4)
2475+
#endif
24722476
MAKE_kQuantizeBlockwise(half, 4096, 4, 0, NF4)
24732477
MAKE_kQuantizeBlockwise(half, 2048, 4, 0, NF4)
24742478
MAKE_kQuantizeBlockwise(half, 1024, 4, 0, NF4)
24752479
MAKE_kQuantizeBlockwise(half, 512, 2, 0, NF4)
24762480
MAKE_kQuantizeBlockwise(half, 256, 2, 0, NF4)
24772481
MAKE_kQuantizeBlockwise(half, 128, 2, 0, NF4)
2482+
#if BNB_WARP_SIZE == 32
24782483
MAKE_kQuantizeBlockwise(half, 64, 2, 0, NF4)
2484+
#endif
24792485
MAKE_kQuantizeBlockwise(float, 4096, 4, 0, General8bit)
24802486
MAKE_kQuantizeBlockwise(float, 4096, 4, 1, General8bit)
24812487
MAKE_kQuantizeBlockwise(float, 2048, 4, 0, General8bit)
24822488
MAKE_kQuantizeBlockwise(float, 1024, 4, 0, General8bit)
24832489
MAKE_kQuantizeBlockwise(float, 512, 2, 0, General8bit)
24842490
MAKE_kQuantizeBlockwise(float, 256, 2, 0, General8bit)
24852491
MAKE_kQuantizeBlockwise(float, 128, 2, 0, General8bit)
2492+
#if BNB_WARP_SIZE == 32
24862493
MAKE_kQuantizeBlockwise(float, 64, 2, 0, General8bit)
2494+
#endif
24872495
MAKE_kQuantizeBlockwise(float, 4096, 4, 0, FP4)
24882496
MAKE_kQuantizeBlockwise(float, 2048, 4, 0, FP4)
24892497
MAKE_kQuantizeBlockwise(float, 1024, 4, 0, FP4)
24902498
MAKE_kQuantizeBlockwise(float, 512, 2, 0, FP4)
24912499
MAKE_kQuantizeBlockwise(float, 256, 2, 0, FP4)
24922500
MAKE_kQuantizeBlockwise(float, 128, 2, 0, FP4)
2501+
#if BNB_WARP_SIZE == 32
24932502
MAKE_kQuantizeBlockwise(float, 64, 2, 0, FP4)
2503+
#endif
24942504
MAKE_kQuantizeBlockwise(float, 4096, 4, 0, NF4)
24952505
MAKE_kQuantizeBlockwise(float, 2048, 4, 0, NF4)
24962506
MAKE_kQuantizeBlockwise(float, 1024, 4, 0, NF4)

0 commit comments

Comments
 (0)