Skip to content

Commit c538ced

Browse files
TimDettmersclaude
andcommitted
Guard all blocksize=64 quantize instantiations for warp size compat
The previous commit missed the float/NF4 and all bnb_bfloat16 blocksize=64 instantiations. These use BLOCK_LOAD_WARP_TRANSPOSE with 32 threads (64/2), which requires block_dim >= warp_size. On CDNA (warp=64), 32 threads is insufficient. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 4fae098 commit c538ced

File tree

1 file changed

+8
-0
lines changed

1 file changed

+8
-0
lines changed

csrc/kernels.cu

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2507,7 +2507,9 @@ MAKE_kQuantizeBlockwise(float, 1024, 4, 0, NF4)
25072507
MAKE_kQuantizeBlockwise(float, 512, 2, 0, NF4)
25082508
MAKE_kQuantizeBlockwise(float, 256, 2, 0, NF4)
25092509
MAKE_kQuantizeBlockwise(float, 128, 2, 0, NF4)
2510+
#if BNB_WARP_SIZE == 32
25102511
MAKE_kQuantizeBlockwise(float, 64, 2, 0, NF4)
2512+
#endif
25112513

25122514
MAKE_kQuantizeBlockwise(bnb_bfloat16, 4096, 4, 0, General8bit)
25132515
MAKE_kQuantizeBlockwise(bnb_bfloat16, 4096, 4, 1, General8bit)
@@ -2516,21 +2518,27 @@ MAKE_kQuantizeBlockwise(bnb_bfloat16, 1024, 4, 0, General8bit)
25162518
MAKE_kQuantizeBlockwise(bnb_bfloat16, 512, 2, 0, General8bit)
25172519
MAKE_kQuantizeBlockwise(bnb_bfloat16, 256, 2, 0, General8bit)
25182520
MAKE_kQuantizeBlockwise(bnb_bfloat16, 128, 2, 0, General8bit)
2521+
#if BNB_WARP_SIZE == 32
25192522
MAKE_kQuantizeBlockwise(bnb_bfloat16, 64, 2, 0, General8bit)
2523+
#endif
25202524
MAKE_kQuantizeBlockwise(bnb_bfloat16, 4096, 4, 0, FP4)
25212525
MAKE_kQuantizeBlockwise(bnb_bfloat16, 2048, 4, 0, FP4)
25222526
MAKE_kQuantizeBlockwise(bnb_bfloat16, 1024, 4, 0, FP4)
25232527
MAKE_kQuantizeBlockwise(bnb_bfloat16, 512, 2, 0, FP4)
25242528
MAKE_kQuantizeBlockwise(bnb_bfloat16, 256, 2, 0, FP4)
25252529
MAKE_kQuantizeBlockwise(bnb_bfloat16, 128, 2, 0, FP4)
2530+
#if BNB_WARP_SIZE == 32
25262531
MAKE_kQuantizeBlockwise(bnb_bfloat16, 64, 2, 0, FP4)
2532+
#endif
25272533
MAKE_kQuantizeBlockwise(bnb_bfloat16, 4096, 4, 0, NF4)
25282534
MAKE_kQuantizeBlockwise(bnb_bfloat16, 2048, 4, 0, NF4)
25292535
MAKE_kQuantizeBlockwise(bnb_bfloat16, 1024, 4, 0, NF4)
25302536
MAKE_kQuantizeBlockwise(bnb_bfloat16, 512, 2, 0, NF4)
25312537
MAKE_kQuantizeBlockwise(bnb_bfloat16, 256, 2, 0, NF4)
25322538
MAKE_kQuantizeBlockwise(bnb_bfloat16, 128, 2, 0, NF4)
2539+
#if BNB_WARP_SIZE == 32
25332540
MAKE_kQuantizeBlockwise(bnb_bfloat16, 64, 2, 0, NF4)
2541+
#endif
25342542

25352543
// Template instantiations for blocksize=32 specialized kernel (4-bit only)
25362544
#define MAKE_kQuantizeBlockwiseSmall(dtype, data_type_name) \

0 commit comments

Comments
 (0)