You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Guard all blocksize=64 quantize instantiations for warp size compat
The previous commit missed the float/NF4 and all bnb_bfloat16
blocksize=64 instantiations. These use BLOCK_LOAD_WARP_TRANSPOSE
with 32 threads (64/2), which requires block_dim >= warp_size.
On CDNA (warp=64), 32 threads is insufficient.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments