You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Use conditional load/store algo for warp size compatibility
BLOCK_LOAD_WARP_TRANSPOSE requires threads >= warp_size. On CDNA
(warp=64), kQuantizeBlockwise with BLOCK_SIZE=64 has only 32
threads. Fall back to BLOCK_LOAD_DIRECT / BLOCK_STORE_DIRECT
when threads < BNB_WARP_SIZE. This avoids rocprim compilation
errors while keeping WARP_TRANSPOSE for larger block sizes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments