You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add M-based dispatch for NVFP4 GEMM (hand-written for M<64)
Dispatches gemm_nvfp4 between:
- M < 64: hand-written kernel (mma.sync, auto split-K, BF16 output)
- M >= 64: CUTLASS SM_120 GEMM (BF16 output)
The hand-written kernel uses flat row-major scales and doesn't fold
tensor scales into the epilogue, so they're applied after the GEMM.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments