You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: Add LinearNVFP4 module for Blackwell GPU inference
Implements LinearNVFP4(nn.Linear) that quantizes weights to NVFP4
on first forward pass and uses the block-scaled MMA for inference.
Features:
- Lazy weight quantization (on first forward)
- Optional Hadamard rotation (rotate=True)
- Activation quantization in the forward pass
- NVFP4 GEMM via hardware MMA instruction
- Automatic input reshape for batched inputs
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
0 commit comments