Commit 7f55f7c
bench: Add LinearNVFP4 end-to-end benchmarks
LinearNVFP4 vs FP16 nn.Linear on RTX PRO 6000:
bs=1-128, hidden=4096, shapes include FFN (11008).
~10x slower than cuBLAS FP16, 3.6x memory savings.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 1e2dc09 commit 7f55f7c
1 file changed
+15
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
72 | 72 | | |
73 | 73 | | |
74 | 74 | | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
0 commit comments