Skip to content

Commit 7dcb79f

Browse files
TimDettmersclaude
andcommitted
Add comprehensive CUDA graph benchmark: NVFP4 vs BF16
Five benchmark modes: BF16 cuBLAS (events + graph), NVFP4 eager pipeline, NVFP4 GEMM-only graph, and full NVFP4 pipeline graph. Tests GLM-4.7 gate_up and down projections across token counts (8-512) and skewed distributions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9f0c6c3 commit 7dcb79f

File tree

1 file changed

+412
-146
lines changed

1 file changed

+412
-146
lines changed

0 commit comments

Comments
 (0)