Commit 7dcb79f
Add comprehensive CUDA graph benchmark: NVFP4 vs BF16
Five benchmark modes: BF16 cuBLAS (events + graph), NVFP4 eager pipeline,
NVFP4 GEMM-only graph, and full NVFP4 pipeline graph. Tests GLM-4.7
gate_up and down projections across token counts (8-512) and skewed
distributions.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent 9f0c6c3 commit 7dcb79f
1 file changed
+412
-146
lines changed
0 commit comments