Commit 240d9af
feat: Add _raw wrappers for hand-written NVFP4 GEMM kernel
Exposes _gemm_nvfp4_hw_raw and _gemm_nvfp4_hw_splitk_raw for
CUDA-graph-safe benchmarking. These are zero-allocation wrappers
around cgemm_nvfp4 and cgemm_nvfp4_splitk that retrieve the
stream at call time (needed for graph capture).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>1 parent c8e31ed commit 240d9af
1 file changed
+60
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1046 | 1046 | | |
1047 | 1047 | | |
1048 | 1048 | | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
| 1064 | + | |
| 1065 | + | |
| 1066 | + | |
| 1067 | + | |
| 1068 | + | |
| 1069 | + | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
| 1078 | + | |
| 1079 | + | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
| 1087 | + | |
| 1088 | + | |
| 1089 | + | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
1049 | 1109 | | |
1050 | 1110 | | |
1051 | 1111 | | |
| |||
0 commit comments