Skip to content

[bench] wvSplitK: per-kernel timing inside captured CUDA graph

c52ff9e
Select commit
Loading
Failed to load commit list.
Draft

[bench] wvSplitK skinny GEMM: capture timed iters into a CUDA graph #928

[bench] wvSplitK: per-kernel timing inside captured CUDA graph
c52ff9e
Select commit
Loading
Failed to load commit list.