perf: Add _raw functions for CUDA-graph-safe kernel calls #2379
| Job | Run time |
|---|---|
| 1m 39s | |
| 20s | |
| 14s | |
| 3m 28s | |
| 3m 23s | |
| 46s | |
| 6m 31s | |
| 14s | |
| 3m 13s | |
| 3m 28s | |
| 3m 19s | |
| 3m 31s | |
| 4m 52s | |
| 4m 39s | |
| 5m 5s | |
| 3m 15s | |
| 4m 14s | |
| 3m 31s | |
| 4m 38s | |
| 4m 41s | |
| 3m 21s | |
| 3m 33s | |
| 3m 47s | |
| 4m 21s | |
| 3m 18s | |
| 4m 43s | |
| 4m 27s | |
| 6m 23s | |
| 5m 55s | |
| 5m 34s | |
| 3m 21s | |
| 4m 51s | |
| 4m 8s | |
| 3m 55s | |
| 5m 30s | |
| 5m 41s | |
| 6m 12s | |
| 5m 57s | |
| 6m 23s | |
| 7m 13s | |
| 4m 16s | |
| 6m 1s | |
| 4m 56s | |
| 4m 23s | |
| 4m 8s | |
| 0s | |
| 0s | |
| 0s | |
| 0s | |
| 3h 7m 18s |