Add fused GatedDeltaNet decode Triton kernel #248
| Job | Run time |
|---|---|
| 10m 48s | |
| 31m 25s | |
| 32m 29s | |
| 12m 12s | |
| 9m 12s | |
| 31m 22s | |
| 28m 30s | |
| 10m 45s | |
| 10m 17s | |
| 10m 51s | |
| 11m 33s | |
| 10m 18s | |
| 11m 0s | |
| 10m 10s | |
| 11m 2s | |
| 11m 29s | |
| 10m 40s | |
| 10m 57s | |
| 10m 25s | |
| 13m 18s | |
| 10m 55s | |
| 5h 9m 38s |
| Job | Run time |
|---|---|
| 10m 48s | |
| 31m 25s | |
| 32m 29s | |
| 12m 12s | |
| 9m 12s | |
| 31m 22s | |
| 28m 30s | |
| 10m 45s | |
| 10m 17s | |
| 10m 51s | |
| 11m 33s | |
| 10m 18s | |
| 11m 0s | |
| 10m 10s | |
| 11m 2s | |
| 11m 29s | |
| 10m 40s | |
| 10m 57s | |
| 10m 25s | |
| 13m 18s | |
| 10m 55s | |
| 5h 9m 38s |