Add CUDA graph capture/replay for qwen 3.5 moe decode method #226
| Job | Run time |
|---|---|
| 37m 32s | |
| 12m 9s | |
| 36m 5s | |
| 8m 46s | |
| 33m 18s | |
| 33m 48s | |
| 9m 10s | |
| 12m 59s | |
| 10m 30s | |
| 10m 33s | |
| 9m 52s | |
| 10m 33s | |
| 10m 54s | |
| 10m 39s | |
| 9m 45s | |
| 10m 19s | |
| 10m 26s | |
| 10m 9s | |
| 9m 53s | |
| 10m 29s | |
| 11m 5s | |
| 5h 18m 54s |
| Job | Run time |
|---|---|
| 37m 32s | |
| 12m 9s | |
| 36m 5s | |
| 8m 46s | |
| 33m 18s | |
| 33m 48s | |
| 9m 10s | |
| 12m 59s | |
| 10m 30s | |
| 10m 33s | |
| 9m 52s | |
| 10m 33s | |
| 10m 54s | |
| 10m 39s | |
| 9m 45s | |
| 10m 19s | |
| 10m 26s | |
| 10m 9s | |
| 9m 53s | |
| 10m 29s | |
| 11m 5s | |
| 5h 18m 54s |