Commit 209709d
bench: expand kk_validation_experiments to exp1_shuffler + exp4_tp
Production-mode (no-nsys) A/B on exp2_cutlass showed ~0 ms/iter delta
because the fast cutlass forward (~32ms) fully overlaps the 1.5ms KK
GIL bubble. Adding two configs where KK is a larger fraction of the
critical path so the C++ KK saving is visible end-to-end:
- exp1_shuffler: default triton kernel — longer, less-aggressive forward
- exp4_tp: --tp_size 2 halves per-rank forward, raising KK fraction
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent de2e302 commit 209709d
1 file changed
Lines changed: 5 additions & 10 deletions
Lines changed: 5 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | | - | |
| 1 | + | |
2 | 2 | | |
3 | | - | |
4 | | - | |
5 | | - | |
6 | | - | |
7 | | - | |
8 | | - | |
9 | | - | |
10 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
11 | 6 | | |
12 | | - | |
| 7 | + | |
0 commit comments