Commit f047447
bench(moe): R37 speed A/B — TEMP=2.0 auto-default is free
Warm-cache 3-run A/B on Qwen3.6-35B IQ4_XS (-n 50 -T 0):
TEMP=2.0 (auto): 3.5 t/s
TEMP=1.0 (forced off): 3.0 t/s
TEMP=2.0 (repeat): 3.1 t/s
Within noise. Confirms f0e51ab auto-default flip has zero measurable
speed cost — the softmax temp divide is one float op per active expert
per layer per token. Pure quality win for the 117-tok cliff, no cost.
Option C of the R36 follow-up plan complete.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 3461509 commit f047447
1 file changed
Lines changed: 14 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
73 | 73 | | |
74 | 74 | | |
75 | 75 | | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
76 | 90 | | |
77 | 91 | | |
78 | 92 | | |
| |||
0 commit comments