Skip to content

Commit f047447

Browse files
unamedkrclaude
andcommitted
bench(moe): R37 speed A/B — TEMP=2.0 auto-default is free
Warm-cache 3-run A/B on Qwen3.6-35B IQ4_XS (-n 50 -T 0): TEMP=2.0 (auto): 3.5 t/s TEMP=1.0 (forced off): 3.0 t/s TEMP=2.0 (repeat): 3.1 t/s Within noise. Confirms f0e51ab auto-default flip has zero measurable speed cost — the softmax temp divide is one float op per active expert per layer per token. Pure quality win for the 117-tok cliff, no cost. Option C of the R36 follow-up plan complete. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3461509 commit f047447

1 file changed

Lines changed: 14 additions & 0 deletions

File tree

bench/results/2026-04-22_moe_temp_cliff_break.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,20 @@ drift is a MoE-specific pathology, not DeltaNet's fault.
7373
So: T=2.0 **breaks the hard 117-tok cliff** and recovers ~50 additional
7474
coherent tokens. Full essay-length generation still needs more work.
7575

76+
## Speed (R37 A/B, warm cache, auto-serial -j 1)
77+
78+
Qwen3.6-35B IQ4_XS, -n 50 -T 0, warm cache 3-run:
79+
80+
| Config | decode t/s |
81+
|---|---:|
82+
| TEMP=2.0 (auto-default) | 3.5 |
83+
| TEMP=1.0 (`TQ_NO_MOE_TEMP_AUTO=1`) | 3.0 |
84+
| TEMP=2.0 (repeat) | 3.1 |
85+
86+
Within measurement noise (3.0-3.5 range). The softmax temperature divide
87+
adds one float division per active expert per layer per token = trivial.
88+
**Auto-default flip has zero speed cost.**
89+
7690
## Safety
7791

7892
- `"The capital of France is"``"Paris."` (correct) at T=2.0

0 commit comments

Comments
 (0)