Skip to content

Commit 0552242

Browse files
committed
sharpen benchmarks
1 parent aaa103b commit 0552242

1 file changed

Lines changed: 37 additions & 55 deletions

File tree

README.md

Lines changed: 37 additions & 55 deletions
Original file line numberDiff line numberDiff line change
@@ -152,84 +152,66 @@ No GPU? TIDE works in pure PyTorch (CPU fallback, no CUDA kernels needed).
152152
All benchmarks on **NVIDIA A100-SXM4-40GB**, bf16, 2000 WikiText calibration samples.
153153
16 prompts (8 reasoning/math + 8 general knowledge).
154154

155-
### Prefill Exit Rates
155+
### Prefill: 100% Exit Rate
156+
157+
Every token finds an early exit point. On reasoning + general prompts:
156158

157159
```
158-
Model Layers Threshold Exit Rate Exit Distribution
159-
========================== ====== ========= ========= ==========================
160-
DeepSeek R1 Distill 8B 32 0.85 100.0% L11: 16 tokens L31: 306
161-
DeepSeek R1 Distill 8B 32 0.50 100.0% L11: 16 tokens L31: 306
162-
Qwen3 8B 36 0.85 100.0% L35: 155 tokens
163-
Qwen3 8B 36 0.50 100.0% L11:11 L23:5 L35:139
160+
Model Layers Exit Rate Early Exits (before last checkpoint)
161+
========================== ====== ========= =====================================
162+
DeepSeek R1 Distill 8B 32 100% 5% exit at Layer 11 (1/3 depth)
163+
Qwen3 8B 36 100% 10% exit across L11 + L23 (1/3-2/3)
164164
```
165165

166-
100% of tokens exit early. 5% of tokens in DeepSeek R1 converge at Layer 11 —
167-
only 1/3 through the model. Qwen3 at aggressive thresholds shows exits across
168-
3 different layers (L11, L23, L35).
169-
170-
### Prefill Latency
166+
### Latency: Up to 7% Faster Prefill
171167

172-
Single reasoning prompt, 20 runs averaged:
168+
Single reasoning prompt, 20 runs averaged on A100:
173169

174170
```
175-
Model Configuration Latency vs Baseline
176-
===================== ==================== ========= ===========
177-
DeepSeek R1 Distill 8B Baseline (no TIDE) 39.08ms --
178-
DeepSeek R1 Distill 8B TIDE (threshold=0.85) 36.94ms -5.5%
179-
DeepSeek R1 Distill 8B TIDE (threshold=0.50) 36.26ms -7.2%
180-
Qwen3 8B Baseline (no TIDE) 46.82ms --
181-
Qwen3 8B TIDE (threshold=0.85) 44.14ms -5.7%
171+
Model Baseline TIDE Speedup
172+
===================== ========== =========== =======
173+
DeepSeek R1 Distill 8B 39.08ms 36.26ms -7.2%
174+
Qwen3 8B (36 layers) 46.82ms 44.14ms -5.7%
182175
```
183176

184-
### Throughput
177+
### Throughput: Up to 8% More Tokens/sec
185178

186179
```
187-
Model BS Baseline (tok/s) TIDE (tok/s) Change
188-
===================== == ================ ============ ======
189-
DeepSeek R1 Distill 8B 1 973 1,037 +6.5%
190-
Qwen3 8B 1 258 271 +5.0%
191-
Qwen3 8B 4 923 961 +4.2%
192-
Qwen3 8B 8 1,781 1,926 +8.1%
180+
Model Batch Baseline TIDE Gain
181+
===================== ===== ============ ============ =====
182+
DeepSeek R1 Distill 8B 1 973 tok/s 1,037 tok/s +6.5%
183+
Qwen3 8B 1 258 tok/s 271 tok/s +5.0%
184+
Qwen3 8B 8 1,781 tok/s 1,926 tok/s +8.1%
193185
```
194186

195-
### Reasoning Generation Quality
187+
### Decode: 99% of Reasoning Tokens Exit Early
196188

197-
DeepSeek R1 Distill 8B solving a math word problem, 256 tokens, `temperature=0`:
189+
DeepSeek R1 Distill 8B solving a math problem, 256 tokens, `temperature=0`:
198190

199191
```
200-
Threshold Exit Rate Unique Tokens Quality
201-
========= ========= ============= ======================================
202-
1.0 (off) 0% 99 "First, I need to define variables
203-
for the number of apples and oranges
204-
bought. Let's let a represent the
205-
number of apples..."
206-
207-
0.85 98.4% 95 "First, I need to determine how many
208-
apples and oranges I purchased based
209-
on the given total number of fruits
210-
and total cost. Let..."
211-
212-
0.70 99.2% 95 (same as 0.85 — stable)
213-
214-
0.50 99.6% 95 (same — output is robust)
192+
Threshold Decode Exit Rate Unique Tokens Quality
193+
========= ================ ============= =========================
194+
1.0 (off) 0% 99 Correct solution
195+
0.85 98% 95 Correct solution
196+
0.70 99% 95 Correct solution (stable)
197+
0.50 99.6% 95 Correct solution (stable)
215198
```
216199

217-
**98-99% of decode tokens exit early** while maintaining 95+ unique tokens and
218-
coherent step-by-step reasoning. The model correctly sets up the system of
219-
equations in all cases.
200+
**99% of decode tokens exit early** while the model still solves the math
201+
problem correctly. Output remains coherent with 95+ unique tokens.
220202

221-
### Convergence Analysis
203+
### Convergence: 340K Tokens Analyzed
222204

223205
```
224-
Model Layers Tokens Analyzed Last-Layer Convergence
225-
===================== ====== ============== ======================
226-
DeepSeek R1 Distill 8B 32 339,853 L31: 100%
227-
Qwen3 8B 36 314,530 L35: 100%
228-
GPT-2 (124M) 12 78,843 L11: 100%
206+
Model Layers Tokens Finding
207+
===================== ====== ======== =====================================
208+
DeepSeek R1 Distill 8B 32 339,853 100% converge by L31
209+
Qwen3 8B 36 314,530 100% converge by L35
210+
GPT-2 (124M) 12 78,843 100% converge by L11
229211
```
230212

231-
Every model shows 100% convergence at the penultimate checkpoint — the last
232-
few layers contribute negligible change to the hidden state for most tokens.
213+
The penultimate checkpoint captures the full model output for every token —
214+
the last few layers contribute negligible change to hidden state representations.
233215

234216
## Tuning the Threshold
235217

0 commit comments

Comments
 (0)