Skip to content

Commit c71f053

Browse files
K3 Stage 2 evidence (run3): faithful forward + kernel-correct query layout
After porting the real qwen3_dflash.py forward (single non-causal pass, context-KV from fc(aux)+hidden_norm+per-layer kv) and fixing the draft query layout per copy_and_expand_dflash_inputs_kernel (bonus at C, drafts from mask positions C+1..), acceptance moved 0.000 -> 0.003 (a few drafts match) with lossless_vs_ar=True across all prompts. Still far from the ~0.447/7.7 reference. Remaining gap = the exact aux hidden-state tap semantics (which output_hidden_states indices map to the trained aux layers + EAGLE-3 collection/ordering) and shared embed/lm_head details, which live in vLLM's base proposer aux-collection path (eagle.py / llm_base_proposer.py, not fully available) — needs that source or a numerical reference to close. Structural port + weights + plumbing are validated (strict load, lossless AR). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
1 parent 736584b commit c71f053

2 files changed

Lines changed: 249 additions & 0 deletions

File tree

Lines changed: 241 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,241 @@
1+
{
2+
"schema_version": 1,
3+
"kind": "k3_dflash_specdecode_acceptance",
4+
"config": {
5+
"verifier_id": "google/gemma-4-26B-A4B-it",
6+
"drafter_id": "z-lab/gemma-4-26B-A4B-it-DFlash",
7+
"block_size": 16,
8+
"num_steps": 1,
9+
"max_new_tokens": 48,
10+
"n_prompts": 4,
11+
"aux_layer_ids": [
12+
2,
13+
7,
14+
12,
15+
18,
16+
23,
17+
28
18+
]
19+
},
20+
"aggregate": {
21+
"acceptance_rate": 0.002979991485738612,
22+
"acceptance_length": 1.0429447852760736,
23+
"total_accepted": 7,
24+
"total_drafted": 2349,
25+
"total_blocks": 163,
26+
"lossless_vs_ar": true,
27+
"reference_humaneval": {
28+
"acceptance_length": 7.7,
29+
"acceptance_rate": 0.447
30+
}
31+
},
32+
"per_prompt": [
33+
{
34+
"prompt": "Write a Python function that returns the n-th Fibonacci number.",
35+
"blocks": 45,
36+
"block_accepts": [
37+
0,
38+
0,
39+
0,
40+
1,
41+
0,
42+
0,
43+
0,
44+
0,
45+
0,
46+
0,
47+
0,
48+
0,
49+
0,
50+
0,
51+
0,
52+
0,
53+
0,
54+
0,
55+
0,
56+
0,
57+
0,
58+
0,
59+
1,
60+
0,
61+
0,
62+
0,
63+
0,
64+
0,
65+
0,
66+
0,
67+
0,
68+
0,
69+
0,
70+
0,
71+
0,
72+
0,
73+
0,
74+
0,
75+
0,
76+
1,
77+
0,
78+
0,
79+
0,
80+
0,
81+
0
82+
],
83+
"mean_accepted_per_block": 0.06666666666666667,
84+
"tokens_generated": 48,
85+
"verifier_forwards_spec": 45,
86+
"lossless_vs_ar": true,
87+
"decoded": "There are several ways to implement this depending on whether you prioritize readability, memory, or speed. Below are the three most common approaches.\n\n### 1. The Efficient Approach (Iterative)\nThis "
88+
},
89+
{
90+
"prompt": "Explain in two sentences why the sky is blue.",
91+
"blocks": 47,
92+
"block_accepts": [
93+
0,
94+
0,
95+
0,
96+
0,
97+
0,
98+
0,
99+
0,
100+
0,
101+
0,
102+
0,
103+
0,
104+
0,
105+
0,
106+
0,
107+
0,
108+
0,
109+
0,
110+
0,
111+
0,
112+
0,
113+
0,
114+
0,
115+
1,
116+
0,
117+
0,
118+
0,
119+
0,
120+
0,
121+
0,
122+
0,
123+
0,
124+
0,
125+
0,
126+
0,
127+
0,
128+
0,
129+
0,
130+
0,
131+
0,
132+
0,
133+
0,
134+
0,
135+
0,
136+
0,
137+
0,
138+
0,
139+
0
140+
],
141+
"mean_accepted_per_block": 0.02127659574468085,
142+
"tokens_generated": 48,
143+
"verifier_forwards_spec": 47,
144+
"lossless_vs_ar": true,
145+
"decoded": "The sky appears blue because of a phenomenon called Rayleigh scattering, where sunlight interacts with the gases and particles in Earth's atmosphere. As sunlight reaches the atmosphere, shorter blue w"
146+
},
147+
{
148+
"prompt": "List three prime numbers greater than 100.",
149+
"blocks": 37,
150+
"block_accepts": [
151+
0,
152+
0,
153+
0,
154+
0,
155+
0,
156+
1,
157+
2,
158+
0,
159+
0,
160+
0,
161+
0,
162+
0,
163+
0,
164+
0,
165+
0,
166+
0,
167+
0,
168+
0,
169+
0,
170+
0,
171+
0,
172+
0,
173+
0,
174+
0,
175+
0,
176+
0,
177+
0,
178+
0,
179+
0,
180+
0,
181+
0,
182+
0,
183+
0,
184+
0,
185+
0,
186+
0,
187+
0
188+
],
189+
"mean_accepted_per_block": 0.08108108108108109,
190+
"tokens_generated": 40,
191+
"verifier_forwards_spec": 37,
192+
"lossless_vs_ar": true,
193+
"decoded": "Here are three prime numbers greater than 100:\n\n1. **101**\n2. **103**\n3. **107**"
194+
},
195+
{
196+
"prompt": "Summarize the plot of Romeo and Juliet in one sentence.",
197+
"blocks": 34,
198+
"block_accepts": [
199+
0,
200+
0,
201+
0,
202+
0,
203+
0,
204+
0,
205+
0,
206+
0,
207+
0,
208+
0,
209+
0,
210+
0,
211+
0,
212+
0,
213+
0,
214+
0,
215+
0,
216+
0,
217+
0,
218+
0,
219+
0,
220+
0,
221+
0,
222+
0,
223+
0,
224+
0,
225+
0,
226+
0,
227+
0,
228+
0,
229+
0,
230+
0,
231+
0,
232+
0
233+
],
234+
"mean_accepted_per_block": 0.0,
235+
"tokens_generated": 34,
236+
"verifier_forwards_spec": 34,
237+
"lossless_vs_ar": true,
238+
"decoded": "Two star-crossed lovers from feuding noble families take their own lives in a tragic misunderstanding, ultimately transforming their deaths into a catalyst for peace between their households."
239+
}
240+
]
241+
}

results/research/logs/k3_dflash_specdecode_run3.log

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[k3-sd] loading verifier google/gemma-4-26B-A4B-it
2+
Loading weights: 0%| | 0/1013 [00:00<?, ?it/s]Loading weights: 0%| | 2/1013 [00:00<01:29, 11.35it/s]Loading weights: 0%| | 4/1013 [00:00<02:06, 7.98it/s]Loading weights: 2%|▏ | 25/1013 [00:00<00:17, 55.71it/s]Loading weights: 3%|▎ | 34/1013 [00:00<00:19, 50.70it/s]Loading weights: 5%|▍ | 48/1013 [00:01<00:18, 52.82it/s]Loading weights: 7%|▋ | 70/1013 [00:01<00:14, 65.53it/s]Loading weights: 9%|▉ | 92/1013 [00:01<00:12, 72.66it/s]Loading weights: 11%|█ | 113/1013 [00:01<00:11, 75.54it/s]Loading weights: 13%|█▎ | 134/1013 [00:01<00:09, 95.88it/s]Loading weights: 14%|█▍ | 146/1013 [00:02<00:09, 88.42it/s]Loading weights: 15%|█▌ | 157/1013 [00:02<00:11, 71.63it/s]Loading weights: 18%|█▊ | 179/1013 [00:02<00:10, 76.47it/s]Loading weights: 20%|█▉ | 200/1013 [00:02<00:10, 77.69it/s]Loading weights: 22%|██▏ | 222/1013 [00:02<00:08, 97.54it/s]Loading weights: 23%|██▎ | 234/1013 [00:03<00:09, 86.45it/s]Loading weights: 24%|██▍ | 244/1013 [00:03<00:08, 87.37it/s]Loading weights: 25%|██▌ | 254/1013 [00:03<00:10, 74.51it/s]Loading weights: 26%|██▌ | 263/1013 [00:03<00:10, 73.34it/s]Loading weights: 27%|██▋ | 271/1013 [00:03<00:12, 61.02it/s]Loading weights: 28%|██▊ | 287/1013 [00:03<00:09, 77.74it/s]Loading weights: 29%|██▉ | 296/1013 [00:04<00:10, 65.83it/s]Loading weights: 31%|███ | 309/1013 [00:04<00:09, 76.79it/s]Loading weights: 31%|███▏ | 318/1013 [00:04<00:10, 65.52it/s]Loading weights: 32%|███▏ | 329/1013 [00:04<00:09, 71.86it/s]Loading weights: 33%|███▎ | 338/1013 [00:04<00:10, 62.37it/s]Loading weights: 35%|███▍ | 353/1013 [00:05<00:11, 55.92it/s]Loading weights: 37%|███▋ | 376/1013 [00:05<00:09, 64.13it/s]Loading weights: 39%|███▊ | 392/1013 [00:05<00:10, 59.82it/s]Loading weights: 41%|████▏ | 418/1013 [00:05<00:06, 85.71it/s]Loading weights: 42%|████▏ | 429/1013 [00:06<00:07, 80.00it/s]Loading weights: 44%|████▎ | 441/1013 [00:06<00:08, 68.68it/s]Loading weights: 45%|████▌ | 458/1013 [00:06<00:06, 84.46it/s]Loading weights: 46%|████▋ | 469/1013 [00:06<00:06, 79.38it/s]Loading weights: 48%|████▊ | 485/1013 [00:06<00:07, 73.31it/s]Loading weights: 50%|████▉ | 506/1013 [00:06<00:05, 96.72it/s]Loading weights: 51%|█████ | 518/1013 [00:07<00:05, 89.40it/s]Loading weights: 52%|█████▏ | 529/1013 [00:07<00:06, 71.21it/s]Loading weights: 54%|█████▍ | 549/1013 [00:07<00:04, 93.95it/s]Loading weights: 55%|█████▌ | 561/1013 [00:07<00:05, 87.20it/s]Loading weights: 56%|█████▋ | 572/1013 [00:07<00:06, 70.47it/s]Loading weights: 59%|█████▊ | 593/1013 [00:07<00:04, 95.49it/s]Loading weights: 60%|█████▉ | 606/1013 [00:08<00:04, 90.23it/s]Loading weights: 61%|██████ | 617/1013 [00:08<00:05, 72.14it/s]Loading weights: 63%|██████▎ | 635/1013 [00:08<00:04, 90.60it/s]Loading weights: 64%|██████▍ | 647/1013 [00:08<00:04, 84.69it/s]Loading weights: 78%|███████▊ | 790/1013 [00:08<00:00, 358.32it/s]Loading weights: 96%|█████████▌| 972/1013 [00:08<00:00, 686.24it/s]Loading weights: 100%|██████████| 1013/1013 [00:08<00:00, 114.37it/s]
3+
[k3-sd] loading drafter z-lab/gemma-4-26B-A4B-it-DFlash
4+
[k3-sd] prompt 0: blocks=45 mean_accept=0.07 accepts=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] lossless=True
5+
[k3-sd] prompt 1: blocks=47 mean_accept=0.02 accepts=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] lossless=True
6+
[k3-sd] prompt 2: blocks=37 mean_accept=0.08 accepts=[0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] lossless=True
7+
[k3-sd] prompt 3: blocks=34 mean_accept=0.00 accepts=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] lossless=True
8+
[k3-sd] AGGREGATE acceptance_rate=0.003 acceptance_length=1.04 lossless=True (ref ~0.447 / ~7.7) -> results/research/k3_dflash_specdecode_run3.json

0 commit comments

Comments
 (0)