K3 Stage 2 evidence (run3): faithful forward + kernel-correct query layout

cursoragent · FluffyAIcode · cursoragent · commit c71f053548f3 · 2026-06-09T09:02:21.000Z
After porting the real qwen3_dflash.py forward (single non-causal pass,
context-KV from fc(aux)+hidden_norm+per-layer kv) and fixing the draft query
layout per copy_and_expand_dflash_inputs_kernel (bonus at C, drafts from mask
positions C+1..), acceptance moved 0.000 -&gt; 0.003 (a few drafts match) with
lossless_vs_ar=True across all prompts.

Still far from the ~0.447/7.7 reference. Remaining gap = the exact aux
hidden-state tap semantics (which output_hidden_states indices map to the
trained aux layers + EAGLE-3 collection/ordering) and shared embed/lm_head
details, which live in vLLM's base proposer aux-collection path
(eagle.py / llm_base_proposer.py, not fully available) — needs that source or
a numerical reference to close. Structural port + weights + plumbing are
validated (strict load, lossless AR).

Co-authored-by: FluffyAIcode &lt;FluffyAIcode@users.noreply.github.com&gt;
diff --git a/results/research/k3_dflash_specdecode_run3.json b/results/research/k3_dflash_specdecode_run3.json
@@ -0,0 +1,241 @@
+{
+  "schema_version": 1,
+  "kind": "k3_dflash_specdecode_acceptance",
+  "config": {
+    "verifier_id": "google/gemma-4-26B-A4B-it",
+    "drafter_id": "z-lab/gemma-4-26B-A4B-it-DFlash",
+    "block_size": 16,
+    "num_steps": 1,
+    "max_new_tokens": 48,
+    "n_prompts": 4,
+    "aux_layer_ids": [
+      2,
+      7,
+      12,
+      18,
+      23,
+      28
+    ]
+  },
+  "aggregate": {
+    "acceptance_rate": 0.002979991485738612,
+    "acceptance_length": 1.0429447852760736,
+    "total_accepted": 7,
+    "total_drafted": 2349,
+    "total_blocks": 163,
+    "lossless_vs_ar": true,
+    "reference_humaneval": {
+      "acceptance_length": 7.7,
+      "acceptance_rate": 0.447
+    }
+  },
+  "per_prompt": [
+    {
+      "prompt": "Write a Python function that returns the n-th Fibonacci number.",
+      "blocks": 45,
+      "block_accepts": [
+        0,
+        0,
+        0,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0
+      ],
+      "mean_accepted_per_block": 0.06666666666666667,
+      "tokens_generated": 48,
+      "verifier_forwards_spec": 45,
+      "lossless_vs_ar": true,
+      "decoded": "There are several ways to implement this depending on whether you prioritize readability, memory, or speed. Below are the three most common approaches.\n\n### 1. The Efficient Approach (Iterative)\nThis "
+    },
+    {
+      "prompt": "Explain in two sentences why the sky is blue.",
+      "blocks": 47,
+      "block_accepts": [
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0
+      ],
+      "mean_accepted_per_block": 0.02127659574468085,
+      "tokens_generated": 48,
+      "verifier_forwards_spec": 47,
+      "lossless_vs_ar": true,
+      "decoded": "The sky appears blue because of a phenomenon called Rayleigh scattering, where sunlight interacts with the gases and particles in Earth's atmosphere. As sunlight reaches the atmosphere, shorter blue w"
+    },
+    {
+      "prompt": "List three prime numbers greater than 100.",
+      "blocks": 37,
+      "block_accepts": [
+        0,
+        0,
+        0,
+        0,
+        0,
+        1,
+        2,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0
+      ],
+      "mean_accepted_per_block": 0.08108108108108109,
+      "tokens_generated": 40,
+      "verifier_forwards_spec": 37,
+      "lossless_vs_ar": true,
+      "decoded": "Here are three prime numbers greater than 100:\n\n1.  **101**\n2.  **103**\n3.  **107**"
+    },
+    {
+      "prompt": "Summarize the plot of Romeo and Juliet in one sentence.",
+      "blocks": 34,
+      "block_accepts": [
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0,
+        0
+      ],
+      "mean_accepted_per_block": 0.0,
+      "tokens_generated": 34,
+      "verifier_forwards_spec": 34,
+      "lossless_vs_ar": true,
+      "decoded": "Two star-crossed lovers from feuding noble families take their own lives in a tragic misunderstanding, ultimately transforming their deaths into a catalyst for peace between their households."
+    }
+  ]
+}
diff --git a/results/research/logs/k3_dflash_specdecode_run3.log b/results/research/logs/k3_dflash_specdecode_run3.log
@@ -0,0 +1,8 @@
+[k3-sd] loading verifier google/gemma-4-26B-A4B-it
+Loading weights:   0%|          | 0/1013 [00:00<?, ?it/s]Loading weights:   0%|          | 2/1013 [00:00<01:29, 11.35it/s]Loading weights:   0%|          | 4/1013 [00:00<02:06,  7.98it/s]Loading weights:   2%|▏         | 25/1013 [00:00<00:17, 55.71it/s]Loading weights:   3%|▎         | 34/1013 [00:00<00:19, 50.70it/s]Loading weights:   5%|▍         | 48/1013 [00:01<00:18, 52.82it/s]Loading weights:   7%|▋         | 70/1013 [00:01<00:14, 65.53it/s]Loading weights:   9%|▉         | 92/1013 [00:01<00:12, 72.66it/s]Loading weights:  11%|█         | 113/1013 [00:01<00:11, 75.54it/s]Loading weights:  13%|█▎        | 134/1013 [00:01<00:09, 95.88it/s]Loading weights:  14%|█▍        | 146/1013 [00:02<00:09, 88.42it/s]Loading weights:  15%|█▌        | 157/1013 [00:02<00:11, 71.63it/s]Loading weights:  18%|█▊        | 179/1013 [00:02<00:10, 76.47it/s]Loading weights:  20%|█▉        | 200/1013 [00:02<00:10, 77.69it/s]Loading weights:  22%|██▏       | 222/1013 [00:02<00:08, 97.54it/s]Loading weights:  23%|██▎       | 234/1013 [00:03<00:09, 86.45it/s]Loading weights:  24%|██▍       | 244/1013 [00:03<00:08, 87.37it/s]Loading weights:  25%|██▌       | 254/1013 [00:03<00:10, 74.51it/s]Loading weights:  26%|██▌       | 263/1013 [00:03<00:10, 73.34it/s]Loading weights:  27%|██▋       | 271/1013 [00:03<00:12, 61.02it/s]Loading weights:  28%|██▊       | 287/1013 [00:03<00:09, 77.74it/s]Loading weights:  29%|██▉       | 296/1013 [00:04<00:10, 65.83it/s]Loading weights:  31%|███       | 309/1013 [00:04<00:09, 76.79it/s]Loading weights:  31%|███▏      | 318/1013 [00:04<00:10, 65.52it/s]Loading weights:  32%|███▏      | 329/1013 [00:04<00:09, 71.86it/s]Loading weights:  33%|███▎      | 338/1013 [00:04<00:10, 62.37it/s]Loading weights:  35%|███▍      | 353/1013 [00:05<00:11, 55.92it/s]Loading weights:  37%|███▋      | 376/1013 [00:05<00:09, 64.13it/s]Loading weights:  39%|███▊      | 392/1013 [00:05<00:10, 59.82it/s]Loading weights:  41%|████▏     | 418/1013 [00:05<00:06, 85.71it/s]Loading weights:  42%|████▏     | 429/1013 [00:06<00:07, 80.00it/s]Loading weights:  44%|████▎     | 441/1013 [00:06<00:08, 68.68it/s]Loading weights:  45%|████▌     | 458/1013 [00:06<00:06, 84.46it/s]Loading weights:  46%|████▋     | 469/1013 [00:06<00:06, 79.38it/s]Loading weights:  48%|████▊     | 485/1013 [00:06<00:07, 73.31it/s]Loading weights:  50%|████▉     | 506/1013 [00:06<00:05, 96.72it/s]Loading weights:  51%|█████     | 518/1013 [00:07<00:05, 89.40it/s]Loading weights:  52%|█████▏    | 529/1013 [00:07<00:06, 71.21it/s]Loading weights:  54%|█████▍    | 549/1013 [00:07<00:04, 93.95it/s]Loading weights:  55%|█████▌    | 561/1013 [00:07<00:05, 87.20it/s]Loading weights:  56%|█████▋    | 572/1013 [00:07<00:06, 70.47it/s]Loading weights:  59%|█████▊    | 593/1013 [00:07<00:04, 95.49it/s]Loading weights:  60%|█████▉    | 606/1013 [00:08<00:04, 90.23it/s]Loading weights:  61%|██████    | 617/1013 [00:08<00:05, 72.14it/s]Loading weights:  63%|██████▎   | 635/1013 [00:08<00:04, 90.60it/s]Loading weights:  64%|██████▍   | 647/1013 [00:08<00:04, 84.69it/s]Loading weights:  78%|███████▊  | 790/1013 [00:08<00:00, 358.32it/s]Loading weights:  96%|█████████▌| 972/1013 [00:08<00:00, 686.24it/s]Loading weights: 100%|██████████| 1013/1013 [00:08<00:00, 114.37it/s]
+[k3-sd] loading drafter z-lab/gemma-4-26B-A4B-it-DFlash
+[k3-sd] prompt 0: blocks=45 mean_accept=0.07 accepts=[0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0] lossless=True
+[k3-sd] prompt 1: blocks=47 mean_accept=0.02 accepts=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] lossless=True
+[k3-sd] prompt 2: blocks=37 mean_accept=0.08 accepts=[0, 0, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] lossless=True
+[k3-sd] prompt 3: blocks=34 mean_accept=0.00 accepts=[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0] lossless=True
+[k3-sd] AGGREGATE acceptance_rate=0.003 acceptance_length=1.04 lossless=True (ref ~0.447 / ~7.7)  -> results/research/k3_dflash_specdecode_run3.json