Commit 183a99c
committed
mtp: copy correct row of t_h_pre_norm based on prior n_accepted
Real bug fix. Previously llama_mtp_relay_h copied the LAST row of
ctx_target's t_h_pre_norm into ctx_mtp's t_inp_h. That is only correct
when the verifier accepts ALL drafts in the previous round; on partial
acceptance, the row whose hidden produced the next id_last is row
n_accepted, not the last row.
For a verify batch [sampled, d0, ..., d_{K-1}] at positions [p..p+K]:
- bonus = verifier's sample at row n_accepted (rejected position, or
the last row if all K drafts accepted)
- next id_last lives at position p + n_accepted + 1
- MTP needs h at position p + n_accepted = ROW n_accepted of t_h_pre_norm
The bug was invisible at K=1 in canonical paths (most rounds full-
accept → row K-1 = last row = correct) but degraded acceptance whenever
a draft was rejected. At K>=2, partial-accept dominates and MTP cascades
on wrong h, collapsing acceptance to ~30%.
Changes:
- llama_mtp_relay_h signature: int32_t n_rows → int32_t src_row.
Copies a single row at the specified index from src into row 0 of dst.
Caller picks the row.
- llama_mtp_relay_h_self unchanged in semantics — t_mtp_out has only the
one row produced by the previous chain step's single-token decode.
- common_speculative_state_mtp: track last_n_accepted (set by accept(),
consumed by next draft()'s k=0 relay). begin() resets it to -1, which
the relay maps to row 0 (only the prompt's last position is in the
trunk's outputs after prefill).
Measured on Qwen3.6-q8_0-mtp.gguf, send_req.sh (dense Python code, 400
tokens, temp=1, seed=42):
before fix after fix
K=1 84% accept, 11.4 tok/s 88% accept, 12.5 tok/s
K=2 30% accept, 9.5 tok/s 86% accept, 16.9 tok/s (+78%)
K=3 not viable 73% accept, 17.5 tok/s
K=2 now matches vLLM's documented sweet spot for Qwen3.6 / DeepSeek
MTP on code workloads. K=3 is a marginal win on top.
Architecture confirmation: an independent walk of vLLM's chain code
(SpecDecodeBaseProposer.propose, qwen3_5_mtp.forward) confirms vLLM's
K>1 chain is a pure self-roll on the MTP block's post-residual hidden
with hnorm reapplied each step — the same mechanism this codebase
already implements; the only delta vs vLLM was the row-selection bug
fixed here.1 parent 17d47df commit 183a99c
3 files changed
Lines changed: 60 additions & 25 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
634 | 634 | | |
635 | 635 | | |
636 | 636 | | |
| 637 | + | |
| 638 | + | |
| 639 | + | |
| 640 | + | |
| 641 | + | |
| 642 | + | |
| 643 | + | |
| 644 | + | |
637 | 645 | | |
638 | 646 | | |
639 | 647 | | |
| |||
696 | 704 | | |
697 | 705 | | |
698 | 706 | | |
| 707 | + | |
699 | 708 | | |
700 | 709 | | |
701 | 710 | | |
| |||
717 | 726 | | |
718 | 727 | | |
719 | 728 | | |
720 | | - | |
721 | | - | |
722 | | - | |
723 | | - | |
724 | | - | |
725 | | - | |
726 | | - | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
727 | 744 | | |
728 | 745 | | |
729 | 746 | | |
| |||
774 | 791 | | |
775 | 792 | | |
776 | 793 | | |
| 794 | + | |
777 | 795 | | |
778 | 796 | | |
779 | 797 | | |
| |||
784 | 802 | | |
785 | 803 | | |
786 | 804 | | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
787 | 808 | | |
788 | 809 | | |
789 | 810 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
988 | 988 | | |
989 | 989 | | |
990 | 990 | | |
991 | | - | |
992 | | - | |
993 | | - | |
994 | | - | |
995 | | - | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
| 1001 | + | |
996 | 1002 | | |
997 | 1003 | | |
998 | | - | |
| 1004 | + | |
999 | 1005 | | |
1000 | 1006 | | |
1001 | 1007 | | |
1002 | | - | |
| 1008 | + | |
1003 | 1009 | | |
1004 | 1010 | | |
1005 | 1011 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
3118 | 3118 | | |
3119 | 3119 | | |
3120 | 3120 | | |
3121 | | - | |
| 3121 | + | |
3122 | 3122 | | |
3123 | 3123 | | |
3124 | 3124 | | |
3125 | 3125 | | |
3126 | 3126 | | |
3127 | 3127 | | |
3128 | 3128 | | |
3129 | | - | |
| 3129 | + | |
3130 | 3130 | | |
3131 | 3131 | | |
3132 | 3132 | | |
| |||
3141 | 3141 | | |
3142 | 3142 | | |
3143 | 3143 | | |
3144 | | - | |
3145 | | - | |
3146 | | - | |
| 3144 | + | |
| 3145 | + | |
| 3146 | + | |
3147 | 3147 | | |
3148 | 3148 | | |
3149 | 3149 | | |
| |||
3155 | 3155 | | |
3156 | 3156 | | |
3157 | 3157 | | |
3158 | | - | |
3159 | | - | |
| 3158 | + | |
| 3159 | + | |
3160 | 3160 | | |
3161 | 3161 | | |
3162 | 3162 | | |
| |||
3195 | 3195 | | |
3196 | 3196 | | |
3197 | 3197 | | |
3198 | | - | |
| 3198 | + | |
3199 | 3199 | | |
3200 | 3200 | | |
3201 | 3201 | | |
3202 | 3202 | | |
3203 | 3203 | | |
3204 | 3204 | | |
3205 | | - | |
| 3205 | + | |
3206 | 3206 | | |
3207 | 3207 | | |
3208 | 3208 | | |
| |||
3211 | 3211 | | |
3212 | 3212 | | |
3213 | 3213 | | |
| 3214 | + | |
| 3215 | + | |
| 3216 | + | |
| 3217 | + | |
| 3218 | + | |
| 3219 | + | |
| 3220 | + | |
| 3221 | + | |
3214 | 3222 | | |
3215 | | - | |
| 3223 | + | |
3216 | 3224 | | |
3217 | | - | |
| 3225 | + | |
3218 | 3226 | | |
3219 | 3227 | | |
3220 | 3228 | | |
| |||
0 commit comments