Commit 16e577c
committed
[TRTLLM-12669][fix] only route plain-TP greedy MTP-Eagle draft sampling through draft_sampler
The previous fix routed every greedy MTP-Eagle draft step through
draft_sampler(), but that call does not forward mapping_lm_head_tp. For the
LM-head-TP-in-ADP configuration draft_sampler() then takes its ADP branch with
a None mapping and crashes during warmup with
"'NoneType' object has no attribute 'tp_group'" (Executor worker returned
error), e.g. DeepSeek-R1 nvfp4 latency_adp_lmtp_tp4.
Only plain tensor parallelism (tp_size>1 without attention DP) shards the draft
logits over the vocab dim and needs draft_sampler()'s all-gather argmax. The
LM-head-TP-in-ADP case already yields full-vocab logits per rank (gathered
upstream) and the no-TP / Eagle3 cases need nothing, so all of those take the
plain d2t-aware argmax (_draft_sampler_greedy), restoring the pre-regression
behavior for ADP while keeping the plain-TP hang fix.
Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>1 parent 85468e1 commit 16e577c
1 file changed
Lines changed: 12 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1210 | 1210 | | |
1211 | 1211 | | |
1212 | 1212 | | |
1213 | | - | |
| 1213 | + | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
1214 | 1225 | | |
1215 | 1226 | | |
1216 | 1227 | | |
| |||
0 commit comments