[TRTLLM-12669][fix] only route plain-TP greedy MTP-Eagle draft sampling through draft_sampler

zhaoyangwang-nvidia · zhaoyangwang-nvidia · commit 16e577cd0148 · 2026-06-14T09:51:45.000-07:00
The previous fix routed every greedy MTP-Eagle draft step through
draft_sampler(), but that call does not forward mapping_lm_head_tp. For the
LM-head-TP-in-ADP configuration draft_sampler() then takes its ADP branch with
a None mapping and crashes during warmup with
"'NoneType' object has no attribute 'tp_group'" (Executor worker returned
error), e.g. DeepSeek-R1 nvfp4 latency_adp_lmtp_tp4.

Only plain tensor parallelism (tp_size&gt;1 without attention DP) shards the draft
logits over the vocab dim and needs draft_sampler()'s all-gather argmax. The
LM-head-TP-in-ADP case already yields full-vocab logits per rank (gathered
upstream) and the no-TP / Eagle3 cases need nothing, so all of those take the
plain d2t-aware argmax (_draft_sampler_greedy), restoring the pre-regression
behavior for ADP while keeping the plain-TP hang fix.

Signed-off-by: ZhaoyangWang &lt;zhaoyangw@nvidia.com&gt;
diff --git a/tensorrt_llm/_torch/speculative/eagle3.py b/tensorrt_llm/_torch/speculative/eagle3.py
@@ -1210,7 +1210,18 @@ def draft_decoder(
         # before argmax (and falls back to a plain argmax when no TP gather is
         # needed). Eagle3 (non-MTP) keeps its d2t-aware argmax.
         if spec_metadata.is_all_greedy_sample:
-            if self.is_mtp_eagle:
+            # Only plain tensor parallelism (tp_size>1 without attention DP)
+            # shards the draft logits over the vocab dim and thus needs
+            # draft_sampler()'s all-gather argmax. The LM-head-TP-in-ADP case
+            # already produces full-vocab logits per rank (gathered upstream),
+            # and the no-TP / Eagle3 cases need nothing, so they take the plain
+            # d2t-aware argmax. (Routing ADP/LM-head-TP through draft_sampler
+            # without its mapping_lm_head_tp arg hits the None-mapping branch
+            # and crashes with 'NoneType has no attribute tp_group'.)
+            if (self.is_mtp_eagle and self.model_config is not None
+                    and hasattr(self.model_config, 'mapping')
+                    and self.model_config.mapping.tp_size > 1
+                    and not self.model_config.mapping.enable_attention_dp):
                 return self.draft_sampler(logits)
             return self._draft_sampler_greedy(logits, d2t)
         # Non-greedy (advanced) draft sampling has the same TP hazard as the