Commit 764edb7
committed
[TRTLLM-12669][fix] refresh is_all_greedy_sample before CUDA graph key selection
The one-engine CUDA graph key includes is_all_greedy_sample to dispatch
between the argmax fast-path and the advanced-sampling graph variant. The flag
was only (re)computed inside populate_sampling_params_for_one_model, which runs
in _prepare_inputs AFTER maybe_get_cuda_graph has already built the key. The key
therefore used the previous iteration's stale flag, and warmup left it False
(from the advanced-sampling capture pass). On the first real decode iteration a
greedy batch would then replay the advanced-sampling graph while populate skips
filling the sampling/draft_probs buffers, reading uninitialized slot-indexed
data. For MTP with num_nextn>=2 this hung the executor (Hang detected on rank 0).
Fix:
- Extract the greediness detection into _scan_one_model_sampling (single source
of truth) and add update_is_all_greedy_sample, called before the graph key is
built so the key matches the buffers populate fills. populate now reuses the
same scan.
- Defensively reset spec_metadata.is_all_greedy_sample to True after CUDA graph
warmup so the stale capture-only False does not seed the first iteration.
Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>1 parent 4aa80bf commit 764edb7
2 files changed
Lines changed: 66 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1331 | 1331 | | |
1332 | 1332 | | |
1333 | 1333 | | |
| 1334 | + | |
| 1335 | + | |
| 1336 | + | |
| 1337 | + | |
| 1338 | + | |
| 1339 | + | |
| 1340 | + | |
1334 | 1341 | | |
1335 | 1342 | | |
1336 | 1343 | | |
| |||
4584 | 4591 | | |
4585 | 4592 | | |
4586 | 4593 | | |
| 4594 | + | |
| 4595 | + | |
| 4596 | + | |
| 4597 | + | |
| 4598 | + | |
| 4599 | + | |
| 4600 | + | |
| 4601 | + | |
| 4602 | + | |
| 4603 | + | |
| 4604 | + | |
4587 | 4605 | | |
4588 | 4606 | | |
4589 | 4607 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
39 | 40 | | |
40 | 41 | | |
41 | 42 | | |
| |||
574 | 575 | | |
575 | 576 | | |
576 | 577 | | |
577 | | - | |
578 | | - | |
579 | | - | |
580 | | - | |
| 578 | + | |
| 579 | + | |
| 580 | + | |
| 581 | + | |
581 | 582 | | |
582 | | - | |
583 | | - | |
584 | | - | |
| 583 | + | |
| 584 | + | |
| 585 | + | |
| 586 | + | |
| 587 | + | |
585 | 588 | | |
586 | 589 | | |
587 | 590 | | |
588 | 591 | | |
589 | | - | |
590 | | - | |
591 | | - | |
592 | | - | |
593 | | - | |
594 | | - | |
595 | | - | |
596 | 592 | | |
597 | 593 | | |
598 | 594 | | |
| |||
708 | 704 | | |
709 | 705 | | |
710 | 706 | | |
| 707 | + | |
| 708 | + | |
| 709 | + | |
| 710 | + | |
| 711 | + | |
| 712 | + | |
| 713 | + | |
| 714 | + | |
| 715 | + | |
| 716 | + | |
| 717 | + | |
| 718 | + | |
| 719 | + | |
| 720 | + | |
| 721 | + | |
| 722 | + | |
| 723 | + | |
| 724 | + | |
| 725 | + | |
| 726 | + | |
| 727 | + | |
| 728 | + | |
| 729 | + | |
| 730 | + | |
| 731 | + | |
| 732 | + | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
711 | 745 | | |
712 | 746 | | |
713 | 747 | | |
| |||
0 commit comments