Commit 485792d
committed
[TRTLLM-12669][fix] Pre-capture both greedy and advanced sampling CUDA graphs during warmup
On-the-fly CUDA graph capture is disabled outside the warmup window
(allow_capture context manager) because it can resize the shared
cuda_graph_workspace tensor and invalidate addresses baked into previously
captured graphs. As a result, the (is_all_greedy_sample=False) graph key
introduced for one-engine spec dec was never captured: warmup only ran
dummy requests with greedy sampling params, so inference batches with
temperature / top_k / top_p fell back to eager.
Fix: run the warmup capture loop twice for one-engine spec dec. The first
pass captures the greedy fast-path (existing behavior). The second pass
flips spec_metadata.is_all_greedy_sample to False before forward so
maybe_get_cuda_graph computes the non-greedy key, and sets a runtime
attribute that populate_sampling_params_for_one_model honors to override
the dummy-request-derived greedy detection and substitute synthetic
non-greedy values into the per-request buffers.
Other paths are unaffected: non-one-engine spec dec and non-spec dec
default is_all_greedy_sample to True, so the second pass is skipped.
End-to-end (qwen3_8b_eagle3, bs=32, T=0.7/top_k=50/top_p=0.9):
rej_off baseline: TPS=3713.73
rej_on (before fix): TPS=3854.01 (+3.8%; non-greedy ran eager)
rej_on (after fix): TPS=6013.58 (+62.0%; non-greedy uses graph)
Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>1 parent df6be84 commit 485792d
2 files changed
Lines changed: 71 additions & 24 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1071 | 1071 | | |
1072 | 1072 | | |
1073 | 1073 | | |
1074 | | - | |
1075 | | - | |
1076 | | - | |
1077 | | - | |
1078 | | - | |
1079 | | - | |
1080 | | - | |
1081 | | - | |
1082 | | - | |
1083 | | - | |
1084 | | - | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
| 1077 | + | |
| 1078 | + | |
| 1079 | + | |
| 1080 | + | |
| 1081 | + | |
| 1082 | + | |
| 1083 | + | |
| 1084 | + | |
| 1085 | + | |
| 1086 | + | |
1085 | 1087 | | |
1086 | | - | |
1087 | | - | |
1088 | | - | |
1089 | | - | |
1090 | | - | |
1091 | | - | |
1092 | | - | |
1093 | | - | |
1094 | | - | |
1095 | | - | |
1096 | | - | |
1097 | | - | |
1098 | | - | |
| 1088 | + | |
| 1089 | + | |
| 1090 | + | |
| 1091 | + | |
| 1092 | + | |
| 1093 | + | |
| 1094 | + | |
| 1095 | + | |
| 1096 | + | |
| 1097 | + | |
| 1098 | + | |
| 1099 | + | |
| 1100 | + | |
| 1101 | + | |
| 1102 | + | |
| 1103 | + | |
| 1104 | + | |
| 1105 | + | |
| 1106 | + | |
| 1107 | + | |
| 1108 | + | |
| 1109 | + | |
| 1110 | + | |
| 1111 | + | |
| 1112 | + | |
| 1113 | + | |
| 1114 | + | |
| 1115 | + | |
| 1116 | + | |
| 1117 | + | |
| 1118 | + | |
| 1119 | + | |
| 1120 | + | |
| 1121 | + | |
| 1122 | + | |
| 1123 | + | |
| 1124 | + | |
| 1125 | + | |
| 1126 | + | |
| 1127 | + | |
| 1128 | + | |
1099 | 1129 | | |
1100 | 1130 | | |
1101 | 1131 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
647 | 647 | | |
648 | 648 | | |
649 | 649 | | |
| 650 | + | |
| 651 | + | |
| 652 | + | |
| 653 | + | |
| 654 | + | |
| 655 | + | |
| 656 | + | |
| 657 | + | |
| 658 | + | |
| 659 | + | |
| 660 | + | |
| 661 | + | |
| 662 | + | |
| 663 | + | |
| 664 | + | |
| 665 | + | |
| 666 | + | |
650 | 667 | | |
651 | 668 | | |
652 | 669 | | |
| |||
0 commit comments