Commit beb4b32
committed
[TRTLLM-12669][perf] Cache d2t target indices in spec metadata
The d2t-projected target vocab indices computed inside the rejection-path
d2t padding step (arange(draft_vocab) + (source + d2t.to(device)) % vocab_size)
were being rebuilt every iteration even though the d2t tensor is model-static.
Cache the result on SpecMetadataBase.d2t_target_indices on first use and
reuse it on subsequent iterations.
Profile breakdown (llama70b bs=32, CUDA graph off) showed
accept_draft.rejection.d2t_padding at 88 us/iter — the second-largest
rejection-path step after compute target_probs (127 us). The index sequence
costs ~10-20 us of that (3-4 kernels: arange + d2t H2D copy + add + mod);
the rest is the slot-indexed scatter into full_draft_probs which is
already pre-allocated.
Verified on llama70b bs=32 over 3 rounds (mean ± stdev):
Before: rej_on vs rej_off gap ≈ -10.0% (single-run baseline)
After : rej_on vs rej_off gap = -8.71% ± 0.9% (3-round mean)
Net within-run improvement ≈ +1.3%. qwen235b unchanged (already positive).
Output accuracy verified across 22 (model, bs, mode) configurations: all
1760 outputs terminate normally (EOT or max_tokens), no regressions.
Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>1 parent e173cbf commit beb4b32
1 file changed
Lines changed: 28 additions & 15 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
487 | 487 | | |
488 | 488 | | |
489 | 489 | | |
| 490 | + | |
| 491 | + | |
| 492 | + | |
| 493 | + | |
| 494 | + | |
490 | 495 | | |
491 | 496 | | |
492 | 497 | | |
| |||
1052 | 1057 | | |
1053 | 1058 | | |
1054 | 1059 | | |
1055 | | - | |
1056 | | - | |
| 1060 | + | |
| 1061 | + | |
1057 | 1062 | | |
1058 | 1063 | | |
1059 | 1064 | | |
| |||
1067 | 1072 | | |
1068 | 1073 | | |
1069 | 1074 | | |
1070 | | - | |
| 1075 | + | |
| 1076 | + | |
1071 | 1077 | | |
1072 | 1078 | | |
1073 | 1079 | | |
| |||
1176 | 1182 | | |
1177 | 1183 | | |
1178 | 1184 | | |
1179 | | - | |
1180 | | - | |
1181 | | - | |
| 1185 | + | |
| 1186 | + | |
1182 | 1187 | | |
1183 | 1188 | | |
1184 | 1189 | | |
| |||
1190 | 1195 | | |
1191 | 1196 | | |
1192 | 1197 | | |
1193 | | - | |
1194 | | - | |
1195 | | - | |
| 1198 | + | |
| 1199 | + | |
| 1200 | + | |
| 1201 | + | |
1196 | 1202 | | |
1197 | | - | |
| 1203 | + | |
| 1204 | + | |
1198 | 1205 | | |
1199 | 1206 | | |
1200 | 1207 | | |
| |||
1204 | 1211 | | |
1205 | 1212 | | |
1206 | 1213 | | |
1207 | | - | |
1208 | | - | |
1209 | | - | |
1210 | | - | |
1211 | | - | |
| 1214 | + | |
| 1215 | + | |
| 1216 | + | |
| 1217 | + | |
| 1218 | + | |
| 1219 | + | |
| 1220 | + | |
| 1221 | + | |
| 1222 | + | |
| 1223 | + | |
| 1224 | + | |
1212 | 1225 | | |
1213 | 1226 | | |
1214 | 1227 | | |
| |||
0 commit comments