Commit fa65e35
committed
[TRTLLM-12669][perf] Add torch.compile(max-autotune) to compute_probs_from_logits
Profiling on H200 shows +15% rejection sampling throughput (1135 → 1304 tok/s)
at bs=16 with Qwen3-8B Eagle3 dynamic tree.
Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>1 parent b7d6987 commit fa65e35
1 file changed
Lines changed: 1 addition & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
117 | 117 | | |
118 | 118 | | |
119 | 119 | | |
| 120 | + | |
120 | 121 | | |
121 | 122 | | |
122 | 123 | | |
| |||
0 commit comments