Commit d599fb5
committed
[TRTLLM-12669][feat] Enable rejection sampling by default for Eagle3 one-model
Flip the default of `use_rejection_sampling` from `False` to `True` on
DecodingBaseConfig. With the refactor of the all-greedy fast path in
place, this is safe: the runtime guard in `_can_use_rejection_sampling`
still requires a non-greedy batch, so all-greedy batches keep taking the
argmax fast path unchanged. Only batches that already opted into
non-greedy sampling now see the rejection sampling acceptance behavior.
Benchmark results on Qwen3-235B-A22B + Eagle3 (tp=8) show consistent
+6.4% to +9.4% throughput and +3.4 to +4.3 pp acceptance rate across
batch sizes 1-16 vs the exact-match baseline. Other Eagle3 deployments
see smaller but uniformly positive acceptance-rate gains.
Two prior `raise ValueError` paths are converted to silent fallbacks so
the new default does not break existing users:
- Non-Eagle3 spec configs (PARD, DFlash, MTP, ...) silently disable the
flag in TorchLlmArgs post-validation, since rejection sampling is only
wired up for Eagle3 one-model paths.
- SA-enhanced Eagle3 configs disable the flag in the per-config
validator, since SA may override proposed draft tokens.
Users who want the prior exact-match behavior can still pass
`use_rejection_sampling=False` explicitly.
Signed-off-by: ZhaoyangWang <zhaoyangw@nvidia.com>1 parent 83e7903 commit d599fb5
1 file changed
Lines changed: 17 additions & 14 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
897 | 897 | | |
898 | 898 | | |
899 | 899 | | |
900 | | - | |
| 900 | + | |
901 | 901 | | |
902 | 902 | | |
903 | | - | |
904 | | - | |
| 903 | + | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
905 | 907 | | |
906 | 908 | | |
907 | 909 | | |
| |||
958 | 960 | | |
959 | 961 | | |
960 | 962 | | |
961 | | - | |
| 963 | + | |
| 964 | + | |
| 965 | + | |
| 966 | + | |
| 967 | + | |
962 | 968 | | |
963 | 969 | | |
964 | | - | |
965 | | - | |
966 | | - | |
967 | | - | |
| 970 | + | |
968 | 971 | | |
969 | 972 | | |
970 | 973 | | |
| |||
4140 | 4143 | | |
4141 | 4144 | | |
4142 | 4145 | | |
4143 | | - | |
4144 | | - | |
4145 | | - | |
4146 | | - | |
4147 | | - | |
4148 | | - | |
| 4146 | + | |
| 4147 | + | |
| 4148 | + | |
| 4149 | + | |
| 4150 | + | |
| 4151 | + | |
4149 | 4152 | | |
4150 | 4153 | | |
4151 | 4154 | | |
| |||
0 commit comments