Commit eaa50b7
committed
[None][feat] Acceptance-rate-based speculation gate
Gate speculative decoding on a rolling per-request acceptance rate. When
the moving-average AR over acceptance_rate_window_size falls below
acceptance_rate_threshold, speculation is disabled for that request;
when it rises back above the threshold, speculation re-engages.
- Wire SpeculationGate through the three hot-path call sites (PP, non-
overlap, overlap scheduler) and hoist the early-return so the
no-op-when-disabled guarantee is visible in the diff.
- Rename _record_batch_acceptance_rate -> _update_batch_acceptance_rate
and document the overlap-scheduler interaction.
- Add the acceptance_rate_window_size / acceptance_rate_threshold knobs
to TorchLlmArgs / DecodingBaseConfig.
- Add unit tests under tests/unittest/_torch/speculative/test_spec_gate.py.
Squashed from:
0e4906a AR based speculation off
e099d11 fix CI
ef245d7 Rename _record_batch_acceptance_rate and add overlap scheduler comment.
955bd69 [None][perf] Make AR gate no-op explicit at call sites
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>1 parent b03b78f commit eaa50b7
4 files changed
Lines changed: 182 additions & 137 deletions
File tree
- tensorrt_llm
- _torch
- pyexecutor
- speculative
- llmapi
- tests/unittest/_torch/speculative/hw_agnostic
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
399 | 399 | | |
400 | 400 | | |
401 | 401 | | |
402 | | - | |
403 | | - | |
404 | | - | |
405 | | - | |
406 | | - | |
407 | | - | |
408 | | - | |
409 | | - | |
| 402 | + | |
| 403 | + | |
410 | 404 | | |
411 | 405 | | |
412 | | - | |
413 | | - | |
414 | | - | |
| 406 | + | |
| 407 | + | |
| 408 | + | |
| 409 | + | |
| 410 | + | |
| 411 | + | |
415 | 412 | | |
416 | 413 | | |
417 | 414 | | |
| |||
1666 | 1663 | | |
1667 | 1664 | | |
1668 | 1665 | | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
| 1684 | + | |
| 1685 | + | |
| 1686 | + | |
| 1687 | + | |
| 1688 | + | |
| 1689 | + | |
| 1690 | + | |
| 1691 | + | |
| 1692 | + | |
| 1693 | + | |
| 1694 | + | |
| 1695 | + | |
| 1696 | + | |
| 1697 | + | |
| 1698 | + | |
| 1699 | + | |
| 1700 | + | |
| 1701 | + | |
1669 | 1702 | | |
1670 | 1703 | | |
1671 | 1704 | | |
| |||
2406 | 2439 | | |
2407 | 2440 | | |
2408 | 2441 | | |
| 2442 | + | |
| 2443 | + | |
| 2444 | + | |
| 2445 | + | |
| 2446 | + | |
2409 | 2447 | | |
2410 | 2448 | | |
2411 | 2449 | | |
| |||
2473 | 2511 | | |
2474 | 2512 | | |
2475 | 2513 | | |
| 2514 | + | |
| 2515 | + | |
| 2516 | + | |
| 2517 | + | |
| 2518 | + | |
| 2519 | + | |
2476 | 2520 | | |
2477 | 2521 | | |
2478 | 2522 | | |
| |||
2608 | 2652 | | |
2609 | 2653 | | |
2610 | 2654 | | |
2611 | | - | |
2612 | 2655 | | |
2613 | 2656 | | |
2614 | 2657 | | |
| |||
2986 | 3029 | | |
2987 | 3030 | | |
2988 | 3031 | | |
| 3032 | + | |
| 3033 | + | |
| 3034 | + | |
| 3035 | + | |
| 3036 | + | |
2989 | 3037 | | |
2990 | 3038 | | |
2991 | 3039 | | |
| |||
3368 | 3416 | | |
3369 | 3417 | | |
3370 | 3418 | | |
| 3419 | + | |
| 3420 | + | |
| 3421 | + | |
| 3422 | + | |
| 3423 | + | |
| 3424 | + | |
| 3425 | + | |
3371 | 3426 | | |
3372 | 3427 | | |
3373 | 3428 | | |
| |||
4993 | 5048 | | |
4994 | 5049 | | |
4995 | 5050 | | |
4996 | | - | |
4997 | | - | |
4998 | | - | |
4999 | | - | |
5000 | | - | |
5001 | | - | |
5002 | | - | |
5003 | | - | |
5004 | | - | |
5005 | | - | |
5006 | | - | |
5007 | | - | |
5008 | | - | |
5009 | | - | |
5010 | | - | |
5011 | | - | |
5012 | | - | |
5013 | | - | |
5014 | | - | |
5015 | | - | |
5016 | | - | |
5017 | | - | |
5018 | | - | |
5019 | | - | |
5020 | | - | |
5021 | 5051 | | |
5022 | 5052 | | |
5023 | 5053 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | | - | |
11 | | - | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
12 | 15 | | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
16 | | - | |
17 | | - | |
18 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
19 | 22 | | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | | - | |
26 | | - | |
27 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | | - | |
| 33 | + | |
31 | 34 | | |
32 | | - | |
33 | | - | |
| 35 | + | |
| 36 | + | |
34 | 37 | | |
35 | | - | |
36 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
37 | 42 | | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
41 | | - | |
42 | | - | |
| 46 | + | |
43 | 47 | | |
44 | 48 | | |
45 | | - | |
46 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
47 | 53 | | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
53 | 57 | | |
54 | 58 | | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
62 | 66 | | |
63 | | - | |
| 67 | + | |
64 | 68 | | |
65 | | - | |
66 | | - | |
67 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
68 | 73 | | |
69 | 74 | | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
76 | 81 | | |
77 | 82 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1057 | 1057 | | |
1058 | 1058 | | |
1059 | 1059 | | |
1060 | | - | |
| 1060 | + | |
1061 | 1061 | | |
1062 | 1062 | | |
1063 | | - | |
| 1063 | + | |
| 1064 | + | |
1064 | 1065 | | |
1065 | 1066 | | |
1066 | 1067 | | |
1067 | | - | |
| 1068 | + | |
1068 | 1069 | | |
1069 | | - | |
1070 | | - | |
1071 | | - | |
1072 | | - | |
| 1070 | + | |
| 1071 | + | |
| 1072 | + | |
| 1073 | + | |
| 1074 | + | |
| 1075 | + | |
| 1076 | + | |
1073 | 1077 | | |
1074 | 1078 | | |
1075 | 1079 | | |
| |||
0 commit comments