Commit d30d2e3
committed
[None][feat] Acceptance-rate-based speculation gate
Gate speculative decoding on a rolling per-request acceptance rate. When
the moving-average AR over acceptance_rate_window_size falls below
acceptance_rate_threshold, speculation is disabled for that request;
when it rises back above the threshold, speculation re-engages.
- Wire SpeculationGate through the three hot-path call sites (PP, non-
overlap, overlap scheduler) and hoist the early-return so the
no-op-when-disabled guarantee is visible in the diff.
- Rename _record_batch_acceptance_rate -> _update_batch_acceptance_rate
and document the overlap-scheduler interaction.
- Add the acceptance_rate_window_size / acceptance_rate_threshold knobs
to TorchLlmArgs / DecodingBaseConfig.
- Add unit tests under tests/unittest/_torch/speculative/test_spec_gate.py.
Squashed from:
0e4906a AR based speculation off
e099d11 fix CI
ef245d7 Rename _record_batch_acceptance_rate and add overlap scheduler comment.
955bd69 [None][perf] Make AR gate no-op explicit at call sites
Signed-off-by: Zheyu Fu <zheyuf@NVIDIA.com>1 parent 2275d1e commit d30d2e3
4 files changed
Lines changed: 182 additions & 137 deletions
File tree
- tensorrt_llm
- _torch
- pyexecutor
- speculative
- llmapi
- tests/unittest/_torch/speculative/hw_agnostic
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
410 | 410 | | |
411 | 411 | | |
412 | 412 | | |
413 | | - | |
414 | | - | |
415 | | - | |
416 | | - | |
417 | | - | |
418 | | - | |
419 | | - | |
420 | | - | |
| 413 | + | |
| 414 | + | |
421 | 415 | | |
422 | 416 | | |
423 | | - | |
424 | | - | |
425 | | - | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
| 420 | + | |
| 421 | + | |
| 422 | + | |
426 | 423 | | |
427 | 424 | | |
428 | 425 | | |
| |||
1706 | 1703 | | |
1707 | 1704 | | |
1708 | 1705 | | |
| 1706 | + | |
| 1707 | + | |
| 1708 | + | |
| 1709 | + | |
| 1710 | + | |
| 1711 | + | |
| 1712 | + | |
| 1713 | + | |
| 1714 | + | |
| 1715 | + | |
| 1716 | + | |
| 1717 | + | |
| 1718 | + | |
| 1719 | + | |
| 1720 | + | |
| 1721 | + | |
| 1722 | + | |
| 1723 | + | |
| 1724 | + | |
| 1725 | + | |
| 1726 | + | |
| 1727 | + | |
| 1728 | + | |
| 1729 | + | |
| 1730 | + | |
| 1731 | + | |
| 1732 | + | |
| 1733 | + | |
| 1734 | + | |
| 1735 | + | |
| 1736 | + | |
| 1737 | + | |
| 1738 | + | |
| 1739 | + | |
| 1740 | + | |
| 1741 | + | |
1709 | 1742 | | |
1710 | 1743 | | |
1711 | 1744 | | |
| |||
2447 | 2480 | | |
2448 | 2481 | | |
2449 | 2482 | | |
| 2483 | + | |
| 2484 | + | |
| 2485 | + | |
| 2486 | + | |
| 2487 | + | |
2450 | 2488 | | |
2451 | 2489 | | |
2452 | 2490 | | |
| |||
2514 | 2552 | | |
2515 | 2553 | | |
2516 | 2554 | | |
| 2555 | + | |
| 2556 | + | |
| 2557 | + | |
| 2558 | + | |
| 2559 | + | |
| 2560 | + | |
2517 | 2561 | | |
2518 | 2562 | | |
2519 | 2563 | | |
| |||
2675 | 2719 | | |
2676 | 2720 | | |
2677 | 2721 | | |
2678 | | - | |
2679 | 2722 | | |
2680 | 2723 | | |
2681 | 2724 | | |
| |||
3065 | 3108 | | |
3066 | 3109 | | |
3067 | 3110 | | |
| 3111 | + | |
| 3112 | + | |
| 3113 | + | |
| 3114 | + | |
| 3115 | + | |
3068 | 3116 | | |
3069 | 3117 | | |
3070 | 3118 | | |
| |||
3449 | 3497 | | |
3450 | 3498 | | |
3451 | 3499 | | |
| 3500 | + | |
| 3501 | + | |
| 3502 | + | |
| 3503 | + | |
| 3504 | + | |
| 3505 | + | |
| 3506 | + | |
3452 | 3507 | | |
3453 | 3508 | | |
3454 | 3509 | | |
| |||
5364 | 5419 | | |
5365 | 5420 | | |
5366 | 5421 | | |
5367 | | - | |
5368 | | - | |
5369 | | - | |
5370 | | - | |
5371 | | - | |
5372 | | - | |
5373 | | - | |
5374 | | - | |
5375 | | - | |
5376 | | - | |
5377 | | - | |
5378 | | - | |
5379 | | - | |
5380 | | - | |
5381 | | - | |
5382 | | - | |
5383 | | - | |
5384 | | - | |
5385 | | - | |
5386 | | - | |
5387 | | - | |
5388 | | - | |
5389 | | - | |
5390 | | - | |
5391 | | - | |
5392 | 5422 | | |
5393 | 5423 | | |
5394 | 5424 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | | - | |
| 2 | + | |
3 | 3 | | |
4 | 4 | | |
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
8 | 8 | | |
9 | | - | |
10 | | - | |
11 | | - | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
12 | 15 | | |
13 | 16 | | |
14 | 17 | | |
15 | 18 | | |
16 | | - | |
17 | | - | |
18 | | - | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
19 | 22 | | |
20 | 23 | | |
21 | 24 | | |
22 | 25 | | |
23 | 26 | | |
24 | 27 | | |
25 | | - | |
26 | | - | |
27 | | - | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
28 | 31 | | |
29 | 32 | | |
30 | | - | |
| 33 | + | |
31 | 34 | | |
32 | | - | |
33 | | - | |
| 35 | + | |
| 36 | + | |
34 | 37 | | |
35 | | - | |
36 | | - | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
37 | 42 | | |
38 | 43 | | |
39 | 44 | | |
40 | 45 | | |
41 | | - | |
42 | | - | |
| 46 | + | |
43 | 47 | | |
44 | 48 | | |
45 | | - | |
46 | | - | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
47 | 53 | | |
48 | | - | |
49 | | - | |
50 | | - | |
51 | | - | |
52 | | - | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
53 | 57 | | |
54 | 58 | | |
55 | | - | |
56 | | - | |
57 | | - | |
58 | | - | |
59 | | - | |
60 | | - | |
61 | | - | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
62 | 66 | | |
63 | | - | |
| 67 | + | |
64 | 68 | | |
65 | | - | |
66 | | - | |
67 | | - | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
68 | 73 | | |
69 | 74 | | |
70 | | - | |
71 | | - | |
72 | | - | |
73 | | - | |
74 | | - | |
75 | | - | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
76 | 81 | | |
77 | 82 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1291 | 1291 | | |
1292 | 1292 | | |
1293 | 1293 | | |
1294 | | - | |
| 1294 | + | |
1295 | 1295 | | |
1296 | 1296 | | |
1297 | | - | |
| 1297 | + | |
| 1298 | + | |
1298 | 1299 | | |
1299 | 1300 | | |
1300 | 1301 | | |
1301 | | - | |
| 1302 | + | |
1302 | 1303 | | |
1303 | | - | |
1304 | | - | |
1305 | | - | |
1306 | | - | |
| 1304 | + | |
| 1305 | + | |
| 1306 | + | |
| 1307 | + | |
| 1308 | + | |
| 1309 | + | |
| 1310 | + | |
1307 | 1311 | | |
1308 | 1312 | | |
1309 | 1313 | | |
| |||
0 commit comments