Commit 68e7ea3
spec : parallel drafting support (#22838)
* spec : refactor
* spec : drop support for incompatible vocabs
* spec : update common_speculative_init()
* cont : pass seq_id
* cont : dedup ctx_seq_rm_type
* server : sketch the ctx_dft decode loop
* server : draft prompt cache and checkpoints
* server : improve ctx names
* server, spec : transition to unified spec context
* cont : sync main and drft contexts
* cont : async drft eval when possible
* cont : handle non-ckpt models
* cont : pass correct n_past for drafting
* cont : process images throught the draft context
* spec : handle draft running out of context
* server : fix mtmd draft processing
* server : fix URL for draft model
* server : add comment
* server : clean-up + dry
* speculative-simple : update
* spec : fix n_past type
* server : fix slot ctx_drft ptr
* tools : update readme
* naming : improve consistency
* spec : refactor for multi-sequence speculative context
* cont : prepare params
* cont : prepare params
* spec : support parallel drafts
* server : support parallel drafting
* llama : reuse device buffers when possible
* server, spec : clean-up
* cont : clean-up
* cont : minor
* spec : reset `drafting` flag at the end
* spec : introduce `common_speculative_process()`
* spec : allow for multiple spec types (chain of speculators)
* replace old type field of type common_speculative_type in the
common_params_speculative struct with a vector to allow multiple
types to be specified
* introduce common_get_enabled_speculative_impls(const std::vector<enum common_speculative_type>)
to figure out which implementations the user has enabled
* introduce common_speculative_type_from_names(const std::vector<std::string> & names)
to parse the already user provided spec types
* all speculators run sequentially, best one wins (we verify its drafted tokens)
* maximize expected accepted tokens for current round by calculating the
product between the probability of accepting current token (n_acc_tokens / n_gen_drafts)
and the draft's length
---------
Co-authored-by: Petros Sideris <petros.sideris@nokia.com>1 parent 928b486 commit 68e7ea3
14 files changed
Lines changed: 1255 additions & 1040 deletions
File tree
- common
- examples/speculative-simple
- include
- src
- tools
- cli
- server
- tests/unit
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
622 | 622 | | |
623 | 623 | | |
624 | 624 | | |
625 | | - | |
626 | | - | |
627 | | - | |
628 | | - | |
629 | 625 | | |
630 | 626 | | |
631 | 627 | | |
| |||
3518 | 3514 | | |
3519 | 3515 | | |
3520 | 3516 | | |
3521 | | - | |
3522 | | - | |
3523 | | - | |
3524 | | - | |
3525 | | - | |
3526 | | - | |
3527 | | - | |
3528 | 3517 | | |
3529 | 3518 | | |
3530 | 3519 | | |
| |||
3561 | 3550 | | |
3562 | 3551 | | |
3563 | 3552 | | |
3564 | | - | |
3565 | | - | |
3566 | | - | |
3567 | | - | |
3568 | | - | |
3569 | | - | |
3570 | | - | |
3571 | | - | |
| 3553 | + | |
3572 | 3554 | | |
3573 | | - | |
| 3555 | + | |
3574 | 3556 | | |
3575 | | - | |
3576 | | - | |
3577 | | - | |
3578 | | - | |
3579 | | - | |
3580 | | - | |
3581 | | - | |
3582 | | - | |
3583 | | - | |
3584 | | - | |
3585 | | - | |
3586 | | - | |
3587 | | - | |
3588 | | - | |
3589 | | - | |
| 3557 | + | |
| 3558 | + | |
3590 | 3559 | | |
3591 | 3560 | | |
3592 | 3561 | | |
| |||
4075 | 4044 | | |
4076 | 4045 | | |
4077 | 4046 | | |
4078 | | - | |
| 4047 | + | |
4079 | 4048 | | |
4080 | 4049 | | |
4081 | 4050 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1422 | 1422 | | |
1423 | 1423 | | |
1424 | 1424 | | |
1425 | | - | |
| 1425 | + | |
1426 | 1426 | | |
1427 | 1427 | | |
1428 | 1428 | | |
| |||
1960 | 1960 | | |
1961 | 1961 | | |
1962 | 1962 | | |
| 1963 | + | |
| 1964 | + | |
| 1965 | + | |
| 1966 | + | |
| 1967 | + | |
| 1968 | + | |
| 1969 | + | |
| 1970 | + | |
| 1971 | + | |
| 1972 | + | |
| 1973 | + | |
| 1974 | + | |
| 1975 | + | |
| 1976 | + | |
| 1977 | + | |
| 1978 | + | |
| 1979 | + | |
| 1980 | + | |
| 1981 | + | |
| 1982 | + | |
| 1983 | + | |
| 1984 | + | |
| 1985 | + | |
| 1986 | + | |
| 1987 | + | |
| 1988 | + | |
| 1989 | + | |
| 1990 | + | |
| 1991 | + | |
| 1992 | + | |
| 1993 | + | |
| 1994 | + | |
| 1995 | + | |
| 1996 | + | |
| 1997 | + | |
| 1998 | + | |
| 1999 | + | |
| 2000 | + | |
| 2001 | + | |
| 2002 | + | |
| 2003 | + | |
| 2004 | + | |
| 2005 | + | |
| 2006 | + | |
| 2007 | + | |
| 2008 | + | |
| 2009 | + | |
| 2010 | + | |
| 2011 | + | |
| 2012 | + | |
| 2013 | + | |
| 2014 | + | |
| 2015 | + | |
| 2016 | + | |
| 2017 | + | |
| 2018 | + | |
| 2019 | + | |
| 2020 | + | |
| 2021 | + | |
| 2022 | + | |
| 2023 | + | |
| 2024 | + | |
| 2025 | + | |
| 2026 | + | |
| 2027 | + | |
| 2028 | + | |
| 2029 | + | |
| 2030 | + | |
| 2031 | + | |
| 2032 | + | |
| 2033 | + | |
| 2034 | + | |
| 2035 | + | |
| 2036 | + | |
| 2037 | + | |
| 2038 | + | |
| 2039 | + | |
| 2040 | + | |
| 2041 | + | |
| 2042 | + | |
| 2043 | + | |
| 2044 | + | |
| 2045 | + | |
| 2046 | + | |
| 2047 | + | |
| 2048 | + | |
| 2049 | + | |
| 2050 | + | |
| 2051 | + | |
| 2052 | + | |
| 2053 | + | |
| 2054 | + | |
| 2055 | + | |
| 2056 | + | |
| 2057 | + | |
| 2058 | + | |
| 2059 | + | |
| 2060 | + | |
| 2061 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
295 | 295 | | |
296 | 296 | | |
297 | 297 | | |
298 | | - | |
299 | | - | |
300 | 298 | | |
301 | 299 | | |
302 | 300 | | |
| |||
307 | 305 | | |
308 | 306 | | |
309 | 307 | | |
310 | | - | |
311 | | - | |
312 | | - | |
| 308 | + | |
| 309 | + | |
313 | 310 | | |
314 | | - | |
315 | 311 | | |
316 | 312 | | |
317 | 313 | | |
| |||
322 | 318 | | |
323 | 319 | | |
324 | 320 | | |
325 | | - | |
326 | 321 | | |
327 | 322 | | |
328 | 323 | | |
| |||
331 | 326 | | |
332 | 327 | | |
333 | 328 | | |
334 | | - | |
335 | | - | |
336 | | - | |
337 | 329 | | |
338 | 330 | | |
339 | 331 | | |
| |||
348 | 340 | | |
349 | 341 | | |
350 | 342 | | |
351 | | - | |
352 | | - | |
| 343 | + | |
353 | 344 | | |
354 | 345 | | |
355 | 346 | | |
| |||
1026 | 1017 | | |
1027 | 1018 | | |
1028 | 1019 | | |
| 1020 | + | |
| 1021 | + | |
| 1022 | + | |
| 1023 | + | |
| 1024 | + | |
| 1025 | + | |
| 1026 | + | |
| 1027 | + | |
| 1028 | + | |
| 1029 | + | |
| 1030 | + | |
| 1031 | + | |
| 1032 | + | |
| 1033 | + | |
| 1034 | + | |
| 1035 | + | |
| 1036 | + | |
| 1037 | + | |
| 1038 | + | |
| 1039 | + | |
| 1040 | + | |
| 1041 | + | |
| 1042 | + | |
| 1043 | + | |
| 1044 | + | |
| 1045 | + | |
| 1046 | + | |
| 1047 | + | |
| 1048 | + | |
| 1049 | + | |
| 1050 | + | |
| 1051 | + | |
| 1052 | + | |
| 1053 | + | |
| 1054 | + | |
| 1055 | + | |
| 1056 | + | |
| 1057 | + | |
| 1058 | + | |
| 1059 | + | |
| 1060 | + | |
| 1061 | + | |
| 1062 | + | |
| 1063 | + | |
0 commit comments