Commit d14ce3d
authored
llama : MTP clean-up (#23269)
* llama : disable equal splits for recurrent memory with partial rollback
* spec : re-enable p-min with MTP drafts
* spec : re-enable ngram spec in combination with RS rollback
* spec : fix ngram-map-* params
* spec : fix acceptance logic in combined ngram + draft configs
* graph : fix reuse for combined `token` + `embd` batches
* spec : log parameters for each speculative implementation
- add LOG_INF in each constructor with implementation type and parameters
- extract device string logic into common_speculative_get_devices_str()
- move 'adding speculative implementation' log from init into constructors
Assisted-by: llama.cpp:local pi
* spec : extend --spec-default with ngram-map-k4v
Assisted-by: llama.cpp:local pi
* minor : fix n_embd log
* args : update draft.n_max == 3 + regen docs
* spec : relax ngram-mod rejection thold to 0.25 @ 5 low
* logs : improve
* docs : update speculative decoding CLI argument documentation
- Add missing draft model CPU scheduling and tensor override parameters
- Update --spec-type to include all available types (excluding draft-eagle3 WIP)
- Fix default values to match implementation (n_max=3, n_min=0, p_min=0.0)
- Remove deprecated options (spec-draft-ctx-size, spec-draft-replace)
- Add environment variables for new parameters
Assisted-by: llama.cpp:local pi
* arg : step-back on adding k4v to the default spec config
* cont : fix name1 parent 6db1304 commit d14ce3d
15 files changed
Lines changed: 293 additions & 134 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
536 | 536 | | |
537 | 537 | | |
538 | 538 | | |
539 | | - | |
| 539 | + | |
| 540 | + | |
| 541 | + | |
| 542 | + | |
| 543 | + | |
540 | 544 | | |
541 | 545 | | |
542 | 546 | | |
| |||
893 | 897 | | |
894 | 898 | | |
895 | 899 | | |
896 | | - | |
| 900 | + | |
| 901 | + | |
| 902 | + | |
| 903 | + | |
| 904 | + | |
897 | 905 | | |
898 | 906 | | |
899 | 907 | | |
| |||
4117 | 4125 | | |
4118 | 4126 | | |
4119 | 4127 | | |
| 4128 | + | |
| 4129 | + | |
| 4130 | + | |
| 4131 | + | |
| 4132 | + | |
| 4133 | + | |
4120 | 4134 | | |
4121 | 4135 | | |
4122 | 4136 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1256 | 1256 | | |
1257 | 1257 | | |
1258 | 1258 | | |
1259 | | - | |
1260 | | - | |
1261 | | - | |
1262 | | - | |
1263 | | - | |
1264 | | - | |
1265 | | - | |
1266 | | - | |
1267 | | - | |
1268 | | - | |
1269 | | - | |
1270 | | - | |
1271 | | - | |
1272 | | - | |
1273 | | - | |
1274 | | - | |
1275 | | - | |
1276 | | - | |
1277 | | - | |
1278 | | - | |
1279 | | - | |
1280 | | - | |
1281 | | - | |
1282 | 1259 | | |
1283 | 1260 | | |
1284 | 1261 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
299 | 299 | | |
300 | 300 | | |
301 | 301 | | |
302 | | - | |
303 | | - | |
| 302 | + | |
| 303 | + | |
304 | 304 | | |
305 | | - | |
306 | | - | |
| 305 | + | |
| 306 | + | |
307 | 307 | | |
308 | 308 | | |
309 | 309 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
500 | 500 | | |
501 | 501 | | |
502 | 502 | | |
503 | | - | |
| 503 | + | |
504 | 504 | | |
505 | 505 | | |
506 | 506 | | |
| |||
0 commit comments