Commit 8da1c15
committed
feat(llama-cpp): expose new speculative-decoding option keys
Upstream `spec : parallel drafting support` (ggml-org/llama.cpp#22838)
adds the `ngram_mod`, `ngram_map_k`, and `ngram_map_k4v` speculative
families and beefs up the draft-model knobs. The previous bump only
adapted the API; this exposes the new fields through the grpc-server
options dictionary so model configs can drive them.
New `options:` keys (all under `backend: llama-cpp`):
ngram_mod (`ngram_mod` type):
spec_ngram_mod_n_min / spec_ngram_mod_n_max / spec_ngram_mod_n_match
ngram_map_k (`ngram_map_k` type):
spec_ngram_map_k_size_n / spec_ngram_map_k_size_m / spec_ngram_map_k_min_hits
ngram_map_k4v (`ngram_map_k4v` type):
spec_ngram_map_k4v_size_n / spec_ngram_map_k4v_size_m /
spec_ngram_map_k4v_min_hits
ngram lookup caches (`ngram_cache` type):
spec_lookup_cache_static / lookup_cache_static
spec_lookup_cache_dynamic / lookup_cache_dynamic
Draft-model tuning (active when `spec_type` is `draft`):
draft_cache_type_k / spec_draft_cache_type_k
draft_cache_type_v / spec_draft_cache_type_v
draft_threads / spec_draft_threads
draft_threads_batch / spec_draft_threads_batch
draft_cpu_moe / spec_draft_cpu_moe (bool flag)
draft_n_cpu_moe / spec_draft_n_cpu_moe (first N MoE layers on CPU)
draft_override_tensor / spec_draft_override_tensor
(comma-separated <tensor regex>=<buffer type>; re-implements upstream's
static parse_tensor_buffer_overrides since it isn't exported)
`spec_type` already accepted comma-separated lists after the previous
commit, matching upstream's `common_speculative_types_from_names`.
Docs: refresh `docs/content/advanced/model-configuration.md` with
per-family tables and a note about multi-type chaining.
Builds locally with `make docker-build-llama-cpp` (linux/amd64
cpu-llama-cpp AVX variant).
Signed-off-by: Ettore Di Giacinto <mudler@localai.io>1 parent 39ee9a9 commit 8da1c15
2 files changed
Lines changed: 186 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
36 | 36 | | |
37 | 37 | | |
38 | 38 | | |
| 39 | + | |
| 40 | + | |
39 | 41 | | |
40 | 42 | | |
41 | 43 | | |
| |||
728 | 730 | | |
729 | 731 | | |
730 | 732 | | |
| 733 | + | |
| 734 | + | |
| 735 | + | |
| 736 | + | |
| 737 | + | |
| 738 | + | |
| 739 | + | |
| 740 | + | |
| 741 | + | |
| 742 | + | |
| 743 | + | |
| 744 | + | |
| 745 | + | |
| 746 | + | |
| 747 | + | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
| 759 | + | |
| 760 | + | |
| 761 | + | |
| 762 | + | |
| 763 | + | |
| 764 | + | |
| 765 | + | |
| 766 | + | |
| 767 | + | |
| 768 | + | |
| 769 | + | |
| 770 | + | |
| 771 | + | |
| 772 | + | |
| 773 | + | |
| 774 | + | |
| 775 | + | |
| 776 | + | |
| 777 | + | |
| 778 | + | |
| 779 | + | |
| 780 | + | |
| 781 | + | |
| 782 | + | |
| 783 | + | |
| 784 | + | |
| 785 | + | |
| 786 | + | |
| 787 | + | |
| 788 | + | |
| 789 | + | |
| 790 | + | |
| 791 | + | |
| 792 | + | |
| 793 | + | |
| 794 | + | |
| 795 | + | |
| 796 | + | |
| 797 | + | |
| 798 | + | |
| 799 | + | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
| 806 | + | |
| 807 | + | |
| 808 | + | |
| 809 | + | |
| 810 | + | |
| 811 | + | |
| 812 | + | |
| 813 | + | |
| 814 | + | |
| 815 | + | |
| 816 | + | |
| 817 | + | |
| 818 | + | |
| 819 | + | |
| 820 | + | |
| 821 | + | |
| 822 | + | |
| 823 | + | |
| 824 | + | |
| 825 | + | |
| 826 | + | |
| 827 | + | |
| 828 | + | |
| 829 | + | |
| 830 | + | |
| 831 | + | |
| 832 | + | |
| 833 | + | |
| 834 | + | |
| 835 | + | |
| 836 | + | |
| 837 | + | |
| 838 | + | |
| 839 | + | |
| 840 | + | |
| 841 | + | |
| 842 | + | |
| 843 | + | |
| 844 | + | |
| 845 | + | |
| 846 | + | |
| 847 | + | |
| 848 | + | |
| 849 | + | |
| 850 | + | |
| 851 | + | |
| 852 | + | |
| 853 | + | |
| 854 | + | |
| 855 | + | |
| 856 | + | |
| 857 | + | |
| 858 | + | |
| 859 | + | |
| 860 | + | |
| 861 | + | |
731 | 862 | | |
732 | 863 | | |
733 | 864 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
251 | 251 | | |
252 | 252 | | |
253 | 253 | | |
| 254 | + | |
| 255 | + | |
254 | 256 | | |
255 | 257 | | |
256 | | - | |
| 258 | + | |
257 | 259 | | |
258 | 260 | | |
259 | 261 | | |
260 | 262 | | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
261 | 282 | | |
262 | 283 | | |
263 | 284 | | |
264 | | - | |
265 | | - | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
266 | 316 | | |
267 | 317 | | |
268 | 318 | | |
| |||
277 | 327 | | |
278 | 328 | | |
279 | 329 | | |
| 330 | + | |
| 331 | + | |
280 | 332 | | |
281 | 333 | | |
282 | 334 | | |
| |||
0 commit comments