Replies: 1 comment
-
|
That would be very awesome, if possible! |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I've been using
--spec-type ngram-modand it's surprisingly good. On my machine it actually beats using a draft model. I'm genuinely impressed.Right now the ngram cache only learns from the current conversation and recent generations. I want to give it a head start by feeding it a text file full of domain-specific content (like my own codebase, documentation, or outputs from bigger models).
What I want:
A simple parameter like
--ngram-text ./my_stuff.txt. The server would read the text, tokenize it, and build the ngram cache from it at startup.Why this would be awesome:
Achieve functionality similar to LoRA or MoE, switching the ngram model based on different scenarios.
Enable ngram tuning: we can control the ngram data to easily fine-tune the ngram model and run automated benchmarks.
Possibly transfer model capabilities: collect outputs from the strongest LLMs (models like GLM, Kimi, DeepSeek—there are many distilled datasets on HuggingFace) as ngram source material, allowing the ngram model to guide a smaller model and improve its performance.
Why this discussion:
I almost opened an issue but saw the note that new features should be discussed first. So here I am.
i think internal API already exists, just need a command line hook.
llama.cpp/common/ngram-cache.h
Lines 76 to 86 in 0d0764d
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions