Commit 6bd9224
Add context window, temperature, parallel requests, and sampling controls
New CLI flags (matching llama-server patterns):
--ctx-size N Context window / KV cache size (sliding window)
--temp N Default sampling temperature (default: 0.6)
--top-p N Top-p nucleus sampling (default: 1.0)
--repeat-penalty N Repetition penalty factor
--parallel N Max concurrent request slots (default: 1)
Per-request overrides via JSON body:
temperature, top_p, repetition_penalty, max_tokens
Concurrency control:
AsyncSemaphore actor limits concurrent inference tasks
Also: commit Package.resolved for reproducible builds1 parent 78ff596 commit 6bd9224
3 files changed
Lines changed: 361 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | 3 | | |
4 | | - | |
5 | 4 | | |
6 | 5 | | |
7 | 6 | | |
| |||
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.
0 commit comments