Skip to content

Commit 6bd9224

Browse files
simbasimba
authored andcommitted
Add context window, temperature, parallel requests, and sampling controls
New CLI flags (matching llama-server patterns): --ctx-size N Context window / KV cache size (sliding window) --temp N Default sampling temperature (default: 0.6) --top-p N Top-p nucleus sampling (default: 1.0) --repeat-penalty N Repetition penalty factor --parallel N Max concurrent request slots (default: 1) Per-request overrides via JSON body: temperature, top_p, repetition_penalty, max_tokens Concurrency control: AsyncSemaphore actor limits concurrent inference tasks Also: commit Package.resolved for reproducible builds
1 parent 78ff596 commit 6bd9224

3 files changed

Lines changed: 361 additions & 10 deletions

File tree

.gitignore

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
# Swift / SPM
22
.build/
33
.swiftpm/
4-
Package.resolved
54
*.xcodeproj/
65
*.xcworkspace/
76
xcuserdata/

Package.resolved

Lines changed: 275 additions & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)