You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(turboquant): wire --turbo-kv flag into server and KVCache
Phase 2: Server.swift integration of TurboQuant KV-cache compression.
CLI:
--turbo-kv Enable 3-bit PolarQuant+QJL KV compression on all
KVCacheSimple layers. Compresses history > 8192 tokens
to ~3.5 bits/token — recommended for 100k+ context.
Default: disabled (zero overhead when off).
KVCache.swift (submodule):
KVCacheSimple.turboQuantEnabled: Bool = false
Now settable at runtime so Server.swift can activate per-request.
Server.swift:
- @Flag --turbo-kv added to CLI
- turboKV stored in ServerConfig
- Startup log shows turbo_kv=enabled/disabled
- Sets .turboQuantEnabled = true on each KVCacheSimple before prefill
0 commit comments