Commit 5b75fb7
quantcpp 0.10.0: infinite scrollback + progressive KV in Python
BREAKTHROUGH: context never overflows. When the KV cache fills up,
the engine automatically shifts: discard oldest half, keep recent half,
continue generating. No OOM, no stop, no token loss for current output.
This is fundamentally different from llama.cpp's context shift (which
requires explicit user action) and vLLM's eviction (which drops random
tokens). quant.cpp does it transparently in the generation loop.
Verified: SmolLM2-135M at ctx=64, generated 500 tokens with 9 automatic
context shifts. The engine logged each shift and continued seamlessly.
Combined with progressive KV (k_highres=128), the architecture mirrors
human memory: recent = FP32 vivid, older = 4-bit faded, ancient =
shifted out. The conversation never "forgets" within the active window.
Implementation:
- src/engine/tq_generate.c: context shift in generation loop (multi-file)
- quant.h: same logic for single-header (Python bindings path)
- Shifts FP32 K/V caches, FP16 V cache, and quantized K cache
- Keeps max_seq_len/2 most recent tokens on each shift
Strategy document saved: docs/strategy_progressive_kv.md
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>1 parent 54e8b24 commit 5b75fb7
3 files changed
Lines changed: 142 additions & 2 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
15497 | 15497 | | |
15498 | 15498 | | |
15499 | 15499 | | |
15500 | | - | |
| 15500 | + | |
| 15501 | + | |
| 15502 | + | |
| 15503 | + | |
| 15504 | + | |
| 15505 | + | |
| 15506 | + | |
| 15507 | + | |
| 15508 | + | |
| 15509 | + | |
| 15510 | + | |
| 15511 | + | |
| 15512 | + | |
| 15513 | + | |
| 15514 | + | |
| 15515 | + | |
| 15516 | + | |
| 15517 | + | |
| 15518 | + | |
| 15519 | + | |
| 15520 | + | |
| 15521 | + | |
| 15522 | + | |
| 15523 | + | |
| 15524 | + | |
| 15525 | + | |
| 15526 | + | |
| 15527 | + | |
| 15528 | + | |
| 15529 | + | |
| 15530 | + | |
| 15531 | + | |
15501 | 15532 | | |
15502 | 15533 | | |
15503 | 15534 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
7 | 7 | | |
8 | 8 | | |
9 | 9 | | |
| 10 | + | |
10 | 11 | | |
11 | 12 | | |
12 | 13 | | |
| |||
321 | 322 | | |
322 | 323 | | |
323 | 324 | | |
324 | | - | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
| 362 | + | |
| 363 | + | |
| 364 | + | |
| 365 | + | |
| 366 | + | |
| 367 | + | |
| 368 | + | |
| 369 | + | |
| 370 | + | |
| 371 | + | |
| 372 | + | |
| 373 | + | |
| 374 | + | |
| 375 | + | |
| 376 | + | |
| 377 | + | |
325 | 378 | | |
326 | 379 | | |
327 | 380 | | |
| |||
0 commit comments