|
| 1 | +"""Kakeya Inference Engine v0.4 architecture. |
| 2 | +
|
| 3 | +This subpackage implements the v0.4 GA design as specified in |
| 4 | +ADR 0008 §11 (the v0.4 amendment dated 2026-06-08): the verifier |
| 5 | +maintains a minimal sink+window KV cache, and at every generation |
| 6 | +step accepts transient K/V tensors at evicted positions reconstructed |
| 7 | +from the dLM proposer's parallel forward pass. |
| 8 | +
|
| 9 | +The architecture's load-bearing fact, recorded in ADR 0008 §11.3: |
| 10 | +the dLM proposer has no KV cache, so its K/V tensors at every |
| 11 | +position are computed transiently each forward and discarded. This |
| 12 | +makes the proposer a constant-memory K/V reconstruction source. |
| 13 | +
|
| 14 | +Implementation phases per ADR 0008 §11.7: |
| 15 | +
|
| 16 | +* **K1**: same-model toy (proposer and verifier share Gemma 3-1B |
| 17 | + weights). Implement K/V routing infrastructure. Validate on |
| 18 | + synthetic NIAH that recall ≈ oracle when the projection is |
| 19 | + identity. |
| 20 | +* **K2**: cross-model toy (proposer = Gemma 3-1B, verifier = Gemma |
| 21 | + 3-4B). Train per-layer linear projection f_θ. |
| 22 | +* **K3**: production scale. |
| 23 | +* **K4**: KakeyaLattice composition. |
| 24 | +* **K5**: default flip + docs. |
| 25 | +
|
| 26 | +This `__init__.py` is intentionally a thin re-export layer. The |
| 27 | +production-style API (a `DLMRestoredVerifier` class wrapping the |
| 28 | +whole pipeline) lands in K1.C; K1.A / K1.B build the foundation. |
| 29 | +""" |
| 30 | + |
| 31 | +from inference_engine.v04.kv_capture import ( |
| 32 | + KVCapture, |
| 33 | + capture_proposer_kv, |
| 34 | + register_kv_capture_hooks, |
| 35 | +) |
| 36 | + |
| 37 | +__all__ = [ |
| 38 | + "KVCapture", |
| 39 | + "capture_proposer_kv", |
| 40 | + "register_kv_capture_hooks", |
| 41 | +] |
0 commit comments