Commit dfd2dd3
feat(cache): implement Prefix Caching (OPT-PC01)
Complete implementation of prefix caching for reusing common prompts:
kv_cache.zig:
- PrefixCacheConfig with LRU/LFU/FIFO eviction policies
- CachedPrefix struct with token/block tracking
- PrefixCache with hash-based lookup
- matchLongestPrefix() for finding cached prefixes
- Copy-on-write block sharing via ref_count
- 4 tests: basic, longest_match, eviction, benchmark
tri_inference.zig:
- PagedSchedulerConfig.enable_prefix_caching option
- PagedBatchingScheduler integration with PrefixCache
- submitRequest() checks cache before prefill
- cachePrefixAfterPrefill() for caching new prompts
- prefix_cache_hits/misses statistics
Benchmark results:
- Prefill reduction: 90.1% (11,000 → 1,090 tokens)
- Cache hit rate: 100% for repeated prompts
- Tests: 19/19 passing
Use cases: chatbots, few-shot learning, RAG applications
Co-authored-by: Ona <no-reply@ona.com>1 parent ad461b9 commit dfd2dd3
4 files changed
Lines changed: 535 additions & 8 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
133 | 133 | | |
134 | 134 | | |
135 | 135 | | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
| 162 | + | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
| 168 | + | |
| 169 | + | |
136 | 170 | | |
137 | 171 | | |
138 | 172 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
53 | 53 | | |
54 | 54 | | |
55 | 55 | | |
56 | | - | |
| 56 | + | |
57 | 57 | | |
58 | 58 | | |
59 | 59 | | |
| |||
100 | 100 | | |
101 | 101 | | |
102 | 102 | | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
103 | 107 | | |
104 | 108 | | |
105 | | - | |
| 109 | + | |
106 | 110 | | |
107 | 111 | | |
108 | 112 | | |
| |||
156 | 160 | | |
157 | 161 | | |
158 | 162 | | |
159 | | - | |
160 | | - | |
161 | | - | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
162 | 166 | | |
163 | 167 | | |
164 | 168 | | |
| |||
0 commit comments