Commit add46de
Server reads KV bytes from engine.kv_state(), not the slab pool
The 2026-05-30 short test #2 confirmed:
- bench in-flight metrics poller works (7313 samples / 58 turns,
median 110 / turn, max 429)
- orphan-session fix works (idle pool_in_use settles to 0)
- slab IS acquired during turns (in-flight pool_in_use peak = 1)
But scheduler_kv_live_bytes still read 0.0 in 58/58 turns. Root
cause: SlabPool.live_kv_bytes (added in PR #24) sums slabs'
live_kv_bytes_override, which is only ever set by PooledVerifier
— and PooledVerifier is never wired into scripts/serve.py.
Wrapping the verifier in PooledVerifier requires plumbing the slab
through Scheduler -> Engine -> SpeculativeDecoder -> Verifier,
which is a non-trivial structural change.
Cheaper fix
-----------
The verifier already holds the real KV cache tensors and is the
canonical source of truth for live KV bytes. Expose it directly:
- kv_cache_proposer/verifier.py
SinkWindowVerifier.live_kv_bytes() -> int
Sums layer.keys.numel() * element_size() + same for values
across the cache. Returns 0 when cache is None (between
reset() and prefill()). _record_peak_kv now reads through it.
- inference_engine/backends/mlx/verifier.py
MLXSinkWindowVerifier.live_kv_bytes() -> int
Same surface as the CPU verifier; reads from
cache_ops.total_kv_bytes(self.cache). _record_peak_kv now
reads through it too.
- inference_engine/server/engine.py
Engine protocol: new kv_state() -> int method.
SpeculativeEngine: returns
decoder.verifier.live_kv_bytes() if exists else 0.
Defensive on the verifier surface so legacy verifiers that
don't expose the optional method don't break the engine.
- inference_engine/server/app.py
/metrics handler: replace
kv_live_bytes=pool.live_kv_bytes
with
kv_live_bytes=int(engine.kv_state())
The pool-side gauge is preserved as infrastructure; once
PooledVerifier is wired (post-v0.3.0), the slab will report
correctly via override and aggregate matches engine. For v0.3,
the engine is the source of truth.
Thread safety
-------------
Both verifiers' live_kv_bytes() reads are int-attribute walks over
tensor shape descriptors. CPython torch.Tensor.numel() / mlx
array.size are atomic reads — a concurrent worker writing the
cache produces some valid intermediate value, never garbage.
Documented inline.
Tests (no mock; all real concrete classes)
------------------------------------------
tests/core/test_verifier.py
+ test_live_kv_bytes_zero_before_prefill
+ test_live_kv_bytes_nonzero_after_prefill
+ test_live_kv_bytes_returns_zero_when_layer_kv_is_null
tests/backends/mlx/test_verifier.py
+ test_live_kv_bytes_zero_before_prefill
+ test_live_kv_bytes_nonzero_after_prefill
tests/inference_engine/server/test_engine.py
+ test_kv_state_reads_from_verifier_live_kv_bytes
(with concrete _VerifierDouble exposing live_kv_bytes)
+ test_kv_state_returns_zero_when_verifier_has_no_method
+ test_kv_state_called_each_invocation
(asserts /metrics scrape contract — no caching)
tests/inference_engine/server/test_app_metrics_and_auth.py
+ test_metrics_kv_live_bytes_reflects_engine_kv_state
(regression test pinning the fix; uses _KVAwareEngine
subclass returning a deterministic non-zero value and
asserts it appears in the prometheus text exposition)
Test doubles updated (DeterministicEngine in conftest.py,
_RaisingEngine in two test files): all return kv_state() == 0
as a no-real-cache default.
Verified locally:
pytest tests/inference_engine/server/test_engine.py
tests/inference_engine/server/test_app_metrics_and_auth.py
tests/core/test_verifier.py
-> 65 passed
pytest tests/inference_engine/ -> 389 passed (no regression)
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>1 parent bc3fbcd commit add46de
11 files changed
Lines changed: 256 additions & 10 deletions
File tree
- inference_engine
- backends/mlx
- server
- kv_cache_proposer
- tests
- backends/mlx
- core
- inference_engine/server
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
187 | 187 | | |
188 | 188 | | |
189 | 189 | | |
190 | | - | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
191 | 204 | | |
192 | | - | |
193 | | - | |
| 205 | + | |
| 206 | + | |
| 207 | + | |
| 208 | + | |
| 209 | + | |
194 | 210 | | |
195 | 211 | | |
196 | 212 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
291 | 291 | | |
292 | 292 | | |
293 | 293 | | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
294 | 303 | | |
295 | 304 | | |
296 | 305 | | |
297 | 306 | | |
298 | 307 | | |
299 | | - | |
| 308 | + | |
300 | 309 | | |
301 | 310 | | |
302 | 311 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
64 | 64 | | |
65 | 65 | | |
66 | 66 | | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
67 | 74 | | |
68 | 75 | | |
69 | 76 | | |
| |||
83 | 90 | | |
84 | 91 | | |
85 | 92 | | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
86 | 96 | | |
87 | 97 | | |
88 | 98 | | |
| |||
175 | 185 | | |
176 | 186 | | |
177 | 187 | | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| 199 | + | |
| 200 | + | |
| 201 | + | |
| 202 | + | |
| 203 | + | |
| 204 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
246 | 246 | | |
247 | 247 | | |
248 | 248 | | |
249 | | - | |
| 249 | + | |
| 250 | + | |
| 251 | + | |
| 252 | + | |
| 253 | + | |
| 254 | + | |
| 255 | + | |
| 256 | + | |
| 257 | + | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
250 | 263 | | |
251 | | - | |
| 264 | + | |
252 | 265 | | |
253 | 266 | | |
254 | 267 | | |
255 | 268 | | |
256 | 269 | | |
257 | 270 | | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
258 | 275 | | |
259 | 276 | | |
260 | 277 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
255 | 255 | | |
256 | 256 | | |
257 | 257 | | |
| 258 | + | |
| 259 | + | |
| 260 | + | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
258 | 276 | | |
259 | 277 | | |
260 | 278 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
317 | 317 | | |
318 | 318 | | |
319 | 319 | | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
| 356 | + | |
| 357 | + | |
| 358 | + | |
| 359 | + | |
| 360 | + | |
| 361 | + | |
320 | 362 | | |
321 | 363 | | |
322 | 364 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
189 | 189 | | |
190 | 190 | | |
191 | 191 | | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
192 | 197 | | |
193 | 198 | | |
194 | 199 | | |
| |||
Lines changed: 44 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
95 | | - | |
96 | | - | |
97 | | - | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
98 | 98 | | |
99 | 99 | | |
100 | 100 | | |
| |||
105 | 105 | | |
106 | 106 | | |
107 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
| 121 | + | |
| 122 | + | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
| 127 | + | |
| 128 | + | |
| 129 | + | |
| 130 | + | |
| 131 | + | |
| 132 | + | |
| 133 | + | |
| 134 | + | |
| 135 | + | |
| 136 | + | |
| 137 | + | |
| 138 | + | |
| 139 | + | |
| 140 | + | |
| 141 | + | |
| 142 | + | |
| 143 | + | |
| 144 | + | |
| 145 | + | |
| 146 | + | |
| 147 | + | |
| 148 | + | |
108 | 149 | | |
109 | 150 | | |
110 | 151 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
350 | 350 | | |
351 | 351 | | |
352 | 352 | | |
| 353 | + | |
| 354 | + | |
| 355 | + | |
353 | 356 | | |
354 | 357 | | |
355 | 358 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
275 | 275 | | |
276 | 276 | | |
277 | 277 | | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
278 | 281 | | |
279 | 282 | | |
280 | 283 | | |
| |||
0 commit comments