Commit 1a72e81
fix(metrics): address P0 review-council findings
Two high-severity issues raised by the review-council pass on PR #306:
1. (#306-1) Subscriber late-binding could drop early ticks via the ZMQ
slow-joiner pattern. Move MetricsSnapshotSubscriber construction +
start() BEFORE launcher.launch() so the SUB handshake completes
during the subprocess-spawn window. ZMQ tolerates connect-before-
bind on IPC — the connect resolves once the binder appears. The
prior ordering (subscribe AFTER launch returns) had a window where
the aggregator could begin ticking on STARTED before the SUB
subscription warmed up, dropping early live snapshots and, in the
worst case, missing COMPLETE entirely.
2. (#306-2) MetricsPublisher._write_atomic_fallback runs synchronous
f.flush + fsync(file) + fsync(parent dir) + rename on the
aggregator's event loop. On a busy host this can block tens-to-
hundreds of ms — long enough to back-pressure event-record
processing. Wrap with asyncio.to_thread inside publish_final.
Both fixes are localized — no API changes, no test changes required.
Existing integration tests (test_concurrency_benchmark, test_end_to_
end_oracle) exercise both paths end-to-end and still pass.
The third P0 item (#306-3, unbounded raw-sample retention) is the
agreed memory trade documented in metrics_pubsub_design_v5.md §11;
addressed by adding "--persist-raw" as a tracked follow-up rather
than a code change in this PR.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>1 parent 4a6fa28 commit 1a72e81
2 files changed
Lines changed: 23 additions & 17 deletions
File tree
- src/inference_endpoint
- async_utils/services/metrics_aggregator
- commands/benchmark
Lines changed: 8 additions & 1 deletion
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
160 | 160 | | |
161 | 161 | | |
162 | 162 | | |
| 163 | + | |
| 164 | + | |
| 165 | + | |
| 166 | + | |
| 167 | + | |
163 | 168 | | |
164 | | - | |
| 169 | + | |
| 170 | + | |
| 171 | + | |
165 | 172 | | |
166 | 173 | | |
167 | 174 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
423 | 423 | | |
424 | 424 | | |
425 | 425 | | |
426 | | - | |
427 | | - | |
428 | | - | |
429 | 426 | | |
430 | | - | |
431 | | - | |
432 | | - | |
433 | 427 | | |
434 | | - | |
435 | | - | |
| 428 | + | |
| 429 | + | |
| 430 | + | |
| 431 | + | |
| 432 | + | |
| 433 | + | |
| 434 | + | |
| 435 | + | |
436 | 436 | | |
437 | 437 | | |
| 438 | + | |
| 439 | + | |
| 440 | + | |
| 441 | + | |
| 442 | + | |
| 443 | + | |
| 444 | + | |
438 | 445 | | |
439 | 446 | | |
440 | 447 | | |
| |||
476 | 483 | | |
477 | 484 | | |
478 | 485 | | |
479 | | - | |
480 | | - | |
481 | | - | |
482 | | - | |
483 | | - | |
484 | | - | |
485 | | - | |
486 | | - | |
487 | 486 | | |
488 | 487 | | |
489 | 488 | | |
| |||
0 commit comments