You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ADR 0008 \u00a76.4 PR-D1 originally proposed a coupled change \u2014 (a)
remove ADR-0007-vintage server dead code, (b) refactor the HTTP
shim's chat-completions handler onto SessionStore. Implementation
revealed (a) is a pure subtraction with no behavior dependence on
(b), so they're split (same pattern as PR-A3 / PR-A3b). This PR is
(a); (b) becomes PR-D2 (queued, not in this diff).
Net diff: -540 deletions, +78 insertions. ADR 0008 \u00a76.4 amended
to record the split.
Files modified \u2014 production:
inference_engine/server/engine.py
-12 lines: EngineResult.path_selection / .tokens_skipped /
.prefill_duration_seconds fields removed; SpeculativeEngine
no longer forwards them from SpeculativeRunResult.
inference_engine/server/metrics.py
-74 lines: path_selection_total, continuation_tokens_skipped_total,
verifier_prefill_duration_seconds, cache_invariant_violations_total
metrics removed from Metrics + factory; record_path_selection and
record_cache_invariant_violation methods removed.
inference_engine/server/app.py
-53 lines: _session_acceptance_rate and _emit_path_selection_metric
helpers removed; the two call sites in the streaming + non-
streaming completion paths now pass acceptance_rate=None to
record_completion. The OpenAI response loses its acceptance_rate
field as a result \u2014 acceptable on a feature-frozen deprecated
shim per ADR 0008 \u00a72.7. Migrate to gRPC for richer telemetry.
inference_engine/scheduler/session.py
-7 lines: engine_result field on Session removed. The scheduler
worker (scheduler.py) no longer stashes engine.generate()'s
result on the session \u2014 the only reader was app.py's removed
helpers.
inference_engine/scheduler/scheduler.py
\u00b13 lines (renamed assignment to del): the line that wrote
session.engine_result = result is gone.
scripts/bench_agentic/bench_long_session.py
-211/+30 lines: removed _PATH_SELECTION_METRIC /
_CONTINUATION_TOKENS_SKIPPED_METRIC / _CACHE_INVARIANT_VIOLATIONS_METRIC
constants, the labeled-line regex, the labeled-metric branch in
_parse_prom_text, the _extract_label helper (no callers after
PR-D1), the _adr_0007_summary aggregator, the adr_0007 payload
field, and the §2.10 block in render_summary.
Module docstring updated to point at PR-E1's
bench_session_long_run.py for the replacement bench.
Files modified \u2014 tests:
tests/inference_engine/server/test_metrics.py
-107 lines: 4 entries dropped from test_build_registers_all_documented_metrics
expected set; 9 tests removed from the 'ADR 0007 \u00a72.10 \u2014
path_selection observability' section.
tests/inference_engine/server/test_app_metrics_and_auth.py
-98 lines: 4 ADR-0007-specific tests removed
(test_metrics_path_selection_metrics_present_on_idle_metrics_scrape,
test_metrics_path_selection_recorded_after_completion,
test_session_acceptance_rate_returns_none_when_result_missing_rate,
test_emit_path_selection_metric_noop_when_path_unset).
Local verification (Linux VM, py3.12):
PYTHONPATH=.:sdks/python pytest <Linux CI gate set>
682 passed (was 695 - 13 ADR-0007-specific tests),
TOTAL 1660 stmts 100.00 % coverage (was 1694 - 34 dead stmts).
Per ADR 0008 \u00a79: this PR is pure deletion / cleanup on the Linux-
runnable surface. Zero MLX runtime code touched. \u00a79 carve-out
applies; no Mac M4 report needed.
Next PR after merge:
PR-D2 (\u00a76.4 amended, queued): HTTP-shim refactor onto SessionStore
proper. Each /v1/chat/completions request becomes a single-
shot session; PooledVerifier retires; Deprecation / Sunset
headers added per \u00a72.7.
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
0 commit comments