You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
ADR 0007 revision: remove fallback semantics per project no-fallback principle
User feedback (2026-05-31): the previous draft's '失配处理 / 自动
fallback 到 reset' framing in §2.4 / §2.8 violates the project's
engineering principle 'no mock, no fallback, no overfit'. Mismatch
between expected and actual cache state should never be a silent
graceful-degradation path; it must be either (a) a different but
equally first-class state transition, or (b) a critical bug raised
to the operator.
The previous draft conflated two separate things into one
fallback-shaped pattern. This revision separates them.
Changes
-------
§2.4 — restructured from 'reset criteria' to 'path selection'
Two named first-class paths:
2.4.a Continuation path (incremental prefill on the new tail)
2.4.b New-session path (cold-start full prefill)
2.4.c Path semantics — both produce bit-identical output;
selecting new-session is NOT degradation, it is the
correct action when the input does not satisfy the
continuation precondition. The path-selection function
is total over the input space, so 'mismatch' as a
runtime concept does not exist.
§2.5 — 'eviction' wording removed
Cache state lifecycle described in terms of §2.4's two
transitions (extend vs replace). 'Implicit eviction' was a
fallback-shaped phrase; replaced with 'state is overwritten via
path-selection'.
§2.8 — 'graceful degradation' explicitly rejected
Renamed to 'path totality'. The phrase 'graceful degradation' is
now called out as deliberately not used, with a one-paragraph
rationale citing the project's no-fallback principle.
§2.9 NEW — 'Anomaly invariants (these are bugs, not states)'
Three required invariants:
INV-1 parallel-sequence consistency between
cached_token_sequence and K/V tensor seq dim
INV-2 cache_position_start monotonic non-decreasing within
a continuation chain
INV-3 continuation-path output bit-identical to full-prefill
output for any input satisfying §2.4.a
INV-1 / INV-2 enforced by Python assert statements; violations
raise AssertionError to the route handler, which surfaces as
HTTP 500 with the OpenAI error envelope and a unique error id
for log correlation. The implementation does NOT retry, NOT
fall back, NOT silently choose the new-session path to recover.
INV-3 enforced offline by the §2.7 determinism gate test
(mandatory before merge).
§2.10 — observability metrics renamed to match the new framing
cross_request_kv_reuse_decisions_total{outcome=hit|partial|miss}
→ path_selection_total{path=continuation|new_session}
Both labels are first-class outcomes; neither is an 'error'
or 'fallback'.
cross_request_kv_reuse_tokens_skipped_total
→ continuation_tokens_skipped_total
verifier_prefill_duration_seconds{path=continuation|new_session}
(added path label so the per-path cost profile is observable)
cache_invariant_violations_total{kind=inv1|inv2}
NEW counter; should always be 0; non-zero is a critical
operational alert (page on it).
§2.7 — OQ on Mac Metal numerical determinism strengthened
The previous OQ said 'if strict bit-identical fails, fall back
to ULP-equivalent'. That is itself a fallback in disguise.
Revised: any relaxation must be written into this ADR
explicitly before the gate is changed; tests that adapt their
strictness based on whether the strict path passes are
prohibited.
§5 — implementation plan annotated for the new invariants
PR 7-1: INV-1 assert at every cache mutation site
PR 7-2: rename find_reusable_prefix → path_select returning a
ContinuationPlan | NewSession sum type; INV-2 assert in
path_select; tests must cover both path outputs
explicitly (not just 'most common path')
PR 7-4: route INV violations to OpenAI error envelope per §2.9
PR 7-5: §2.7 determinism gate also covers all path-selection
branches
§6 — validation criteria updated
New gate item: 'INV-1 and INV-2 assertions never fire during
validation runs'. cache_invariant_violations_total must be 0
across the determinism test, the synthetic suite, and the 4h
Mac M4 run. Any non-zero value is a release blocker.
Verification
------------
$ grep -i 'fallback\|fall back\|graceful degr' docs/adr/0007-cross-request-kv-reuse.md
All remaining hits are explicit prohibitions of the pattern, not
designs invoking it. Verified by inspection.
Diff: +166 / -27 lines, all docs/adr/0007-*.md.
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
| 7-2 |`Verifier.find_reusable_prefix` + `prefill_incremental` (MLX + CPU) | the prefix-match algorithm + the incremental prefill path; unit tests with synthetic verifier | 100% |
456
-
| 7-3 |`SpeculativeDecoder` integration | accept reusable-prefix hint; route between full-prefill and incremental-prefill paths | 100% |
457
-
| 7-4 |`SpeculativeEngine` route-handler integration | call find_reusable_prefix before delegating to decoder.generate; emit decision to metrics| 100% |
458
-
| 7-5 | Determinism gate test | bit-identical comparison between reuse path and always-reset path on a 30-turn synthetic conversation | mandatory before merge |
459
-
| 7-6 | bench_long_session_v2 + 4h Mac re-run | bench observes per-turn cost stable at O(new_message) and §2.3.a still holds | 4h Mac evidence |
612
+
| 7-1 |`SinkWindowKVCache` + parallel token sequence (MLX + CPU) |`logical_token_sequence` + `logical_position_start`; `update_and_fetch` / `trim` / `reset` paths sync the parallel sequence; **INV-1 assert at every mutation site**; unit tests | 100% on touched modules |
613
+
| 7-2 |`Verifier.path_select(prompt) -> ContinuationPlan \| NewSession` + `prefill_incremental(skip_n)` (MLX + CPU) | the path-selection function (§2.4) + the incremental prefill path; **INV-2 assert in path-select**; unit tests with synthetic verifier covering both paths' inputs explicitly| 100% |
614
+
| 7-3 |`SpeculativeDecoder` integration | accept the path-selection result; route between full-prefill and incremental-prefill paths | 100% |
615
+
| 7-4 |`SpeculativeEngine` route-handler integration | call verifier.path_select before delegating to decoder.generate; emit `path_selection_total` metric; route INV violations to OpenAI error envelope per §2.9| 100% |
616
+
| 7-5 | Determinism gate test (§2.7 + INV-3) | bit-identical comparison between continuation path and always-reset path on a 30-turn synthetic conversation; covers all path-selection branches; mandatory before merge| mandatory before merge |
617
+
| 7-6 | bench_long_session_v2 + 4h Mac re-run | bench observes per-turn cost stable at O(new_message); §2.3.a still holds; INV violations counter is 0 over 4h| 4h Mac evidence |
0 commit comments