Commit 7875416
Fix PR-N1 INV-1 / negative-token / HistoryTruncated tests for real numerics
The Mac smoke run revealed three more tests in test_coordinator_real.py
+ test_generator_real.py that were inherently FakeVerifier-only
constructions and don't translate cleanly to real numerics:
1. test_negative_token_id_raises_value_error (DROPPED)
The original asserted the coordinator surfaces ValueError("non-
negative") from SessionStore.append_tokens. With the real
Qwen3 verifier, prefill calls torch.embedding(token_ids, ...)
which raises IndexError on a negative id BEFORE the
coordinator reaches the store-level validation. The contract
itself is still tested in tests/inference_engine/session/
test_store.py against SessionStore directly (where the
verifier path isn't on the critical path).
2. test_inv1_violation_through_session_state_corruption (DROPPED)
The coordinator MIRRORS verifier.cached_token_sequence onto
session.cached_token_sequence right before the store's INV-1
check, so a direct corruption of session.cached_token_sequence
is overwritten before INV-1 fires. The previous FakeVerifier-
side _LyingVerifier injected the lie at k_seq_length() which
IS observable; the real verifier can't be made to lie without
composition / subclass that defeats the integration purpose.
INV-1 enforcement is exercised at the SessionStore layer in
tests/inference_engine/session/test_store.py against a
parametric CacheInspector stub (acceptable per the no-doubles
principle's parametric-stub carve-out for protocol contract
tests).
3. test_inv1_violation_propagates_through_generate (DROPPED)
Same root cause as #2: the generator also mirrors verifier
state at every step. Session corruption is overwritten before
INV-1 sees it.
4. test_truncated_event_when_cache_is_in_truncated_mode (FIXED)
Asserted truncated[0].dropped_token_count ==
len(sess.history_token_ids) - len(sess.cached_token_sequence)
reading the lengths AFTER generate runs. But generate appends
the newly-emitted token to history_token_ids before the test
reads, so the difference computed AFTER generate is off by
1 (history grows by 1 over the course of the call). Snapshot
the lengths BEFORE calling generate to compute the dropped-
count baseline at the moment HistoryTruncated was actually
emitted (which is at start-of-generate, before the first
token is committed).
Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>1 parent e706090 commit 7875416
3 files changed
Lines changed: 421 additions & 50 deletions
File tree
- results/platform-tests
- tests/integration
0 commit comments