Commit 7875416

and

committed

Fix PR-N1 INV-1 / negative-token / HistoryTruncated tests for real numerics

The Mac smoke run revealed three more tests in test_coordinator_real.py + test_generator_real.py that were inherently FakeVerifier-only constructions and don't translate cleanly to real numerics: 1. test_negative_token_id_raises_value_error (DROPPED) The original asserted the coordinator surfaces ValueError("non- negative") from SessionStore.append_tokens. With the real Qwen3 verifier, prefill calls torch.embedding(token_ids, ...) which raises IndexError on a negative id BEFORE the coordinator reaches the store-level validation. The contract itself is still tested in tests/inference_engine/session/ test_store.py against SessionStore directly (where the verifier path isn't on the critical path). 2. test_inv1_violation_through_session_state_corruption (DROPPED) The coordinator MIRRORS verifier.cached_token_sequence onto session.cached_token_sequence right before the store's INV-1 check, so a direct corruption of session.cached_token_sequence is overwritten before INV-1 fires. The previous FakeVerifier- side _LyingVerifier injected the lie at k_seq_length() which IS observable; the real verifier can't be made to lie without composition / subclass that defeats the integration purpose. INV-1 enforcement is exercised at the SessionStore layer in tests/inference_engine/session/test_store.py against a parametric CacheInspector stub (acceptable per the no-doubles principle's parametric-stub carve-out for protocol contract tests). 3. test_inv1_violation_propagates_through_generate (DROPPED) Same root cause as #2: the generator also mirrors verifier state at every step. Session corruption is overwritten before INV-1 sees it. 4. test_truncated_event_when_cache_is_in_truncated_mode (FIXED) Asserted truncated[0].dropped_token_count == len(sess.history_token_ids) - len(sess.cached_token_sequence) reading the lengths AFTER generate runs. But generate appends the newly-emitted token to history_token_ids before the test reads, so the difference computed AFTER generate is off by 1 (history grows by 1 over the course of the call). Snapshot the lengths BEFORE calling generate to compute the dropped- count baseline at the moment HistoryTruncated was actually emitted (which is at start-of-generate, before the first token is committed). Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>

1 parent e706090 commit 7875416Copy full SHA for 7875416

3 files changed

results/platform-tests
- smoke-all-prs-1780370637.junit.xml
tests/integration
- test_coordinator_real.py
- test_generator_real.py

Comments

(0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit 7875416

File tree

0 commit comments