Skip to content

Commit 0f2493b

Browse files
PR-N2: remove DeterministicEngine + DeterministicTokenizer from scheduler tests
ADR 0008 / no-test-doubles cleanup, second installment. PR-N2 retires the scheduler-side engine/tokenizer test doubles and migrates their dispatch / admission-control / lifecycle tests to tests/integration/ where they run against a real SpeculativeEngine over Qwen3-0.6B. Per the user's principle: 'fake = mock, all banned'. PR-N1 cleared the verifier protocol mirrors; PR-N2 clears the scheduler-conftest engine + tokenizer mirrors. PR-N3 will tackle the HTTP shim's separate copies and engine subtypes; PR-N4 will clean up the SDK conftest stub + final CI consolidation. What was deleted ---------------- tests/inference_engine/scheduler/conftest.py -197 / +62 lines net. DeterministicEngine and DeterministicTokenizer classes (~120 lines) deleted. Their fixtures (tokenizer, short_engine, long_engine, slow_engine, reject_scheduler, queue_scheduler) deleted. The slab-pool fixtures (slab_config, small_pool, single_pool) stay \u2014 they're verifier-independent and consumed by the new Linux- side validation tests. tests/inference_engine/scheduler/test_scheduler.py -421 lines. 20 tests against DeterministicEngine. Migrated selectively (see Added). What was added -------------- tests/integration/test_scheduler_real.py +422 lines, 12 tests Scheduler integration tests against the real SpeculativeEngine: - construction validation (pool size match) - happy path: submit + iter_tokens \u2192 COMPLETED + slab released - admission control: REJECT (pool exhausted), QUEUE (admit-after-completion) - cancellation (mid-stream + idempotent-after-completion) - engine error propagation (parametric error injector wraps the real engine; same composition pattern PR-N1 used for gRPC error-mapping tests) - 3-way concurrency (all complete) - shutdown (cancels active + rejects pending) - active_count zeros after drain The migrated tests use looser assertions than the originals (real engine output varies); structural invariants are what matters for scheduler correctness. tests/integration/conftest.py +82 lines - Existing pytest_collection_modifyitems hook (auto-marks everything under tests/integration/ with @pytest.mark.integration). - New session-scoped real_speculative_engine fixture (Qwen3-0.6B + SparseLogitsProposer + SpeculativeDecoder + SpeculativeEngine wrapper). Mirrors the long-standing fixture under tests/system/conftest.py but uses 0.6B not 1.7B to match the rest of the integration suite. tests/inference_engine/scheduler/test_scheduler_validation.py +102 lines, 5 tests Pre-engine validation paths on Linux: - construction validation (pool dim mismatch) - submit() argument validation: empty prompt, zero max_new_tokens, empty EOS All run with engine=None; the validation rejects before the scheduler enqueues a worker that would consult the engine. scripts/review_pr_n2_on_mac.sh +88 lines Mac M4 reviewer aid. Runs pytest -m integration tests/integration/ and produces pr-n2-mac-integration-tests- <unix>.json under results/platform-tests/. What stays in place ------------------- tests/inference_engine/scheduler/test_pooled_verifier.py Still uses _FakeVerifier / _RaisingVerifier. PR-D2 retires PooledVerifier entirely; cleanup before then is throwaway. tests/inference_engine/server/conftest.py Has its own copy of DeterministicEngine + DeterministicTokenizer used by the HTTP shim tests. PR-N3 scope. tests/inference_engine/server/test_app_*.py + test_engine.py + test_tokenizer.py + their server-specific subtypes (_RaisingEngine, _ProxyEngine, _AlwaysHoldingEngine, _KVAwareSlowEngine, _BrokenTokenizer, _EmptyTemplateTokenizer, _NoEosTokenizer) PR-N3 scope. CI workflow change ------------------ .github/workflows/ci.yaml: dropped --cov=inference_engine.scheduler in favor of explicit per-module coverage: - inference_engine.scheduler.config (Linux \u2713) - inference_engine.scheduler.session (Linux \u2713) - inference_engine.scheduler.pooled_verifier (Linux \u2713; via test_pooled_verifier.py exempt) - inference_engine.scheduler.scheduler (integration only) Also pre-emptively switched to the coverage-run pattern (was `pytest --cov=` before; the GitHub-hosted runner's torch+pytest-cov interaction surfaced as SIGSEGV during PR-N1). Linux verification ------------------ PYTHONPATH=.:sdks/python coverage run -m pytest <Linux gate paths>: 680 passed (was 695 on main, -15 net = removed 20 scheduler FakeEngine tests, added 5 verifier-independent validation tests). 100% coverage on 1456 stmts (was 1660 on main; -204 net stmts is inference_engine.scheduler.scheduler now integration-only). Mac M4 evidence (REQUIRED for merge) ------------------------------------ Per ADR 0008 \u00a79: this PR's runtime correctness lives in the integration suite. Reviewer runs: bash scripts/review_pr_n2_on_mac.sh git add results/platform-tests/pr-n2-mac-* git commit -m 'Mac M4 review evidence for PR-N2' git push Acceptance: all integration tests pass against real Qwen3-0.6B, including PR-N1's coordinator/generator suites and the INV-3 byte-exact GA gate (PR-E1). The Mac evidence is load-bearing because Linux CI cannot exercise the scheduler+real-engine path. Stack ----- PR-N2 is branched off main, independent of PR-N1 (#53). The two touch disjoint test files; can merge in either order. Once both land, PR-N3 cleans up the HTTP shim's doubles + subtypes. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
1 parent bec3d7b commit 0f2493b

7 files changed

Lines changed: 728 additions & 611 deletions

File tree

.github/workflows/ci.yaml

Lines changed: 16 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,17 @@ jobs:
7272
# PYTHONPATH route avoids a setuptools build step in CI.
7373
PYTHONPATH: .:sdks/python
7474
run: |
75-
pytest \
75+
# PR-N2 (ADR 0008) cleanup: this gate covers ONLY
76+
# verifier-independent code. The Linux runner cannot load
77+
# real Qwen3 weights; PR-N2 retired the DeterministicEngine
78+
# + DeterministicTokenizer test doubles that previously
79+
# stood in for them. Engine / scheduler runtime tests
80+
# moved to tests/integration/ (Mac M4 / CUDA gate).
81+
#
82+
# Coverage is invoked via ``coverage run -m pytest`` not
83+
# ``pytest --cov=`` to avoid a torch+pytest-cov race at
84+
# conftest-import time on the hosted Linux runner.
85+
coverage run -m pytest \
7686
tests/inference_engine/server/ \
7787
tests/inference_engine/memory/ \
7888
tests/inference_engine/scheduler/ \
@@ -81,18 +91,13 @@ jobs:
8191
tests/sdk/python/ \
8292
tests/training/repr_align/ \
8393
tests/backends/mlx/test_env.py \
84-
--cov=inference_engine.server \
85-
--cov=inference_engine.memory \
86-
--cov=inference_engine.scheduler \
87-
--cov=inference_engine.pipeline \
88-
--cov=inference_engine.session \
89-
--cov=kakeya \
90-
--cov=training.repr_align \
91-
--cov-report=term \
92-
--cov-report=xml:coverage.xml \
93-
--cov-fail-under=100 \
9494
--junitxml=junit.xml \
9595
-v
96+
coverage report \
97+
--include='inference_engine/server/*,inference_engine/memory/*,inference_engine/scheduler/config.py,inference_engine/scheduler/session.py,inference_engine/scheduler/pooled_verifier.py,inference_engine/pipeline/*,inference_engine/session/store.py,sdks/python/kakeya/*,training/repr_align/*' \
98+
--fail-under=100
99+
coverage xml -o coverage.xml \
100+
--include='inference_engine/server/*,inference_engine/memory/*,inference_engine/scheduler/config.py,inference_engine/scheduler/session.py,inference_engine/scheduler/pooled_verifier.py,inference_engine/pipeline/*,inference_engine/session/store.py,sdks/python/kakeya/*,training/repr_align/*'
96101
97102
- name: Upload coverage artifact
98103
if: always()

scripts/review_pr_n2_on_mac.sh

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
#!/usr/bin/env bash
2+
# Mac M4 review aid for PR-N2 (no-test-doubles cleanup, scope =
3+
# DeterministicEngine + DeterministicTokenizer in scheduler/conftest.py
4+
# + the test_scheduler.py tests that depended on them).
5+
#
6+
# PR-N2 retired the scheduler/-side ``DeterministicEngine`` and
7+
# ``DeterministicTokenizer`` test doubles. Their dispatch /
8+
# admission-control / lifecycle tests moved to
9+
# tests/integration/test_scheduler_real.py, where they run against
10+
# the real ``SpeculativeEngine`` over Qwen3-0.6B.
11+
#
12+
# The HTTP shim's separate copy of these doubles (in
13+
# ``tests/inference_engine/server/conftest.py``) and the engine-
14+
# subtype doubles (``_RaisingEngine``, ``_ProxyEngine``, etc.) are
15+
# PR-N3 scope and remain in place on this branch.
16+
#
17+
# Produces 1 artifact:
18+
#
19+
# results/platform-tests/pr-n2-mac-integration-tests-<unix>.json
20+
# pytest -m integration tests/integration/test_scheduler_real.py
21+
# against real Qwen3-0.6B + SpeculativeEngine. Acceptance: all
22+
# pass; structural invariants hold (state transitions, slab
23+
# acquire/release, admission control, concurrency).
24+
#
25+
# Usage (from repo root, on Mac M4):
26+
#
27+
# bash scripts/review_pr_n2_on_mac.sh
28+
#
29+
# Then commit:
30+
#
31+
# git add results/platform-tests/pr-n2-mac-*
32+
# git commit -m "Mac M4 review evidence for PR-N2"
33+
# git push
34+
35+
set -euo pipefail
36+
37+
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
38+
cd "$ROOT"
39+
40+
stamp="$(date +%s)"
41+
out_dir="results/platform-tests"
42+
mkdir -p "$out_dir"
43+
44+
junit="$out_dir/pr-n2-mac-integration-tests-${stamp}.junit.xml"
45+
report="$out_dir/pr-n2-mac-integration-tests-${stamp}.json"
46+
47+
echo "==> integration suite (PR-N2 migrated scheduler tests + INV-3 GA gate)"
48+
PYTHONPATH=.:sdks/python python3 -m pytest \
49+
-m integration \
50+
tests/integration/ \
51+
--junitxml="$junit" \
52+
-v
53+
54+
PYTHONPATH=.:sdks/python python3 - "$junit" "$report" <<'PY'
55+
import json
56+
import platform
57+
import sys
58+
import xml.etree.ElementTree as ET
59+
junit_path, out_path = sys.argv[1:3]
60+
jr = ET.parse(junit_path).getroot()
61+
testsuites = list(jr.iter("testsuite"))
62+
total_tests = sum(int(ts.get("tests", "0")) for ts in testsuites)
63+
total_failures = sum(int(ts.get("failures", "0")) for ts in testsuites)
64+
total_errors = sum(int(ts.get("errors", "0")) for ts in testsuites)
65+
total_skipped = sum(int(ts.get("skipped", "0")) for ts in testsuites)
66+
report = {
67+
"schema_version": 1,
68+
"kind": "pr_n2_mac_integration_tests",
69+
"host": {
70+
"platform": platform.platform(),
71+
"machine": platform.machine(),
72+
"python": platform.python_version(),
73+
},
74+
"junit": {
75+
"tests": total_tests, "failures": total_failures,
76+
"errors": total_errors, "skipped": total_skipped,
77+
},
78+
}
79+
with open(out_path, "w", encoding="utf-8") as fh:
80+
json.dump(report, fh, indent=2)
81+
print(f" -> {out_path}")
82+
PY
83+
84+
echo
85+
echo "==> Done. Commit:"
86+
echo " git add $out_dir/pr-n2-mac-*"
87+
echo " git commit -m 'Mac M4 review evidence for PR-N2'"
88+
echo " git push"
Lines changed: 18 additions & 179 deletions
Original file line numberDiff line numberDiff line change
@@ -1,135 +1,30 @@
1-
"""Shared fixtures for scheduler tests.
2-
3-
Defines local copies of the deterministic test doubles
4-
(``DeterministicTokenizer``, ``DeterministicEngine``) so this branch
5-
can be tested independently of the E2 server branch. When both land,
6-
a follow-up commit consolidates them into a single shared location.
7-
8-
These are real concrete classes — not ``unittest.mock`` objects.
1+
"""Shared fixtures for the verifier-independent scheduler tests.
2+
3+
PR-N2 retired the ``DeterministicEngine`` + ``DeterministicTokenizer``
4+
test doubles that previously lived here. The scheduler's runtime
5+
behavior — admission control, lifecycle, cancellation, concurrency,
6+
shutdown — moved to ``tests/integration/test_scheduler_real.py``
7+
where it runs against a real ``SpeculativeEngine`` over Qwen3-0.6B.
8+
9+
What stays on Linux: the slab-pool fixtures (verifier-independent;
10+
they describe storage shape, not model behavior). They're consumed by
11+
``test_scheduler_validation.py`` (argument validation paths that
12+
reject before the engine is touched).
13+
14+
The previously co-located ``test_pooled_verifier.py`` is intentionally
15+
left in place with its own ``_FakeVerifier`` because PR-D2 retires
16+
the ``PooledVerifier`` module entirely (HTTP shim refactor onto
17+
``SessionStore``); cleaning up the test file before the module
18+
disappears would be throwaway work.
919
"""
1020

1121
from __future__ import annotations
1222

13-
from typing import Any, Callable, List, Optional
14-
1523
import pytest
1624
import torch
1725

1826
from inference_engine.memory.pool import SlabPool
1927
from inference_engine.memory.slab import SlabConfig
20-
from inference_engine.scheduler.config import AdmissionPolicy, SchedulerConfig
21-
from inference_engine.scheduler.scheduler import Scheduler
22-
23-
24-
# ---------------------------------------------------------------------------
25-
# Test doubles (local copies; identical behaviour to E2's versions)
26-
# ---------------------------------------------------------------------------
27-
28-
29-
class DeterministicTokenizer:
30-
"""Minimal HF-AutoTokenizer-shaped tokenizer; word-id mapping."""
31-
32-
def __init__(self) -> None:
33-
self._token_to_id: dict[str, int] = {"<|im_end|>": 0, "<|unk|>": 1}
34-
self._id_to_token: dict[int, str] = {0: "<|im_end|>", 1: "<|unk|>"}
35-
self.eos_token_id: Optional[int] = 0
36-
self.unk_token_id: Optional[int] = 1
37-
38-
def _intern(self, word: str) -> int:
39-
if word not in self._token_to_id:
40-
new_id = len(self._token_to_id)
41-
self._token_to_id[word] = new_id
42-
self._id_to_token[new_id] = word
43-
return self._token_to_id[word]
44-
45-
def apply_chat_template( # pragma: no cover - unused by scheduler tests
46-
self, *args, **kwargs
47-
) -> Any:
48-
raise NotImplementedError
49-
50-
def decode( # pragma: no cover - unused by scheduler tests
51-
self, token_ids, *, skip_special_tokens=False
52-
):
53-
raise NotImplementedError
54-
55-
def convert_tokens_to_ids( # pragma: no cover - unused by scheduler tests
56-
self, token: str
57-
) -> Optional[int]:
58-
return self._token_to_id.get(token)
59-
60-
61-
class DeterministicEngine:
62-
"""Engine test double emitting a fixed token sequence."""
63-
64-
def __init__(
65-
self,
66-
fixed_tokens: List[int],
67-
tokenizer: DeterministicTokenizer,
68-
model_id_label: str = "kakeya-test",
69-
per_token_delay_s: float = 0.0,
70-
) -> None:
71-
if not fixed_tokens:
72-
raise ValueError("fixed_tokens must be non-empty")
73-
if per_token_delay_s < 0:
74-
raise ValueError("per_token_delay_s must be >= 0")
75-
self._fixed_tokens = list(fixed_tokens)
76-
self._tokenizer = tokenizer
77-
self._model_id_label = model_id_label
78-
self._per_token_delay_s = per_token_delay_s
79-
80-
@property
81-
def tokenizer(self) -> DeterministicTokenizer:
82-
return self._tokenizer
83-
84-
@property
85-
def model_id_label(self) -> str:
86-
return self._model_id_label
87-
88-
def generate(
89-
self,
90-
prompt_ids: List[int],
91-
max_new_tokens: int,
92-
eos_token_ids: List[int],
93-
on_token: Optional[Callable[[int], bool]] = None,
94-
):
95-
if not prompt_ids:
96-
raise ValueError("prompt_ids must be non-empty")
97-
if max_new_tokens <= 0:
98-
raise ValueError(
99-
f"max_new_tokens must be positive, got {max_new_tokens}"
100-
)
101-
if not eos_token_ids:
102-
raise ValueError("eos_token_ids must be non-empty")
103-
eos_set = set(int(i) for i in eos_token_ids)
104-
emitted: List[int] = []
105-
for tok in self._fixed_tokens:
106-
if len(emitted) >= max_new_tokens:
107-
break
108-
if self._per_token_delay_s > 0:
109-
import time
110-
time.sleep(self._per_token_delay_s)
111-
emitted.append(int(tok))
112-
if on_token is not None and on_token(int(tok)):
113-
break
114-
if int(tok) in eos_set:
115-
break
116-
117-
# Lightweight result struct identical to what
118-
# SpeculativeDecoder.GenerationResult exposes (only the fields
119-
# the scheduler actually reads).
120-
class _Result:
121-
def __init__(self, output_token_ids):
122-
self.output_token_ids = output_token_ids
123-
self.acceptance_rate = 1.0
124-
self.proposer_forward_calls = len(output_token_ids)
125-
self.verifier_forward_calls = len(output_token_ids)
126-
127-
return _Result(emitted)
128-
129-
130-
# ---------------------------------------------------------------------------
131-
# Pytest fixtures
132-
# ---------------------------------------------------------------------------
13328

13429

13530
@pytest.fixture
@@ -148,59 +43,3 @@ def small_pool(slab_config: SlabConfig) -> SlabPool:
14843
@pytest.fixture
14944
def single_pool(slab_config: SlabConfig) -> SlabPool:
15045
return SlabPool(num_slabs=1, slab_config=slab_config)
151-
152-
153-
@pytest.fixture
154-
def tokenizer() -> DeterministicTokenizer:
155-
return DeterministicTokenizer()
156-
157-
158-
@pytest.fixture
159-
def short_engine(tokenizer: DeterministicTokenizer) -> DeterministicEngine:
160-
hello = tokenizer._intern("hello")
161-
world = tokenizer._intern("world")
162-
bang = tokenizer._intern("!")
163-
return DeterministicEngine(
164-
fixed_tokens=[hello, world, bang, tokenizer.eos_token_id],
165-
tokenizer=tokenizer,
166-
)
167-
168-
169-
@pytest.fixture
170-
def long_engine(tokenizer: DeterministicTokenizer) -> DeterministicEngine:
171-
ids = [tokenizer._intern(f"tok{i}") for i in range(50)]
172-
return DeterministicEngine(
173-
fixed_tokens=ids, tokenizer=tokenizer, model_id_label="long",
174-
)
175-
176-
177-
@pytest.fixture
178-
def slow_engine(tokenizer: DeterministicTokenizer) -> DeterministicEngine:
179-
ids = [tokenizer._intern(f"slow{i}") for i in range(20)]
180-
return DeterministicEngine(
181-
fixed_tokens=ids, tokenizer=tokenizer,
182-
model_id_label="slow", per_token_delay_s=0.01,
183-
)
184-
185-
186-
@pytest.fixture
187-
def reject_scheduler(short_engine, small_pool):
188-
return Scheduler(
189-
engine=short_engine, pool=small_pool,
190-
config=SchedulerConfig(
191-
max_concurrent=small_pool.total_count,
192-
admission_policy=AdmissionPolicy.REJECT,
193-
),
194-
)
195-
196-
197-
@pytest.fixture
198-
def queue_scheduler(short_engine, small_pool):
199-
return Scheduler(
200-
engine=short_engine, pool=small_pool,
201-
config=SchedulerConfig(
202-
max_concurrent=small_pool.total_count,
203-
admission_policy=AdmissionPolicy.QUEUE,
204-
queue_max_wait_s=2.0,
205-
),
206-
)

0 commit comments

Comments
 (0)