Skip to content

Commit 4e58d3d

Browse files
PR-N4: remove SDK conftest stub + finalize no-doubles cleanup
Final installment of the no-test-doubles cleanup. Closes the sequence PR-N1 \u2192 N2 \u2192 N3 \u2192 N4. After PR-N4 lands, NO test doubles implementing the verifier / engine / tokenizer protocols remain in the Linux test tree. What was deleted ---------------- tests/sdk/python/conftest.py -203 lines. Contained _start_runtime / _stop_runtime helpers that spun up an in-process gRPC server with a FakeVerifier (later replaced by _MinimalVerifierStub in PR-N1's preview cleanup) on a background thread. The runtime_address + runtime_address_no_inspector fixtures are gone with it. tests/sdk/python/test_client.py -157 lines, 13 tests. Exercised Client + Session lifecycle against the FakeVerifier-backed runtime. tests/sdk/python/test_session.py -502 lines, 33 tests. Exercised Session.append + .generate + .info + .close end-to-end against the FakeVerifier-backed runtime. What was added -------------- tests/integration/test_sdk_real.py +137 lines, 11 tests SDK Client + Session integration tests against a real Qwen3-0.6B-backed gRPC runtime: - Client: create_session round-trip, eos_token_ids round- trip, idempotent close, address property - Session: append + generate yield tokens + metadata, info reflects history, close returns final length, close is locally idempotent - End-to-end error mapping: SessionNotFoundError on unknown id, InvalidArgumentError on max_tokens=0, SessionClosedError on append-after-close tests/integration/conftest.py +180 lines - pytest_collection_modifyitems hook (auto-marks everything under tests/integration/ with @pytest.mark.integration) - real_speculative_engine fixture (session-scoped, Qwen3-0.6B) - real_grpc_runtime_address fixture (session-scoped, in-process gRPC server backed by real Qwen3-0.6B verifier on a background thread; yields the host:port the SDK can connect to) tests/integration/__init__.py +0 lines (placeholder) scripts/review_pr_n4_on_mac.sh +93 lines Mac M4 reviewer aid running the full accumulated integration suite (PR-E1 INV-3 + PR-N1 coordinator/generator + PR-N2 scheduler + PR-N3 http_shim/engine/tokenizer/streaming + PR-N4 SDK). What stays on Linux ------------------- tests/sdk/python/test_errors.py (unchanged, 9 tests) Pure _wrap_grpc_error mapping with synthesized grpc.RpcError objects. Verifier-independent; transport-only error-class translation. Stays on Linux. CI workflow change ------------------ .github/workflows/ci.yaml: dropped kakeya.client and kakeya.session from the --include= filter. Linux gate now covers ONLY: inference_engine/server/{auth, config, errors, grpc_app, metrics, schemas, proto_gen} inference_engine/memory/* inference_engine/scheduler/{config, session, pooled_verifier} inference_engine/pipeline/* inference_engine/session/store sdks/python/kakeya/{__init__, errors} training/repr_align/* That's the verifier-independent boundary, frozen post PR-N4. Final state of the no-doubles cleanup ------------------------------------- PR-N1 (#53): retired FakeVerifier hierarchy (tests/inference_engine/session/test_coordinator.py, test_generator.py, test_grpc_app.py FakeVerifier-using sections). PR-N2 (#54): retired DeterministicEngine + DeterministicTokenizer (tests/inference_engine/scheduler/conftest.py + test_scheduler.py). PR-N3 (#55): retired the HTTP shim cluster (server/conftest.py + 6 test files + their subtypes). PR-N4 (this): retired the SDK conftest stub. The integration suite at tests/integration/ now contains: test_inv3_session_determinism_gate.py (PR-E1) test_coordinator_real.py (PR-N1) test_generator_real.py (PR-N1) test_scheduler_real.py (PR-N2) test_http_shim_real.py (PR-N3) test_engine_real.py (PR-N3) test_tokenizer_real.py (PR-N3) test_streaming_real.py (PR-N3) test_sdk_real.py (PR-N4) Linux verification ------------------ PYTHONPATH=.:sdks/python coverage run -m pytest <Linux gate paths>: 649 passed (was 695 on main; -46 net = removed 46 SDK runtime tests, kept 9 SDK error-mapping tests). 100% coverage on 999 stmts (was 1660 on main; -661 net stmts is all verifier-dependent modules now integration-only). Mac M4 evidence (REQUIRED for merge) ------------------------------------ Per ADR 0008 \u00a79: this PR's runtime correctness lives in the integration suite. Reviewer runs: bash scripts/review_pr_n4_on_mac.sh git add results/platform-tests/pr-n4-mac-* git commit -m 'Mac M4 review evidence for PR-N4' git push Stack ----- PR-N4 is branched off main, independent of PR-N1 (#53) / PR-N2 (#54) / PR-N3 (#55) at the file level. Conftests in tests/integration/ added by N1/N2/N3/N4 are file-disjoint from each other (each adds one fixture) but the file IS shared, so post-merge the four contributors' fixture defs need to be reconciled. The recommended merge order: 1. PR-N1 (verifier doubles) — adds conftest with marker hook 2. PR-N2 (engine/tokenizer doubles) — adds real_speculative_engine 3. PR-N3 (HTTP shim doubles) — uses real_speculative_engine 4. PR-N4 (this, SDK doubles) — adds real_grpc_runtime_address If a different order lands first, the integration conftest needs a small merge to combine fixtures. Co-authored-by: FluffyAIcode <FluffyAIcode@users.noreply.github.com>
1 parent bec3d7b commit 4e58d3d

8 files changed

Lines changed: 426 additions & 738 deletions

File tree

.github/workflows/ci.yaml

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -72,7 +72,22 @@ jobs:
7272
# PYTHONPATH route avoids a setuptools build step in CI.
7373
PYTHONPATH: .:sdks/python
7474
run: |
75-
pytest \
75+
# PR-N1/N2/N3/N4 (ADR 0008) cleanup: this gate covers ONLY
76+
# verifier-independent code. The Linux runner cannot load
77+
# real Qwen3 weights; the cleanup PRs retired the
78+
# FakeVerifier / DeterministicEngine / DeterministicTokenizer
79+
# / _MinimalVerifierStub test doubles. Verifier-dependent
80+
# modules — ``inference_engine.session.coordinator``,
81+
# ``inference_engine.session.generator``,
82+
# ``inference_engine.scheduler.scheduler``,
83+
# ``inference_engine.server.{app, engine, tokenizer, streaming}``,
84+
# ``kakeya.{client, session}`` — move to the
85+
# tests/integration/ suite, gated on Mac M4 / CUDA hosts.
86+
#
87+
# Coverage is invoked via ``coverage run -m pytest`` rather
88+
# than ``pytest --cov=`` to avoid a torch+pytest-cov race
89+
# at conftest-import time on the hosted Linux runner.
90+
coverage run -m pytest \
7691
tests/inference_engine/server/ \
7792
tests/inference_engine/memory/ \
7893
tests/inference_engine/scheduler/ \
@@ -81,18 +96,13 @@ jobs:
8196
tests/sdk/python/ \
8297
tests/training/repr_align/ \
8398
tests/backends/mlx/test_env.py \
84-
--cov=inference_engine.server \
85-
--cov=inference_engine.memory \
86-
--cov=inference_engine.scheduler \
87-
--cov=inference_engine.pipeline \
88-
--cov=inference_engine.session \
89-
--cov=kakeya \
90-
--cov=training.repr_align \
91-
--cov-report=term \
92-
--cov-report=xml:coverage.xml \
93-
--cov-fail-under=100 \
9499
--junitxml=junit.xml \
95100
-v
101+
coverage report \
102+
--include='inference_engine/server/auth.py,inference_engine/server/config.py,inference_engine/server/errors.py,inference_engine/server/grpc_app.py,inference_engine/server/metrics.py,inference_engine/server/schemas.py,inference_engine/server/proto_gen/**/*.py,inference_engine/memory/*,inference_engine/scheduler/config.py,inference_engine/scheduler/session.py,inference_engine/scheduler/pooled_verifier.py,inference_engine/pipeline/*,inference_engine/session/store.py,sdks/python/kakeya/__init__.py,sdks/python/kakeya/errors.py,training/repr_align/*' \
103+
--fail-under=100
104+
coverage xml -o coverage.xml \
105+
--include='inference_engine/server/auth.py,inference_engine/server/config.py,inference_engine/server/errors.py,inference_engine/server/grpc_app.py,inference_engine/server/metrics.py,inference_engine/server/schemas.py,inference_engine/server/proto_gen/**/*.py,inference_engine/memory/*,inference_engine/scheduler/config.py,inference_engine/scheduler/session.py,inference_engine/scheduler/pooled_verifier.py,inference_engine/pipeline/*,inference_engine/session/store.py,sdks/python/kakeya/__init__.py,sdks/python/kakeya/errors.py,training/repr_align/*'
96106
97107
- name: Upload coverage artifact
98108
if: always()

scripts/review_pr_n4_on_mac.sh

Lines changed: 86 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,86 @@
1+
#!/usr/bin/env bash
2+
# Mac M4 review aid for PR-N4 (no-test-doubles cleanup, FINAL).
3+
#
4+
# PR-N4 retires the last verifier-protocol stand-in: the
5+
# ``_MinimalVerifierStub`` (formerly ``FakeVerifier`` import) in
6+
# ``tests/sdk/python/conftest.py``. The SDK transport tests
7+
# (Client + Session) move to ``tests/integration/test_sdk_real.py``
8+
# where they run against a real Qwen3-0.6B-backed gRPC runtime.
9+
#
10+
# After PR-N4: NO test doubles remain in the Linux test tree
11+
# implementing the verifier / engine / tokenizer protocols. The
12+
# Linux CI gate covers ONLY truly verifier-independent code; the
13+
# integration suite is the binding gate for runtime correctness.
14+
#
15+
# Produces 1 artifact:
16+
#
17+
# results/platform-tests/pr-n4-mac-integration-tests-<unix>.json
18+
# pytest -m integration tests/integration/ — runs the full
19+
# accumulated integration suite (PR-E1 INV-3 + PR-N1 coordinator/
20+
# generator + PR-N2 scheduler + PR-N3 http_shim/engine/tokenizer/
21+
# streaming + PR-N4 SDK).
22+
#
23+
# Usage (from repo root, on Mac M4):
24+
#
25+
# bash scripts/review_pr_n4_on_mac.sh
26+
#
27+
# Then commit:
28+
#
29+
# git add results/platform-tests/pr-n4-mac-*
30+
# git commit -m "Mac M4 review evidence for PR-N4"
31+
# git push
32+
33+
set -euo pipefail
34+
35+
ROOT="$(cd "$(dirname "$0")/.." && pwd)"
36+
cd "$ROOT"
37+
38+
stamp="$(date +%s)"
39+
out_dir="results/platform-tests"
40+
mkdir -p "$out_dir"
41+
42+
junit="$out_dir/pr-n4-mac-integration-tests-${stamp}.junit.xml"
43+
report="$out_dir/pr-n4-mac-integration-tests-${stamp}.json"
44+
45+
echo "==> integration suite (full accumulated PR-N1..N4 + PR-E1 GA gate)"
46+
PYTHONPATH=.:sdks/python python3 -m pytest \
47+
-m integration \
48+
tests/integration/ \
49+
--junitxml="$junit" \
50+
-v
51+
52+
PYTHONPATH=.:sdks/python python3 - "$junit" "$report" <<'PY'
53+
import json
54+
import platform
55+
import sys
56+
import xml.etree.ElementTree as ET
57+
junit_path, out_path = sys.argv[1:3]
58+
jr = ET.parse(junit_path).getroot()
59+
testsuites = list(jr.iter("testsuite"))
60+
total_tests = sum(int(ts.get("tests", "0")) for ts in testsuites)
61+
total_failures = sum(int(ts.get("failures", "0")) for ts in testsuites)
62+
total_errors = sum(int(ts.get("errors", "0")) for ts in testsuites)
63+
total_skipped = sum(int(ts.get("skipped", "0")) for ts in testsuites)
64+
report = {
65+
"schema_version": 1,
66+
"kind": "pr_n4_mac_integration_tests",
67+
"host": {
68+
"platform": platform.platform(),
69+
"machine": platform.machine(),
70+
"python": platform.python_version(),
71+
},
72+
"junit": {
73+
"tests": total_tests, "failures": total_failures,
74+
"errors": total_errors, "skipped": total_skipped,
75+
},
76+
}
77+
with open(out_path, "w", encoding="utf-8") as fh:
78+
json.dump(report, fh, indent=2)
79+
print(f" -> {out_path}")
80+
PY
81+
82+
echo
83+
echo "==> Done. Commit:"
84+
echo " git add $out_dir/pr-n4-mac-*"
85+
echo " git commit -m 'Mac M4 review evidence for PR-N4'"
86+
echo " git push"

tests/integration/__init__.py

Whitespace-only changes.

tests/integration/conftest.py

Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
"""Shared fixtures and marker plumbing for the integration suite.
2+
3+
Tests under ``tests/integration/`` exercise the v0.3 runtime against
4+
**real** model weights — typically Qwen3-0.6B from the HF cache.
5+
They are NOT part of the Linux unit-test gate (model loading is
6+
HF-cache- and hardware-bound) and are NOT auto-discovered by a bare
7+
``pytest``: every test in this directory gets the
8+
``@pytest.mark.integration`` marker auto-applied below, and you opt
9+
in with ``pytest -m integration tests/integration/``.
10+
11+
This conftest is created independently by PR-E1, PR-N1, PR-N2, PR-N3,
12+
and PR-N4 (they all branched off main while none had merged yet);
13+
the file content is the union and de-duplicates cleanly because each
14+
PR appends its own real-engine / real-runtime fixtures.
15+
16+
Per ADR 0008 §9: this suite is the binding GA gate. Mac M4 reviewer
17+
scripts (``scripts/review_pr_n*_on_mac.sh``) drive it manually
18+
until PR-E2 ships the self-hosted runner workflow.
19+
"""
20+
21+
from __future__ import annotations
22+
23+
import pytest
24+
25+
26+
def pytest_collection_modifyitems(config, items): # noqa: ARG001
27+
"""Auto-mark every test under ``tests/integration/`` with
28+
``@pytest.mark.integration``."""
29+
for item in items:
30+
if "tests/integration/" in str(item.fspath):
31+
item.add_marker(pytest.mark.integration)
32+
33+
34+
# ---------------------------------------------------------------------------
35+
# Real engine fixture — used by PR-N3's HTTP shim integration tests
36+
# and PR-N4's SDK integration tests.
37+
# ---------------------------------------------------------------------------
38+
39+
40+
@pytest.fixture(scope="session")
41+
def real_speculative_engine():
42+
"""Real :class:`SpeculativeEngine` over Qwen3-0.6B."""
43+
import torch
44+
45+
from inference_engine.proposer import SparseLogitsProposer
46+
from inference_engine.server.engine import SpeculativeEngine
47+
from kv_cache_proposer.proposer import ProposerConfig
48+
from kv_cache_proposer.speculative import SpeculativeDecoder
49+
from kv_cache_proposer.verifier import SinkWindowVerifier, VerifierConfig
50+
51+
proposer_cfg = ProposerConfig(dtype=torch.bfloat16, device="cpu")
52+
verifier_cfg = VerifierConfig(
53+
model_id="Qwen/Qwen3-0.6B",
54+
dtype=torch.bfloat16, device="cpu",
55+
sink_size=4, window_size=64,
56+
)
57+
proposer = SparseLogitsProposer(proposer_cfg)
58+
verifier = SinkWindowVerifier(verifier_cfg)
59+
decoder = SpeculativeDecoder(
60+
proposer=proposer, verifier=verifier,
61+
block_size=8, num_diffusion_steps=2,
62+
)
63+
return SpeculativeEngine(
64+
decoder=decoder,
65+
tokenizer=verifier.tokenizer,
66+
model_id_label="kakeya-integration",
67+
)
68+
69+
70+
# ---------------------------------------------------------------------------
71+
# Real gRPC runtime fixture — used by PR-N4's SDK integration tests.
72+
# An in-process gRPC server backed by a real verifier on a background
73+
# thread, yielding the host:port string the SDK can connect to.
74+
# ---------------------------------------------------------------------------
75+
76+
77+
@pytest.fixture(scope="session")
78+
def real_grpc_runtime_address():
79+
"""Run an in-process gRPC ``RuntimeService`` backed by a real
80+
Qwen3-0.6B :class:`SinkWindowVerifier` on a background thread.
81+
82+
Yields the ``host:port`` address string the SDK can connect to.
83+
Session-scoped: model load (~3-5 s on CPU) is paid once. Each
84+
integration SDK test creates its own session via the SDK; the
85+
underlying verifier is shared and reset on each ``prefill`` call.
86+
"""
87+
import asyncio
88+
import threading
89+
import time
90+
91+
import grpc
92+
import torch
93+
94+
from inference_engine.server.grpc_app import RuntimeServiceServicer
95+
from inference_engine.server.proto_gen.kakeya.v1 import (
96+
runtime_pb2_grpc,
97+
)
98+
from inference_engine.session import (
99+
AppendTokensCoordinator,
100+
GenerationCoordinator,
101+
SessionStore,
102+
)
103+
from kv_cache_proposer.verifier import SinkWindowVerifier, VerifierConfig
104+
105+
verifier_cfg = VerifierConfig(
106+
model_id="Qwen/Qwen3-0.6B",
107+
dtype=torch.bfloat16, device="cpu",
108+
sink_size=4, window_size=64,
109+
)
110+
verifier = SinkWindowVerifier(verifier_cfg)
111+
store = SessionStore(capacity=4, cache_inspector=verifier)
112+
append_coord = AppendTokensCoordinator(store, verifier)
113+
gen_coord = GenerationCoordinator(store, verifier)
114+
115+
loop = asyncio.new_event_loop()
116+
holder: dict = {
117+
"server": None,
118+
"port": None,
119+
"started": threading.Event(),
120+
}
121+
122+
async def _serve():
123+
# Build the server INSIDE the worker thread's loop so any
124+
# internal asyncio.Future is bound to this loop, not the
125+
# main-thread default loop (the "Future attached to a
126+
# different loop" failure PR-B4 hit).
127+
server = grpc.aio.server()
128+
runtime_pb2_grpc.add_RuntimeServiceServicer_to_server(
129+
RuntimeServiceServicer(
130+
store,
131+
append_coordinator=append_coord,
132+
generation_coordinator=gen_coord,
133+
),
134+
server,
135+
)
136+
holder["server"] = server
137+
holder["port"] = server.add_insecure_port("127.0.0.1:0")
138+
await server.start()
139+
holder["started"].set()
140+
await server.wait_for_termination()
141+
142+
def _run():
143+
asyncio.set_event_loop(loop)
144+
loop.run_until_complete(_serve())
145+
146+
thread = threading.Thread(target=_run, daemon=True)
147+
thread.start()
148+
if not holder["started"].wait(timeout=15.0):
149+
raise RuntimeError(
150+
"background gRPC runtime failed to start within 15s",
151+
)
152+
153+
address = f"127.0.0.1:{holder['port']}"
154+
try:
155+
yield address
156+
finally:
157+
async def _shutdown():
158+
await holder["server"].stop(grace=0.1)
159+
160+
try:
161+
fut = asyncio.run_coroutine_threadsafe(_shutdown(), loop)
162+
fut.result(timeout=2.0)
163+
except Exception: # pragma: no cover - best-effort cleanup
164+
pass
165+
thread.join(timeout=2.0)
166+
time.sleep(0.05)
167+
try:
168+
loop.close()
169+
except Exception: # pragma: no cover - best-effort cleanup
170+
pass

0 commit comments

Comments
 (0)