Skip to content

Commit 21d3f33

Browse files
committed
test: add adapter unit tests + adapters README (#94 review fixes)
Addresses two gate failures on #104 surfaced by code review: 1. "Tests required" gate — feat: prefix declared a behaviour change but tests/ had no test for the new adapter (the eval/-side test only runs with live Azure credentials). Adds tests/test_eval_azure_openai_adapter.py: 13 fully-offline cases covering _resolve_config (defaults, override, empty-string fallback, missing-env error listing), the constructor (env wiring, explicit API version, missing-env, missing-SDK), and the two SDK call paths (complete_json structured-output mode, complete user-message dispatch, null-content returns "" / "{}"). The SDK is mocked at sys.modules level so the test never hits the network and never requires the openai extra to be installed. 2. "src/ README audit" gate — every src/ package needs a README.md per CLAUDE.md. Adds src/eval/adapters/README.md documenting the layer's purpose, the current adapter, a 7-step "adding a new adapter" recipe, and why the layer lives at the top of the import order. Also applies the reviewer's non-blocking sentinel-string suggestion: the magic "azure-deployment" string passed as judge_model in eval/test_golden_patterns.py is now the named constant _AZURE_DEPLOYMENT_SENTINEL with a comment explaining why the runner threads it through but the Azure adapter discards it. Local gates: 205 unit tests pass (was 192, +13 new), mypy clean on 43 source files, ruff/format/import-linter all green. Refs #94
1 parent 32b1203 commit 21d3f33

3 files changed

Lines changed: 275 additions & 3 deletions

File tree

eval/test_golden_patterns.py

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,13 @@
5050

5151
patterns = load_golden_dataset(_PATTERNS_PATH)
5252

53+
# Sentinel passed to EvalRunner.judge_model. The runner threads this through
54+
# to LLMClient.complete_json(model=...), where the Azure adapter discards it
55+
# — Azure addresses by deployment name (set at adapter construction), not by
56+
# the model parameter. Named constant makes the intent obvious to a reader
57+
# of this fixture without needing to chase into the adapter.
58+
_AZURE_DEPLOYMENT_SENTINEL = "azure-deployment-from-env"
59+
5360

5461
@pytest.fixture(scope="module")
5562
def runner() -> EvalRunner:
@@ -62,9 +69,7 @@ def runner() -> EvalRunner:
6269
return EvalRunner(
6370
answer_fn=client.complete,
6471
judge_client=client,
65-
# Azure addresses by deployment, set at client construction. The
66-
# runner still passes this through for Protocol conformance.
67-
judge_model="azure-deployment",
72+
judge_model=_AZURE_DEPLOYMENT_SENTINEL,
6873
)
6974

7075

src/eval/adapters/README.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
# `src/eval/adapters`
2+
3+
Concrete `LLMClient` adapters for the eval harness. The judge in [`src/eval/judge.py`](../judge.py) calls an `LLMClient` Protocol — never a vendor SDK directly. Each adapter in this package implements that Protocol for one provider, so the eval core stays vendor-neutral and a downstream consumer can swap providers by changing one wiring line in their test fixture.
4+
5+
## Why this layer exists
6+
7+
Without the Protocol seam, swapping LLM providers would mean touching the eval core. With it, vendor lock-in is confined to one file per provider. The layer demonstrates that the harness's "provider-agnostic" claim is structural, not aspirational: the eval core has zero imports of any vendor SDK.
8+
9+
## Current adapters
10+
11+
| File | Provider | Optional extra | Env contract |
12+
|---|---|---|---|
13+
| [`azure_openai.py`](azure_openai.py) | Azure OpenAI | `uv sync --extra eval` | `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_DEPLOYMENT`, optional `AZURE_OPENAI_API_VERSION` (default `2024-10-21`) |
14+
15+
## Adding a new adapter
16+
17+
1. Add the SDK to `[project.optional-dependencies]` in `pyproject.toml` — either to the existing `eval` extra or a new provider-scoped one.
18+
2. Add the SDK's top-level module to `[[tool.mypy.overrides]]` with `ignore_missing_imports = true`, matching the existing `openai.*` / `opentelemetry.*` entries. This keeps mypy clean on stock `uv sync --extra dev` checkouts.
19+
3. Implement `complete_json(*, model: str, prompt: str) -> str` per the `LLMClient` Protocol in [`src/eval/judge.py`](../judge.py). Optionally add a `complete(prompt: str) -> str` for use as an `EvalRunner.answer_fn`.
20+
4. **Lazy-import the SDK inside `__init__`** so the adapter module remains importable without the optional extra installed. The import error path should raise a clear, named exception (e.g. `AzureOpenAIConfigError`) telling the reader which `uv sync --extra ...` to run.
21+
5. Read configuration from environment variables at construction time. Raise the same named exception listing every missing var when env is incomplete — fail fast, fail clear.
22+
6. Add an offline unit test in [`tests/`](../../../tests/) that mocks the SDK at the `sys.modules` level (see `tests/test_eval_azure_openai_adapter.py` for the pattern). This keeps the unit suite credential-free; live-credential paths are exercised by [`eval/test_golden_patterns.py`](../../../eval/test_golden_patterns.py).
23+
7. Document the env contract in this README's table above and in [`docs/EVAL_HARNESS.md`](../../../docs/EVAL_HARNESS.md)'s "Worked patterns" section.
24+
25+
## Why adapters live under `src/eval/`
26+
27+
The import-linter contract in `pyproject.toml` puts `src.eval` at the top of the layered import order:
28+
29+
```
30+
api | eval -> agent -> tools -> data -> observability -> models
31+
```
32+
33+
Adapters can therefore depend on anything in `src/`; nothing in `src/` depends on them. That asymmetry is exactly what the layered architecture exists to encode — vendor-specific code stays at the boundary, never leaks down into the eval primitives or the model layer.
Lines changed: 234 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,234 @@
1+
"""Offline unit tests for the Azure OpenAI eval adapter.
2+
3+
These tests never hit the network. The `openai` SDK is replaced at the
4+
`sys.modules` level so the adapter's lazy import resolves to a `MagicMock`,
5+
which lets us assert on the constructor arguments and the chat-completions
6+
call shape without an API key.
7+
8+
The live-credential path is exercised by `eval/test_golden_patterns.py`,
9+
which is skipped on stock checkouts.
10+
"""
11+
12+
from __future__ import annotations
13+
14+
import sys
15+
from types import SimpleNamespace
16+
from unittest.mock import MagicMock
17+
18+
import pytest
19+
20+
from src.eval.adapters.azure_openai import (
21+
_DEFAULT_API_VERSION,
22+
AzureOpenAIClient,
23+
AzureOpenAIConfigError,
24+
_resolve_config,
25+
)
26+
27+
# ---------------------------------------------------------------------------
28+
# _resolve_config — pure function, no SDK involved
29+
# ---------------------------------------------------------------------------
30+
31+
32+
class TestResolveConfig:
33+
"""`_resolve_config` reads env, applies the default API version, and
34+
raises a single `AzureOpenAIConfigError` naming every missing var."""
35+
36+
def test_returns_env_values_with_default_api_version(self) -> None:
37+
env = {
38+
"AZURE_OPENAI_ENDPOINT": "https://x.openai.azure.com",
39+
"AZURE_OPENAI_API_KEY": "key",
40+
"AZURE_OPENAI_DEPLOYMENT": "gpt-4o-mini",
41+
}
42+
endpoint, key, deploy, version = _resolve_config(env)
43+
assert endpoint == "https://x.openai.azure.com"
44+
assert key == "key"
45+
assert deploy == "gpt-4o-mini"
46+
assert version == _DEFAULT_API_VERSION
47+
48+
def test_explicit_api_version_overrides_default(self) -> None:
49+
env = {
50+
"AZURE_OPENAI_ENDPOINT": "https://x.openai.azure.com",
51+
"AZURE_OPENAI_API_KEY": "key",
52+
"AZURE_OPENAI_DEPLOYMENT": "deploy",
53+
"AZURE_OPENAI_API_VERSION": "2025-01-01",
54+
}
55+
_, _, _, version = _resolve_config(env)
56+
assert version == "2025-01-01"
57+
58+
def test_empty_api_version_falls_back_to_default(self) -> None:
59+
env = {
60+
"AZURE_OPENAI_ENDPOINT": "https://x.openai.azure.com",
61+
"AZURE_OPENAI_API_KEY": "key",
62+
"AZURE_OPENAI_DEPLOYMENT": "deploy",
63+
"AZURE_OPENAI_API_VERSION": "",
64+
}
65+
_, _, _, version = _resolve_config(env)
66+
assert version == _DEFAULT_API_VERSION
67+
68+
def test_raises_listing_all_missing_when_none_set(self) -> None:
69+
with pytest.raises(AzureOpenAIConfigError) as exc:
70+
_resolve_config({})
71+
msg = str(exc.value)
72+
assert "AZURE_OPENAI_ENDPOINT" in msg
73+
assert "AZURE_OPENAI_API_KEY" in msg
74+
assert "AZURE_OPENAI_DEPLOYMENT" in msg
75+
76+
def test_raises_listing_only_missing(self) -> None:
77+
env = {
78+
"AZURE_OPENAI_ENDPOINT": "x",
79+
"AZURE_OPENAI_DEPLOYMENT": "d",
80+
# AZURE_OPENAI_API_KEY missing
81+
}
82+
with pytest.raises(AzureOpenAIConfigError) as exc:
83+
_resolve_config(env)
84+
msg = str(exc.value)
85+
assert "AZURE_OPENAI_API_KEY" in msg
86+
assert "AZURE_OPENAI_ENDPOINT" not in msg
87+
assert "AZURE_OPENAI_DEPLOYMENT" not in msg
88+
89+
90+
# ---------------------------------------------------------------------------
91+
# AzureOpenAIClient — SDK is mocked at sys.modules level
92+
# ---------------------------------------------------------------------------
93+
94+
95+
@pytest.fixture
96+
def _env(monkeypatch: pytest.MonkeyPatch) -> None:
97+
"""Populate the three required env vars with test values."""
98+
monkeypatch.setenv("AZURE_OPENAI_ENDPOINT", "https://x.openai.azure.com")
99+
monkeypatch.setenv("AZURE_OPENAI_API_KEY", "test-key")
100+
monkeypatch.setenv("AZURE_OPENAI_DEPLOYMENT", "test-deploy")
101+
monkeypatch.delenv("AZURE_OPENAI_API_VERSION", raising=False)
102+
103+
104+
@pytest.fixture
105+
def _mock_openai(monkeypatch: pytest.MonkeyPatch) -> MagicMock:
106+
"""Install a fake `openai` module exporting a `AzureOpenAI` constructor.
107+
108+
The adapter's lazy `from openai import AzureOpenAI` will resolve to the
109+
`MagicMock` returned here, so call-args assertions work without any SDK
110+
installed.
111+
"""
112+
mock_constructor = MagicMock(name="AzureOpenAI")
113+
fake_module = SimpleNamespace(AzureOpenAI=mock_constructor)
114+
monkeypatch.setitem(sys.modules, "openai", fake_module)
115+
return mock_constructor
116+
117+
118+
class TestAzureOpenAIClientConstruction:
119+
"""Constructor wires env config into the SDK client and surfaces clear
120+
errors when prerequisites are missing."""
121+
122+
def test_init_constructs_sdk_with_resolved_env_config(
123+
self, _env: None, _mock_openai: MagicMock
124+
) -> None:
125+
AzureOpenAIClient()
126+
_mock_openai.assert_called_once_with(
127+
azure_endpoint="https://x.openai.azure.com",
128+
api_key="test-key",
129+
api_version=_DEFAULT_API_VERSION,
130+
)
131+
132+
def test_init_passes_explicit_api_version(
133+
self,
134+
_env: None,
135+
_mock_openai: MagicMock,
136+
monkeypatch: pytest.MonkeyPatch,
137+
) -> None:
138+
monkeypatch.setenv("AZURE_OPENAI_API_VERSION", "2025-01-01")
139+
AzureOpenAIClient()
140+
kwargs = _mock_openai.call_args.kwargs
141+
assert kwargs["api_version"] == "2025-01-01"
142+
143+
def test_init_raises_when_env_missing(
144+
self, monkeypatch: pytest.MonkeyPatch
145+
) -> None:
146+
for name in (
147+
"AZURE_OPENAI_ENDPOINT",
148+
"AZURE_OPENAI_API_KEY",
149+
"AZURE_OPENAI_DEPLOYMENT",
150+
):
151+
monkeypatch.delenv(name, raising=False)
152+
with pytest.raises(AzureOpenAIConfigError, match="AZURE_OPENAI_ENDPOINT"):
153+
AzureOpenAIClient()
154+
155+
def test_init_raises_when_openai_sdk_missing(
156+
self,
157+
_env: None,
158+
monkeypatch: pytest.MonkeyPatch,
159+
) -> None:
160+
# Force the lazy import inside __init__ to ImportError. Setting the
161+
# module to None makes `from openai import AzureOpenAI` raise the
162+
# exact ImportError the adapter catches.
163+
monkeypatch.setitem(sys.modules, "openai", None)
164+
with pytest.raises(AzureOpenAIConfigError, match="openai SDK not installed"):
165+
AzureOpenAIClient()
166+
167+
168+
class TestAzureOpenAIClientCalls:
169+
"""`complete` and `complete_json` dispatch correctly to the SDK and
170+
return the assistant message content."""
171+
172+
@staticmethod
173+
def _mock_response(content: str | None) -> MagicMock:
174+
"""Build a ChatCompletion-shaped MagicMock with the given content."""
175+
message = MagicMock()
176+
message.content = content
177+
choice = MagicMock()
178+
choice.message = message
179+
response = MagicMock()
180+
response.choices = [choice]
181+
return response
182+
183+
def test_complete_json_uses_structured_output_mode(
184+
self, _env: None, _mock_openai: MagicMock
185+
) -> None:
186+
sdk_instance = _mock_openai.return_value
187+
sdk_instance.chat.completions.create.return_value = self._mock_response(
188+
'{"ok": true}'
189+
)
190+
191+
client = AzureOpenAIClient()
192+
body = client.complete_json(model="ignored-per-Protocol", prompt="judge this")
193+
194+
assert body == '{"ok": true}'
195+
call = sdk_instance.chat.completions.create.call_args
196+
assert call.kwargs["model"] == "test-deploy"
197+
assert call.kwargs["response_format"] == {"type": "json_object"}
198+
messages = call.kwargs["messages"]
199+
assert messages[0]["role"] == "system"
200+
assert "JSON" in messages[0]["content"]
201+
assert messages[1] == {"role": "user", "content": "judge this"}
202+
203+
def test_complete_json_returns_empty_json_on_null_content(
204+
self, _env: None, _mock_openai: MagicMock
205+
) -> None:
206+
sdk_instance = _mock_openai.return_value
207+
sdk_instance.chat.completions.create.return_value = self._mock_response(None)
208+
209+
client = AzureOpenAIClient()
210+
assert client.complete_json(model="x", prompt="x") == "{}"
211+
212+
def test_complete_dispatches_user_message_to_deployment(
213+
self, _env: None, _mock_openai: MagicMock
214+
) -> None:
215+
sdk_instance = _mock_openai.return_value
216+
sdk_instance.chat.completions.create.return_value = self._mock_response("hi")
217+
218+
client = AzureOpenAIClient()
219+
assert client.complete("say hi") == "hi"
220+
221+
call = sdk_instance.chat.completions.create.call_args
222+
assert call.kwargs["model"] == "test-deploy"
223+
assert call.kwargs["messages"] == [{"role": "user", "content": "say hi"}]
224+
# complete() does not pin response_format — only complete_json does
225+
assert "response_format" not in call.kwargs
226+
227+
def test_complete_returns_empty_string_on_null_content(
228+
self, _env: None, _mock_openai: MagicMock
229+
) -> None:
230+
sdk_instance = _mock_openai.return_value
231+
sdk_instance.chat.completions.create.return_value = self._mock_response(None)
232+
233+
client = AzureOpenAIClient()
234+
assert client.complete("x") == ""

0 commit comments

Comments
 (0)