test: add adapter unit tests + adapters README (#94 review fixes)

constk · constk · commit 21d3f3367e76 · 2026-05-26T00:29:00.000+10:00
Addresses two gate failures on #104 surfaced by code review: 1. "Tests required" gate — feat: prefix declared a behaviour change but tests/ had no test for the new adapter (the eval/-side test only runs with live Azure credentials). Adds tests/test_eval_azure_openai_adapter.py: 13 fully-offline cases covering _resolve_config (defaults, override, empty-string fallback, missing-env error listing), the constructor (env wiring, explicit API version, missing-env, missing-SDK), and the two SDK call paths (complete_json structured-output mode, complete user-message dispatch, null-content returns "" / "{}"). The SDK is mocked at sys.modules level so the test never hits the network and never requires the openai extra to be installed. 2. "src/ README audit" gate — every src/ package needs a README.md per CLAUDE.md. Adds src/eval/adapters/README.md documenting the layer's purpose, the current adapter, a 7-step "adding a new adapter" recipe, and why the layer lives at the top of the import order. Also applies the reviewer's non-blocking sentinel-string suggestion: the magic "azure-deployment" string passed as judge_model in eval/test_golden_patterns.py is now the named constant _AZURE_DEPLOYMENT_SENTINEL with a comment explaining why the runner threads it through but the Azure adapter discards it. Local gates: 205 unit tests pass (was 192, +13 new), mypy clean on 43 source files, ruff/format/import-linter all green. Refs #94
diff --git a/eval/test_golden_patterns.py b/eval/test_golden_patterns.py
@@ -50,6 +50,13 @@
 
 patterns = load_golden_dataset(_PATTERNS_PATH)
 
+# Sentinel passed to EvalRunner.judge_model. The runner threads this through
+# to LLMClient.complete_json(model=...), where the Azure adapter discards it
+# — Azure addresses by deployment name (set at adapter construction), not by
+# the model parameter. Named constant makes the intent obvious to a reader
+# of this fixture without needing to chase into the adapter.
+_AZURE_DEPLOYMENT_SENTINEL = "azure-deployment-from-env"
+
 
 @pytest.fixture(scope="module")
 def runner() -> EvalRunner:
@@ -62,9 +69,7 @@ def runner() -> EvalRunner:
     return EvalRunner(
         answer_fn=client.complete,
         judge_client=client,
-        # Azure addresses by deployment, set at client construction. The
-        # runner still passes this through for Protocol conformance.
-        judge_model="azure-deployment",
+        judge_model=_AZURE_DEPLOYMENT_SENTINEL,
     )
 
 
diff --git a/src/eval/adapters/README.md b/src/eval/adapters/README.md
@@ -0,0 +1,33 @@
+# `src/eval/adapters`
+
+Concrete `LLMClient` adapters for the eval harness. The judge in [`src/eval/judge.py`](../judge.py) calls an `LLMClient` Protocol — never a vendor SDK directly. Each adapter in this package implements that Protocol for one provider, so the eval core stays vendor-neutral and a downstream consumer can swap providers by changing one wiring line in their test fixture.
+
+## Why this layer exists
+
+Without the Protocol seam, swapping LLM providers would mean touching the eval core. With it, vendor lock-in is confined to one file per provider. The layer demonstrates that the harness's "provider-agnostic" claim is structural, not aspirational: the eval core has zero imports of any vendor SDK.
+
+## Current adapters
+
+| File | Provider | Optional extra | Env contract |
+|---|---|---|---|
+| [`azure_openai.py`](azure_openai.py) | Azure OpenAI | `uv sync --extra eval` | `AZURE_OPENAI_ENDPOINT`, `AZURE_OPENAI_API_KEY`, `AZURE_OPENAI_DEPLOYMENT`, optional `AZURE_OPENAI_API_VERSION` (default `2024-10-21`) |
+
+## Adding a new adapter
+
+1. Add the SDK to `[project.optional-dependencies]` in `pyproject.toml` — either to the existing `eval` extra or a new provider-scoped one.
+2. Add the SDK's top-level module to `[[tool.mypy.overrides]]` with `ignore_missing_imports = true`, matching the existing `openai.*` / `opentelemetry.*` entries. This keeps mypy clean on stock `uv sync --extra dev` checkouts.
+3. Implement `complete_json(*, model: str, prompt: str) -> str` per the `LLMClient` Protocol in [`src/eval/judge.py`](../judge.py). Optionally add a `complete(prompt: str) -> str` for use as an `EvalRunner.answer_fn`.
+4. **Lazy-import the SDK inside `__init__`** so the adapter module remains importable without the optional extra installed. The import error path should raise a clear, named exception (e.g. `AzureOpenAIConfigError`) telling the reader which `uv sync --extra ...` to run.
+5. Read configuration from environment variables at construction time. Raise the same named exception listing every missing var when env is incomplete — fail fast, fail clear.
+6. Add an offline unit test in [`tests/`](../../../tests/) that mocks the SDK at the `sys.modules` level (see `tests/test_eval_azure_openai_adapter.py` for the pattern). This keeps the unit suite credential-free; live-credential paths are exercised by [`eval/test_golden_patterns.py`](../../../eval/test_golden_patterns.py).
+7. Document the env contract in this README's table above and in [`docs/EVAL_HARNESS.md`](../../../docs/EVAL_HARNESS.md)'s "Worked patterns" section.
+
+## Why adapters live under `src/eval/`
+
+The import-linter contract in `pyproject.toml` puts `src.eval` at the top of the layered import order:
+
+```
+api | eval -> agent -> tools -> data -> observability -> models
+```
+
+Adapters can therefore depend on anything in `src/`; nothing in `src/` depends on them. That asymmetry is exactly what the layered architecture exists to encode — vendor-specific code stays at the boundary, never leaks down into the eval primitives or the model layer.
diff --git a/tests/test_eval_azure_openai_adapter.py b/tests/test_eval_azure_openai_adapter.py
@@ -0,0 +1,234 @@
+"""Offline unit tests for the Azure OpenAI eval adapter.
+
+These tests never hit the network. The `openai` SDK is replaced at the
+`sys.modules` level so the adapter's lazy import resolves to a `MagicMock`,
+which lets us assert on the constructor arguments and the chat-completions
+call shape without an API key.
+
+The live-credential path is exercised by `eval/test_golden_patterns.py`,
+which is skipped on stock checkouts.
+"""
+
+from __future__ import annotations
+
+import sys
+from types import SimpleNamespace
+from unittest.mock import MagicMock
+
+import pytest
+
+from src.eval.adapters.azure_openai import (
+    _DEFAULT_API_VERSION,
+    AzureOpenAIClient,
+    AzureOpenAIConfigError,
+    _resolve_config,
+)
+
+# ---------------------------------------------------------------------------
+# _resolve_config — pure function, no SDK involved
+# ---------------------------------------------------------------------------
+
+
+class TestResolveConfig:
+    """`_resolve_config` reads env, applies the default API version, and
+    raises a single `AzureOpenAIConfigError` naming every missing var."""
+
+    def test_returns_env_values_with_default_api_version(self) -> None:
+        env = {
+            "AZURE_OPENAI_ENDPOINT": "https://x.openai.azure.com",
+            "AZURE_OPENAI_API_KEY": "key",
+            "AZURE_OPENAI_DEPLOYMENT": "gpt-4o-mini",
+        }
+        endpoint, key, deploy, version = _resolve_config(env)
+        assert endpoint == "https://x.openai.azure.com"
+        assert key == "key"
+        assert deploy == "gpt-4o-mini"
+        assert version == _DEFAULT_API_VERSION
+
+    def test_explicit_api_version_overrides_default(self) -> None:
+        env = {
+            "AZURE_OPENAI_ENDPOINT": "https://x.openai.azure.com",
+            "AZURE_OPENAI_API_KEY": "key",
+            "AZURE_OPENAI_DEPLOYMENT": "deploy",
+            "AZURE_OPENAI_API_VERSION": "2025-01-01",
+        }
+        _, _, _, version = _resolve_config(env)
+        assert version == "2025-01-01"
+
+    def test_empty_api_version_falls_back_to_default(self) -> None:
+        env = {
+            "AZURE_OPENAI_ENDPOINT": "https://x.openai.azure.com",
+            "AZURE_OPENAI_API_KEY": "key",
+            "AZURE_OPENAI_DEPLOYMENT": "deploy",
+            "AZURE_OPENAI_API_VERSION": "",
+        }
+        _, _, _, version = _resolve_config(env)
+        assert version == _DEFAULT_API_VERSION
+
+    def test_raises_listing_all_missing_when_none_set(self) -> None:
+        with pytest.raises(AzureOpenAIConfigError) as exc:
+            _resolve_config({})
+        msg = str(exc.value)
+        assert "AZURE_OPENAI_ENDPOINT" in msg
+        assert "AZURE_OPENAI_API_KEY" in msg
+        assert "AZURE_OPENAI_DEPLOYMENT" in msg
+
+    def test_raises_listing_only_missing(self) -> None:
+        env = {
+            "AZURE_OPENAI_ENDPOINT": "x",
+            "AZURE_OPENAI_DEPLOYMENT": "d",
+            # AZURE_OPENAI_API_KEY missing
+        }
+        with pytest.raises(AzureOpenAIConfigError) as exc:
+            _resolve_config(env)
+        msg = str(exc.value)
+        assert "AZURE_OPENAI_API_KEY" in msg
+        assert "AZURE_OPENAI_ENDPOINT" not in msg
+        assert "AZURE_OPENAI_DEPLOYMENT" not in msg
+
+
+# ---------------------------------------------------------------------------
+# AzureOpenAIClient — SDK is mocked at sys.modules level
+# ---------------------------------------------------------------------------
+
+
+@pytest.fixture
+def _env(monkeypatch: pytest.MonkeyPatch) -> None:
+    """Populate the three required env vars with test values."""
+    monkeypatch.setenv("AZURE_OPENAI_ENDPOINT", "https://x.openai.azure.com")
+    monkeypatch.setenv("AZURE_OPENAI_API_KEY", "test-key")
+    monkeypatch.setenv("AZURE_OPENAI_DEPLOYMENT", "test-deploy")
+    monkeypatch.delenv("AZURE_OPENAI_API_VERSION", raising=False)
+
+
+@pytest.fixture
+def _mock_openai(monkeypatch: pytest.MonkeyPatch) -> MagicMock:
+    """Install a fake `openai` module exporting a `AzureOpenAI` constructor.
+
+    The adapter's lazy `from openai import AzureOpenAI` will resolve to the
+    `MagicMock` returned here, so call-args assertions work without any SDK
+    installed.
+    """
+    mock_constructor = MagicMock(name="AzureOpenAI")
+    fake_module = SimpleNamespace(AzureOpenAI=mock_constructor)
+    monkeypatch.setitem(sys.modules, "openai", fake_module)
+    return mock_constructor
+
+
+class TestAzureOpenAIClientConstruction:
+    """Constructor wires env config into the SDK client and surfaces clear
+    errors when prerequisites are missing."""
+
+    def test_init_constructs_sdk_with_resolved_env_config(
+        self, _env: None, _mock_openai: MagicMock
+    ) -> None:
+        AzureOpenAIClient()
+        _mock_openai.assert_called_once_with(
+            azure_endpoint="https://x.openai.azure.com",
+            api_key="test-key",
+            api_version=_DEFAULT_API_VERSION,
+        )
+
+    def test_init_passes_explicit_api_version(
+        self,
+        _env: None,
+        _mock_openai: MagicMock,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        monkeypatch.setenv("AZURE_OPENAI_API_VERSION", "2025-01-01")
+        AzureOpenAIClient()
+        kwargs = _mock_openai.call_args.kwargs
+        assert kwargs["api_version"] == "2025-01-01"
+
+    def test_init_raises_when_env_missing(
+        self, monkeypatch: pytest.MonkeyPatch
+    ) -> None:
+        for name in (
+            "AZURE_OPENAI_ENDPOINT",
+            "AZURE_OPENAI_API_KEY",
+            "AZURE_OPENAI_DEPLOYMENT",
+        ):
+            monkeypatch.delenv(name, raising=False)
+        with pytest.raises(AzureOpenAIConfigError, match="AZURE_OPENAI_ENDPOINT"):
+            AzureOpenAIClient()
+
+    def test_init_raises_when_openai_sdk_missing(
+        self,
+        _env: None,
+        monkeypatch: pytest.MonkeyPatch,
+    ) -> None:
+        # Force the lazy import inside __init__ to ImportError. Setting the
+        # module to None makes `from openai import AzureOpenAI` raise the
+        # exact ImportError the adapter catches.
+        monkeypatch.setitem(sys.modules, "openai", None)
+        with pytest.raises(AzureOpenAIConfigError, match="openai SDK not installed"):
+            AzureOpenAIClient()
+
+
+class TestAzureOpenAIClientCalls:
+    """`complete` and `complete_json` dispatch correctly to the SDK and
+    return the assistant message content."""
+
+    @staticmethod
+    def _mock_response(content: str | None) -> MagicMock:
+        """Build a ChatCompletion-shaped MagicMock with the given content."""
+        message = MagicMock()
+        message.content = content
+        choice = MagicMock()
+        choice.message = message
+        response = MagicMock()
+        response.choices = [choice]
+        return response
+
+    def test_complete_json_uses_structured_output_mode(
+        self, _env: None, _mock_openai: MagicMock
+    ) -> None:
+        sdk_instance = _mock_openai.return_value
+        sdk_instance.chat.completions.create.return_value = self._mock_response(
+            '{"ok": true}'
+        )
+
+        client = AzureOpenAIClient()
+        body = client.complete_json(model="ignored-per-Protocol", prompt="judge this")
+
+        assert body == '{"ok": true}'
+        call = sdk_instance.chat.completions.create.call_args
+        assert call.kwargs["model"] == "test-deploy"
+        assert call.kwargs["response_format"] == {"type": "json_object"}
+        messages = call.kwargs["messages"]
+        assert messages[0]["role"] == "system"
+        assert "JSON" in messages[0]["content"]
+        assert messages[1] == {"role": "user", "content": "judge this"}
+
+    def test_complete_json_returns_empty_json_on_null_content(
+        self, _env: None, _mock_openai: MagicMock
+    ) -> None:
+        sdk_instance = _mock_openai.return_value
+        sdk_instance.chat.completions.create.return_value = self._mock_response(None)
+
+        client = AzureOpenAIClient()
+        assert client.complete_json(model="x", prompt="x") == "{}"
+
+    def test_complete_dispatches_user_message_to_deployment(
+        self, _env: None, _mock_openai: MagicMock
+    ) -> None:
+        sdk_instance = _mock_openai.return_value
+        sdk_instance.chat.completions.create.return_value = self._mock_response("hi")
+
+        client = AzureOpenAIClient()
+        assert client.complete("say hi") == "hi"
+
+        call = sdk_instance.chat.completions.create.call_args
+        assert call.kwargs["model"] == "test-deploy"
+        assert call.kwargs["messages"] == [{"role": "user", "content": "say hi"}]
+        # complete() does not pin response_format — only complete_json does
+        assert "response_format" not in call.kwargs
+
+    def test_complete_returns_empty_string_on_null_content(
+        self, _env: None, _mock_openai: MagicMock
+    ) -> None:
+        sdk_instance = _mock_openai.return_value
+        sdk_instance.chat.completions.create.return_value = self._mock_response(None)
+
+        client = AzureOpenAIClient()
+        assert client.complete("x") == ""