release: v0.20.0 (align SDK adapter session-limit enforcement) (#22)

flyersworder · claude · web-flow · commit 5ce86ae0ecbd · 2026-05-10T21:46:20.000+02:00
* release: v0.20.0 (align SDK adapter session-limit enforcement)

Brings `create_sdk_mcp_server` in line with `create_langchain_tools`:
every wrapped tool now pre-checks `ContractSession.check_limits()` by
default. Pre-0.20.0, only `run_query` self-checked limits — lookup
tools (`describe_table`, `list_metrics`, etc.) bypassed entirely.
The two adapters now behave identically.

Practical effect: `max_duration_seconds` measures wall-clock from the
first tool call (any tool), not just from the first `run_query`. For
most users this is invisible — lookups complete in milliseconds. The
narrow population that sees a behavior change: agents with tight
`max_duration_seconds` AND lookup-heavy prompts that browse extensively
before querying. The new behavior matches the YAML's documented intent
('the agent has N seconds total') and closes the runaway-loop gap where
an agent stuck on lookups previously bypassed the duration cap.

Public API unchanged. New optional `apply_middleware: bool = True`
kwarg on `create_sdk_mcp_server` mirrors the LangChain adapter; pass
`False` to opt out and restore pre-0.20.0 semantics.

SQL validation is intentionally NOT auto-applied, same reasoning as
the LangChain adapter: doing so would block `inspect_query`'s purpose
of reporting violations as JSON. `run_query` self-validation at
`factory.py:632-702` already covers the cost path.

6 new tests on the `_wrap_with_session_check` helper and the new
kwarg surface. Full suite: 591 passed, 8 skipped, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

* review: address 2 sub-threshold items from automated review

- Issue A (score 75): Wrapper-emitted BLOCKED envelopes now include
  the canonical `Remaining: {budget}` suffix matching run_query's
  self-emitted blocks (factory.py:627-628). Affected both SDK and
  LangChain adapters' wrappers (_wrap_with_session_check,
  _to_structured_tool, ContractMiddleware._check). Agents whose
  retry-planning logic depended on the suffix now see consistent
  output regardless of whether the limit fired in run_query or in
  the wrapper layer.

- Issue D (score 72): Clarified the apply_middleware=False docstring
  in sdk.py. The earlier 'matching create_langchain_tools' phrasing
  understated a transport-inherent divergence — with apply_middleware=False
  the LangChain adapter still raises ToolException via its always-active
  prefix sniff, while the SDK adapter passes BLOCKED envelopes through
  as plain MCP text content (the SDK MCP transport has no status='error'
  field). The docstring now spells out that timing is aligned but error
  transport differs by design.

3 tests extended to assert the Remaining: suffix appears on session-
limit-exceeded paths in both adapters. Full suite: 591 passed,
8 skipped, 0 regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;

---------

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,27 @@
 
 All notable changes to this project will be documented in this file.
 
+## [0.20.0] - 2026-05-10
+
+### Changed
+
+- **`create_sdk_mcp_server` now auto-applies session-limit enforcement to all 9 tools by default**, matching the v0.19.0 behavior of `create_langchain_tools`. Pre-v0.20.0, only `run_query` self-checked `ContractSession` limits — lookup tools (`describe_table`, `list_metrics`, etc.) bypassed. The two adapters now behave identically: a single contract YAML enforces the same way under SDK and LangChain.
+- **Practical effect**: `max_duration_seconds` now measures wall-clock from the *first tool call* (any tool), not just from the first `run_query`. For most contracts this is invisible — lookups complete in milliseconds. The narrow population that sees a behavior change: agents with tight `max_duration_seconds` AND lookup-heavy prompts that browse extensively before querying. The new behavior matches the YAML's documented intent ("the agent has N seconds total"), and closes the runaway-loop gap where an agent stuck on lookup tools previously bypassed the duration cap.
+- **Escape hatch**: pass `apply_middleware=False` to `create_sdk_mcp_server` to restore pre-0.20.0 behavior.
+
+### Added
+
+- New `_wrap_with_session_check(inner, session)` helper in `tools/sdk.py` — exported as a private symbol so tests can verify the enforcement wrapper directly without going through the SDK's `@tool` decorator. Mirrors the in-tool enforcement pattern in `tools/langchain.py:_to_structured_tool`.
+
+### Fixed
+
+- Wrapper-emitted BLOCKED envelopes now include the canonical `Remaining: {budget}` suffix that `run_query`'s self-emitted blocks have always carried (per `factory.py:627-628`). Pre-0.20.0 the LangChain wrapper (introduced in v0.19.0) and the new SDK wrapper both omitted this suffix, so agents whose retry-planning logic depended on the suffix would lose context once they hit the wrapper layer instead of `run_query`'s own block. Applies to both `_wrap_with_session_check` (SDK) and `_to_structured_tool` / `ContractMiddleware._check` (LangChain).
+
+### Compatibility
+
+- Public API unchanged — `create_sdk_mcp_server` gains one optional kwarg with a sensible default. Pre-built `ToolDef` lists, custom sessions, and all other call shapes continue to work.
+- 6 new tests in `tests/test_tools/test_sdk.py` cover the wrapper behavior and the new kwarg.
+
 ## [0.19.0] - 2026-05-10
 
 ### Added
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "agentic-data-contracts"
-version = "0.19.0"
+version = "0.20.0"
 description = "YAML-first, domain-driven data governance for AI agents"
 readme = "README.md"
 requires-python = ">=3.12"
diff --git a/src/agentic_data_contracts/tools/langchain.py b/src/agentic_data_contracts/tools/langchain.py
@@ -32,6 +32,7 @@
 
 from __future__ import annotations
 
+import json
 from collections.abc import Awaitable, Callable
 from typing import Any
 
@@ -50,6 +51,13 @@
 _BLOCKED_PREFIX = "BLOCKED —"
 
 
+def _with_remaining(message: str, session: ContractSession) -> str:
+    """Append the canonical ``Remaining: {budget}`` suffix used by
+    ``run_query`` (factory.py:627-628) so wrapper-emitted blocks carry
+    the same diagnostic footprint as run_query's own blocks."""
+    return f"{message}\nRemaining: {json.dumps(session.remaining(), default=str)}"
+
+
 def _unwrap_mcp_text(envelope: dict[str, Any]) -> str:
     """Pull the first text block out of an MCP-style content envelope.
 
@@ -131,7 +139,10 @@ async def _coroutine(**kwargs: Any) -> tuple[str, dict[str, Any]]:
                 session.check_limits()
             except LimitExceededError as e:
                 raise ToolException(
-                    f"{_BLOCKED_PREFIX} Session limit exceeded: {e}"
+                    _with_remaining(
+                        f"{_BLOCKED_PREFIX} Session limit exceeded: {e}",
+                        session,
+                    )
                 ) from e
 
         envelope = await inner(kwargs)
@@ -209,7 +220,10 @@ def _check(self, request: ToolCallRequest) -> ToolMessage | None:
             self._session.check_limits()
         except LimitExceededError as e:
             return ToolMessage(
-                content=f"{_BLOCKED_PREFIX} Session limit exceeded: {e}",
+                content=_with_remaining(
+                    f"{_BLOCKED_PREFIX} Session limit exceeded: {e}",
+                    self._session,
+                ),
                 name=name,
                 tool_call_id=tool_call_id,
                 status="error",
@@ -226,9 +240,10 @@ def _check(self, request: ToolCallRequest) -> ToolMessage | None:
             if result.blocked:
                 self._session.record_retry()
                 return ToolMessage(
-                    content=(
+                    content=_with_remaining(
                         f"{_BLOCKED_PREFIX} Violations:\n"
-                        + "\n".join(f"- {r}" for r in result.reasons)
+                        + "\n".join(f"- {r}" for r in result.reasons),
+                        self._session,
                     ),
                     name=name,
                     tool_call_id=tool_call_id,
diff --git a/src/agentic_data_contracts/tools/sdk.py b/src/agentic_data_contracts/tools/sdk.py
@@ -1,15 +1,77 @@
-"""Claude Agent SDK integration — wraps ToolDefs into an SDK MCP server."""
+"""Claude Agent SDK integration — wraps ToolDefs into an SDK MCP server.
+
+By default (since v0.20.0) every wrapped tool pre-checks
+``ContractSession.check_limits()`` and short-circuits with a canonical
+``BLOCKED — Session limit exceeded`` envelope on overrun. This aligns the
+SDK adapter with ``create_langchain_tools`` so a single contract YAML
+behaves the same way under both adapters — in particular,
+``max_duration_seconds`` measures wall-clock from the first tool call,
+not just from the first ``run_query``.
+
+SQL validation is intentionally **not** auto-applied. Doing so would
+block ``inspect_query`` from reporting violations as JSON; the canonical
+``run_query`` self-validation at ``factory.py:632-702`` already covers
+the cost path.
+
+Pass ``apply_middleware=False`` to opt out (preserves the pre-0.20.0
+behavior where only ``run_query`` self-checked limits).
+"""
 
 from __future__ import annotations
 
+import functools
+import json
+from collections.abc import Awaitable, Callable
 from typing import Any
 
 from agentic_data_contracts.adapters.base import DatabaseAdapter
 from agentic_data_contracts.core.contract import DataContract
-from agentic_data_contracts.core.session import ContractSession
+from agentic_data_contracts.core.session import ContractSession, LimitExceededError
 from agentic_data_contracts.semantic.base import SemanticSource
 from agentic_data_contracts.tools.factory import ToolDef, create_tools
 
+_BLOCKED_PREFIX = "BLOCKED —"
+
+
+def _with_remaining(message: str, session: ContractSession) -> str:
+    """Append the canonical ``Remaining: {budget}`` suffix used by
+    ``run_query`` (factory.py:627-628) so wrapper-emitted blocks carry
+    the same diagnostic footprint as run_query's own blocks."""
+    return f"{message}\nRemaining: {json.dumps(session.remaining(), default=str)}"
+
+
+def _wrap_with_session_check(
+    inner: Callable[[dict[str, Any]], Awaitable[dict[str, Any]]],
+    session: ContractSession,
+) -> Callable[[dict[str, Any]], Awaitable[dict[str, Any]]]:
+    """Wrap an MCP-style tool callable with a pre-call session-limit check.
+
+    Returns the canonical ``BLOCKED — Session limit exceeded`` envelope on
+    overrun without invoking the inner function. SQL validation is
+    intentionally NOT applied here — that would short-circuit
+    ``inspect_query`` whose purpose is to *report* violations as JSON.
+    """
+
+    @functools.wraps(inner)
+    async def wrapped(args: dict[str, Any]) -> dict[str, Any]:
+        try:
+            session.check_limits()
+        except LimitExceededError as e:
+            return {
+                "content": [
+                    {
+                        "type": "text",
+                        "text": _with_remaining(
+                            f"{_BLOCKED_PREFIX} Session limit exceeded: {e}",
+                            session,
+                        ),
+                    }
+                ]
+            }
+        return await inner(args)
+
+    return wrapped
+
 
 def create_sdk_mcp_server(
     contract: DataContract,
@@ -18,6 +80,7 @@ def create_sdk_mcp_server(
     semantic_source: SemanticSource | None = None,
     session: ContractSession | None = None,
     tools: list[ToolDef] | None = None,
+    apply_middleware: bool = True,
     server_name: str = "data-contracts",
     server_version: str = "1.0.0",
 ) -> Any:
@@ -30,8 +93,25 @@ def create_sdk_mcp_server(
         contract: The data contract to enforce.
         adapter: Optional database adapter for query execution.
         semantic_source: Optional semantic source (auto-loaded if not given).
-        session: Optional session for tracking enforcement state.
+        session: Optional session for tracking enforcement state. One is
+            created automatically if omitted.
         tools: Pre-built ToolDefs (if None, created via create_tools).
+        apply_middleware: When ``True`` (default since v0.20.0), every
+            wrapped tool pre-checks ``session.check_limits()`` and
+            short-circuits on overrun. Aligned with ``create_langchain_tools``
+            on enforcement *timing* (clock starts at first tool call), but
+            error transport differs by design — see note below. Set
+            ``False`` to restore pre-0.20.0 behavior in which only
+            ``run_query`` self-checks limits (lookup tools bypass).
+
+            Note on cross-adapter parity: with ``apply_middleware=False``,
+            this adapter passes a tool's BLOCKED envelope through to the
+            agent as-is (the SDK MCP transport carries error context as
+            text content; there is no ``status="error"`` field). The
+            LangChain adapter additionally sniffs the ``BLOCKED —``
+            prefix and converts it into a ``ToolException``. Both surface
+            the same text to the agent; only the structured-error signal
+            differs.
         server_name: Name for the MCP server.
         server_version: Version for the MCP server.
 
@@ -51,6 +131,9 @@ def create_sdk_mcp_server(
         )
         raise ImportError(msg) from None
 
+    if session is None:
+        session = ContractSession(contract)
+
     if tools is None:
         tools = create_tools(
             contract,
@@ -61,7 +144,14 @@ def create_sdk_mcp_server(
 
     sdk_tools = []
     for t in tools:
-        decorated = sdk_tool(t.name, t.description, t.input_schema)(t.callable)
+        callable_to_register = (
+            _wrap_with_session_check(t.callable, session)
+            if apply_middleware
+            else t.callable
+        )
+        decorated = sdk_tool(t.name, t.description, t.input_schema)(
+            callable_to_register
+        )
         sdk_tools.append(decorated)
 
     return _create_server(
diff --git a/tests/test_tools/test_langchain.py b/tests/test_tools/test_langchain.py
@@ -244,7 +244,10 @@ async def test_session_limit_exceeded_raises_tool_exception(
     contract: DataContract, adapter: DuckDBAdapter, semantic: YamlSource
 ) -> None:
     """Even non-SQL tools must surface session-limit exhaustion. Fixture
-    sets max_retries=3 (tests/fixtures/valid_contract.yml:45)."""
+    sets max_retries=3 (tests/fixtures/valid_contract.yml:45). The
+    raised ToolException must include the ``Remaining:`` budget summary
+    so the agent sees the same diagnostic info ``run_query`` would have
+    emitted directly."""
     session = ContractSession(contract)
     for _ in range(4):  # exceed max_retries=3
         session.record_retry()
@@ -256,6 +259,7 @@ async def test_session_limit_exceeded_raises_tool_exception(
         await describe.ainvoke({"schema": "analytics", "table": "orders"})
     msg = str(exc.value).lower()
     assert "limit" in msg or "exceeded" in msg
+    assert "remaining:" in msg
 
 
 # ─── apply_middleware=False escape hatch ──────────────────────────────────────
@@ -324,6 +328,7 @@ async def _handler(_req: ToolCallRequest) -> ToolMessage:  # pragma: no cover
     assert isinstance(result, ToolMessage)
     assert result.status == "error"
     assert "BLOCKED" in str(result.content)
+    assert "Remaining:" in str(result.content)  # agent must see budget
     assert result.tool_call_id == "tc-1"
 
 
diff --git a/tests/test_tools/test_sdk.py b/tests/test_tools/test_sdk.py
diff --git a/uv.lock b/uv.lock