release: v0.21.0 (describe_table emits column descriptions)

flyersworder · claude · flyersworder · commit 9b82c58205b0 · 2026-05-17T20:06:01.000+02:00
Closes the day-one gap where describe_table dropped Column.description on
the way out, even when populated by the adapter or available in the
semantic source. Now overlays descriptions with semantic-source-wins
precedence, falling back to the adapter's Column.description, omitting
the field when both are empty.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -1,6 +1,6 @@
 repos:
   - repo: https://github.com/astral-sh/ruff-pre-commit
-    rev: v0.15.10
+    rev: v0.15.13
     hooks:
       - id: ruff-check
         args: [--fix]
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -2,6 +2,28 @@
 
 All notable changes to this project will be documented in this file.
 
+## [0.21.0] - 2026-05-17
+
+### Fixed
+
+- **`describe_table` now emits column descriptions to the agent.** Since the tool factory's first commit (`0296613`), the tool serialised columns as `{name, type, nullable}` only — `Column.description` was silently dropped on the way out, even when populated by the adapter (e.g., a Denodo deployment carrying authored catalog comments) or available in the contract's semantic source. This is the single largest *context* improvement a data-contract library can make: per the [Datacult "boring work" benchmark](https://www.datacult.com/post/the-boring-work-that-makes-ai-analytics-actually-work-why-winning-with-ai-in-analytics-is-an-investment-in-a-rich-data-context-not-better-llm-models), adding column descriptions moved an agent's SQL accuracy from 0% to 15% and SQL generation from 38.5% to 100% — the largest jump in their six-layer experiment. The fix overlays descriptions onto the tool response with this precedence: (1) semantic source via `SemanticSource.get_table_schema(schema, table)`, which is the canonical agent-facing authority; (2) `Column.description` from the adapter, which captures warehouse catalog comments; (3) field omitted entirely when both are empty, keeping responses tight.
+- **The `SemanticSource.get_table_schema` protocol method is no longer dead code from the tool layer's perspective.** All three built-in semantic sources (`YamlSource`, `DbtSource`, `CubeSource`) already populated `TableSchema.columns[*].description` from their respective inputs; the tool just never consulted them. Now it does.
+
+### Added
+
+- 3 new tests in `tests/test_tools/test_factory.py` covering the merge behaviour: `test_describe_table_includes_semantic_descriptions` (semantic-source descriptions reach the agent), `test_describe_table_falls_back_to_adapter_description` (adapter-supplied descriptions surface when the semantic source has no entry, and the field is omitted when both are empty), and `test_describe_table_semantic_overrides_adapter_description` (semantic source wins when both have descriptions for the same column).
+
+### Compatibility
+
+- **Backward-compatible response shape.** The new `description` field is *additive only* — consumers that ignore unknown keys see no behaviour change. The field is omitted (not set to `""`) when no description exists, so JSON payload size is unchanged for description-less columns.
+- **No new failure modes.** The merge guards `semantic_source is None`, `get_table_schema(...)` returning `None`, columns appearing in one source but not the other, and empty-string descriptions. A column described in the semantic source but absent from the warehouse is silently dropped — the adapter's column list is the source of truth for *which* columns exist; the semantic source only adorns them.
+- **No new dependencies.** The fix uses interfaces that already existed in the codebase.
+
+### Internal
+
+- `uv lock --upgrade` refreshed transitive dependencies (notable bumps: `sqlglot 30.6.0 → 30.8.0`, `langchain 1.2.17 → 1.3.0`, `langgraph 1.1.10 → 1.2.0`, `pydantic 2.13.3 → 2.13.4`, `cryptography 47.0.0 → 48.0.0`). Full 602-test suite + ruff + ty all green against the new versions.
+- `.pre-commit-config.yaml`: `ruff-pre-commit` rev bumped to `v0.15.13` to match the lockfile-pinned `ruff` binary, preventing the silent local-vs-hook drift where the same file passes `uv run ruff` but a stale hook env flags it.
+
 ## [0.20.0] - 2026-05-10
 
 ### Changed
diff --git a/docs/architecture.md b/docs/architecture.md
@@ -344,7 +344,7 @@ Two modes: tool factory for quick starts, middleware for BYO tools.
 
 ### 9 Tools
 
-1. **`describe_table(schema, table)`** — Column details from the database adapter
+1. **`describe_table(schema, table)`** — Column details, merging the database adapter's catalog view with authored descriptions from the semantic source (semantic wins; adapter fills gaps)
 2. **`preview_table(schema, table, limit?)`** — Sample rows
 3. **`list_metrics(domain?, tier?, indicator_kind?)`** — Browse metrics with filters
 4. **`lookup_metric(metric_name)`** — Full metric definition with SQL and impact edges
diff --git a/pyproject.toml b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "agentic-data-contracts"
-version = "0.20.0"
+version = "0.21.0"
 description = "YAML-first, domain-driven data governance for AI agents"
 readme = "README.md"
 requires-python = ">=3.12"
diff --git a/src/agentic_data_contracts/tools/factory.py b/src/agentic_data_contracts/tools/factory.py
@@ -244,9 +244,29 @@ async def describe_table(args: dict[str, Any]) -> dict[str, Any]:
                 f" for {qualified}."
             )
         ts = adapter.describe_table(schema_name, table_name)
-        cols = [
-            {"name": c.name, "type": c.type, "nullable": c.nullable} for c in ts.columns
-        ]
+        # Overlay authored descriptions from the semantic source onto adapter
+        # output. Semantic source wins because it is the canonical agent-facing
+        # documentation; adapter-supplied descriptions (e.g. warehouse column
+        # comments) fill in where the semantic source has no entry. Columns
+        # with no description anywhere omit the field to keep responses tight.
+        sem_descs: dict[str, str] = {}
+        if semantic_source is not None:
+            sem_ts = semantic_source.get_table_schema(schema_name, table_name)
+            if sem_ts is not None:
+                sem_descs = {
+                    c.name: c.description for c in sem_ts.columns if c.description
+                }
+        cols: list[dict[str, Any]] = []
+        for c in ts.columns:
+            col: dict[str, Any] = {
+                "name": c.name,
+                "type": c.type,
+                "nullable": c.nullable,
+            }
+            desc = sem_descs.get(c.name) or c.description
+            if desc:
+                col["description"] = desc
+            cols.append(col)
         return _text_response(
             json.dumps({"schema": schema_name, "table": table_name, "columns": cols})
         )
diff --git a/tests/test_tools/test_factory.py b/tests/test_tools/test_factory.py
@@ -3,6 +3,7 @@
 
 import pytest
 
+from agentic_data_contracts.adapters.base import Column, TableSchema
 from agentic_data_contracts.adapters.duckdb import DuckDBAdapter
 from agentic_data_contracts.core.contract import DataContract
 from agentic_data_contracts.semantic.yaml_source import YamlSource
@@ -94,6 +95,98 @@ async def test_describe_table_without_adapter(
     assert "unavailable" in text.lower() or "no database" in text.lower()
 
 
+@pytest.mark.asyncio
+async def test_describe_table_includes_semantic_descriptions(
+    contract: DataContract, adapter: DuckDBAdapter, semantic: YamlSource
+) -> None:
+    """Column descriptions from the semantic source must reach the agent."""
+    tools = create_tools(contract, adapter=adapter, semantic_source=semantic)
+    tool = next(t for t in tools if t.name == "describe_table")
+    result = await tool.callable({"schema": "analytics", "table": "orders"})
+    payload = json.loads(result["content"][0]["text"])
+    cols_by_name = {c["name"]: c for c in payload["columns"]}
+    assert cols_by_name["amount"]["description"] == "Order total in USD"
+    assert cols_by_name["tenant_id"]["description"] == (
+        "Tenant identifier for multi-tenancy"
+    )
+
+
+@pytest.mark.asyncio
+async def test_describe_table_falls_back_to_adapter_description(
+    contract: DataContract, semantic: YamlSource
+) -> None:
+    """When semantic source has no entry, adapter-supplied descriptions surface.
+
+    Mirrors deployments (e.g. Denodo) where the warehouse catalog already
+    carries authored column comments and the adapter populates Column.description.
+    """
+
+    class DescriptionAwareAdapter(DuckDBAdapter):
+        def describe_table(self, schema: str, table: str) -> TableSchema:
+            if (schema, table) == ("analytics", "subscriptions"):
+                return TableSchema(
+                    columns=[
+                        Column(name="id", type="INTEGER", description="Plan FK"),
+                        Column(
+                            name="plan",
+                            type="VARCHAR",
+                            description="Subscription tier from billing system",
+                        ),
+                        Column(name="tenant_id", type="VARCHAR"),
+                    ]
+                )
+            return super().describe_table(schema, table)
+
+    desc_adapter = DescriptionAwareAdapter(":memory:")
+    desc_adapter.connection.execute(
+        "CREATE SCHEMA analytics;"
+        "CREATE TABLE analytics.subscriptions ("
+        "id INTEGER, plan VARCHAR, tenant_id VARCHAR);"
+    )
+    tools = create_tools(contract, adapter=desc_adapter, semantic_source=semantic)
+    tool = next(t for t in tools if t.name == "describe_table")
+    result = await tool.callable({"schema": "analytics", "table": "subscriptions"})
+    payload = json.loads(result["content"][0]["text"])
+    cols_by_name = {c["name"]: c for c in payload["columns"]}
+    assert (
+        cols_by_name["plan"]["description"] == "Subscription tier from billing system"
+    )
+    # No description anywhere → field omitted to keep responses tight.
+    assert "description" not in cols_by_name["tenant_id"]
+
+
+@pytest.mark.asyncio
+async def test_describe_table_semantic_overrides_adapter_description(
+    contract: DataContract, semantic: YamlSource
+) -> None:
+    """Authored semantic-source descriptions win over adapter catalog comments."""
+
+    class CompetingAdapter(DuckDBAdapter):
+        def describe_table(self, schema: str, table: str) -> TableSchema:
+            return TableSchema(
+                columns=[
+                    Column(
+                        name="status",
+                        type="VARCHAR",
+                        description="catalog-side stale description",
+                    ),
+                ]
+            )
+
+    competing = CompetingAdapter(":memory:")
+    competing.connection.execute(
+        "CREATE SCHEMA analytics; CREATE TABLE analytics.orders (status VARCHAR);"
+    )
+    tools = create_tools(contract, adapter=competing, semantic_source=semantic)
+    tool = next(t for t in tools if t.name == "describe_table")
+    result = await tool.callable({"schema": "analytics", "table": "orders"})
+    payload = json.loads(result["content"][0]["text"])
+    cols_by_name = {c["name"]: c for c in payload["columns"]}
+    assert cols_by_name["status"]["description"] == (
+        "Order status: pending, completed, cancelled"
+    )
+
+
 @pytest.mark.asyncio
 async def test_run_query_valid(
     contract: DataContract, adapter: DuckDBAdapter, semantic: YamlSource
diff --git a/uv.lock b/uv.lock