Skip to content

Commit 9b82c58

Browse files
flyersworderclaude
andcommitted
release: v0.21.0 (describe_table emits column descriptions)
Closes the day-one gap where describe_table dropped Column.description on the way out, even when populated by the adapter or available in the semantic source. Now overlays descriptions with semantic-source-wins precedence, falling back to the adapter's Column.description, omitting the field when both are empty. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 5ce86ae commit 9b82c58

7 files changed

Lines changed: 528 additions & 337 deletions

File tree

.pre-commit-config.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
repos:
22
- repo: https://github.com/astral-sh/ruff-pre-commit
3-
rev: v0.15.10
3+
rev: v0.15.13
44
hooks:
55
- id: ruff-check
66
args: [--fix]

CHANGELOG.md

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,28 @@
22

33
All notable changes to this project will be documented in this file.
44

5+
## [0.21.0] - 2026-05-17
6+
7+
### Fixed
8+
9+
- **`describe_table` now emits column descriptions to the agent.** Since the tool factory's first commit (`0296613`), the tool serialised columns as `{name, type, nullable}` only — `Column.description` was silently dropped on the way out, even when populated by the adapter (e.g., a Denodo deployment carrying authored catalog comments) or available in the contract's semantic source. This is the single largest *context* improvement a data-contract library can make: per the [Datacult "boring work" benchmark](https://www.datacult.com/post/the-boring-work-that-makes-ai-analytics-actually-work-why-winning-with-ai-in-analytics-is-an-investment-in-a-rich-data-context-not-better-llm-models), adding column descriptions moved an agent's SQL accuracy from 0% to 15% and SQL generation from 38.5% to 100% — the largest jump in their six-layer experiment. The fix overlays descriptions onto the tool response with this precedence: (1) semantic source via `SemanticSource.get_table_schema(schema, table)`, which is the canonical agent-facing authority; (2) `Column.description` from the adapter, which captures warehouse catalog comments; (3) field omitted entirely when both are empty, keeping responses tight.
10+
- **The `SemanticSource.get_table_schema` protocol method is no longer dead code from the tool layer's perspective.** All three built-in semantic sources (`YamlSource`, `DbtSource`, `CubeSource`) already populated `TableSchema.columns[*].description` from their respective inputs; the tool just never consulted them. Now it does.
11+
12+
### Added
13+
14+
- 3 new tests in `tests/test_tools/test_factory.py` covering the merge behaviour: `test_describe_table_includes_semantic_descriptions` (semantic-source descriptions reach the agent), `test_describe_table_falls_back_to_adapter_description` (adapter-supplied descriptions surface when the semantic source has no entry, and the field is omitted when both are empty), and `test_describe_table_semantic_overrides_adapter_description` (semantic source wins when both have descriptions for the same column).
15+
16+
### Compatibility
17+
18+
- **Backward-compatible response shape.** The new `description` field is *additive only* — consumers that ignore unknown keys see no behaviour change. The field is omitted (not set to `""`) when no description exists, so JSON payload size is unchanged for description-less columns.
19+
- **No new failure modes.** The merge guards `semantic_source is None`, `get_table_schema(...)` returning `None`, columns appearing in one source but not the other, and empty-string descriptions. A column described in the semantic source but absent from the warehouse is silently dropped — the adapter's column list is the source of truth for *which* columns exist; the semantic source only adorns them.
20+
- **No new dependencies.** The fix uses interfaces that already existed in the codebase.
21+
22+
### Internal
23+
24+
- `uv lock --upgrade` refreshed transitive dependencies (notable bumps: `sqlglot 30.6.0 → 30.8.0`, `langchain 1.2.17 → 1.3.0`, `langgraph 1.1.10 → 1.2.0`, `pydantic 2.13.3 → 2.13.4`, `cryptography 47.0.0 → 48.0.0`). Full 602-test suite + ruff + ty all green against the new versions.
25+
- `.pre-commit-config.yaml`: `ruff-pre-commit` rev bumped to `v0.15.13` to match the lockfile-pinned `ruff` binary, preventing the silent local-vs-hook drift where the same file passes `uv run ruff` but a stale hook env flags it.
26+
527
## [0.20.0] - 2026-05-10
628

729
### Changed

docs/architecture.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -344,7 +344,7 @@ Two modes: tool factory for quick starts, middleware for BYO tools.
344344
345345
### 9 Tools
346346
347-
1. **`describe_table(schema, table)`** — Column details from the database adapter
347+
1. **`describe_table(schema, table)`** — Column details, merging the database adapter's catalog view with authored descriptions from the semantic source (semantic wins; adapter fills gaps)
348348
2. **`preview_table(schema, table, limit?)`** — Sample rows
349349
3. **`list_metrics(domain?, tier?, indicator_kind?)`** — Browse metrics with filters
350350
4. **`lookup_metric(metric_name)`** — Full metric definition with SQL and impact edges

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "agentic-data-contracts"
3-
version = "0.20.0"
3+
version = "0.21.0"
44
description = "YAML-first, domain-driven data governance for AI agents"
55
readme = "README.md"
66
requires-python = ">=3.12"

src/agentic_data_contracts/tools/factory.py

Lines changed: 23 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -244,9 +244,29 @@ async def describe_table(args: dict[str, Any]) -> dict[str, Any]:
244244
f" for {qualified}."
245245
)
246246
ts = adapter.describe_table(schema_name, table_name)
247-
cols = [
248-
{"name": c.name, "type": c.type, "nullable": c.nullable} for c in ts.columns
249-
]
247+
# Overlay authored descriptions from the semantic source onto adapter
248+
# output. Semantic source wins because it is the canonical agent-facing
249+
# documentation; adapter-supplied descriptions (e.g. warehouse column
250+
# comments) fill in where the semantic source has no entry. Columns
251+
# with no description anywhere omit the field to keep responses tight.
252+
sem_descs: dict[str, str] = {}
253+
if semantic_source is not None:
254+
sem_ts = semantic_source.get_table_schema(schema_name, table_name)
255+
if sem_ts is not None:
256+
sem_descs = {
257+
c.name: c.description for c in sem_ts.columns if c.description
258+
}
259+
cols: list[dict[str, Any]] = []
260+
for c in ts.columns:
261+
col: dict[str, Any] = {
262+
"name": c.name,
263+
"type": c.type,
264+
"nullable": c.nullable,
265+
}
266+
desc = sem_descs.get(c.name) or c.description
267+
if desc:
268+
col["description"] = desc
269+
cols.append(col)
250270
return _text_response(
251271
json.dumps({"schema": schema_name, "table": table_name, "columns": cols})
252272
)

tests/test_tools/test_factory.py

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33

44
import pytest
55

6+
from agentic_data_contracts.adapters.base import Column, TableSchema
67
from agentic_data_contracts.adapters.duckdb import DuckDBAdapter
78
from agentic_data_contracts.core.contract import DataContract
89
from agentic_data_contracts.semantic.yaml_source import YamlSource
@@ -94,6 +95,98 @@ async def test_describe_table_without_adapter(
9495
assert "unavailable" in text.lower() or "no database" in text.lower()
9596

9697

98+
@pytest.mark.asyncio
99+
async def test_describe_table_includes_semantic_descriptions(
100+
contract: DataContract, adapter: DuckDBAdapter, semantic: YamlSource
101+
) -> None:
102+
"""Column descriptions from the semantic source must reach the agent."""
103+
tools = create_tools(contract, adapter=adapter, semantic_source=semantic)
104+
tool = next(t for t in tools if t.name == "describe_table")
105+
result = await tool.callable({"schema": "analytics", "table": "orders"})
106+
payload = json.loads(result["content"][0]["text"])
107+
cols_by_name = {c["name"]: c for c in payload["columns"]}
108+
assert cols_by_name["amount"]["description"] == "Order total in USD"
109+
assert cols_by_name["tenant_id"]["description"] == (
110+
"Tenant identifier for multi-tenancy"
111+
)
112+
113+
114+
@pytest.mark.asyncio
115+
async def test_describe_table_falls_back_to_adapter_description(
116+
contract: DataContract, semantic: YamlSource
117+
) -> None:
118+
"""When semantic source has no entry, adapter-supplied descriptions surface.
119+
120+
Mirrors deployments (e.g. Denodo) where the warehouse catalog already
121+
carries authored column comments and the adapter populates Column.description.
122+
"""
123+
124+
class DescriptionAwareAdapter(DuckDBAdapter):
125+
def describe_table(self, schema: str, table: str) -> TableSchema:
126+
if (schema, table) == ("analytics", "subscriptions"):
127+
return TableSchema(
128+
columns=[
129+
Column(name="id", type="INTEGER", description="Plan FK"),
130+
Column(
131+
name="plan",
132+
type="VARCHAR",
133+
description="Subscription tier from billing system",
134+
),
135+
Column(name="tenant_id", type="VARCHAR"),
136+
]
137+
)
138+
return super().describe_table(schema, table)
139+
140+
desc_adapter = DescriptionAwareAdapter(":memory:")
141+
desc_adapter.connection.execute(
142+
"CREATE SCHEMA analytics;"
143+
"CREATE TABLE analytics.subscriptions ("
144+
"id INTEGER, plan VARCHAR, tenant_id VARCHAR);"
145+
)
146+
tools = create_tools(contract, adapter=desc_adapter, semantic_source=semantic)
147+
tool = next(t for t in tools if t.name == "describe_table")
148+
result = await tool.callable({"schema": "analytics", "table": "subscriptions"})
149+
payload = json.loads(result["content"][0]["text"])
150+
cols_by_name = {c["name"]: c for c in payload["columns"]}
151+
assert (
152+
cols_by_name["plan"]["description"] == "Subscription tier from billing system"
153+
)
154+
# No description anywhere → field omitted to keep responses tight.
155+
assert "description" not in cols_by_name["tenant_id"]
156+
157+
158+
@pytest.mark.asyncio
159+
async def test_describe_table_semantic_overrides_adapter_description(
160+
contract: DataContract, semantic: YamlSource
161+
) -> None:
162+
"""Authored semantic-source descriptions win over adapter catalog comments."""
163+
164+
class CompetingAdapter(DuckDBAdapter):
165+
def describe_table(self, schema: str, table: str) -> TableSchema:
166+
return TableSchema(
167+
columns=[
168+
Column(
169+
name="status",
170+
type="VARCHAR",
171+
description="catalog-side stale description",
172+
),
173+
]
174+
)
175+
176+
competing = CompetingAdapter(":memory:")
177+
competing.connection.execute(
178+
"CREATE SCHEMA analytics; CREATE TABLE analytics.orders (status VARCHAR);"
179+
)
180+
tools = create_tools(contract, adapter=competing, semantic_source=semantic)
181+
tool = next(t for t in tools if t.name == "describe_table")
182+
result = await tool.callable({"schema": "analytics", "table": "orders"})
183+
payload = json.loads(result["content"][0]["text"])
184+
cols_by_name = {c["name"]: c for c in payload["columns"]}
185+
assert cols_by_name["status"]["description"] == (
186+
"Order status: pending, completed, cancelled"
187+
)
188+
189+
97190
@pytest.mark.asyncio
98191
async def test_run_query_valid(
99192
contract: DataContract, adapter: DuckDBAdapter, semantic: YamlSource

0 commit comments

Comments
 (0)