feat(managed-databases): migrate to /v1/databases API and fix query catalog by eddietejeda · Pull Request #11 · hotdata-dev/hotdata-ibis

eddietejeda · 2026-05-23T18:41:20Z

Summary

Migrate managed databases to /v1/databases API — replaces the legacy /v1/connections approach with dedicated DatabasesApi endpoints (create_database, delete_database, list_databases, get_database). CreateConnectionRequest / build_managed_config are gone; CreateDatabaseRequest with schema/table declarations is used instead.
Fix managed table queries — the Hotdata server requires "default"."schema"."table" as the SQL catalog for managed databases (not the raw connection_id). The backend now uses "default" in generated SQL while mapping it back to the real connection_id for information-schema API calls via the new _resolve_database_connection_id helper.
Fix unknown query run statuses — the async polling loop in execute_query now raises immediately on statuses like "cancelled" or "timed_out" instead of spinning until the 600 s timeout.
Pass database_id through query execution — _safe_raw_sql, to_pyarrow, and _get_schema_using_query now forward self._database_id as the X-Database-Id header so managed-database context is preserved end-to-end.
README: new Managed databases section — complete working example covering create → upload → sync-wait → ibis expression query → raw SQL → drop, plus key points about the "default" catalog, table pre-declaration, and async load behaviour.

Test plan

Smoke-tested locally end-to-end: create database, upload table, wait for sync, ibis expression filter+order, con.sql() sum, to_pyarrow, drop database — all 8 steps pass
Unit tests updated to match new API shapes (test_hotdata_backend.py, test_hotdata_http.py)
Run uv run pytest to verify no regressions

🤖 Generated with Claude Code

…atalog Managed databases now use the dedicated `/v1/databases` endpoints instead of the legacy `/v1/connections` API. Fixes managed table queries by correctly using `"default"` as the SQL catalog (required by the server) while still routing info-schema API calls through the underlying `connection_id`. Changes: - http.py: add DatabasesApi; migrate create/delete/list/get to /v1/databases; add _IN_FLIGHT guard in execute_query so cancelled/timed-out runs raise immediately instead of spinning until timeout; add database_id kwarg to execute_query for X-Database-Id header support - backend.py: rewrite _resolve_managed_connection to use get_database by id + description fallback; fix _table_location to cache _database_id and _database_connection_id on resolution; add _resolve_database_connection_id helper; fix get_schema to use real connection_id when catalog == "default"; fix create_table to return "default" catalog for managed tables; thread database_id through _safe_raw_sql, to_pyarrow, and _get_schema_using_query; fix _infer_default_schema and _infer_default_connection to check cached values before making API calls; fix create_database/drop_database to use new databases API - managed.py: remove MANAGED_SOURCE_TYPE and build_managed_config (replaced by CreateDatabaseRequest) - README.md: add Managed databases section with complete working example and key points about the "default" catalog, table pre-declaration, and async sync - tests: update to match new API shapes and function signatures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-05-23T18:43:59Z

+        if self._database_connection_id is None:
+            try:
+                db = self._http.get_database(self._database_id)
+                self._database_connection_id = db.get("default_connection_id")
+            except HotdataAPIError:
+                pass


nit: (not blocking) the bare except HotdataAPIError: pass swallows every API error — including 500s, auth failures, and network errors — and silently falls through to using "default" as the connection_id in subsequent info-schema calls, producing a confusing downstream error. Consider letting non-404s propagate so the original cause surfaces:

if self._database_connection_id is None: try: db = self._http.get_database(self._database_id) except HotdataAPIError as exc: if exc.status_code == 404: return None raise _ibis_err_from_hotdata(exc) from exc self._database_connection_id = db.get("default_connection_id")

claude · 2026-05-23T18:44:07Z

+        # when the SQL catalog is "default" (the prefix managed databases require).
+        self._database_id = self._database_id or db_record["id"]
+        self._database_connection_id = conn_id


nit: (not blocking) asymmetric update — _database_id is "sticky" (only set when unset), but _database_connection_id is overwritten unconditionally. If a caller works with two managed databases through the same con (e.g. calls _table_location against db_A then db_B), _database_id keeps pointing at db_A while _database_connection_id points at db_B's connection. Subsequent execute_query calls then send X-Database-Id: <db_A_id> to the server while info-schema lookups go through db_B's connection_id — a hidden split-brain.

The README documents a one-managed-db-per-connection workflow, so this isn't hit on the happy path, but the two fields should move together. Either always update both, or always preserve both — whichever matches the intended invariant.

claude · 2026-05-23T18:44:13Z

+                if status not in _IN_FLIGHT:
+                    raise HotdataAPIError(f"Unexpected query run status: {status!r}")


nit: (not blocking) the PR description calls out this new behavior (raise immediately on "cancelled" / "timed_out" instead of spinning until the 600 s timeout) as a deliberate bug fix, but there's no test covering it. A small unit test that mocks /v1/query-runs/{id} to return status: "cancelled" and asserts HotdataAPIError raises promptly would lock the behavior in and prevent a future regression back to the 600s-spin.

claude · 2026-05-23T18:44:19Z

+        api_conn = (
+            self._resolve_database_connection_id() or conn
+            if conn == "default"
+            else conn
+        )


super nit: (not blocking) the combined or/ternary parses correctly but is harder to read at a glance — the or and the if/else are operating at different levels. A plain branch would be clearer:

if conn == "default": api_conn = self._resolve_database_connection_id() or conn else: api_conn = conn

claude Bot reviewed May 23, 2026

View reviewed changes

claude Bot approved these changes May 23, 2026

View reviewed changes

eddietejeda merged commit 5c3f63e into main May 24, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(managed-databases): migrate to /v1/databases API and fix query catalog#11

feat(managed-databases): migrate to /v1/databases API and fix query catalog#11
eddietejeda merged 1 commit into
mainfrom
feat/managed-databases-v2

eddietejeda commented May 23, 2026

Uh oh!

claude Bot May 23, 2026

Uh oh!

claude Bot May 23, 2026

Uh oh!

claude Bot May 23, 2026

Uh oh!

claude Bot May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if status not in _IN_FLIGHT:
		raise HotdataAPIError(f"Unexpected query run status: {status!r}")

Conversation

eddietejeda commented May 23, 2026

Summary

Test plan

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant