Skip to content

feat(managed-databases): migrate to /v1/databases API and fix query catalog#11

Merged
eddietejeda merged 1 commit into
mainfrom
feat/managed-databases-v2
May 24, 2026
Merged

feat(managed-databases): migrate to /v1/databases API and fix query catalog#11
eddietejeda merged 1 commit into
mainfrom
feat/managed-databases-v2

Conversation

@eddietejeda

Copy link
Copy Markdown
Contributor

Summary

  • Migrate managed databases to /v1/databases API — replaces the legacy /v1/connections approach with dedicated DatabasesApi endpoints (create_database, delete_database, list_databases, get_database). CreateConnectionRequest / build_managed_config are gone; CreateDatabaseRequest with schema/table declarations is used instead.
  • Fix managed table queries — the Hotdata server requires "default"."schema"."table" as the SQL catalog for managed databases (not the raw connection_id). The backend now uses "default" in generated SQL while mapping it back to the real connection_id for information-schema API calls via the new _resolve_database_connection_id helper.
  • Fix unknown query run statuses — the async polling loop in execute_query now raises immediately on statuses like "cancelled" or "timed_out" instead of spinning until the 600 s timeout.
  • Pass database_id through query execution_safe_raw_sql, to_pyarrow, and _get_schema_using_query now forward self._database_id as the X-Database-Id header so managed-database context is preserved end-to-end.
  • README: new Managed databases section — complete working example covering create → upload → sync-wait → ibis expression query → raw SQL → drop, plus key points about the "default" catalog, table pre-declaration, and async load behaviour.

Test plan

  • Smoke-tested locally end-to-end: create database, upload table, wait for sync, ibis expression filter+order, con.sql() sum, to_pyarrow, drop database — all 8 steps pass
  • Unit tests updated to match new API shapes (test_hotdata_backend.py, test_hotdata_http.py)
  • Run uv run pytest to verify no regressions

🤖 Generated with Claude Code

…atalog

Managed databases now use the dedicated `/v1/databases` endpoints instead of
the legacy `/v1/connections` API. Fixes managed table queries by correctly
using `"default"` as the SQL catalog (required by the server) while still
routing info-schema API calls through the underlying `connection_id`.

Changes:
- http.py: add DatabasesApi; migrate create/delete/list/get to /v1/databases;
  add _IN_FLIGHT guard in execute_query so cancelled/timed-out runs raise
  immediately instead of spinning until timeout; add database_id kwarg to
  execute_query for X-Database-Id header support
- backend.py: rewrite _resolve_managed_connection to use get_database by id +
  description fallback; fix _table_location to cache _database_id and
  _database_connection_id on resolution; add _resolve_database_connection_id
  helper; fix get_schema to use real connection_id when catalog == "default";
  fix create_table to return "default" catalog for managed tables; thread
  database_id through _safe_raw_sql, to_pyarrow, and _get_schema_using_query;
  fix _infer_default_schema and _infer_default_connection to check cached
  values before making API calls; fix create_database/drop_database to use
  new databases API
- managed.py: remove MANAGED_SOURCE_TYPE and build_managed_config (replaced
  by CreateDatabaseRequest)
- README.md: add Managed databases section with complete working example and
  key points about the "default" catalog, table pre-declaration, and async sync
- tests: update to match new API shapes and function signatures

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Comment on lines +409 to +414
if self._database_connection_id is None:
try:
db = self._http.get_database(self._database_id)
self._database_connection_id = db.get("default_connection_id")
except HotdataAPIError:
pass

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (not blocking) the bare except HotdataAPIError: pass swallows every API error — including 500s, auth failures, and network errors — and silently falls through to using "default" as the connection_id in subsequent info-schema calls, producing a confusing downstream error. Consider letting non-404s propagate so the original cause surfaces:

if self._database_connection_id is None:
    try:
        db = self._http.get_database(self._database_id)
    except HotdataAPIError as exc:
        if exc.status_code == 404:
            return None
        raise _ibis_err_from_hotdata(exc) from exc
    self._database_connection_id = db.get("default_connection_id")

Comment on lines +395 to +397
# when the SQL catalog is "default" (the prefix managed databases require).
self._database_id = self._database_id or db_record["id"]
self._database_connection_id = conn_id

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (not blocking) asymmetric update — _database_id is "sticky" (only set when unset), but _database_connection_id is overwritten unconditionally. If a caller works with two managed databases through the same con (e.g. calls _table_location against db_A then db_B), _database_id keeps pointing at db_A while _database_connection_id points at db_B's connection. Subsequent execute_query calls then send X-Database-Id: <db_A_id> to the server while info-schema lookups go through db_B's connection_id — a hidden split-brain.

The README documents a one-managed-db-per-connection workflow, so this isn't hit on the happy path, but the two fields should move together. Either always update both, or always preserve both — whichever matches the intended invariant.

Comment thread src/ibis_hotdata/http.py
Comment on lines +173 to +174
if status not in _IN_FLIGHT:
raise HotdataAPIError(f"Unexpected query run status: {status!r}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: (not blocking) the PR description calls out this new behavior (raise immediately on "cancelled" / "timed_out" instead of spinning until the 600 s timeout) as a deliberate bug fix, but there's no test covering it. A small unit test that mocks /v1/query-runs/{id} to return status: "cancelled" and asserts HotdataAPIError raises promptly would lock the behavior in and prevent a future regression back to the 600s-spin.

Comment on lines +430 to +434
api_conn = (
self._resolve_database_connection_id() or conn
if conn == "default"
else conn
)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super nit: (not blocking) the combined or/ternary parses correctly but is harder to read at a glance — the or and the if/else are operating at different levels. A plain branch would be clearer:

if conn == "default":
    api_conn = self._resolve_database_connection_id() or conn
else:
    api_conn = conn

@eddietejeda eddietejeda merged commit 5c3f63e into main May 24, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant