feat(managed-databases): migrate to /v1/databases API and fix query catalog#11
Conversation
…atalog Managed databases now use the dedicated `/v1/databases` endpoints instead of the legacy `/v1/connections` API. Fixes managed table queries by correctly using `"default"` as the SQL catalog (required by the server) while still routing info-schema API calls through the underlying `connection_id`. Changes: - http.py: add DatabasesApi; migrate create/delete/list/get to /v1/databases; add _IN_FLIGHT guard in execute_query so cancelled/timed-out runs raise immediately instead of spinning until timeout; add database_id kwarg to execute_query for X-Database-Id header support - backend.py: rewrite _resolve_managed_connection to use get_database by id + description fallback; fix _table_location to cache _database_id and _database_connection_id on resolution; add _resolve_database_connection_id helper; fix get_schema to use real connection_id when catalog == "default"; fix create_table to return "default" catalog for managed tables; thread database_id through _safe_raw_sql, to_pyarrow, and _get_schema_using_query; fix _infer_default_schema and _infer_default_connection to check cached values before making API calls; fix create_database/drop_database to use new databases API - managed.py: remove MANAGED_SOURCE_TYPE and build_managed_config (replaced by CreateDatabaseRequest) - README.md: add Managed databases section with complete working example and key points about the "default" catalog, table pre-declaration, and async sync - tests: update to match new API shapes and function signatures Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
| if self._database_connection_id is None: | ||
| try: | ||
| db = self._http.get_database(self._database_id) | ||
| self._database_connection_id = db.get("default_connection_id") | ||
| except HotdataAPIError: | ||
| pass |
There was a problem hiding this comment.
nit: (not blocking) the bare except HotdataAPIError: pass swallows every API error — including 500s, auth failures, and network errors — and silently falls through to using "default" as the connection_id in subsequent info-schema calls, producing a confusing downstream error. Consider letting non-404s propagate so the original cause surfaces:
if self._database_connection_id is None:
try:
db = self._http.get_database(self._database_id)
except HotdataAPIError as exc:
if exc.status_code == 404:
return None
raise _ibis_err_from_hotdata(exc) from exc
self._database_connection_id = db.get("default_connection_id")| # when the SQL catalog is "default" (the prefix managed databases require). | ||
| self._database_id = self._database_id or db_record["id"] | ||
| self._database_connection_id = conn_id |
There was a problem hiding this comment.
nit: (not blocking) asymmetric update — _database_id is "sticky" (only set when unset), but _database_connection_id is overwritten unconditionally. If a caller works with two managed databases through the same con (e.g. calls _table_location against db_A then db_B), _database_id keeps pointing at db_A while _database_connection_id points at db_B's connection. Subsequent execute_query calls then send X-Database-Id: <db_A_id> to the server while info-schema lookups go through db_B's connection_id — a hidden split-brain.
The README documents a one-managed-db-per-connection workflow, so this isn't hit on the happy path, but the two fields should move together. Either always update both, or always preserve both — whichever matches the intended invariant.
| if status not in _IN_FLIGHT: | ||
| raise HotdataAPIError(f"Unexpected query run status: {status!r}") |
There was a problem hiding this comment.
nit: (not blocking) the PR description calls out this new behavior (raise immediately on "cancelled" / "timed_out" instead of spinning until the 600 s timeout) as a deliberate bug fix, but there's no test covering it. A small unit test that mocks /v1/query-runs/{id} to return status: "cancelled" and asserts HotdataAPIError raises promptly would lock the behavior in and prevent a future regression back to the 600s-spin.
| api_conn = ( | ||
| self._resolve_database_connection_id() or conn | ||
| if conn == "default" | ||
| else conn | ||
| ) |
There was a problem hiding this comment.
super nit: (not blocking) the combined or/ternary parses correctly but is harder to read at a glance — the or and the if/else are operating at different levels. A plain branch would be clearer:
if conn == "default":
api_conn = self._resolve_database_connection_id() or conn
else:
api_conn = conn
Summary
/v1/databasesAPI — replaces the legacy/v1/connectionsapproach with dedicatedDatabasesApiendpoints (create_database,delete_database,list_databases,get_database).CreateConnectionRequest/build_managed_configare gone;CreateDatabaseRequestwith schema/table declarations is used instead."default"."schema"."table"as the SQL catalog for managed databases (not the rawconnection_id). The backend now uses"default"in generated SQL while mapping it back to the realconnection_idfor information-schema API calls via the new_resolve_database_connection_idhelper.execute_querynow raises immediately on statuses like"cancelled"or"timed_out"instead of spinning until the 600 s timeout.database_idthrough query execution —_safe_raw_sql,to_pyarrow, and_get_schema_using_querynow forwardself._database_idas theX-Database-Idheader so managed-database context is preserved end-to-end."default"catalog, table pre-declaration, and async load behaviour.Test plan
con.sql()sum,to_pyarrow, drop database — all 8 steps passtest_hotdata_backend.py,test_hotdata_http.py)uv run pytestto verify no regressions🤖 Generated with Claude Code