You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(managed-databases): migrate to /v1/databases API and fix query catalog
Managed databases now use the dedicated `/v1/databases` endpoints instead of
the legacy `/v1/connections` API. Fixes managed table queries by correctly
using `"default"` as the SQL catalog (required by the server) while still
routing info-schema API calls through the underlying `connection_id`.
Changes:
- http.py: add DatabasesApi; migrate create/delete/list/get to /v1/databases;
add _IN_FLIGHT guard in execute_query so cancelled/timed-out runs raise
immediately instead of spinning until timeout; add database_id kwarg to
execute_query for X-Database-Id header support
- backend.py: rewrite _resolve_managed_connection to use get_database by id +
description fallback; fix _table_location to cache _database_id and
_database_connection_id on resolution; add _resolve_database_connection_id
helper; fix get_schema to use real connection_id when catalog == "default";
fix create_table to return "default" catalog for managed tables; thread
database_id through _safe_raw_sql, to_pyarrow, and _get_schema_using_query;
fix _infer_default_schema and _infer_default_connection to check cached
values before making API calls; fix create_database/drop_database to use
new databases API
- managed.py: remove MANAGED_SOURCE_TYPE and build_managed_config (replaced
by CreateDatabaseRequest)
- README.md: add Managed databases section with complete working example and
key points about the "default" catalog, table pre-declaration, and async sync
- tests: update to match new API shapes and function signatures
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: README.md
+56-2Lines changed: 56 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -52,6 +52,8 @@ con = ibis.connect(
52
52
53
53
**Mapping:** Ibis **catalog** = Hotdata connection id; **database** = remote schema; **table** = table name. SQL references look like `connection.schema.table`. With a single connection and schema, defaults are inferred; otherwise set `default_connection` / `default_schema` or qualify `con.table(..., database=(conn_id, schema))`.
54
54
55
+
> **Managed databases:** SQL and Ibis expressions against managed database tables use `"default"` as the catalog rather than the connection id. The backend resolves this automatically — see [Managed databases](#managed-databases) below.
56
+
55
57
**Execution:** SQL is compiled with Ibis’s **Postgres** SQLGlot compiler. The client submits queries asynchronously with `POST /v1/query`, polls `GET /v1/query-runs/{id}`, then downloads ready results as Arrow IPC from `GET /v1/results/{id}`. Tuning: `poll_interval_s`, `poll_timeout_s` on `connect()`.
56
58
57
59
**Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query and Arrow schema; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
@@ -68,8 +70,8 @@ Supported today:
68
70
-**SQL-backed expressions:** Ibis expressions compile with the Postgres SQLGlot compiler and execute through Hotdata. Common `SELECT` workloads such as projection, filtering, joins, grouping, aggregation, ordering, limits, scalar expressions, and `con.sql(...)` work when the generated SQL is accepted by Hotdata.
69
71
-**Result materialization:**`.execute()` returns pandas objects. `.to_pyarrow()` and `.to_pyarrow_batches()` use the Arrow IPC result data exposed by Hotdata without converting through JSON rows; batches are split locally after the result is downloaded.
70
72
-**Raw SQL escape hatch:**`con.sql("SELECT ...", dialect="postgres")` is the most reliable way to use Hotdata-specific federated table names or SQL that Ibis does not model directly.
71
-
-**Managed database lifecycle:**`create_database("sales", schema="public", tables=["orders"])`registers a managed connection (Ibis catalog). `create_table("orders", pandas_df, database=("sales", "public"))` uploads Parquet and loads it with replace mode. Query as `sales.public.orders` in SQL. `drop_table` clears a managed table; `drop_database` deletes the connection.
72
-
-**Parquet uploads:**`create_table` accepts pandas DataFrames, PyArrow tables, or schema-only empty tables. Tables must live in a managed connection — declare them with `create_database(..., tables=[...])` first. Loads always use replace mode; pass `overwrite=True` to replace an existing synced table (the default `overwrite=False` raises if the table already exists).
73
+
-**Managed database lifecycle:**`create_database("sales", schema="public", tables=["orders"])`provisions a managed connection (Ibis catalog). `create_table("orders", pandas_df, database=("sales", "public"))` uploads Parquet and loads it. Query using `database=("default", "public")` or the `"default"."public"."orders"`SQL prefix. `drop_table` clears a managed table; `drop_database` deletes the connection. See [Managed databases](#managed-databases) for a complete example.
74
+
-**Parquet uploads:**`create_table` accepts pandas DataFrames, PyArrow tables, or schema-only empty tables. Tables must live in a managed connection — declare them with `create_database(..., tables=[...])` first. Loads are asynchronous; poll `_managed_table_synced(conn_id, schema, table)` if you need to query immediately. Loads always use replace mode; pass `overwrite=True` to replace an existing synced table (the default `overwrite=False` raises if the table already exists).
73
75
74
76
Not supported as full Ibis backend features:
75
77
@@ -81,6 +83,58 @@ Not supported as full Ibis backend features:
81
83
-**Complete Ibis compliance:** The backend is experimental and has focused test coverage for connection, discovery, schema mapping, execution, uploads, and Arrow results. It has not yet been validated against the full Ibis backend test suite.
82
84
-**Hotdata platform APIs beyond SQL and managed databases:** embeddings, indexes, query history management, sandbox lifecycle management, and other Hotdata-specific APIs are outside the Ibis backend surface.
83
85
86
+
## Managed databases
87
+
88
+
Managed databases are temporary, workspace-owned connections for uploading and querying your own data. Tables must be declared at creation time, loads are asynchronous, and SQL uses `"default"` as the catalog (not the raw connection id).
89
+
90
+
```python
91
+
import time
92
+
import ibis
93
+
import pandas as pd
94
+
95
+
con = ibis.hotdata.connect(
96
+
api_url="https://api.hotdata.dev",
97
+
token="YOUR_API_TOKEN",
98
+
workspace_id="ws_…",
99
+
)
100
+
101
+
# 1. Create the managed database and declare tables upfront.
102
+
# Tables must be declared here — load_managed_table rejects undeclared names.
# Use database=("default", schema) — managed databases require "default" as the
120
+
# SQL catalog; the backend resolves the underlying connection automatically.
121
+
t = con.table("orders", database=("default", "public"))
122
+
result = t.filter(t.amount >10).order_by("amount").execute()
123
+
124
+
# 6. Or with raw SQL (same "default" catalog prefix).
125
+
result = con.sql('SELECT sum(amount) AS total FROM "default"."public"."orders"').execute()
126
+
127
+
# 7. Clean up.
128
+
con.drop_database("my-dataset")
129
+
```
130
+
131
+
**Key points:**
132
+
-`create_database(..., tables=[...])` — table names must be listed here before uploading.
133
+
-`create_table(..., database=(db_id, schema))` — pass the managed database id (from `_resolve_managed_connection`) as the first element of the tuple, not the connection id.
134
+
- SQL catalog is `"default"`, not the connection id — `"default"."schema"."table"` is the correct form.
135
+
- After `create_table`, ibis table references automatically use `database=("default", schema)`; use the same form for subsequent `con.table(...)` calls.
136
+
- Loads are asynchronous. Poll `_managed_table_synced(conn_id, schema, table)` or add a small sleep before querying.
0 commit comments