hotdata-dev
diff --git a/‎README.md‎
Lines changed: 56 additions & 2 deletions b/‎README.md‎
Lines changed: 56 additions & 2 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion b/‎pyproject.toml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎src/ibis_hotdata/backend.py‎
Lines changed: 73 additions & 29 deletions b/‎src/ibis_hotdata/backend.py‎
Lines changed: 73 additions & 29 deletions
@@ -52,6 +52,8 @@ con = ibis.connect(
 
 **Mapping:** Ibis **catalog** = Hotdata connection id; **database** = remote schema; **table** = table name. SQL references look like `connection.schema.table`. With a single connection and schema, defaults are inferred; otherwise set `default_connection` / `default_schema` or qualify `con.table(..., database=(conn_id, schema))`.
 
+> **Managed databases:** SQL and Ibis expressions against managed database tables use `"default"` as the catalog rather than the connection id. The backend resolves this automatically — see [Managed databases](#managed-databases) below.
+
 **Execution:** SQL is compiled with Ibis’s **Postgres** SQLGlot compiler. The client submits queries asynchronously with `POST /v1/query`, polls `GET /v1/query-runs/{id}`, then downloads ready results as Arrow IPC from `GET /v1/results/{id}`. Tuning: `poll_interval_s`, `poll_timeout_s` on `connect()`.
 
 **Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query and Arrow schema; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
@@ -68,8 +70,8 @@ Supported today:
 - **SQL-backed expressions:** Ibis expressions compile with the Postgres SQLGlot compiler and execute through Hotdata. Common `SELECT` workloads such as projection, filtering, joins, grouping, aggregation, ordering, limits, scalar expressions, and `con.sql(...)` work when the generated SQL is accepted by Hotdata.
 - **Result materialization:** `.execute()` returns pandas objects. `.to_pyarrow()` and `.to_pyarrow_batches()` use the Arrow IPC result data exposed by Hotdata without converting through JSON rows; batches are split locally after the result is downloaded.
 - **Raw SQL escape hatch:** `con.sql("SELECT ...", dialect="postgres")` is the most reliable way to use Hotdata-specific federated table names or SQL that Ibis does not model directly.
-- **Managed database lifecycle:** `create_database("sales", schema="public", tables=["orders"])` registers a managed connection (Ibis catalog). `create_table("orders", pandas_df, database=("sales", "public"))` uploads Parquet and loads it with replace mode. Query as `sales.public.orders` in SQL. `drop_table` clears a managed table; `drop_database` deletes the connection.
-- **Parquet uploads:** `create_table` accepts pandas DataFrames, PyArrow tables, or schema-only empty tables. Tables must live in a managed connection — declare them with `create_database(..., tables=[...])` first. Loads always use replace mode; pass `overwrite=True` to replace an existing synced table (the default `overwrite=False` raises if the table already exists).
+- **Managed database lifecycle:** `create_database("sales", schema="public", tables=["orders"])` provisions a managed connection (Ibis catalog). `create_table("orders", pandas_df, database=("sales", "public"))` uploads Parquet and loads it. Query using `database=("default", "public")` or the `"default"."public"."orders"` SQL prefix. `drop_table` clears a managed table; `drop_database` deletes the connection. See [Managed databases](#managed-databases) for a complete example.
+- **Parquet uploads:** `create_table` accepts pandas DataFrames, PyArrow tables, or schema-only empty tables. Tables must live in a managed connection — declare them with `create_database(..., tables=[...])` first. Loads are asynchronous; poll `_managed_table_synced(conn_id, schema, table)` if you need to query immediately. Loads always use replace mode; pass `overwrite=True` to replace an existing synced table (the default `overwrite=False` raises if the table already exists).
 
 Not supported as full Ibis backend features:
 
@@ -81,6 +83,58 @@ Not supported as full Ibis backend features:
 - **Complete Ibis compliance:** The backend is experimental and has focused test coverage for connection, discovery, schema mapping, execution, uploads, and Arrow results. It has not yet been validated against the full Ibis backend test suite.
 - **Hotdata platform APIs beyond SQL and managed databases:** embeddings, indexes, query history management, sandbox lifecycle management, and other Hotdata-specific APIs are outside the Ibis backend surface.
 
+## Managed databases
+
+Managed databases are temporary, workspace-owned connections for uploading and querying your own data. Tables must be declared at creation time, loads are asynchronous, and SQL uses `"default"` as the catalog (not the raw connection id).
+
+```python
+import time
+import ibis
+import pandas as pd
+
+con = ibis.hotdata.connect(
+    api_url="https://api.hotdata.dev",
+    token="YOUR_API_TOKEN",
+    workspace_id="ws_…",
+)
+
+# 1. Create the managed database and declare tables upfront.
+#    Tables must be declared here — load_managed_table rejects undeclared names.
+con.create_database("my-dataset", schema="public", tables=["orders"])
+
+# 2. Resolve the database id + underlying connection id.
+db = con._resolve_managed_connection("my-dataset")
+db_id   = db["id"]                      # "dbid…"
+conn_id = db["default_connection_id"]   # "conn…"
+
+# 3. Upload data (pandas DataFrame or PyArrow table).
+df = pd.DataFrame({"order_id": [1, 2, 3], "amount": [9.99, 49.99, 5.00]})
+con.create_table("orders", df, database=(db_id, "public"), overwrite=True)
+
+# 4. Loads are async — wait for the table to sync before querying.
+while not con._managed_table_synced(conn_id, "public", "orders"):
+    time.sleep(1)
+
+# 5. Query with Ibis expressions.
+#    Use database=("default", schema) — managed databases require "default" as the
+#    SQL catalog; the backend resolves the underlying connection automatically.
+t = con.table("orders", database=("default", "public"))
+result = t.filter(t.amount > 10).order_by("amount").execute()
+
+# 6. Or with raw SQL (same "default" catalog prefix).
+result = con.sql('SELECT sum(amount) AS total FROM "default"."public"."orders"').execute()
+
+# 7. Clean up.
+con.drop_database("my-dataset")
+```
+
+**Key points:**
+- `create_database(..., tables=[...])` — table names must be listed here before uploading.
+- `create_table(..., database=(db_id, schema))` — pass the managed database id (from `_resolve_managed_connection`) as the first element of the tuple, not the connection id.
+- SQL catalog is `"default"`, not the connection id — `"default"."schema"."table"` is the correct form.
+- After `create_table`, ibis table references automatically use `database=("default", schema)`; use the same form for subsequent `con.table(...)` calls.
+- Loads are asynchronous. Poll `_managed_table_synced(conn_id, schema, table)` or add a small sleep before querying.
+
 ## Development
 
 ```bash
 
@@ -24,7 +24,7 @@ classifiers = [
 ]
 dependencies = [
   "ibis-framework>=10.0,<11",
-  "hotdata>=0.2.0",
+  "hotdata>=0.2.3",
   "pyarrow>=15",
   "pyarrow-hotfix>=0.6",
   "pandas>=2",
 
@@ -43,7 +43,7 @@
 from ibis.backends.sql import SQLBackend
 
 from ibis_hotdata.http import HotdataAPIError, HotdataClient
-from ibis_hotdata.managed import DEFAULT_SCHEMA, MANAGED_SOURCE_TYPE
+from ibis_hotdata.managed import DEFAULT_SCHEMA
 from ibis_hotdata.types import dtype_from_hotdata_sql_type
 
 _INFORMATION_SCHEMA_PAGE_SIZE = 500
@@ -144,6 +144,7 @@ def do_connect(
         verify_ssl: bool | str = True,
         default_connection: str | None = None,
         default_schema: str | None = None,
+        database_id: str | None = None,
         poll_interval_s: float = 0.25,
         poll_timeout_s: float = 600.0,
     ) -> None:
@@ -181,6 +182,9 @@ def do_connect(
         self.disconnect()
         self._default_connection = default_connection
         self._default_schema = default_schema
+        self._database_id = database_id
+        # Resolved lazily: the actual connection_id behind _database_id (for info schema API).
+        self._database_connection_id: str | None = None
         self._poll_interval_s = poll_interval_s
         self._poll_timeout_s = poll_timeout_s
 
@@ -200,9 +204,9 @@ def disconnect(self) -> None:
     # --- hierarchy ---------------------------------------------------------
 
     def _infer_default_connection(self) -> str:
-        ids = self._connection_ids()
         if self._default_connection is not None:
             return self._default_connection
+        ids = self._connection_ids()
         if len(ids) == 1:
             self._default_connection = ids[0]
             return self._default_connection
@@ -212,6 +216,8 @@ def _infer_default_connection(self) -> str:
         )
 
     def _infer_default_schema(self, connection_id: str) -> str:
+        if self._default_schema is not None:
+            return self._default_schema
         schemas = sorted(
             {
                 row["schema"]
@@ -220,12 +226,6 @@ def _infer_default_schema(self, connection_id: str) -> str:
                 )
             }
         )
-        if self._default_schema is not None:
-            if self._default_schema not in schemas:
-                raise com.IbisInputError(
-                    f"Unknown schema {self._default_schema!r} for connection {connection_id!r}"
-                )
-            return self._default_schema
         if len(schemas) == 1:
             self._default_schema = schemas[0]
             return self._default_schema
@@ -331,13 +331,22 @@ def _resolve_connection(self, name_or_id: str) -> dict[str, Any]:
         raise com.IbisError(f"Unknown Hotdata connection {name_or_id!r}")
 
     def _resolve_managed_connection(self, name_or_id: str) -> dict[str, Any]:
-        conn = self._resolve_connection(name_or_id)
-        if conn.get("source_type") != MANAGED_SOURCE_TYPE:
-            raise com.IbisInputError(
-                f"{name_or_id!r} is not a managed database "
-                f"(source_type={conn.get('source_type')!r})"
-            )
-        return conn
+        """Resolve a managed database by id or description, returning its detail dict."""
+        # Try direct ID lookup first
+        try:
+            return self._http.get_database(name_or_id)
+        except HotdataAPIError as exc:
+            if exc.status_code != 404:
+                raise _ibis_err_from_hotdata(exc) from exc
+        # Fall back to description scan
+        data = self._http.list_databases()
+        for db in data.get("databases", []):
+            if db.get("description") == name_or_id:
+                try:
+                    return self._http.get_database(db["id"])
+                except HotdataAPIError as exc:
+                    raise _ibis_err_from_hotdata(exc) from exc
+        raise com.IbisError(f"Unknown managed database {name_or_id!r}")
 
     def _managed_table_synced(
         self,
@@ -380,8 +389,30 @@ def _table_location(
                 raise com.IbisInputError(
                     "create_table with database=schema requires default_connection or current catalog"
                 )
-        conn_record = self._resolve_managed_connection(conn)
-        return conn_record["id"], schema
+        db_record = self._resolve_managed_connection(conn)
+        conn_id = db_record["default_connection_id"]
+        # Keep the cached mapping in sync so get_schema can use the real connection_id
+        # when the SQL catalog is "default" (the prefix managed databases require).
+        self._database_id = self._database_id or db_record["id"]
+        self._database_connection_id = conn_id
+        return conn_id, schema
+
+    def _resolve_database_connection_id(self) -> str | None:
+        """Return the actual connection_id for the current managed database context.
+
+        Managed database SQL uses ``"default"`` as the catalog, but the information
+        schema REST API still needs the real ``connection_id``.  This method resolves
+        that mapping lazily and caches the result.
+        """
+        if self._database_id is None:
+            return None
+        if self._database_connection_id is None:
+            try:
+                db = self._http.get_database(self._database_id)
+                self._database_connection_id = db.get("default_connection_id")
+            except HotdataAPIError:
+                pass
+        return self._database_connection_id
 
     # --- schema / sql execution --------------------------------------------
 
@@ -394,9 +425,16 @@ def get_schema(
     ) -> sch.Schema:
         conn = catalog or self.current_catalog
         schema_name = database or self.current_database
+        # Managed database tables use "default" as the SQL catalog but the info
+        # schema REST API needs the real connection_id.
+        api_conn = (
+            self._resolve_database_connection_id() or conn
+            if conn == "default"
+            else conn
+        )
         matches: list[dict[str, Any]] = []
         for row in self._iterate_information_schema(
-            {"connection_id": conn, "schema": schema_name, "table": table_name},
+            {"connection_id": api_conn, "schema": schema_name, "table": table_name},
             include_columns=True,
         ):
             if row["table"] == table_name and row["schema"] == schema_name:
@@ -420,6 +458,7 @@ def _get_schema_using_query(self, query: str) -> sch.Schema:
         try:
             data = self._http.execute_query(
                 preview_sql,
+                database_id=self._database_id,
                 poll_interval_s=self._poll_interval_s,
                 poll_timeout_s=self._poll_timeout_s,
             )
@@ -441,6 +480,7 @@ def _safe_raw_sql(
         try:
             payload = self._http.execute_query(
                 query,
+                database_id=self._database_id,
                 poll_interval_s=self._poll_interval_s,
                 poll_timeout_s=self._poll_timeout_s,
             )
@@ -470,6 +510,7 @@ def to_pyarrow(
         try:
             payload = self._http.execute_query(
                 sql,
+                database_id=self._database_id,
                 poll_interval_s=self._poll_interval_s,
                 poll_timeout_s=self._poll_timeout_s,
             )
@@ -525,21 +566,18 @@ def create_database(
             raise com.UnsupportedOperationError(
                 "Hotdata create_database creates a managed connection (catalog); catalog= is not supported"
             )
+        # Check if a database with this description already exists
+        existing = None
         try:
-            existing = self._resolve_connection(name)
+            existing = self._resolve_managed_connection(name)
         except com.IbisError:
-            existing = None
+            pass
         if existing is not None:
             if not force:
                 raise com.IbisInputError(f"Managed database {name!r} already exists")
-            if existing.get("source_type") != MANAGED_SOURCE_TYPE:
-                raise com.IbisInputError(
-                    f"{name!r} is not a managed database "
-                    f"(source_type={existing.get('source_type')!r})"
-                )
             return
         try:
-            self._http.create_managed_database(name, schema=schema, tables=list(tables or ()))
+            self._http.create_managed_database(description=name, schema=schema, tables=list(tables or ()))
         except HotdataAPIError as exc:
             raise _ibis_err_from_hotdata(exc) from exc
 
@@ -557,15 +595,15 @@ def drop_database(
                 "Hotdata drop_database deletes a managed connection (catalog); catalog= is not supported"
             )
         try:
-            conn = self._resolve_managed_connection(name)
+            db = self._resolve_managed_connection(name)
         except com.IbisInputError:
             raise
         except com.IbisError:
             if force:
                 return
             raise
         try:
-            self._http.delete_connection(conn["id"])
+            self._http.delete_database(db["id"])
         except HotdataAPIError as exc:
             if force and exc.status_code == 404:
                 return
@@ -622,6 +660,9 @@ def create_table(
 
         data = self._local_table_to_parquet(obj, schema)
         connection_id, schema_name = self._table_location(database)
+        # Cache the resolved connection_id so get_schema can use it for info schema
+        # API calls when the "default" catalog is used in managed database contexts.
+        self._database_connection_id = connection_id
         if not overwrite and self._managed_table_synced(connection_id, schema_name, name):
             raise com.IbisInputError(
                 f"Table {name!r} already exists; pass overwrite=True to replace"
@@ -636,7 +677,10 @@ def create_table(
             )
         except HotdataAPIError as exc:
             raise _ibis_err_from_hotdata(exc) from exc
-        return self.table(name, database=(connection_id, schema_name))
+        # Managed database SQL requires "default" as the catalog prefix, not the
+        # raw connection_id.  _table_location always sets _database_id when resolving
+        # a managed connection, so we can always use the "default" catalog here.
+        return self.table(name, database=("default", schema_name))
 
     def drop_table(
         self,
Original file line number	Diff line number	Diff line change
`@@ -24,7 +24,7 @@ classifiers = [`
`24`	`24`	`]`
`25`	`25`	`dependencies = [`
`26`	`26`	`"ibis-framework>=10.0,<11",`
`27`		`- "hotdata>=0.2.0",`
	`27`	`+ "hotdata>=0.2.3",`
`28`	`28`	`"pyarrow>=15",`
`29`	`29`	`"pyarrow-hotfix>=0.6",`
`30`	`30`	`"pandas>=2",`