Use Ibis to create on-demand databases, upload data, and query with Python expressions — get pandas or Arrow results back without writing SQL.
Requirements: Python 3.10+, ibis-framework ≥10,<11, hotdata ≥0.2.3.
pip install hotdata-ibis
# or: uv pip install hotdata-ibisimport time
import pandas as pd
import ibis
con = ibis.hotdata.connect(
api_url="https://api.hotdata.dev",
token="YOUR_API_KEY",
workspace_id="ws_...",
)
# 1. Create a database and declare the tables you'll load
con.create_database("sales", tables=["orders"])
# 2. Upload a pandas DataFrame (or PyArrow table)
df = pd.DataFrame({
"order_id": [1, 2, 3],
"amount": [9.99, 49.99, 5.00],
"region": ["west", "east", "west"],
})
con.create_table("orders", df, database=("sales", "main"), overwrite=True)
# 3. Uploads are async — wait briefly before querying
time.sleep(2)
# 4. Query with Ibis expressions
# Managed tables are always accessed with catalog "default"
t = con.table("orders", database=("default", "main"))
result = (
t.group_by("region")
.agg(total=t.amount.sum())
.order_by(ibis.desc("total"))
.execute() # returns a pandas DataFrame
)
# 5. Clean up
con.drop_table("orders", database=("sales", "main"))
con.drop_database("sales")con = ibis.hotdata.connect(
api_url="https://api.hotdata.dev",
token="YOUR_API_KEY",
workspace_id="ws_...",
# optional
session_id=None, # sandbox id (X-Session-Id header)
timeout=120.0, # per-request HTTP timeout in seconds
verify_ssl=True, # False to skip TLS verification, or path to CA bundle
default_connection=None, # default catalog (connection id); auto-detected if only one exists
default_schema=None, # default schema; auto-detected if only one exists
database_id=None, # bind an existing managed database id at connect time
poll_interval_s=0.25, # polling interval for async queries
poll_timeout_s=600.0, # max time to wait for a query result
)URL-style also works, with the same parameters as query string keys:
con = ibis.connect(
"hotdata://api.hotdata.dev/"
"?token=...&workspace_id=ws_..."
"&default_connection=my_conn&default_schema=public"
)Managed databases are the primary way to bring data into Hotdata with Ibis. Declare a database and its tables, upload data, and query immediately.
# Declare the database and all table names up front
con.create_database("analytics", tables=["events", "users"])
# Upload from a pandas DataFrame
con.create_table("events", events_df, database=("analytics", "main"), overwrite=True)
# PyArrow tables also work
import pyarrow as pa
table = pa.table({"id": [1, 2], "name": ["alice", "bob"]})
con.create_table("users", table, database=("analytics", "main"), overwrite=True)
# Schema-only (no data): creates an empty table with the declared schema
import ibis.expr.schema as sch
con.create_table(
"staging",
schema=sch.Schema({"id": "int64", "ts": "timestamp"}),
database=("analytics", "main"),
)Table names must be declared when the database is created — you cannot upload to a table name that was not listed in tables=.
When querying, use "default" as the catalog:
t = con.table("events", database=("default", "main"))
result = (
t.filter(t.event_type == "click")
.group_by("user_id")
.agg(n=t.count())
.execute()
)Or with raw SQL:
result = con.sql(
'SELECT user_id, COUNT(*) AS n '
'FROM "default"."main"."events" '
'WHERE event_type = \'click\' '
'GROUP BY user_id'
).execute()Pass force=True to silently skip errors when the database or table does not exist:
con.drop_table("events", database=("analytics", "main"))
con.drop_table("events", database=("analytics", "main"), force=True) # no-op if missing
con.drop_database("analytics")
con.drop_database("analytics", force=True) # no-op if missing| Operation | database= argument |
|---|---|
create_table / drop_table |
("your-database-name", schema) |
con.table(...) when querying |
("default", schema) |
t = con.table("orders", database=("default", "main"))
summary = (
t.filter(t.amount > 10)
.group_by("region")
.agg(total=t.amount.sum(), n=t.count())
.order_by(ibis.desc("total"))
.execute()
).execute() returns a pandas DataFrame. .to_pyarrow() returns an Arrow table. .to_pyarrow_batches() returns a RecordBatchReader — note that Hotdata returns a single Arrow IPC payload per query, so this method downloads the full result first and then splits it into local batches.
base = con.sql(
'SELECT * FROM "default"."main"."orders"',
dialect="postgres",
)
result = base.filter(base.amount > 10).execute()You can chain Ibis expressions on the result of con.sql(...).
If you have existing databases or warehouses connected to your Hotdata workspace (Postgres, Snowflake, BigQuery, etc.), you can query them through the same Ibis connection:
con = ibis.hotdata.connect(
api_url="https://api.hotdata.dev",
token="YOUR_API_KEY",
workspace_id="ws_...",
default_connection="my_postgres",
default_schema="public",
)
t = con.table("orders") # resolves to my_postgres.public.ordersDiscover what's available:
con.list_catalogs() # connection IDs
con.list_databases(catalog="my_postgres") # schemas
con.list_tables(database=("my_postgres", "public")) # tables| Feature | Status |
|---|---|
create_database / drop_database (managed) |
✅ |
create_table from pandas / PyArrow / schema-only |
✅ |
drop_table |
✅ |
con.table(...) with full schema metadata |
✅ |
| Ibis expressions: filter, select, join, group_by, agg, order_by, limit | ✅ |
con.sql(...) raw SQL |
✅ |
.execute() → pandas, .to_pyarrow(), .to_pyarrow_batches() |
✅ |
list_catalogs, list_databases, list_tables |
✅ |
| Arrow / Parquet column types (timestamp, decimal, list, duration, …) | ✅ |
| Temporary tables | ❌ |
In-memory tables (ibis.memtable(...)) |
❌ |
| Python UDFs | ❌ |
| INSERT / UPDATE / DELETE on external connections | ❌ |
SQL compilation uses Ibis's Postgres dialect. Column types returned by Hotdata's information schema are resolved via PyArrow's type system, so Parquet-loaded tables with Arrow-native types (timestamps with time zones, decimals, lists, durations) are mapped correctly to Ibis types.
uv sync # installs dev group (pytest, ruff, httpx)
uv run pytest
uv run ruff check src testsCI: uv sync --locked && uv run pytest.
Set your credentials, then run any example script:
export HOTDATA_API_KEY=...
export HOTDATA_WORKSPACE=...
uv run python examples/01_catalog_introspection.py
uv run python examples/02_execute_sql.py 'SELECT COUNT(*) AS n FROM tpch.tpch_sf1.customer'
uv run python examples/03_connect_via_url.py
uv run python examples/04_ibis_table_workflows.py