Skip to content

Commit 185fe7a

Browse files
committed
docs: rewrite README to lead with managed databases
Focus on create → upload → query → drop workflow as the primary pattern. Move connections/external sources to a secondary section. Add addressing summary table (create_table vs query catalog conventions).
1 parent baa9414 commit 185fe7a

1 file changed

Lines changed: 128 additions & 93 deletions

File tree

README.md

Lines changed: 128 additions & 93 deletions
Original file line numberDiff line numberDiff line change
@@ -1,168 +1,205 @@
11
# hotdata-ibis
22

3-
Use [Ibis](https://ibis-project.org/) to query and upload data in your [Hotdata](https://www.hotdata.dev/docs/api-reference) workspace — write Python expressions instead of SQL, get pandas or Arrow results back.
3+
Use [Ibis](https://ibis-project.org/) to create on-demand databases, upload data, and query with Python expressions get pandas or Arrow results back without writing SQL.
44

55
**Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.3.
66

77
## Install
88

99
```bash
10-
uv pip install hotdata-ibis
11-
# or: pip install hotdata-ibis
10+
pip install hotdata-ibis
11+
# or: uv pip install hotdata-ibis
1212
```
1313

14-
## Quick start
14+
## Quickstart: create a database and query it
1515

1616
```python
17+
import time
18+
import pandas as pd
1719
import ibis
1820

1921
con = ibis.hotdata.connect(
2022
api_url="https://api.hotdata.dev",
21-
token="YOUR_API_TOKEN",
22-
workspace_id="ws_",
23+
token="YOUR_API_KEY",
24+
workspace_id="ws_...",
2325
)
2426

25-
# List available tables
26-
con.list_tables()
27+
# 1. Create a database and declare the tables you'll load
28+
con.create_database("sales", schema="public", tables=["orders"])
29+
30+
# 2. Upload a pandas DataFrame (or PyArrow table)
31+
df = pd.DataFrame({
32+
"order_id": [1, 2, 3],
33+
"amount": [9.99, 49.99, 5.00],
34+
"region": ["west", "east", "west"],
35+
})
36+
con.create_table("orders", df, database=("sales", "public"), overwrite=True)
37+
38+
# 3. Uploads are async — wait briefly before querying
39+
time.sleep(2)
2740

28-
# Query with Ibis expressions
29-
t = con.table("customer", database=("my_connection", "tpch_sf1"))
30-
df = (
31-
t.filter(t.c_mktsegment == "AUTOMOBILE")
32-
.select("c_custkey", "c_name")
33-
.limit(100)
34-
.execute() # returns a pandas DataFrame
41+
# 4. Query with Ibis expressions
42+
# Managed tables are always accessed with catalog "default"
43+
t = con.table("orders", database=("default", "public"))
44+
result = (
45+
t.group_by("region")
46+
.agg(total=t.amount.sum())
47+
.order_by(ibis.desc("total"))
48+
.execute() # returns a pandas DataFrame
3549
)
50+
51+
# 5. Clean up
52+
con.drop_table("orders", database=("sales", "public"))
53+
con.drop_database("sales")
3654
```
3755

3856
## Connect
3957

4058
```python
4159
con = ibis.hotdata.connect(
4260
api_url="https://api.hotdata.dev",
43-
token="YOUR_API_TOKEN",
44-
workspace_id="ws_…",
45-
default_connection="my_connection", # skip qualifying every table reference
46-
default_schema="public", # skip qualifying every table reference
47-
session_id=None, # optional sandbox session
48-
timeout=120.0,
49-
verify_ssl=True,
50-
poll_interval_s=0.25,
51-
poll_timeout_s=600.0,
61+
token="YOUR_API_KEY",
62+
workspace_id="ws_...",
5263
)
5364
```
5465

55-
URL style also works — token can go in the query string or the URL password segment:
66+
URL-style also works:
67+
68+
```python
69+
con = ibis.connect("hotdata://api.hotdata.dev/?token=...&workspace_id=ws_...")
70+
```
71+
72+
## Managed databases
73+
74+
Managed databases are the primary way to bring data into Hotdata with Ibis. Declare a database and its tables, upload data, and query immediately.
75+
76+
### Create and load
5677

5778
```python
58-
con = ibis.connect("hotdata://api.hotdata.dev/?token=…&workspace_id=ws_…")
79+
# Declare the database and all table names up front
80+
con.create_database("analytics", schema="public", tables=["events", "users"])
81+
82+
# Upload from a pandas DataFrame
83+
con.create_table("events", events_df, database=("analytics", "public"), overwrite=True)
84+
85+
# PyArrow tables also work
86+
import pyarrow as pa
87+
table = pa.table({"id": [1, 2], "name": ["alice", "bob"]})
88+
con.create_table("users", table, database=("analytics", "public"), overwrite=True)
5989
```
6090

61-
**Table addressing:** Hotdata organizes data as `connection → schema → table`. In Ibis terms that maps to `catalog → database → table`. With a single connection and schema, defaults are inferred automatically. For multiple connections or schemas, pass `database=(connection_id, schema)` when referencing a table, or set `default_connection` / `default_schema` at connect time.
91+
Table names must be declared when the database is created — you cannot add new table names later without recreating the database.
92+
93+
### Query
94+
95+
When querying, use `"default"` as the catalog:
96+
97+
```python
98+
t = con.table("events", database=("default", "public"))
99+
100+
result = (
101+
t.filter(t.event_type == "click")
102+
.group_by("user_id")
103+
.agg(n=t.count())
104+
.execute()
105+
)
106+
```
107+
108+
Or with raw SQL:
109+
110+
```python
111+
result = con.sql(
112+
'SELECT user_id, COUNT(*) AS n '
113+
'FROM "default"."public"."events" '
114+
'WHERE event_type = \'click\' '
115+
'GROUP BY user_id'
116+
).execute()
117+
```
118+
119+
### Delete
120+
121+
```python
122+
con.drop_table("events", database=("analytics", "public"))
123+
con.drop_database("analytics")
124+
```
125+
126+
### Addressing summary
127+
128+
| Operation | `database=` argument |
129+
|-----------|----------------------|
130+
| `create_table` / `drop_table` | `("your-database-name", schema)` |
131+
| `con.table(...)` when querying | `("default", schema)` |
62132

63133
## Querying
64134

65135
### Ibis expressions
66136

67137
```python
68-
t = con.table("orders")
138+
t = con.table("orders", database=("default", "public"))
69139

70-
# Filter, select, aggregate — all run as SQL on Hotdata
71140
summary = (
72-
t.filter(t.status == "shipped")
141+
t.filter(t.amount > 10)
73142
.group_by("region")
74143
.agg(total=t.amount.sum(), n=t.count())
75-
.order_by("total", ascending=False)
144+
.order_by(ibis.desc("total"))
76145
.execute()
77146
)
78147
```
79148

80-
`.execute()` returns a **pandas DataFrame**. Use `.to_pyarrow()` for an Arrow table or `.to_pyarrow_batches()` for a record batch reader.
149+
`.execute()` returns a **pandas DataFrame**. Use `.to_pyarrow()` for an Arrow table or `.to_pyarrow_batches()` to stream batches without materializing the full result.
81150

82151
### Raw SQL
83152

84-
When you need Hotdata-specific syntax, federated table names, or SQL that Ibis doesn't model:
85-
86153
```python
87-
df = con.sql(
88-
"SELECT region, SUM(amount) AS total FROM my_conn.public.orders GROUP BY region",
154+
base = con.sql(
155+
'SELECT * FROM "default"."public"."orders"',
89156
dialect="postgres",
90-
).execute()
157+
)
158+
result = base.filter(base.amount > 10).execute()
91159
```
92160

93-
You can chain Ibis expressions on the result of `con.sql(...)` the same way you would on `con.table(...)`.
161+
You can chain Ibis expressions on the result of `con.sql(...)`.
94162

95-
### Discover what's available
163+
## Connecting to existing sources
96164

97-
```python
98-
con.list_catalogs() # Hotdata connection ids
99-
con.list_databases(catalog="my_connection") # schemas for a connection
100-
con.list_tables(database=("my_connection", "public"))
101-
con.get_schema("orders", catalog="my_connection", database="public")
102-
```
103-
104-
## Managed databases
105-
106-
Managed databases let you upload your own data (pandas DataFrames or PyArrow tables) and query it alongside your other Hotdata connections. They are provisioned on demand and scoped to your workspace.
165+
If you have existing databases or warehouses connected to your Hotdata workspace (Postgres, Snowflake, BigQuery, etc.), you can query them through the same Ibis connection:
107166

108167
```python
109-
import time
110-
import ibis
111-
import pandas as pd
112-
113168
con = ibis.hotdata.connect(
114169
api_url="https://api.hotdata.dev",
115-
token="YOUR_API_TOKEN",
116-
workspace_id="ws_…",
170+
token="YOUR_API_KEY",
171+
workspace_id="ws_...",
172+
default_connection="my_postgres",
173+
default_schema="public",
117174
)
118175

119-
# 1. Create the database and declare which tables you'll upload.
120-
# Table names must be declared here — uploads to undeclared names are rejected.
121-
con.create_database("my-dataset", schema="public", tables=["orders"])
122-
123-
# 2. Upload data.
124-
df = pd.DataFrame({"order_id": [1, 2, 3], "amount": [9.99, 49.99, 5.00]})
125-
con.create_table("orders", df, database=("my-dataset", "public"), overwrite=True)
126-
127-
# 3. Uploads are asynchronous — wait a moment before querying.
128-
time.sleep(2)
129-
130-
# 4. Query with Ibis expressions.
131-
# Managed tables use "default" as the catalog — the backend handles this automatically.
132-
t = con.table("orders", database=("default", "public"))
133-
result = t.filter(t.amount > 10).order_by("amount").execute()
176+
t = con.table("orders") # resolves to my_postgres.public.orders
177+
```
134178

135-
# 5. Or with raw SQL.
136-
result = con.sql('SELECT SUM(amount) AS total FROM "default"."public"."orders"').execute()
179+
Discover what's available:
137180

138-
# 6. Clean up.
139-
con.drop_table("orders", database=("my-dataset", "public"))
140-
con.drop_database("my-dataset")
181+
```python
182+
con.list_catalogs() # connection IDs
183+
con.list_databases(catalog="my_postgres") # schemas
184+
con.list_tables(database=("my_postgres", "public")) # tables
141185
```
142186

143-
**Things to know:**
144-
- Declare all table names in `create_database(..., tables=[...])` before uploading — you can't add them later without recreating the database.
145-
- Use `database=("my-dataset", schema)` when uploading (`create_table`) or dropping tables (`drop_table`).
146-
- Use `database=("default", schema)` when querying — managed tables always use `"default"` as the SQL catalog prefix.
147-
- `create_table` accepts pandas DataFrames, PyArrow tables, or an Ibis schema for creating an empty table.
148-
- Uploads use replace mode. Pass `overwrite=True` to replace a table that already exists; without it, uploading to an existing table raises an error.
149-
150187
## What's supported
151188

152189
| Feature | Status |
153-
|---|---|
154-
| `list_catalogs`, `list_databases`, `list_tables` ||
190+
|---------|--------|
191+
| `create_database` / `drop_database` (managed) ||
192+
| `create_table` / `drop_table` (DataFrame or Arrow upload) ||
155193
| `con.table(...)` with full schema metadata ||
156194
| Ibis expressions: filter, select, join, group\_by, agg, order\_by, limit ||
157195
| `con.sql(...)` raw SQL ||
158196
| `.execute()` → pandas, `.to_pyarrow()`, `.to_pyarrow_batches()` ||
159-
| `create_database` / `drop_database` (managed) ||
160-
| `create_table` / `drop_table` (managed, Parquet upload) ||
197+
| `list_catalogs`, `list_databases`, `list_tables` ||
161198
| Temporary tables ||
162199
| Python UDFs ||
163200
| INSERT / UPDATE / DELETE on external connections ||
164201

165-
SQL compilation uses Ibis's Postgres dialect as the closest fit. Most common `SELECT` workloads run fine; complex expressions may generate SQL that Hotdata doesn't support — use `con.sql(...)` as a fallback.
202+
SQL compilation uses Ibis's Postgres dialect. Use `con.sql(...)` as a fallback for expressions that don't compile cleanly.
166203

167204
## Development
168205

@@ -179,18 +216,16 @@ CI: `uv sync --locked && uv run pytest`.
179216
Set your credentials, then run any example script:
180217

181218
```bash
182-
export HOTDATA_API_KEY=
183-
export HOTDATA_WORKSPACE=
219+
export HOTDATA_API_KEY=...
220+
export HOTDATA_WORKSPACE=...
184221
uv run python examples/01_catalog_introspection.py
185222
uv run python examples/02_execute_sql.py 'SELECT COUNT(*) AS n FROM tpch.tpch_sf1.customer'
186223
uv run python examples/03_connect_via_url.py
187224
uv run python examples/04_ibis_table_workflows.py
188225
```
189226

190-
The examples assume a TPC-H dataset at `tpch.tpch_sf1`. To provision it: create a DuckDB connection in Hotdata, then run `CALL dbgen(sf = 1)` using DuckDB's [tpch extension](https://duckdb.org/docs/extensions/tpch.html).
191-
192227
## References
193228

229+
- [Hotdata documentation](https://www.hotdata.dev/docs/ibis)
194230
- [Hotdata Python SDK](https://github.com/hotdata-dev/sdk-python)
195-
- [Hotdata API reference](https://www.hotdata.dev/docs/api-reference) · [Hotdata SQL](https://www.hotdata.dev/docs/sql)
196-
- [Ibis documentation](https://ibis-project.org/) · [Ibis backend concepts](https://ibis-project.org/concepts/backend-table-hierarchy.qmd)
231+
- [Ibis documentation](https://ibis-project.org/)

0 commit comments

Comments
 (0)