Skip to content

Commit 3281852

Browse files
committed
docs: update README to reflect current implementation
- Correct ibis-framework version requirement to >=10,<11 - Document all connect() optional parameters with inline comments - Add URL query string example with optional parameters - Document schema-only create_table (empty table from schema) - Document force=True on drop operations - Note to_pyarrow_batches() downloads full result then splits locally - Add in-memory tables (unsupported) and Arrow type mapping to feature table
1 parent 80c82ed commit 3281852

1 file changed

Lines changed: 36 additions & 7 deletions

File tree

README.md

Lines changed: 36 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Use [Ibis](https://ibis-project.org/) to create on-demand databases, upload data, and query with Python expressions — get pandas or Arrow results back without writing SQL.
44

5-
**Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.3.
5+
**Requirements:** Python 3.10+, **ibis-framework** ≥10,<11, **hotdata** ≥0.2.3.
66

77
## Install
88

@@ -60,13 +60,26 @@ con = ibis.hotdata.connect(
6060
api_url="https://api.hotdata.dev",
6161
token="YOUR_API_KEY",
6262
workspace_id="ws_...",
63+
# optional
64+
session_id=None, # sandbox id (X-Session-Id header)
65+
timeout=120.0, # per-request HTTP timeout in seconds
66+
verify_ssl=True, # False to skip TLS verification, or path to CA bundle
67+
default_connection=None, # default catalog (connection id); auto-detected if only one exists
68+
default_schema=None, # default schema; auto-detected if only one exists
69+
database_id=None, # bind an existing managed database id at connect time
70+
poll_interval_s=0.25, # polling interval for async queries
71+
poll_timeout_s=600.0, # max time to wait for a query result
6372
)
6473
```
6574

66-
URL-style also works:
75+
URL-style also works, with the same parameters as query string keys:
6776

6877
```python
69-
con = ibis.connect("hotdata://api.hotdata.dev/?token=...&workspace_id=ws_...")
78+
con = ibis.connect(
79+
"hotdata://api.hotdata.dev/"
80+
"?token=...&workspace_id=ws_..."
81+
"&default_connection=my_conn&default_schema=public"
82+
)
7083
```
7184

7285
## Managed databases
@@ -86,9 +99,17 @@ con.create_table("events", events_df, database=("analytics", "public"), overwrit
8699
import pyarrow as pa
87100
table = pa.table({"id": [1, 2], "name": ["alice", "bob"]})
88101
con.create_table("users", table, database=("analytics", "public"), overwrite=True)
102+
103+
# Schema-only (no data): creates an empty table with the declared schema
104+
import ibis.expr.schema as sch
105+
con.create_table(
106+
"staging",
107+
schema=sch.Schema({"id": "int64", "ts": "timestamp"}),
108+
database=("analytics", "public"),
109+
)
89110
```
90111

91-
Table names must be declared when the database is created — you cannot add new table names later without recreating the database.
112+
Table names must be declared when the database is created — you cannot upload to a table name that was not listed in `tables=`.
92113

93114
### Query
94115

@@ -118,9 +139,14 @@ result = con.sql(
118139

119140
### Delete
120141

142+
Pass `force=True` to silently skip errors when the database or table does not exist:
143+
121144
```python
122145
con.drop_table("events", database=("analytics", "public"))
146+
con.drop_table("events", database=("analytics", "public"), force=True) # no-op if missing
147+
123148
con.drop_database("analytics")
149+
con.drop_database("analytics", force=True) # no-op if missing
124150
```
125151

126152
### Addressing summary
@@ -146,7 +172,7 @@ summary = (
146172
)
147173
```
148174

149-
`.execute()` returns a **pandas DataFrame**. Use `.to_pyarrow()` for an Arrow table or `.to_pyarrow_batches()` to stream batches without materializing the full result.
175+
`.execute()` returns a **pandas DataFrame**. `.to_pyarrow()` returns an Arrow table. `.to_pyarrow_batches()` returns a `RecordBatchReader` — note that Hotdata returns a single Arrow IPC payload per query, so this method downloads the full result first and then splits it into local batches.
150176

151177
### Raw SQL
152178

@@ -189,17 +215,20 @@ con.list_tables(database=("my_postgres", "public")) # tables
189215
| Feature | Status |
190216
|---------|--------|
191217
| `create_database` / `drop_database` (managed) ||
192-
| `create_table` / `drop_table` (DataFrame or Arrow upload) ||
218+
| `create_table` from pandas / PyArrow / schema-only ||
219+
| `drop_table` ||
193220
| `con.table(...)` with full schema metadata ||
194221
| Ibis expressions: filter, select, join, group\_by, agg, order\_by, limit ||
195222
| `con.sql(...)` raw SQL ||
196223
| `.execute()` → pandas, `.to_pyarrow()`, `.to_pyarrow_batches()` ||
197224
| `list_catalogs`, `list_databases`, `list_tables` ||
225+
| Arrow / Parquet column types (timestamp, decimal, list, duration, …) ||
198226
| Temporary tables ||
227+
| In-memory tables (`ibis.memtable(...)`) ||
199228
| Python UDFs ||
200229
| INSERT / UPDATE / DELETE on external connections ||
201230

202-
SQL compilation uses Ibis's Postgres dialect. Use `con.sql(...)` as a fallback for expressions that don't compile cleanly.
231+
SQL compilation uses Ibis's Postgres dialect. Column types returned by Hotdata's information schema are resolved via PyArrow's type system, so Parquet-loaded tables with Arrow-native types (timestamps with time zones, decimals, lists, durations) are mapped correctly to Ibis types.
203232

204233
## Development
205234

0 commit comments

Comments
 (0)