docs: update README to reflect current implementation

eddietejeda · eddietejeda · commit 328185247c24 · 2026-05-26T14:54:37.000-07:00
- Correct ibis-framework version requirement to &gt;=10,&lt;11
- Document all connect() optional parameters with inline comments
- Add URL query string example with optional parameters
- Document schema-only create_table (empty table from schema)
- Document force=True on drop operations
- Note to_pyarrow_batches() downloads full result then splits locally
- Add in-memory tables (unsupported) and Arrow type mapping to feature table
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 Use [Ibis](https://ibis-project.org/) to create on-demand databases, upload data, and query with Python expressions — get pandas or Arrow results back without writing SQL.
 
-**Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.3.
+**Requirements:** Python 3.10+, **ibis-framework** ≥10,<11, **hotdata** ≥0.2.3.
 
 ## Install
 
@@ -60,13 +60,26 @@ con = ibis.hotdata.connect(
     api_url="https://api.hotdata.dev",
     token="YOUR_API_KEY",
     workspace_id="ws_...",
+    # optional
+    session_id=None,           # sandbox id (X-Session-Id header)
+    timeout=120.0,             # per-request HTTP timeout in seconds
+    verify_ssl=True,           # False to skip TLS verification, or path to CA bundle
+    default_connection=None,   # default catalog (connection id); auto-detected if only one exists
+    default_schema=None,       # default schema; auto-detected if only one exists
+    database_id=None,          # bind an existing managed database id at connect time
+    poll_interval_s=0.25,      # polling interval for async queries
+    poll_timeout_s=600.0,      # max time to wait for a query result
 )
 ```
 
-URL-style also works:
+URL-style also works, with the same parameters as query string keys:
 
 ```python
-con = ibis.connect("hotdata://api.hotdata.dev/?token=...&workspace_id=ws_...")
+con = ibis.connect(
+    "hotdata://api.hotdata.dev/"
+    "?token=...&workspace_id=ws_..."
+    "&default_connection=my_conn&default_schema=public"
+)
 ```
 
 ## Managed databases
@@ -86,9 +99,17 @@ con.create_table("events", events_df, database=("analytics", "public"), overwrit
 import pyarrow as pa
 table = pa.table({"id": [1, 2], "name": ["alice", "bob"]})
 con.create_table("users", table, database=("analytics", "public"), overwrite=True)
+
+# Schema-only (no data): creates an empty table with the declared schema
+import ibis.expr.schema as sch
+con.create_table(
+    "staging",
+    schema=sch.Schema({"id": "int64", "ts": "timestamp"}),
+    database=("analytics", "public"),
+)
 ```
 
-Table names must be declared when the database is created — you cannot add new table names later without recreating the database.
+Table names must be declared when the database is created — you cannot upload to a table name that was not listed in `tables=`.
 
 ### Query
 
@@ -118,9 +139,14 @@ result = con.sql(
 
 ### Delete
 
+Pass `force=True` to silently skip errors when the database or table does not exist:
+
 ```python
 con.drop_table("events", database=("analytics", "public"))
+con.drop_table("events", database=("analytics", "public"), force=True)  # no-op if missing
+
 con.drop_database("analytics")
+con.drop_database("analytics", force=True)  # no-op if missing
 ```
 
 ### Addressing summary
@@ -146,7 +172,7 @@ summary = (
 )
 ```
 
-`.execute()` returns a **pandas DataFrame**. Use `.to_pyarrow()` for an Arrow table or `.to_pyarrow_batches()` to stream batches without materializing the full result.
+`.execute()` returns a **pandas DataFrame**. `.to_pyarrow()` returns an Arrow table. `.to_pyarrow_batches()` returns a `RecordBatchReader` — note that Hotdata returns a single Arrow IPC payload per query, so this method downloads the full result first and then splits it into local batches.
 
 ### Raw SQL
 
@@ -189,17 +215,20 @@ con.list_tables(database=("my_postgres", "public"))    # tables
 | Feature | Status |
 |---------|--------|
 | `create_database` / `drop_database` (managed) | ✅ |
-| `create_table` / `drop_table` (DataFrame or Arrow upload) | ✅ |
+| `create_table` from pandas / PyArrow / schema-only | ✅ |
+| `drop_table` | ✅ |
 | `con.table(...)` with full schema metadata | ✅ |
 | Ibis expressions: filter, select, join, group\_by, agg, order\_by, limit | ✅ |
 | `con.sql(...)` raw SQL | ✅ |
 | `.execute()` → pandas, `.to_pyarrow()`, `.to_pyarrow_batches()` | ✅ |
 | `list_catalogs`, `list_databases`, `list_tables` | ✅ |
+| Arrow / Parquet column types (timestamp, decimal, list, duration, …) | ✅ |
 | Temporary tables | ❌ |
+| In-memory tables (`ibis.memtable(...)`) | ❌ |
 | Python UDFs | ❌ |
 | INSERT / UPDATE / DELETE on external connections | ❌ |
 
-SQL compilation uses Ibis's Postgres dialect. Use `con.sql(...)` as a fallback for expressions that don't compile cleanly.
+SQL compilation uses Ibis's Postgres dialect. Column types returned by Hotdata's information schema are resolved via PyArrow's type system, so Parquet-loaded tables with Arrow-native types (timestamps with time zones, decimals, lists, durations) are mapped correctly to Ibis types.
 
 ## Development