Skip to content

Commit d43d8a5

Browse files
eddietejedaclaude
andcommitted
fix(types): handle Arrow-style type names from Parquet/managed tables; update docs
- types.py: add _ARROW_TYPE_MAP for Arrow-style names (Date32, Float64, Utf8, etc.) returned by the information_schema for Parquet/managed table columns - tests: add parametrized test_dtype_from_hotdata_arrow_type_names covering all Arrow-style names and case-insensitivity - README: update hotdata requirement to >=0.2.3; document Arrow-style type support in both the feature list and the Connect → Types section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent c5815ce commit d43d8a5

3 files changed

Lines changed: 71 additions & 4 deletions

File tree

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
Experimental [Ibis](https://ibis-project.org/) backend for [Hotdata](https://www.hotdata.dev/docs/api-reference): compile expressions with Ibis, run federated SQL over the Hotdata API. REST calls use the official **[hotdata](https://github.com/hotdata-dev/sdk-python)** Python SDK. Repo examples use **httpx** (listed under the **dev** dependency group).
44

5-
**Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.
5+
**Requirements:** Python 3.10+, **ibis-framework** 10.x, **hotdata** ≥0.2.3.
66

77
## Install
88

@@ -16,7 +16,7 @@ uv pip install hotdata-ibis
1616
- **Ibis connection API** — connect with `ibis.hotdata.connect(...)` or `ibis.connect("hotdata://...")`.
1717
- **Hotdata catalog mapping** — expose Hotdata connections, schemas, and tables through Ibis catalogs, databases, and tables.
1818
- **SQL-backed expression execution** — compile Ibis expressions with the Postgres SQLGlot compiler and execute them through Hotdata query APIs.
19-
- **Typed table discovery** — load schema metadata from Hotdata information schema and map SQL types into Ibis types.
19+
- **Typed table discovery** — load schema metadata from Hotdata information schema and map SQL types into Ibis types. Both SQL-style names (`INTEGER`, `VARCHAR`) and Arrow-style names (`Float64`, `Utf8`) returned by Parquet/managed tables are handled.
2020
- **Arrow and pandas results** — materialize expressions as pandas DataFrames, PyArrow tables, or local Arrow record batches.
2121
- **Raw SQL escape hatch** — use `con.sql(..., dialect="postgres")` when Hotdata-specific federated SQL is clearer than modeled Ibis expressions.
2222
- **Managed database writes** — create managed connections with `create_database`, load local pandas or PyArrow data through `create_table`, and clean up with `drop_table` / `drop_database`.
@@ -56,7 +56,7 @@ con = ibis.connect(
5656
5757
**Execution:** SQL is compiled with Ibis’s **Postgres** SQLGlot compiler. The client submits queries asynchronously with `POST /v1/query`, polls `GET /v1/query-runs/{id}`, then downloads ready results as Arrow IPC from `GET /v1/results/{id}`. Tuning: `poll_interval_s`, `poll_timeout_s` on `connect()`.
5858

59-
**Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query and Arrow schema; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
59+
**Types:** Typed tables come from Hotdata’s information schema. `con.sql(...)` types are inferred from a small preview query and Arrow schema. Both SQL-style names (`INTEGER`, `DOUBLE PRECISION`) and Arrow-style names (`Float64`, `Utf8`, `Date32`) returned by Parquet/managed tables are supported; see [Hotdata SQL](https://www.hotdata.dev/docs/sql) for server behavior.
6060

6161
## Ibis Support Overview
6262

src/ibis_hotdata/types.py

Lines changed: 39 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,11 +5,49 @@
55
import ibis.expr.datatypes as dt
66
from ibis.backends.sql.datatypes import PostgresType
77

8+
# Arrow-style type names returned by Hotdata's information_schema when tables are
9+
# loaded from Parquet/Arrow sources. PostgresType.from_string() treats these as
10+
# USERDEFINED unknowns, so we resolve them explicitly before falling through.
11+
_ARROW_TYPE_MAP: dict[str, type[dt.DataType]] = {
12+
# dates
13+
"date32": dt.Date,
14+
"date64": dt.Date,
15+
# floats
16+
"float16": dt.Float16,
17+
"float32": dt.Float32,
18+
"float64": dt.Float64,
19+
# unsigned ints
20+
"uint8": dt.UInt8,
21+
"uint16": dt.UInt16,
22+
"uint32": dt.UInt32,
23+
"uint64": dt.UInt64,
24+
# strings
25+
"utf8": dt.String,
26+
"largeutf8": dt.String,
27+
# binary
28+
"largebinary": dt.Binary,
29+
# time
30+
"time32": dt.Time,
31+
"time64": dt.Time,
32+
}
33+
834

935
def dtype_from_hotdata_sql_type(sql_type: str | None, *, nullable: bool) -> dt.DataType:
10-
"""Best-effort mapping from Hotdata `/information_schema` column `data_type` strings."""
36+
"""Best-effort mapping from Hotdata `/information_schema` column `data_type` strings.
37+
38+
Hotdata may return either SQL-style names (``INTEGER``, ``VARCHAR``, ``DOUBLE
39+
PRECISION``, …) or Arrow-style names (``Date32``, ``Float64``, ``Utf8``, …).
40+
SQL-style names are delegated to the Postgres dialect parser; Arrow-style names
41+
are resolved via an explicit lookup table before falling back to the parser.
42+
"""
1143
if not sql_type:
1244
return dt.String(nullable=nullable)
45+
46+
# Arrow-style names (case-insensitive lookup).
47+
arrow_cls = _ARROW_TYPE_MAP.get(sql_type.strip().lower())
48+
if arrow_cls is not None:
49+
return arrow_cls(nullable=nullable)
50+
1351
try:
1452
return PostgresType.from_string(sql_type.strip(), nullable=nullable)
1553
except Exception: # ibis/sqlglot raise a variety of parse errors; fall back to String

tests/test_hotdata_types.py

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,32 @@ def test_dtype_from_hotdata_vendor_name_maps_or_string_fallback():
3636
def test_dtype_from_hotdata_malformed_fallback_string():
3737
out = dtype_from_hotdata_sql_type('"', nullable=False)
3838
assert isinstance(out, dt.String)
39+
40+
41+
@pytest.mark.parametrize(
42+
("sql_type", "nullable", "expected_cls"),
43+
[
44+
# Arrow-style names returned when tables are loaded from Parquet/Arrow sources
45+
("Date32", True, dt.Date),
46+
("Date64", False, dt.Date),
47+
("Float32", True, dt.Float32),
48+
("Float64", False, dt.Float64),
49+
("UInt8", True, dt.UInt8),
50+
("UInt16", True, dt.UInt16),
51+
("UInt32", True, dt.UInt32),
52+
("UInt64", True, dt.UInt64),
53+
("Utf8", True, dt.String),
54+
("LargeUtf8", False, dt.String),
55+
("LargeBinary", True, dt.Binary),
56+
("Time32", True, dt.Time),
57+
("Time64", False, dt.Time),
58+
# Case-insensitive
59+
("date32", True, dt.Date),
60+
("FLOAT64", True, dt.Float64),
61+
("UTF8", True, dt.String),
62+
],
63+
)
64+
def test_dtype_from_hotdata_arrow_type_names(sql_type, nullable, expected_cls):
65+
out = dtype_from_hotdata_sql_type(sql_type, nullable=nullable)
66+
assert out.nullable is nullable
67+
assert isinstance(out, expected_cls)

0 commit comments

Comments
 (0)