fix(types): handle Arrow-style type names from Parquet/managed tables; update docs by eddietejeda · Pull Request #12 · hotdata-dev/hotdata-ibis

eddietejeda · 2026-05-24T18:27:43Z

Summary

types.py: add _ARROW_TYPE_MAP to handle Arrow-style type names (Date32, Float64, Utf8, LargeBinary, etc.) returned by the information schema for Parquet/managed-table columns. Previously these fell through to PostgresType.from_string() and were silently mapped to String.
tests/test_hotdata_types.py: parametrized test covering all Arrow-style names and case-insensitivity.
README.md: bump hotdata requirement to ≥0.2.3; document Arrow-style type support in the feature list and the Connect → Types section.

Test plan

uv run pytest tests/test_hotdata_types.py — new test_dtype_from_hotdata_arrow_type_names cases pass
uv run pytest — full suite passes
README version badge / requirement line reads hotdata ≥0.2.3

🤖 Generated with Claude Code

…; update docs - types.py: add _ARROW_TYPE_MAP for Arrow-style names (Date32, Float64, Utf8, etc.) returned by the information_schema for Parquet/managed table columns - tests: add parametrized test_dtype_from_hotdata_arrow_type_names covering all Arrow-style names and case-insensitivity - README: update hotdata requirement to >=0.2.3; document Arrow-style type support in both the feature list and the Connect → Types section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- backend.py: add _find_managed_connection helper that returns None for not-found vs raising IbisError; use it in create_database so real 5xx API failures are no longer swallowed by the broad `except IbisError: pass` - backend.py: always overwrite _database_id in _table_location (drop the `or`) so both cached fields stay in sync when multiple managed databases are used - backend.py: add explicit parens to api_conn ternary in get_schema for clarity - backend.py: document the database_id parameter in do_connect docstring - README.md: rewrite as user-facing docs — quick start first, plain language, no private method calls in examples, support table replaces spec-style prose Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

claude · 2026-05-24T18:42:27Z

+    # time
+    "time32": dt.Time,
+    "time64": dt.Time,
+}


nit: the map handles unsigned Arrow ints but not signed ones (Int8, Int16, Int32, Int64). Because the Postgres dialect treats int8 as an alias for BIGINT and int4 as INTEGER, an Arrow-style Int8 (signed 8-bit) column will silently fall through to PostgresType.from_string("Int8") and resolve to dt.Int64 — an 8× widening rather than a USERDEFINED fallback. Parquet schemas routinely produce Int8/Int16 for small signed ints, so if Hotdata can ever emit those names, consider adding them explicitly:

"int8": dt.Int8, "int16": dt.Int16, "int32": dt.Int32, "int64": dt.Int64,

(not blocking — only matters if Hotdata actually returns these names; the unsigned counterparts being present suggests it may.)