fix(types): handle Arrow-style type names from Parquet/managed tables; update docs#12
Merged
Merged
Conversation
…; update docs - types.py: add _ARROW_TYPE_MAP for Arrow-style names (Date32, Float64, Utf8, etc.) returned by the information_schema for Parquet/managed table columns - tests: add parametrized test_dtype_from_hotdata_arrow_type_names covering all Arrow-style names and case-insensitivity - README: update hotdata requirement to >=0.2.3; document Arrow-style type support in both the feature list and the Connect → Types section Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- backend.py: add _find_managed_connection helper that returns None for not-found vs raising IbisError; use it in create_database so real 5xx API failures are no longer swallowed by the broad `except IbisError: pass` - backend.py: always overwrite _database_id in _table_location (drop the `or`) so both cached fields stay in sync when multiple managed databases are used - backend.py: add explicit parens to api_conn ternary in get_schema for clarity - backend.py: document the database_id parameter in do_connect docstring - README.md: rewrite as user-facing docs — quick start first, plain language, no private method calls in examples, support table replaces spec-style prose Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
6c66835 to
33785bb
Compare
| # time | ||
| "time32": dt.Time, | ||
| "time64": dt.Time, | ||
| } |
There was a problem hiding this comment.
nit: the map handles unsigned Arrow ints but not signed ones (Int8, Int16, Int32, Int64). Because the Postgres dialect treats int8 as an alias for BIGINT and int4 as INTEGER, an Arrow-style Int8 (signed 8-bit) column will silently fall through to PostgresType.from_string("Int8") and resolve to dt.Int64 — an 8× widening rather than a USERDEFINED fallback. Parquet schemas routinely produce Int8/Int16 for small signed ints, so if Hotdata can ever emit those names, consider adding them explicitly:
"int8": dt.Int8,
"int16": dt.Int16,
"int32": dt.Int32,
"int64": dt.Int64,(not blocking — only matters if Hotdata actually returns these names; the unsigned counterparts being present suggests it may.)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
types.py: add_ARROW_TYPE_MAPto handle Arrow-style type names (Date32,Float64,Utf8,LargeBinary, etc.) returned by the information schema for Parquet/managed-table columns. Previously these fell through toPostgresType.from_string()and were silently mapped toString.tests/test_hotdata_types.py: parametrized test covering all Arrow-style names and case-insensitivity.README.md: bumphotdatarequirement to ≥0.2.3; document Arrow-style type support in the feature list and the Connect → Types section.Test plan
uv run pytest tests/test_hotdata_types.py— newtest_dtype_from_hotdata_arrow_type_namescases passuv run pytest— full suite passeshotdata ≥0.2.3🤖 Generated with Claude Code