Commit 57697dd
fix: incorrect Parquet INT96 timestamp values from ArrowReader (apache#2301)
## Which issue does this PR close?
- Closes apache#2299.
## What changes are included in this PR?
- Add `coerce_int96_timestamps()` to patch the Arrow schema before
reading, using arrow-rs's schema hint mechanism
(`ArrowReaderOptions::with_schema`) to read
INT96 columns at the resolution specified by the Iceberg table schema
- `timestamp`/`timestamptz` → microsecond,
`timestamp_ns`/`timestamptz_ns` → nanosecond, per the [Iceberg
spec](https://iceberg.apache.org/spec/#primitive-types)
- Falls back to microsecond when no field ID is available (matching
Iceberg Java's `TimestampInt96Reader` behavior)
- Applied after all three schema resolution branches (with field IDs,
name mapping, positional fallback) so the fix covers both native and
migrated tables
- Handles INT96 inside nested types (structs, lists, maps) via
`ArrowSchemaVisitor` traversal
- Visitor and tests live in a standalone `arrow/int96.rs` module to keep
`reader.rs` manageable
- Made `visit_schema` in `arrow/schema.rs` `pub(crate)` so the coercion
visitor can reuse the existing traversal
## Are these changes tested?
- `test_read_int96_timestamps_with_field_ids` — files with embedded
field IDs (branch 1)
- `test_read_int96_timestamps_without_field_ids` — migrated files
without field IDs (branches 2/3)
- `test_read_int96_timestamps_in_struct` — INT96 inside a struct field
- `test_read_int96_timestamps_in_list` — INT96 inside a list field
(3-level Parquet LIST encoding)
- `test_read_int96_timestamps_in_map` — INT96 as map values
- All tests use dates outside the i64 nanosecond range (~1677-2262) to
confirm the overflow is avoided
- [Apache DataFusion Comet](https://github.com/apache/datafusion-comet)
used the repro test in
[apache/datafusion-comet#3856](apache/datafusion-comet#3856)
and it passes with this change:
apache/datafusion-comet#3857
(cherry picked from commit a2f067d)1 parent 176eb19 commit 57697dd
4 files changed
Lines changed: 1158 additions & 1 deletion
0 commit comments