Skip to content

[SPARK-57828][SQL] Add vectorized Parquet reader support for nanosecond-precision timestamps#56942

Open
yadavay-amzn wants to merge 1 commit into
apache:masterfrom
yadavay-amzn:SPARK-57828
Open

[SPARK-57828][SQL] Add vectorized Parquet reader support for nanosecond-precision timestamps#56942
yadavay-amzn wants to merge 1 commit into
apache:masterfrom
yadavay-amzn:SPARK-57828

Conversation

@yadavay-amzn

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Adds vectorized Parquet reader support for nanosecond-precision timestamp columns (INT64 TIMESTAMP(NANOS)). A new TimestampNanosUpdater in ParquetVectorUpdaterFactory decomposes epoch-nanoseconds into the two-child column-vector representation (epochMicros: Long, nanosWithinMicro: Short), matching DateTimeUtils.epochNanosToTimestampNanos and the existing row-based ParquetRowConverter exactly (including the dictionary-decoding path). Previously these columns forced the row-based reader (isBatchReadSupported=false, otherwise SchemaColumnConvertNotSupportedException).

Why are the changes needed?

Part of nanosecond-precision timestamp support (SPARK-56822). The vectorized reader is the default fast path; nanosecond-precision timestamp columns should be readable through it - producing identical results to the row-based reader - rather than falling back to row-by-row reads.

Does this PR introduce any user-facing change?

Yes - nanosecond-precision timestamp columns in Parquet are now read via the vectorized reader (a performance improvement); results are identical to the row-based reader.

How was this patch tested?

ParquetTimestampNanosSuite was upgraded to run under both readers (withAllParquetReaders), plus a new vectorized-vs-row-based parity test over edge-case nanosecond values (positive, pre-epoch, exact-second, nulls) at precisions 7/8/9 for both NTZ and LTZ, and a rebase-mode-invariance test. All 11 tests pass; ParquetIOSuite nanosecond tests pass; scalastyle + checkstyle clean.

Was this patch authored or co-authored using generative AI tooling?

Authored with assistance by Claude Opus 4.8.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant