[SPARK-57828][SQL] Add vectorized Parquet reader support for nanosecond-precision timestamps by yadavay-amzn · Pull Request #56942 · apache/spark

yadavay-amzn · 2026-07-01T20:32:03Z

What changes were proposed in this pull request?

Adds vectorized Parquet reader support for nanosecond-precision timestamp columns (INT64 TIMESTAMP(NANOS)). A new TimestampNanosUpdater in ParquetVectorUpdaterFactory decomposes epoch-nanoseconds into the two-child column-vector representation (epochMicros: Long, nanosWithinMicro: Short), matching DateTimeUtils.epochNanosToTimestampNanos and the existing row-based ParquetRowConverter exactly (including the dictionary-decoding path). Previously these columns forced the row-based reader (isBatchReadSupported=false, otherwise SchemaColumnConvertNotSupportedException).

Why are the changes needed?

Part of nanosecond-precision timestamp support (SPARK-56822). The vectorized reader is the default fast path; nanosecond-precision timestamp columns should be readable through it - producing identical results to the row-based reader - rather than falling back to row-by-row reads.

Does this PR introduce any user-facing change?

Yes - nanosecond-precision timestamp columns in Parquet are now read via the vectorized reader (a performance improvement); results are identical to the row-based reader.

How was this patch tested?

ParquetTimestampNanosSuite was upgraded to run under both readers (withAllParquetReaders), plus a new vectorized-vs-row-based parity test over edge-case nanosecond values (positive, pre-epoch, exact-second, nulls) at precisions 7/8/9 for both NTZ and LTZ, and a rebase-mode-invariance test. All 11 tests pass; ParquetIOSuite nanosecond tests pass; scalastyle + checkstyle clean.

Was this patch authored or co-authored using generative AI tooling?

Authored with assistance by Claude Opus 4.8.

…nd-precision timestamps

[SPARK-57828][SQL] Add vectorized Parquet reader support for nanoseco…

2783a1e

…nd-precision timestamps

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57828][SQL] Add vectorized Parquet reader support for nanosecond-precision timestamps#56942

[SPARK-57828][SQL] Add vectorized Parquet reader support for nanosecond-precision timestamps#56942
yadavay-amzn wants to merge 1 commit into
apache:masterfrom
yadavay-amzn:SPARK-57828

yadavay-amzn commented Jul 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

yadavay-amzn commented Jul 1, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant