[SPARK-57160][CONNECT] Add Spark Connect protocol support for nanosecond-capable timestamp types and literals by MaxGekk · Pull Request #56909 · apache/spark

MaxGekk · 2026-06-30T20:48:52Z

What changes were proposed in this pull request?

This PR adds the Spark Connect protocol surface for nanosecond timestamps so they can travel over the wire, both as types and as literals. There is no behavior change yet -- the converters that consume these messages land in follow-up sub-tasks of SPARK-56822.

types.proto: two new data-type kinds, TimestampNTZNanos and TimestampLTZNanos, each with an optional precision (7..9).
expressions.proto: matching literal arms that carry the value as epoch_micros + nanos_within_micro (0..999) plus an optional precision. Two components are used instead of a single int64 of nanoseconds because nanoseconds-since-epoch cannot cover the full 0001..9999 year range; this mirrors the Catalyst value TimestampNanosVal.
Regenerated the Python stubs under python/pyspark/sql/connect/proto/.

NTZ and LTZ are kept as separate kinds/arms (like timestamp vs timestamp_ntz), and non-negative fields use uint32.

Why are the changes needed?

Today the Connect DataType message has only microsecond timestamp kinds (timestamp, timestamp_ntz) with no precision field, and the Expression.Literal message encodes timestamp literals as a single int64 of microseconds. There is no way to express a nanosecond-capable timestamp type or a sub-microsecond literal over the wire, so no Connect client/server path can carry the new types. The protocol must be extended before any converter, Arrow, or client work can proceed.

Does this PR introduce any user-facing change?

No. This only adds protobuf message definitions; the new types remain gated behind spark.sql.timestampNanosTypes.enabled once the consuming paths are implemented.

How was this patch tested?

buf build / buf lint succeed for the modified protos (field numbers appended, no reuse/renumber).
./dev/connect-gen-protos.sh regenerates the committed Python stubs; ./dev/check-protos.py reports no drift (pyspark-connect and pyspark-streaming: SUCCESS).
build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite" (44 tests) and build/sbt "connect-client-jvm/testOnly *ColumnNodeToProtoConverterSuite" (18 tests) pass, confirming the additive proto fields do not break existing proto plumbing.

No functional tests in this PR (there are no consumers of the new fields yet); behavior is covered by the converter and end-to-end sub-tasks.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor (Claude Opus 4.8)

…ond-capable timestamp types and literals Extend the Spark Connect protobuf protocol to represent TimestampNTZNanos(p) and TimestampLTZNanos(p) (p in [7, 9]) both as data types and as literal values, and regenerate the Python stubs.

yadavay-amzn

LGTM with a small nit

Clean, well-scoped protocol addition - field numbers are allocated correctly and the encoding faithfully mirrors the Catalyst value type.

Nit: the DataType TimestampNTZNanos/TimestampLTZNanos messages don't state the omitted-precision default, while the literal arms do ("defaults to 9"). Since the type defaults to 9 too (TimestampNTZNanosType.apply()), mirroring that one-liner would keep them symmetric.

What I verified:

Field numbers append past the literal oneof's reserved 27, 28 (Geometry/Geography) and the DataType oneof's used range - additive and wire-compatible, no reuse or renumber.
epoch_micros + nanos_within_micro in [0, 999] matches Catalyst TimestampNanosVal (epochMicros, MAX_NANOS_WITHIN_MICRO = 999); the two-component encoding matches why a single int64 of nanos can't span the year range.
Precision 7/8/9 and default 9 match Timestamp{NTZ,LTZ}NanosType (MIN_PRECISION/MAX_PRECISION/DEFAULT_PRECISION).
NTZ and LTZ as separate kinds/arms is consistent with the existing timestamp vs timestamp_ntz split.

yadavay-amzn approved these changes Jun 30, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57160][CONNECT] Add Spark Connect protocol support for nanosecond-capable timestamp types and literals#56909

[SPARK-57160][CONNECT] Add Spark Connect protocol support for nanosecond-capable timestamp types and literals#56909
MaxGekk wants to merge 1 commit into
apache:masterfrom
MaxGekk:nanos-proto

MaxGekk commented Jun 30, 2026 •

edited

Loading

Uh oh!

yadavay-amzn left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

MaxGekk commented Jun 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

yadavay-amzn left a comment

Choose a reason for hiding this comment

LGTM with a small nit

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MaxGekk commented Jun 30, 2026 •

edited

Loading