Skip to content

Commit bc72d93

Browse files
committed
[SPARK-57160][CONNECT] Add Spark Connect protocol support for nanosecond-capable timestamp types and literals
### What changes were proposed in this pull request? This PR adds the Spark Connect protocol surface for nanosecond timestamps so they can travel over the wire, both as types and as literals. There is no behavior change yet -- the converters that consume these messages land in follow-up sub-tasks of SPARK-56822. - `types.proto`: two new data-type kinds, `TimestampNTZNanos` and `TimestampLTZNanos`, each with an optional `precision` (7..9). - `expressions.proto`: matching literal arms that carry the value as `epoch_micros` + `nanos_within_micro` (0..999) plus an optional `precision`. Two components are used instead of a single int64 of nanoseconds because nanoseconds-since-epoch cannot cover the full `0001..9999` year range; this mirrors the Catalyst value `TimestampNanosVal`. - Regenerated the Python stubs under `python/pyspark/sql/connect/proto/`. NTZ and LTZ are kept as separate kinds/arms (like `timestamp` vs `timestamp_ntz`), and non-negative fields use `uint32`. ### Why are the changes needed? Today the Connect `DataType` message has only microsecond timestamp kinds (`timestamp`, `timestamp_ntz`) with no precision field, and the `Expression.Literal` message encodes timestamp literals as a single int64 of microseconds. There is no way to express a nanosecond-capable timestamp type or a sub-microsecond literal over the wire, so no Connect client/server path can carry the new types. The protocol must be extended before any converter, Arrow, or client work can proceed. ### Does this PR introduce _any_ user-facing change? No. This only adds protobuf message definitions; the new types remain gated behind `spark.sql.timestampNanosTypes.enabled` once the consuming paths are implemented. ### How was this patch tested? - `buf build` / `buf lint` succeed for the modified protos (field numbers appended, no reuse/renumber). - `./dev/connect-gen-protos.sh` regenerates the committed Python stubs; `./dev/check-protos.py` reports no drift (pyspark-connect and pyspark-streaming: SUCCESS). - `build/sbt "connect/testOnly *LiteralExpressionProtoConverterSuite"` (44 tests) and `build/sbt "connect-client-jvm/testOnly *ColumnNodeToProtoConverterSuite"` (18 tests) pass, confirming the additive proto fields do not break existing proto plumbing. No functional tests in this PR (there are no consumers of the new fields yet); behavior is covered by the converter and end-to-end sub-tasks. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor (Claude Opus 4.8) Closes #56909 from MaxGekk/nanos-proto. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com>
1 parent df728ff commit bc72d93

6 files changed

Lines changed: 389 additions & 125 deletions

File tree

python/pyspark/sql/connect/proto/expressions_pb2.py

Lines changed: 69 additions & 65 deletions
Large diffs are not rendered by default.

python/pyspark/sql/connect/proto/expressions_pb2.pyi

Lines changed: 114 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -729,6 +729,99 @@ class Expression(google.protobuf.message.Message):
729729
self, oneof_group: typing_extensions.Literal["_precision", b"_precision"]
730730
) -> typing_extensions.Literal["precision"] | None: ...
731731

732+
class TimestampNTZNanos(google.protobuf.message.Message):
733+
"""A TIMESTAMP_NTZ literal with nanosecond-capable precision. The physical value is carried
734+
as microseconds since the UNIX epoch plus the extra nanoseconds within that microsecond,
735+
because a single int64 of nanoseconds cannot span the supported year range.
736+
"""
737+
738+
DESCRIPTOR: google.protobuf.descriptor.Descriptor
739+
740+
EPOCH_MICROS_FIELD_NUMBER: builtins.int
741+
NANOS_WITHIN_MICRO_FIELD_NUMBER: builtins.int
742+
PRECISION_FIELD_NUMBER: builtins.int
743+
epoch_micros: builtins.int
744+
"""Microseconds since the UNIX epoch (without timezone information)."""
745+
nanos_within_micro: builtins.int
746+
"""Additional nanoseconds within epoch_micros, in [0, 999]."""
747+
precision: builtins.int
748+
"""Number of fractional-second digits (7, 8, or 9). If omitted, defaults to 9 (nanoseconds)."""
749+
def __init__(
750+
self,
751+
*,
752+
epoch_micros: builtins.int = ...,
753+
nanos_within_micro: builtins.int = ...,
754+
precision: builtins.int | None = ...,
755+
) -> None: ...
756+
def HasField(
757+
self,
758+
field_name: typing_extensions.Literal[
759+
"_precision", b"_precision", "precision", b"precision"
760+
],
761+
) -> builtins.bool: ...
762+
def ClearField(
763+
self,
764+
field_name: typing_extensions.Literal[
765+
"_precision",
766+
b"_precision",
767+
"epoch_micros",
768+
b"epoch_micros",
769+
"nanos_within_micro",
770+
b"nanos_within_micro",
771+
"precision",
772+
b"precision",
773+
],
774+
) -> None: ...
775+
def WhichOneof(
776+
self, oneof_group: typing_extensions.Literal["_precision", b"_precision"]
777+
) -> typing_extensions.Literal["precision"] | None: ...
778+
779+
class TimestampLTZNanos(google.protobuf.message.Message):
780+
"""A TIMESTAMP_LTZ literal with nanosecond-capable precision. See TimestampNTZNanos for the
781+
rationale behind the two-component physical value.
782+
"""
783+
784+
DESCRIPTOR: google.protobuf.descriptor.Descriptor
785+
786+
EPOCH_MICROS_FIELD_NUMBER: builtins.int
787+
NANOS_WITHIN_MICRO_FIELD_NUMBER: builtins.int
788+
PRECISION_FIELD_NUMBER: builtins.int
789+
epoch_micros: builtins.int
790+
"""Microseconds since the UNIX epoch."""
791+
nanos_within_micro: builtins.int
792+
"""Additional nanoseconds within epoch_micros, in [0, 999]."""
793+
precision: builtins.int
794+
"""Number of fractional-second digits (7, 8, or 9). If omitted, defaults to 9 (nanoseconds)."""
795+
def __init__(
796+
self,
797+
*,
798+
epoch_micros: builtins.int = ...,
799+
nanos_within_micro: builtins.int = ...,
800+
precision: builtins.int | None = ...,
801+
) -> None: ...
802+
def HasField(
803+
self,
804+
field_name: typing_extensions.Literal[
805+
"_precision", b"_precision", "precision", b"precision"
806+
],
807+
) -> builtins.bool: ...
808+
def ClearField(
809+
self,
810+
field_name: typing_extensions.Literal[
811+
"_precision",
812+
b"_precision",
813+
"epoch_micros",
814+
b"epoch_micros",
815+
"nanos_within_micro",
816+
b"nanos_within_micro",
817+
"precision",
818+
b"precision",
819+
],
820+
) -> None: ...
821+
def WhichOneof(
822+
self, oneof_group: typing_extensions.Literal["_precision", b"_precision"]
823+
) -> typing_extensions.Literal["precision"] | None: ...
824+
732825
NULL_FIELD_NUMBER: builtins.int
733826
BINARY_FIELD_NUMBER: builtins.int
734827
BOOLEAN_FIELD_NUMBER: builtins.int
@@ -751,6 +844,8 @@ class Expression(google.protobuf.message.Message):
751844
STRUCT_FIELD_NUMBER: builtins.int
752845
SPECIALIZED_ARRAY_FIELD_NUMBER: builtins.int
753846
TIME_FIELD_NUMBER: builtins.int
847+
TIMESTAMP_NTZ_NANOS_FIELD_NUMBER: builtins.int
848+
TIMESTAMP_LTZ_NANOS_FIELD_NUMBER: builtins.int
754849
DATA_TYPE_FIELD_NUMBER: builtins.int
755850
@property
756851
def null(self) -> pyspark.sql.connect.proto.types_pb2.DataType: ...
@@ -786,6 +881,13 @@ class Expression(google.protobuf.message.Message):
786881
@property
787882
def time(self) -> global___Expression.Literal.Time: ...
788883
@property
884+
def timestamp_ntz_nanos(self) -> global___Expression.Literal.TimestampNTZNanos:
885+
"""Nanosecond-capable timestamp literals (precision 7..9). NTZ and LTZ are distinct
886+
arms so the literal kind is self-describing.
887+
"""
888+
@property
889+
def timestamp_ltz_nanos(self) -> global___Expression.Literal.TimestampLTZNanos: ...
890+
@property
789891
def data_type(self) -> pyspark.sql.connect.proto.types_pb2.DataType:
790892
"""Data type information for the literal.
791893
This field is required only in the root literal message for null values or
@@ -818,6 +920,8 @@ class Expression(google.protobuf.message.Message):
818920
struct: global___Expression.Literal.Struct | None = ...,
819921
specialized_array: global___Expression.Literal.SpecializedArray | None = ...,
820922
time: global___Expression.Literal.Time | None = ...,
923+
timestamp_ntz_nanos: global___Expression.Literal.TimestampNTZNanos | None = ...,
924+
timestamp_ltz_nanos: global___Expression.Literal.TimestampLTZNanos | None = ...,
821925
data_type: pyspark.sql.connect.proto.types_pb2.DataType | None = ...,
822926
) -> None: ...
823927
def HasField(
@@ -867,8 +971,12 @@ class Expression(google.protobuf.message.Message):
867971
b"time",
868972
"timestamp",
869973
b"timestamp",
974+
"timestamp_ltz_nanos",
975+
b"timestamp_ltz_nanos",
870976
"timestamp_ntz",
871977
b"timestamp_ntz",
978+
"timestamp_ntz_nanos",
979+
b"timestamp_ntz_nanos",
872980
"year_month_interval",
873981
b"year_month_interval",
874982
],
@@ -920,8 +1028,12 @@ class Expression(google.protobuf.message.Message):
9201028
b"time",
9211029
"timestamp",
9221030
b"timestamp",
1031+
"timestamp_ltz_nanos",
1032+
b"timestamp_ltz_nanos",
9231033
"timestamp_ntz",
9241034
b"timestamp_ntz",
1035+
"timestamp_ntz_nanos",
1036+
b"timestamp_ntz_nanos",
9251037
"year_month_interval",
9261038
b"year_month_interval",
9271039
],
@@ -952,6 +1064,8 @@ class Expression(google.protobuf.message.Message):
9521064
"struct",
9531065
"specialized_array",
9541066
"time",
1067+
"timestamp_ntz_nanos",
1068+
"timestamp_ltz_nanos",
9551069
]
9561070
| None
9571071
): ...

0 commit comments

Comments
 (0)