[SPARK-57456][SQL] Support nanosecond-precision timestamp types in the JSON datasource (v1 and v2)#56865
Open
MaxGekk wants to merge 2 commits into
Open
[SPARK-57456][SQL] Support nanosecond-precision timestamp types in the JSON datasource (v1 and v2)#56865MaxGekk wants to merge 2 commits into
MaxGekk wants to merge 2 commits into
Conversation
uros-b
approved these changes
Jun 30, 2026
uros-b
left a comment
Member
There was a problem hiding this comment.
LGTM, thank you @MaxGekk and @dongjoon-hyun!
…e JSON datasource (v1 and v2) Umbrella: SPARK-56822 (Timestamps with nanosecond precision). This PR adds read and write support for the nanosecond-capable timestamp types `TIMESTAMP_NTZ(p)` and `TIMESTAMP_LTZ(p)` (`p` in 7-9) to the JSON datasource (v1 `JsonFileFormat` and v2 `JsonTable`), reaching parity with the microsecond `TimestampType` / `TimestampNTZType`, and removes the SPARK-57166 rejection guardrail. - `JacksonParser`: adds `TimestampLTZNanosType` / `TimestampNTZNanosType` read cases that delegate to `parseNanos` / `parseWithoutTimeZoneNanos` with the column precision. - `JacksonGenerator`: adds the corresponding write cases that delegate to `formatNanos` / `formatWithoutTimeZoneNanos`. - `JsonFileFormat` (v1) and `JsonTable` (v2): drop the `AnyTimestampNanoType` rejection so the types are accepted by the read/write paths. Schema inference (`JsonInferSchema`) keeps inferring microsecond types by default; nanos are reached only via an explicit user schema. The existing `timestampFormat` / `timestampNTZFormat` options drive the nanos path (no new options): the type carries the precision and the count of `S` letters in the pattern controls the fractional digits emitted on write. Under the LEGACY time parser policy the legacy LTZ formatter cannot represent sub-microsecond digits, so nanos are rejected with `TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER`. JSON rejected nanos timestamp types in its datasource capability checks and lacked the conversions to round-trip them, so these columns could not be written or read through JSON. Yes. With `spark.sql.timestampNanosTypes.enabled=true`, columns of type `TIMESTAMP_NTZ(7-9)` / `TIMESTAMP_LTZ(7-9)` can now be written to and read from JSON, and parsed/generated by `from_json` / `to_json`. This is a change within the unreleased master/branch only. - `JsonExpressionsSuite`: `JsonToStructs` nanos parsing. - `JsonFunctionsSuite`: flipped the existing `from_json` nanos test to assert success + truncation; added `to_json` and `to_json` / `from_json` round-trip tests. - `FileBasedDataSourceSuite`: removed JSON from the SPARK-57166 rejection list; added v1/v2 round-trip, nested struct/map/array round-trip, and a LEGACY-policy rejection test. - `JsonSuite`: `DataFrameReader.json(Dataset[String])` read, custom-schema round-trip, and a mixed microsecond/nanosecond schema round-trip (run under v1, v2, legacy, and unsafe-row variants). Generated-by: Cursor 2.1, Claude Opus 4.8
… accept only string input The numeric-epoch shorthand (a JSON integer parsed as epoch seconds) is legacy TimestampType behavior and is intentionally not carried over to the nanosecond timestamp types, which accept only string input. Co-authored-by: Isaac
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Umbrella: SPARK-56822 (Timestamps with nanosecond precision).
This PR adds read and write support for the nanosecond-capable timestamp types
TIMESTAMP_NTZ(p)andTIMESTAMP_LTZ(p)(pin 7-9) to the JSON datasource, for both the v1 (JsonFileFormat) and v2 (JsonTable) paths, reaching parity with the microsecondTimestampType/TimestampNTZType, and removes the SPARK-57166 rejection guardrail.Specifically:
JacksonParser: addsTimestampLTZNanosType/TimestampNTZNanosTyperead cases that delegate to the existingparseNanos/parseWithoutTimeZoneNanosformatter methods with the column precision.JacksonGenerator: adds the corresponding write cases that delegate toformatNanos/formatWithoutTimeZoneNanos.JsonFileFormat(v1) andJsonTable(v2): drop theAnyTimestampNanoTyperejection insupportDataType/supportsDataType.Notes:
JsonInferSchema) keeps inferring microsecondTimestampType/TimestampNTZTypeby default; nanosecond types are reached only via an explicit user schema.timestampFormat/timestampNTZFormatoptions drive the nanos path. The column type carries the precision, and the count ofSletters in the pattern controls how many fractional-second digits are emitted on write (text output needs up to 9Sfor full precision; reads with the default formatter parse the full fraction and truncate to the declared precision).UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER(the NTZ formatter always uses the ISO-8601 path).Why are the changes needed?
JSON rejected nanos timestamp types in its datasource capability checks and lacked the conversions to round-trip them, so these columns could not be written or read through JSON. This extends nanosecond-precision timestamp support (umbrella SPARK-56822) to the JSON datasource, matching the existing microsecond timestamp behavior and the Parquet/ORC/Avro/CSV nanosecond support.
Does this PR introduce any user-facing change?
Yes. With
spark.sql.timestampNanosTypes.enabled=true, columns of typeTIMESTAMP_NTZ(7-9)/TIMESTAMP_LTZ(7-9)can now be written to and read from JSON files, and parsed/generated byfrom_json/to_json. Previously such columns were rejected withUNSUPPORTED_DATA_TYPE_FOR_DATASOURCE. This is a change within the unreleased master/branch only.How was this patch tested?
JsonExpressionsSuite:JsonToStructsnanosecond parsing at the catalyst expression level.JsonFunctionsSuite: flipped the existingfrom_jsonnanosecond test to assert successful parsing and the truncated value (instead of an unsupported-type error); addedto_jsonandto_json/from_jsonround-trip tests.FileBasedDataSourceSuite: removed JSON from the SPARK-57166 rejection list; added end-to-end round-trip (precisions 7-9, NTZ and LTZ, v1 and v2), a nested struct/array/map round-trip, and a LEGACY time-parser-policy rejection test (write and read).JsonSuite:DataFrameReader.json(Dataset[String])read, a custom-schema file round-trip, and a mixed microsecond/nanosecond schema round-trip; these run under theJsonV1Suite,JsonV2Suite,JsonLegacyTimeParserSuite, andJsonUnsafeRowSuitevariants.Was this patch authored or co-authored using generative AI tooling?
Generated-by: Cursor 2.1, Claude Opus 4.8