[SPARK-57456][SQL] Support nanosecond-precision timestamp types in the JSON datasource (v1 and v2) by MaxGekk · Pull Request #56865 · apache/spark

MaxGekk · 2026-06-29T15:29:03Z

What changes were proposed in this pull request?

Umbrella: SPARK-56822 (Timestamps with nanosecond precision).

This PR adds read and write support for the nanosecond-capable timestamp types TIMESTAMP_NTZ(p) and TIMESTAMP_LTZ(p) (p in 7-9) to the JSON datasource, for both the v1 (JsonFileFormat) and v2 (JsonTable) paths, reaching parity with the microsecond TimestampType / TimestampNTZType, and removes the SPARK-57166 rejection guardrail.

Specifically:

JacksonParser: adds TimestampLTZNanosType / TimestampNTZNanosType read cases that delegate to the existing parseNanos / parseWithoutTimeZoneNanos formatter methods with the column precision.
JacksonGenerator: adds the corresponding write cases that delegate to formatNanos / formatWithoutTimeZoneNanos.
JsonFileFormat (v1) and JsonTable (v2): drop the AnyTimestampNanoType rejection in supportDataType / supportsDataType.

Notes:

Schema inference (JsonInferSchema) keeps inferring microsecond TimestampType / TimestampNTZType by default; nanosecond types are reached only via an explicit user schema.
No new options: the existing timestampFormat / timestampNTZFormat options drive the nanos path. The column type carries the precision, and the count of S letters in the pattern controls how many fractional-second digits are emitted on write (text output needs up to 9 S for full precision; reads with the default formatter parse the full fraction and truncate to the declared precision).
The legacy time parser policy rejects nanos: the legacy LTZ formatter cannot represent sub-microsecond digits, so it raises UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER (the NTZ formatter always uses the ISO-8601 path).

Why are the changes needed?

JSON rejected nanos timestamp types in its datasource capability checks and lacked the conversions to round-trip them, so these columns could not be written or read through JSON. This extends nanosecond-precision timestamp support (umbrella SPARK-56822) to the JSON datasource, matching the existing microsecond timestamp behavior and the Parquet/ORC/Avro/CSV nanosecond support.

Does this PR introduce any user-facing change?

Yes. With spark.sql.timestampNanosTypes.enabled=true, columns of type TIMESTAMP_NTZ(7-9) / TIMESTAMP_LTZ(7-9) can now be written to and read from JSON files, and parsed/generated by from_json / to_json. Previously such columns were rejected with UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE. This is a change within the unreleased master/branch only.

How was this patch tested?

JsonExpressionsSuite: JsonToStructs nanosecond parsing at the catalyst expression level.
JsonFunctionsSuite: flipped the existing from_json nanosecond test to assert successful parsing and the truncated value (instead of an unsupported-type error); added to_json and to_json / from_json round-trip tests.
FileBasedDataSourceSuite: removed JSON from the SPARK-57166 rejection list; added end-to-end round-trip (precisions 7-9, NTZ and LTZ, v1 and v2), a nested struct/array/map round-trip, and a LEGACY time-parser-policy rejection test (write and read).
JsonSuite: DataFrameReader.json(Dataset[String]) read, a custom-schema file round-trip, and a mixed microsecond/nanosecond schema round-trip; these run under the JsonV1Suite, JsonV2Suite, JsonLegacyTimeParserSuite, and JsonUnsafeRowSuite variants.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor 2.1, Claude Opus 4.8

dongjoon-hyun

+1, LGTM.

uros-b

LGTM, thank you @MaxGekk and @dongjoon-hyun!

…e JSON datasource (v1 and v2) Umbrella: SPARK-56822 (Timestamps with nanosecond precision). This PR adds read and write support for the nanosecond-capable timestamp types `TIMESTAMP_NTZ(p)` and `TIMESTAMP_LTZ(p)` (`p` in 7-9) to the JSON datasource (v1 `JsonFileFormat` and v2 `JsonTable`), reaching parity with the microsecond `TimestampType` / `TimestampNTZType`, and removes the SPARK-57166 rejection guardrail. - `JacksonParser`: adds `TimestampLTZNanosType` / `TimestampNTZNanosType` read cases that delegate to `parseNanos` / `parseWithoutTimeZoneNanos` with the column precision. - `JacksonGenerator`: adds the corresponding write cases that delegate to `formatNanos` / `formatWithoutTimeZoneNanos`. - `JsonFileFormat` (v1) and `JsonTable` (v2): drop the `AnyTimestampNanoType` rejection so the types are accepted by the read/write paths. Schema inference (`JsonInferSchema`) keeps inferring microsecond types by default; nanos are reached only via an explicit user schema. The existing `timestampFormat` / `timestampNTZFormat` options drive the nanos path (no new options): the type carries the precision and the count of `S` letters in the pattern controls the fractional digits emitted on write. Under the LEGACY time parser policy the legacy LTZ formatter cannot represent sub-microsecond digits, so nanos are rejected with `TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER`. JSON rejected nanos timestamp types in its datasource capability checks and lacked the conversions to round-trip them, so these columns could not be written or read through JSON. Yes. With `spark.sql.timestampNanosTypes.enabled=true`, columns of type `TIMESTAMP_NTZ(7-9)` / `TIMESTAMP_LTZ(7-9)` can now be written to and read from JSON, and parsed/generated by `from_json` / `to_json`. This is a change within the unreleased master/branch only. - `JsonExpressionsSuite`: `JsonToStructs` nanos parsing. - `JsonFunctionsSuite`: flipped the existing `from_json` nanos test to assert success + truncation; added `to_json` and `to_json` / `from_json` round-trip tests. - `FileBasedDataSourceSuite`: removed JSON from the SPARK-57166 rejection list; added v1/v2 round-trip, nested struct/map/array round-trip, and a LEGACY-policy rejection test. - `JsonSuite`: `DataFrameReader.json(Dataset[String])` read, custom-schema round-trip, and a mixed microsecond/nanosecond schema round-trip (run under v1, v2, legacy, and unsafe-row variants). Generated-by: Cursor 2.1, Claude Opus 4.8

… accept only string input The numeric-epoch shorthand (a JSON integer parsed as epoch seconds) is legacy TimestampType behavior and is intentionally not carried over to the nanosecond timestamp types, which accept only string input. Co-authored-by: Isaac

MaxGekk · 2026-06-30T14:43:51Z

Merging to master/4.x. Thank you, @dongjoon-hyun and @uros-b for review.

…e JSON datasource (v1 and v2) ### What changes were proposed in this pull request? Umbrella: [SPARK-56822](https://issues.apache.org/jira/browse/SPARK-56822) (Timestamps with nanosecond precision). This PR adds read and write support for the nanosecond-capable timestamp types `TIMESTAMP_NTZ(p)` and `TIMESTAMP_LTZ(p)` (`p` in 7-9) to the JSON datasource, for both the v1 (`JsonFileFormat`) and v2 (`JsonTable`) paths, reaching parity with the microsecond `TimestampType` / `TimestampNTZType`, and removes the [SPARK-57166](https://issues.apache.org/jira/browse/SPARK-57166) rejection guardrail. Specifically: - `JacksonParser`: adds `TimestampLTZNanosType` / `TimestampNTZNanosType` read cases that delegate to the existing `parseNanos` / `parseWithoutTimeZoneNanos` formatter methods with the column precision. - `JacksonGenerator`: adds the corresponding write cases that delegate to `formatNanos` / `formatWithoutTimeZoneNanos`. - `JsonFileFormat` (v1) and `JsonTable` (v2): drop the `AnyTimestampNanoType` rejection in `supportDataType` / `supportsDataType`. Notes: - Schema inference (`JsonInferSchema`) keeps inferring microsecond `TimestampType` / `TimestampNTZType` by default; nanosecond types are reached only via an explicit user schema. - No new options: the existing `timestampFormat` / `timestampNTZFormat` options drive the nanos path. The column type carries the precision, and the count of `S` letters in the pattern controls how many fractional-second digits are emitted on write (text output needs up to 9 `S` for full precision; reads with the default formatter parse the full fraction and truncate to the declared precision). - The legacy time parser policy rejects nanos: the legacy LTZ formatter cannot represent sub-microsecond digits, so it raises `UNSUPPORTED_FEATURE.TIMESTAMP_NANOS_WITH_LEGACY_TIME_PARSER` (the NTZ formatter always uses the ISO-8601 path). ### Why are the changes needed? JSON rejected nanos timestamp types in its datasource capability checks and lacked the conversions to round-trip them, so these columns could not be written or read through JSON. This extends nanosecond-precision timestamp support (umbrella SPARK-56822) to the JSON datasource, matching the existing microsecond timestamp behavior and the Parquet/ORC/Avro/CSV nanosecond support. ### Does this PR introduce _any_ user-facing change? Yes. With `spark.sql.timestampNanosTypes.enabled=true`, columns of type `TIMESTAMP_NTZ(7-9)` / `TIMESTAMP_LTZ(7-9)` can now be written to and read from JSON files, and parsed/generated by `from_json` / `to_json`. Previously such columns were rejected with `UNSUPPORTED_DATA_TYPE_FOR_DATASOURCE`. This is a change within the unreleased master/branch only. ### How was this patch tested? - `JsonExpressionsSuite`: `JsonToStructs` nanosecond parsing at the catalyst expression level. - `JsonFunctionsSuite`: flipped the existing `from_json` nanosecond test to assert successful parsing and the truncated value (instead of an unsupported-type error); added `to_json` and `to_json` / `from_json` round-trip tests. - `FileBasedDataSourceSuite`: removed JSON from the SPARK-57166 rejection list; added end-to-end round-trip (precisions 7-9, NTZ and LTZ, v1 and v2), a nested struct/array/map round-trip, and a LEGACY time-parser-policy rejection test (write and read). - `JsonSuite`: `DataFrameReader.json(Dataset[String])` read, a custom-schema file round-trip, and a mixed microsecond/nanosecond schema round-trip; these run under the `JsonV1Suite`, `JsonV2Suite`, `JsonLegacyTimeParserSuite`, and `JsonUnsafeRowSuite` variants. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Cursor 2.1, Claude Opus 4.8 Closes #56865 from MaxGekk/nanos-json-ds. Authored-by: Maxim Gekk <max.gekk@gmail.com> Signed-off-by: Max Gekk <max.gekk@gmail.com> (cherry picked from commit 59fdb3e) Signed-off-by: Max Gekk <max.gekk@gmail.com>

dongjoon-hyun approved these changes Jun 29, 2026

View reviewed changes

MaxGekk mentioned this pull request Jun 29, 2026

[SPARK-57458][SQL] Support nanosecond-precision timestamp types in the XML datasource #56854

Closed

MaxGekk requested a review from HyukjinKwon June 29, 2026 22:25

uros-b approved these changes Jun 30, 2026

View reviewed changes

MaxGekk added 2 commits June 30, 2026 12:38

MaxGekk force-pushed the nanos-json-ds branch from f1316df to 8358a95 Compare June 30, 2026 10:51

MaxGekk closed this in 59fdb3e Jun 30, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-57456][SQL] Support nanosecond-precision timestamp types in the JSON datasource (v1 and v2)#56865

[SPARK-57456][SQL] Support nanosecond-precision timestamp types in the JSON datasource (v1 and v2)#56865
MaxGekk wants to merge 2 commits into
apache:masterfrom
MaxGekk:nanos-json-ds

MaxGekk commented Jun 29, 2026

Uh oh!

dongjoon-hyun left a comment

Uh oh!

uros-b left a comment

Uh oh!

MaxGekk commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

MaxGekk commented Jun 29, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

dongjoon-hyun left a comment

Choose a reason for hiding this comment

Uh oh!

uros-b left a comment

Choose a reason for hiding this comment

Uh oh!

MaxGekk commented Jun 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants