Skip to content

[WIP][SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps#56059

Open
MaxGekk wants to merge 6 commits into
apache:masterfrom
MaxGekk:nanos-in-rows
Open

[WIP][SPARK-56981][SQL] Add physical representation and UnsafeRow support for nanosecond timestamps#56059
MaxGekk wants to merge 6 commits into
apache:masterfrom
MaxGekk:nanos-in-rows

Conversation

@MaxGekk
Copy link
Copy Markdown
Member

@MaxGekk MaxGekk commented May 22, 2026

What changes were proposed in this pull request?

This PR implements the physical row layer for nanosecond-capable timestamp types, as defined in SPARK-56822 SPIP: Timestamps with nanosecond precision.
Logical types TimestampNTZNanosType(p) and TimestampLTZNanosType(p) (p in [7, 9]) were added in #55952; they still mapped to UninitializedPhysicalType, so values could not be stored or read from InternalRow / UnsafeRow. This change wires up the minimum physical infrastructure that downstream work (casts, Parquet, expressions) can depend on.

SPIP internal representation

Per the SPIP, a value is a composite of:

  • Epoch microseconds (long, 8 bytes) — same proleptic-Gregorian epoch microsecond count as existing microsecond timestamps
  • Nanoseconds within that microsecond (short, 0-999) — sub-micro fractional part, not a full nanosecond offset from epoch

Logical defaultSize remains 10 bytes on the types. In UnsafeRow, values use the same variable-length pattern as CalendarInterval: an 8-byte field slot (offset + size) pointing to a 16-byte aligned payload (epochMicros + nanosWithinMicro with padding), so in-place updates remain possible.

Implementation summary

  • Unsafe value types (distinct classes for NTZ vs LTZ semantics at the Java layer; shared byte layout):
    • TimestampNTZNanos - physical value for TIMESTAMP_NTZ(p)
    • TimestampLTZNanos - physical value for TIMESTAMP_LTZ(p)
  • Physical types: PhysicalTimestampNTZNanosType, PhysicalTimestampLTZNanosType registered in PhysicalDataType.applyDefault
  • Row access: specialized getters/setters on InternalRow, UnsafeRow, UnsafeArrayData, codegen (CodeGenerator, InterpretedUnsafeProjection, SpecializedGettersReader), and literal validation
  • Columnar: ColumnVector stubs throw SparkUnsupportedOperationException until columnar support is added

Why are the changes needed?

Without a concrete physical type and UnsafeRow accessors, any code path that materializes rows for nanosecond timestamps fails or falls through to UninitializedPhysicalType. This is the unblocker for the rest of sub-tasks.

Does this PR introduce any user-facing change?

No. Logical types exist but are not yet exposed through SQL; behavior of TimestampType, TimestampNTZType, and microsecond storage is unchanged.

How was this patch tested?

  • unsafe/testOnly *TimestampNanosSuite — unsafe value equality, hashCode, validation (nanosWithinMicro ∈ [0, 999])
  • catalyst/testOnly *TimestampNanosRowSuiteGenericInternalRow and UnsafeRow roundtrips (NTZ + LTZ, null/non-null), codegen and interpreted unsafe projection, literal validation
  • DataTypeSuitePhysicalDataType is not UninitializedPhysicalType for p in {7, 8, 9}; defaultSize remains 10

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Cursor Auto

MaxGekk added 6 commits May 22, 2026 09:59
Register physical types and UnsafeRow/InternalRow accessors for
TimestampNTZNanosType and TimestampLTZNanosType so values can be stored
and read using the SPIP composite layout (epoch micros + sub-micro nanos).
…anosSuite

Wrap the `assertThrows` calls in `invalidNanosWithinMicroNTZ`/`LTZ` so
each line stays within the 100-character limit enforced by checkstyle.

Co-authored-by: Isaac
…estamps

Route nullable TIMESTAMP_NTZ/LTZ nanos fields through UnsafeWriter.write
instead of setNull8Bytes so the 16-byte variable-length region is reserved
and in-place setTimestampNTZNanos/LTZ remains valid.
…ndian safety

writePayload stores nanos via putLong; reading with getShort returned zero on
big-endian. Match the write path with getLong and a 16-bit mask.
…timestamps

Document unsafe projection support for TimestampNTZNanosType and
TimestampLTZNanosType instead of relying only on DatetimeType extending
AtomicType. Add a test to pin the contract.
…types

Provide defaults in Literal.defaultDefault so planning rules that call
Literal.default on TimestampNTZNanosType/TimestampLTZNanosType (e.g. outer-
join null replacement, window tolerance) don't throw noDefaultForDataTypeError.

Co-authored-by: Isaac
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant