Fix timezone-shifted TIMESTAMP in nested complex types (ES-1978662)#1492
Open
sreekanth-db wants to merge 1 commit into
Open
Fix timezone-shifted TIMESTAMP in nested complex types (ES-1978662)#1492sreekanth-db wants to merge 1 commit into
sreekanth-db wants to merge 1 commit into
Conversation
TIMESTAMP fields inside nested complex types (STRUCT/ARRAY/MAP) are serialized by Arrow as epoch microseconds. ComplexDataTypeParser built the value via Timestamp.from(instant), which anchors an absolute instant that getString()/getObject() then re-render in the JVM default timezone, producing a spurious offset (e.g. a -5h shift) for nested timestamps while scalar TIMESTAMP retrieval was unaffected. Build the Timestamp from the UTC wall-clock instead, mirroring the scalar conversion path (ArrowToJavaObjectConverter.convertToTimestamp), so the JVM zone cancels out on render. This also fixes nested timestamps in ARRAY and MAP, which share the same parsing path. Adds unit tests (STRUCT and ARRAY) that force a non-UTC JVM zone and assert no shift. Signed-off-by: Sreekanth Vadigi <sreekanth.vadigi@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
With
EnableComplexDatatypeSupport=1, retrieving a TIMESTAMP field nested inside a complex type (STRUCT/ARRAY/MAP) returned a timezone-shifted value, while scalar TIMESTAMP retrieval was correct.For example, on a JVM with a UTC-5 default timezone, a scalar timestamp returned
2017-03-26 01:01:02.345(correct), but the same value inside a STRUCT returned2017-03-25 20:01:02.345(a -5h shift). BothgetString()andgetObject()were affected.Root cause
Arrow serializes nested TIMESTAMP fields as epoch microseconds.
ComplexDataTypeParser.convertPrimitive()built the value withTimestamp.from(instant), which anchors an absolute instant.getString()/getObject()then re-render it in the JVM default timezone, producing the offset. The scalar path (ArrowToJavaObjectConverter.convertToTimestamp) avoids this by rebuilding aLocalDateTimeand usingTimestamp.valueOf(...), so the JVM zone cancels out on render.Fix
Build the
Timestampfrom the UTC wall-clock (Timestamp.valueOf(LocalDateTime.ofInstant(instant, ZoneOffset.UTC))), mirroring the scalar path. SinceconvertPrimitive()is shared by struct/array/map parsing, this fixes nested timestamps in all three.This is a no-op for UTC JVMs (where
Timestamp.fromalready produced the correct result), so there is no regression for the previously-correct case.Tests
Added unit tests for the STRUCT and ARRAY paths that force a non-UTC JVM default timezone (
America/Bogota, UTC-5, no DST) and assert no shift. Verified they fail against the old code (expected: <2017-03-26 01:01:02.345> but was: <2017-03-25 20:01:02.345>) and pass with the fix.This pull request and its description were written by Isaac.