[SPARK-57748][SQL] Use a dedicated tree-pattern bit for the TIME to TIMESTAMP cast rewrite in ComputeCurrentTime#56888
Conversation
|
@MaxGekk thanks for the detailed writeup on the JIRA. One deliberate deviation from the proposed shape worth flagging: the JIRA framed the bit as the TIMESTAMP_NTZ family (CAST_TO_TIMESTAMP_NTZ), but ComputeCurrentTime also rewrites TIME -> TIMESTAMP_LTZ casts (Cast.isTimeToTimestampLTZ), and a standalone CAST(TIME '10:00' AS TIMESTAMP) does not co-occur with a CURRENT_LIKE node. With an NTZ-only bit, narrowing the predicate from the broad CAST would stop reaching those LTZ casts and regress them. So I keyed the bit on any timestamp target type and named it CAST_TO_TIMESTAMP; the node-level isTimeToTimestamp{NTZ,LTZ} guards are unchanged. Happy to rename or split NTZ/LTZ if you prefer a different shape. |
…TAMP cast rewrites in ComputeCurrentTime Replaces the broad CAST pruning predicate in ComputeCurrentTime with a dedicated CAST_TO_TIMESTAMP tree-pattern bit set on Cast nodes targeting any timestamp type (NTZ or LTZ family). The rule now only descends into plans containing a TIME->TIMESTAMP cast; node-level isTimeToTimestampNTZ/isTimeToTimestampLTZ guards keep rewrite semantics unchanged.
81937d0 to
d732fb5
Compare
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 1 non-blocking, 0 nits.
Correct, well-tested, strictly-better pruning optimization — the narrowed predicate is exactly equivalent for the rule's output, and it's the target-keyed dedicated pattern that SPARK-57618's review concluded was the safe way to do this.
Suggestions (1)
Cast.scala:693: add a comment linking theCAST_TO_TIMESTAMPtarget set to theisTimeToTimestamp{NTZ,LTZ}guards (and recording why the key must stay the target type) to prevent future drift — see inline.
Verification
- Pruning equivalence:
CAST_TO_TIMESTAMPis set on exactly the 4 timestamp target types the twoisTimeToTimestamp{NTZ,LTZ}guards accept, so everyCastthe rule rewrites is still reached; the old broadCASTonly added traversal the rule ignored. Target-only ⇒ superset on the source axis (no missed rewrite).final override+ standard upward bit propagation. - Safe against the eager-construction hazard: the match keys on the constructor-provided target
dataType, never onchild.dataType— node patterns are computed at construction before the child is resolved, where readingchild.dataTypethrows (theOuterReference/CREATE FUNCTION ... RETURNS TABLEbreak removed in51136ecb5e2). This is exactly the "only the always-available target type is a safe key" conclusion from the SPARK-57618 review, so this dedicated target-keyed bit is the sanctioned follow-up to the broadCASTpredicate. - The
SPARK-57748: TIME->TIMESTAMP cast is rewritten even with no CURRENT_LIKE nodetest directly covers the regression the narrowing risked.
|
|
||
| final override def nodePatternsInternal(): Seq[TreePattern] = Seq(CAST) | ||
| final override def nodePatternsInternal(): Seq[TreePattern] = dataType match { | ||
| case _: TimestampNTZType | _: TimestampNTZNanosType | |
There was a problem hiding this comment.
Non-blocking (maintainability): CAST_TO_TIMESTAMP must stay set on a superset of the targets that Cast.isTimeToTimestampNTZ / isTimeToTimestampLTZ accept — otherwise ComputeCurrentTime silently stops reaching a rewrite it should perform (a CURRENT_DATE-derived TIME->TIMESTAMP left un-stabilized), and no current test would catch the drift since they're two independent match lists. Consider a short comment here cross-referencing those guards (and the rule).
Worth also recording why this keys on the target dataType and not child.dataType: node patterns are computed eagerly at construction before the child is resolved, so reading child.dataType throws (the OuterReference / CREATE FUNCTION ... RETURNS TABLE case removed in 51136ec). A future attempt to narrow the bit by also matching the source type would reintroduce that break — the target type is the only safe key.
What changes were proposed in this pull request?
Introduce a dedicated
CAST_TO_TIMESTAMPtree-pattern bit, set onCastnodes whose target type is any timestamp type (TIMESTAMP_NTZ/TIMESTAMP_NTZ(p)/TIMESTAMP/TIMESTAMP_LTZ(p)).ComputeCurrentTime's pruning predicate changes fromCURRENT_LIKE || CASTtoCURRENT_LIKE || CAST_TO_TIMESTAMP. The node-levelisTimeToTimestampNTZ/isTimeToTimestampLTZguards are unchanged, so rewrite semantics are identical.Why are the changes needed?
SPARK-57618 widened the rule's pruning predicate to the broad
CASTbit so it could findCAST(TIME AS TIMESTAMP_NTZ/LTZ)casts (whose date fields derive fromCURRENT_DATE). ButCASTappears in nearly every query, soComputeCurrentTimenow traverses the full expression tree of essentially every plan even though the TIME -> TIMESTAMP rewrite fires rarely. A dedicated bit keyed on the timestamp target type restores effective pruning while keeping the rewrite reachable.Note: the JIRA proposed an NTZ-only bit. I broadened it to all timestamp targets because the rule also rewrites TIME -> TIMESTAMP_LTZ, and a standalone
CAST(TIME '10:00' AS TIMESTAMP)does not co-occur with aCURRENT_LIKEnode, so an NTZ-only bit would stop reaching those casts and regress them. HenceCAST_TO_TIMESTAMP.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New
ComputeCurrentTimeSuiteassertions: the bit is set for NTZ and LTZ timestamp targets, absent for non-timestamp targets, independent of the source type, and the TIME -> TIMESTAMP rewrites still fire. ExistingComputeCurrentTimeSuite/CastSuite/CastWithAnsiSuitepass (158 tests).Was this patch authored or co-authored using generative AI tooling?
Authored with assistance by Claude Opus 4.8.