[SPARK-57748][SQL] Use a dedicated tree-pattern bit for the TIME to TIMESTAMP cast rewrite in ComputeCurrentTime#56888
Conversation
|
@MaxGekk thanks for the detailed writeup on the JIRA. One deliberate deviation from the proposed shape worth flagging: the JIRA framed the bit as the TIMESTAMP_NTZ family (CAST_TO_TIMESTAMP_NTZ), but ComputeCurrentTime also rewrites TIME -> TIMESTAMP_LTZ casts (Cast.isTimeToTimestampLTZ), and a standalone CAST(TIME '10:00' AS TIMESTAMP) does not co-occur with a CURRENT_LIKE node. With an NTZ-only bit, narrowing the predicate from the broad CAST would stop reaching those LTZ casts and regress them. So I keyed the bit on any timestamp target type and named it CAST_TO_TIMESTAMP; the node-level isTimeToTimestamp{NTZ,LTZ} guards are unchanged. Happy to rename or split NTZ/LTZ if you prefer a different shape. |
81937d0 to
d732fb5
Compare
MaxGekk
left a comment
There was a problem hiding this comment.
0 blocking, 1 non-blocking, 0 nits.
Correct, well-tested, strictly-better pruning optimization — the narrowed predicate is exactly equivalent for the rule's output, and it's the target-keyed dedicated pattern that SPARK-57618's review concluded was the safe way to do this.
Suggestions (1)
Cast.scala:693: add a comment linking theCAST_TO_TIMESTAMPtarget set to theisTimeToTimestamp{NTZ,LTZ}guards (and recording why the key must stay the target type) to prevent future drift — see inline.
Verification
- Pruning equivalence:
CAST_TO_TIMESTAMPis set on exactly the 4 timestamp target types the twoisTimeToTimestamp{NTZ,LTZ}guards accept, so everyCastthe rule rewrites is still reached; the old broadCASTonly added traversal the rule ignored. Target-only ⇒ superset on the source axis (no missed rewrite).final override+ standard upward bit propagation. - Safe against the eager-construction hazard: the match keys on the constructor-provided target
dataType, never onchild.dataType— node patterns are computed at construction before the child is resolved, where readingchild.dataTypethrows (theOuterReference/CREATE FUNCTION ... RETURNS TABLEbreak removed in51136ecb5e2). This is exactly the "only the always-available target type is a safe key" conclusion from the SPARK-57618 review, so this dedicated target-keyed bit is the sanctioned follow-up to the broadCASTpredicate. - The
SPARK-57748: TIME->TIMESTAMP cast is rewritten even with no CURRENT_LIKE nodetest directly covers the regression the narrowing risked.
|
|
||
| final override def nodePatternsInternal(): Seq[TreePattern] = Seq(CAST) | ||
| final override def nodePatternsInternal(): Seq[TreePattern] = dataType match { | ||
| case _: TimestampNTZType | _: TimestampNTZNanosType | |
There was a problem hiding this comment.
Non-blocking (maintainability): CAST_TO_TIMESTAMP must stay set on a superset of the targets that Cast.isTimeToTimestampNTZ / isTimeToTimestampLTZ accept — otherwise ComputeCurrentTime silently stops reaching a rewrite it should perform (a CURRENT_DATE-derived TIME->TIMESTAMP left un-stabilized), and no current test would catch the drift since they're two independent match lists. Consider a short comment here cross-referencing those guards (and the rule).
Worth also recording why this keys on the target dataType and not child.dataType: node patterns are computed eagerly at construction before the child is resolved, so reading child.dataType throws (the OuterReference / CREATE FUNCTION ... RETURNS TABLE case removed in 51136ec). A future attempt to narrow the bit by also matching the source type would reintroduce that break — the target type is the only safe key.
There was a problem hiding this comment.
Added a comment above nodePatternsInternal recording both points: the superset invariant w.r.t. ComputeCurrentTime / isTimeToTimestamp{NTZ,LTZ}, and why we key on the target type only (node patterns are computed eagerly at construction before the child is resolved). See bab3bfe.
…TAMP cast rewrites in ComputeCurrentTime Replaces the broad CAST pruning predicate in ComputeCurrentTime with a dedicated CAST_TO_TIMESTAMP tree-pattern bit set on Cast nodes targeting any timestamp type (NTZ or LTZ family). The rule now only descends into plans containing a TIME->TIMESTAMP cast; node-level isTimeToTimestampNTZ/isTimeToTimestampLTZ guards keep rewrite semantics unchanged.
d732fb5 to
bab3bfe
Compare
What changes were proposed in this pull request?
Introduce a dedicated
CAST_TO_TIMESTAMPtree-pattern bit, set onCastnodes whose target type is any timestamp type (TIMESTAMP_NTZ/TIMESTAMP_NTZ(p)/TIMESTAMP/TIMESTAMP_LTZ(p)).ComputeCurrentTime's pruning predicate changes fromCURRENT_LIKE || CASTtoCURRENT_LIKE || CAST_TO_TIMESTAMP. The node-levelisTimeToTimestampNTZ/isTimeToTimestampLTZguards are unchanged, so rewrite semantics are identical.Why are the changes needed?
SPARK-57618 widened the rule's pruning predicate to the broad
CASTbit so it could findCAST(TIME AS TIMESTAMP_NTZ/LTZ)casts (whose date fields derive fromCURRENT_DATE). ButCASTappears in nearly every query, soComputeCurrentTimenow traverses the full expression tree of essentially every plan even though the TIME -> TIMESTAMP rewrite fires rarely. A dedicated bit keyed on the timestamp target type restores effective pruning while keeping the rewrite reachable.Note: the JIRA proposed an NTZ-only bit. I broadened it to all timestamp targets because the rule also rewrites TIME -> TIMESTAMP_LTZ, and a standalone
CAST(TIME '10:00' AS TIMESTAMP)does not co-occur with aCURRENT_LIKEnode, so an NTZ-only bit would stop reaching those casts and regress them. HenceCAST_TO_TIMESTAMP.Does this PR introduce any user-facing change?
No.
How was this patch tested?
New
ComputeCurrentTimeSuiteassertions: the bit is set for NTZ and LTZ timestamp targets, absent for non-timestamp targets, independent of the source type, and the TIME -> TIMESTAMP rewrites still fire. ExistingComputeCurrentTimeSuite/CastSuite/CastWithAnsiSuitepass (158 tests).Was this patch authored or co-authored using generative AI tooling?
Authored with assistance by Claude Opus 4.8.