Skip to content

[SPARK-57748][SQL] Use a dedicated tree-pattern bit for the TIME to TIMESTAMP cast rewrite in ComputeCurrentTime#56888

Open
yadavay-amzn wants to merge 1 commit into
apache:masterfrom
yadavay-amzn:SPARK-57748
Open

[SPARK-57748][SQL] Use a dedicated tree-pattern bit for the TIME to TIMESTAMP cast rewrite in ComputeCurrentTime#56888
yadavay-amzn wants to merge 1 commit into
apache:masterfrom
yadavay-amzn:SPARK-57748

Conversation

@yadavay-amzn

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

Introduce a dedicated CAST_TO_TIMESTAMP tree-pattern bit, set on Cast nodes whose target type is any timestamp type (TIMESTAMP_NTZ / TIMESTAMP_NTZ(p) / TIMESTAMP / TIMESTAMP_LTZ(p)). ComputeCurrentTime's pruning predicate changes from CURRENT_LIKE || CAST to CURRENT_LIKE || CAST_TO_TIMESTAMP. The node-level isTimeToTimestampNTZ / isTimeToTimestampLTZ guards are unchanged, so rewrite semantics are identical.

Why are the changes needed?

SPARK-57618 widened the rule's pruning predicate to the broad CAST bit so it could find CAST(TIME AS TIMESTAMP_NTZ/LTZ) casts (whose date fields derive from CURRENT_DATE). But CAST appears in nearly every query, so ComputeCurrentTime now traverses the full expression tree of essentially every plan even though the TIME -> TIMESTAMP rewrite fires rarely. A dedicated bit keyed on the timestamp target type restores effective pruning while keeping the rewrite reachable.

Note: the JIRA proposed an NTZ-only bit. I broadened it to all timestamp targets because the rule also rewrites TIME -> TIMESTAMP_LTZ, and a standalone CAST(TIME '10:00' AS TIMESTAMP) does not co-occur with a CURRENT_LIKE node, so an NTZ-only bit would stop reaching those casts and regress them. Hence CAST_TO_TIMESTAMP.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

New ComputeCurrentTimeSuite assertions: the bit is set for NTZ and LTZ timestamp targets, absent for non-timestamp targets, independent of the source type, and the TIME -> TIMESTAMP rewrites still fire. Existing ComputeCurrentTimeSuite / CastSuite / CastWithAnsiSuite pass (158 tests).

Was this patch authored or co-authored using generative AI tooling?

Authored with assistance by Claude Opus 4.8.

@yadavay-amzn

Copy link
Copy Markdown
Contributor Author

@MaxGekk thanks for the detailed writeup on the JIRA. One deliberate deviation from the proposed shape worth flagging: the JIRA framed the bit as the TIMESTAMP_NTZ family (CAST_TO_TIMESTAMP_NTZ), but ComputeCurrentTime also rewrites TIME -> TIMESTAMP_LTZ casts (Cast.isTimeToTimestampLTZ), and a standalone CAST(TIME '10:00' AS TIMESTAMP) does not co-occur with a CURRENT_LIKE node. With an NTZ-only bit, narrowing the predicate from the broad CAST would stop reaching those LTZ casts and regress them. So I keyed the bit on any timestamp target type and named it CAST_TO_TIMESTAMP; the node-level isTimeToTimestamp{NTZ,LTZ} guards are unchanged. Happy to rename or split NTZ/LTZ if you prefer a different shape.

…TAMP cast rewrites in ComputeCurrentTime

Replaces the broad CAST pruning predicate in ComputeCurrentTime with a dedicated CAST_TO_TIMESTAMP tree-pattern bit set on Cast nodes targeting any timestamp type (NTZ or LTZ family). The rule now only descends into plans containing a TIME->TIMESTAMP cast; node-level isTimeToTimestampNTZ/isTimeToTimestampLTZ guards keep rewrite semantics unchanged.

@MaxGekk MaxGekk left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0 blocking, 1 non-blocking, 0 nits.
Correct, well-tested, strictly-better pruning optimization — the narrowed predicate is exactly equivalent for the rule's output, and it's the target-keyed dedicated pattern that SPARK-57618's review concluded was the safe way to do this.

Suggestions (1)

  • Cast.scala:693: add a comment linking the CAST_TO_TIMESTAMP target set to the isTimeToTimestamp{NTZ,LTZ} guards (and recording why the key must stay the target type) to prevent future drift — see inline.

Verification

  • Pruning equivalence: CAST_TO_TIMESTAMP is set on exactly the 4 timestamp target types the two isTimeToTimestamp{NTZ,LTZ} guards accept, so every Cast the rule rewrites is still reached; the old broad CAST only added traversal the rule ignored. Target-only ⇒ superset on the source axis (no missed rewrite). final override + standard upward bit propagation.
  • Safe against the eager-construction hazard: the match keys on the constructor-provided target dataType, never on child.dataType — node patterns are computed at construction before the child is resolved, where reading child.dataType throws (the OuterReference / CREATE FUNCTION ... RETURNS TABLE break removed in 51136ecb5e2). This is exactly the "only the always-available target type is a safe key" conclusion from the SPARK-57618 review, so this dedicated target-keyed bit is the sanctioned follow-up to the broad CAST predicate.
  • The SPARK-57748: TIME->TIMESTAMP cast is rewritten even with no CURRENT_LIKE node test directly covers the regression the narrowing risked.


final override def nodePatternsInternal(): Seq[TreePattern] = Seq(CAST)
final override def nodePatternsInternal(): Seq[TreePattern] = dataType match {
case _: TimestampNTZType | _: TimestampNTZNanosType |

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking (maintainability): CAST_TO_TIMESTAMP must stay set on a superset of the targets that Cast.isTimeToTimestampNTZ / isTimeToTimestampLTZ accept — otherwise ComputeCurrentTime silently stops reaching a rewrite it should perform (a CURRENT_DATE-derived TIME->TIMESTAMP left un-stabilized), and no current test would catch the drift since they're two independent match lists. Consider a short comment here cross-referencing those guards (and the rule).

Worth also recording why this keys on the target dataType and not child.dataType: node patterns are computed eagerly at construction before the child is resolved, so reading child.dataType throws (the OuterReference / CREATE FUNCTION ... RETURNS TABLE case removed in 51136ec). A future attempt to narrow the bit by also matching the source type would reintroduce that break — the target type is the only safe key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants