[SPARK-54588][SQL] Add time_format function to convert TIME values to formatted string representations#53320
[SPARK-54588][SQL] Add time_format function to convert TIME values to formatted string representations#53320vinodkc wants to merge 2 commits into
Conversation
MaxGekk
left a comment
There was a problem hiding this comment.
1 blocking, 1 non-blocking, 3 nits. A clean, well-tested TIME analogue of date_format (caching, eval/codegen parity, nulls, and since all correct); one error-quality issue plus doc nits.
Note: this PR is closed-unmerged — leaving this for the record / in case the work continues in a successor PR.
Correctness (1)
time_formatleaks a rawjava.time.temporal.UnsupportedTemporalTypeExceptionfor a date-field pattern (e.g.'HH:MM:ss', whereMM= month) instead of a Spark error — syntactically-invalid patterns are wrapped cleanly, but a valid-but-inapplicable letter is not. See inline ontimeExpressions.scala.
Suggestions (1)
- No nanosecond (
SSSSSSSSS, precision 7–9) fractional-second coverage; tests stop at micros. See inline onTimeExpressionsSuite.scala.
Nits (3)
time.sqlcomment claimsMM"returns epoch month (01)" but the golden output for that query is the exception above — factually wrong. Plus a broken@paramfragment infunctions.scalaand "date format" → "time format" in the@ExpressionDescription. See inline.
@vinodkc — would you consider reopening this PR to continue the work? The function is in good shape; the one blocking item is error-handling polish for date-field patterns.
| val formatter = formatterOption.getOrElse { | ||
| TimeFormatter(format.toString, TimeFormatter.defaultLocale, isParsing = false) | ||
| } | ||
| UTF8String.fromString(formatter.format(nanos)) |
There was a problem hiding this comment.
A format that references a date-only field (e.g. MM, yyyy, dd) leaks a raw java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: MonthOfYear here — see the golden output for SELECT time_format(TIME'14:30:45', 'HH:MM:ss') in time.sql.out. TimeFormatter.validatePatternString passes (MM is a valid letter), then LocalTime.format throws at runtime because a TIME has no date fields. Syntactically-invalid patterns are already wrapped cleanly (the 'invalid[[[' test), so this valid-but-inapplicable-letter path is the gap; date_format never hits it (timestamps carry all fields). Suggest validating the pattern to time-applicable fields, or catching the UnsupportedTemporalTypeException and raising a clear Spark error (an INVALID_DATETIME_PATTERN-style message scoped to TIME) rather than leaking the JDK exception.
| SELECT time_format(TIME'14:30:45', 'HH-mm-ss'); | ||
|
|
||
| -- Test common mistake: MM (month) vs mm (minute) | ||
| -- MM is for month, mm is for minute - TIME has no date so MM returns epoch month (01) |
There was a problem hiding this comment.
This comment says MM "returns epoch month (01)", but the golden output for the query below is UnsupportedTemporalTypeException: Unsupported field: MonthOfYear — it throws, it doesn't return 01. Date fields don't default to epoch for a TIME. Worth correcting the comment to say date fields are unsupported and raise an error (this also ties to the error-handling suggestion on the expression).
| * @param time | ||
| * A column of time values to be formatted. | ||
| * @param format | ||
| * A time format string. for valid patterns. |
There was a problem hiding this comment.
@param format reads "A time format string. for valid patterns." — a dangling fragment (a clause referencing the Datetime Patterns doc was dropped). Complete it, e.g. "A time format string. See <a href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html\">Datetime Patterns for valid patterns."
|
|
||
| // scalastyle:off line.size.limit | ||
| @ExpressionDescription( | ||
| usage = "_FUNC_(time, format) - Converts a time to a value of string in the format specified by the date format given by the second argument.", |
There was a problem hiding this comment.
Nit: the usage string says "...in the format specified by the date format given by the second argument" — copied from date_format. This function formats a TIME, so it should read "time format".
| TimeFormat(Literal(localTime(9, 5, 0), TimeType()), Literal("hh:mm:ss a")), | ||
| "09:05:00 AM") | ||
| checkEvaluation( | ||
| TimeFormat(timeLit, Literal("HH:mm:ss.SSSSSS")), |
There was a problem hiding this comment.
Optional: the fractional-second cases stop at 6 digits (micros). TIME supports precision up to 9 (nanos), so a TIME'…123456789' with SSSSSSSSS case (which checkEvaluation exercises in both interpreted and codegen paths) would lock in nanosecond formatting.
38609b1 to
2e1fb3a
Compare
2e1fb3a to
0b12215
Compare
What changes were proposed in this pull request?
This PR adds a new
time_formatfunction that converts TIME data type values to formatted string representations, providing functionality similar to date_format but specifically designed for TIME values.Why are the changes needed?
Users need a standard way to format TIME values as strings for:
Does this PR introduce any user-facing change?
Yes, this PR adds a new public API function.
Scala API
Python API
SQL Usage
Format Pattern Support
HH:mm:ss14:30:45hh:mm:ss a02:30:45 PMH:mm9:15HH:mm:ss.SSS14:30:45.123HH:mm:ss.SSSSSS14:30:45.123456HH-mm-ss14-30-45'Time:' HH:mmTime: 14:30How was this patch tested?
Added tests in TimeExpressionsSuite, TimeFunctionsSuiteBase
Was this patch authored or co-authored using generative AI tooling?
No