feat(temporal): add Spark-style unix extractors and weekday#6920
feat(temporal): add Spark-style unix extractors and weekday#6920BABTUNA wants to merge 3 commits into
Conversation
Adds unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, and weekday for Spark parity (Eventual-Inc#3798). The unix extractors are thin Python and SQL wrappers over the existing to_unix_epoch with an explicit time_unit. weekday wraps day_of_week directly since Daft's day_of_week already uses Spark's Monday=0, Sunday=6 numbering.
Greptile SummaryThis PR adds six Spark-style temporal functions (
Confidence Score: 5/5Safe to merge. All six new functions are thin wrappers over well-tested existing UDFs with no new Rust logic, and the test suite covers value correctness, null propagation, alias equivalence, and SQL integration. The changes introduce no new execution logic — they wire Spark-named aliases to the already-proven UnixTimestamp and DayOfWeek UDFs. The SQL and Python paths converge on the same underlying function with the same argument handling. Tests are thorough and the weekday numbering is verified against all seven days of the week. No files require special attention. Important Files Changed
Reviews (3): Last reviewed commit: "chore(temporal): address Greptile review..." | Re-trigger Greptile |
| next_day, | ||
| unix_seconds, | ||
| unix_millis, | ||
| unix_micros, | ||
| unix_timestamp, | ||
| to_unix_timestamp, | ||
| weekday, | ||
| ) |
There was a problem hiding this comment.
The new imports are not in alphabetical order within the
from .datetime import block. to_unix_timestamp (starts with t) is placed after the unix_* entries (start with u), and within the unix_* group the order is unix_seconds, unix_millis, unix_micros instead of the correct unix_micros, unix_millis, unix_seconds. The __all__ list in the same file has them in the correct order, so this is an inconsistency within the file.
| next_day, | |
| unix_seconds, | |
| unix_millis, | |
| unix_micros, | |
| unix_timestamp, | |
| to_unix_timestamp, | |
| weekday, | |
| ) | |
| next_day, | |
| to_unix_timestamp, | |
| unix_micros, | |
| unix_millis, | |
| unix_seconds, | |
| unix_timestamp, | |
| weekday, | |
| ) |
There was a problem hiding this comment.
Sorted the imports alphabetically to match the all ordering.
| def weekday(expr: Expression) -> Expression: | ||
| """Returns the day of the week with Monday=0 numbering. | ||
|
|
||
| Mirrors Spark's ``weekday``. Equivalent to :func:`day_of_week` since Daft already | ||
| uses Monday=0, Sunday=6 numbering internally. | ||
|
|
||
| Args: | ||
| expr: A Date or Timestamp expression. | ||
|
|
||
| Returns: | ||
| Expression: a UInt32 expression with the weekday (Mon=0, Sun=6). | ||
| """ | ||
| return day_of_week(expr) | ||
|
|
||
|
|
||
| def current_date() -> Expression: | ||
| """Returns the current date (UTC). | ||
|
|
There was a problem hiding this comment.
Missing Expression-class counterparts
All existing temporal extractors (day_of_week, to_unix_epoch, unix_date, week_of_year, etc.) have corresponding Expression methods in daft/expressions/expressions.py. The six new functions (weekday, unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp) do not. Users who prefer the method-chaining style (e.g., df["ts"].weekday(), df["ts"].unix_seconds()) have no equivalent path, which is inconsistent with every other temporal function in the module.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
Added all six as Expression methods (unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, weekday) plus a method-form test, so df['ts'].weekday() / df['ts'].unix_seconds() etc. work like the rest of the temporal extractors.
Sort the new from .datetime imports alphabetically to match the __all__ list. Add Expression method forms (unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, weekday) so the new functions are reachable via method-chaining like every other temporal extractor.
|
@greptile |
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
|
@greptile re-review |
…tractors Resolved conflicts in lib.rs (kept pub mod unix_timestamp), __init__.py __all__ (kept both weekday and weekofyear), test_temporals.py imports + tests (kept both unix/weekday and add_months/months_between blocks), and temporal.rs SQL module (kept both AddMonths/MonthsBetween and Unix extractor handlers). Added test_unix_seconds_timestamp_input parameterized across midnight, mid-day, and the 1970-01-01 00:00:01 epoch boundary to exercise Timestamp inputs across all three resolutions.
Summary
Implements six more functions from issue #3798 by adding Spark-style
unix_seconds,unix_millis,unix_micros,unix_timestamp,to_unix_timestamp, andweekdayas wrappers over existing Daft primitives.The unix extractors are thin Python and SQL wrappers over the existing
to_unix_epochwith an explicittime_unit.weekdayis an alias overday_of_week, which already uses Spark's Monday=0 / Sunday=6 numbering internally. No new Rust UDFs were needed.Why
The issue asks for parity with PySpark's temporal functions. This PR focuses on:
timestamp_seconds/millis/microsconstructors in reverse.weekdayalias so users migrating from Spark don't have to know about Daft'sday_of_week.Changes Made
daft/functions/datetime.py:unix_seconds(expr),unix_millis(expr),unix_micros(expr)delegate toto_unix_epochwithtime_unit="s"/"ms"/"us".unix_timestamp(expr)andto_unix_timestamp(expr)are Spark-named aliases forunix_seconds.weekday(expr)delegates today_of_week(Daft already uses Mon=0, Sun=6).daft/functions/__init__.pyin alphabetical order.mod unix_timestamptopub mod unix_timestampinsrc/daft-functions-temporal/src/lib.rsso the SQL crate can access the underlyingUnixTimestampUDF.SQLUnixSeconds,SQLUnixMillis,SQLUnixMicros,SQLUnixTimestamp, andSQLWeekdayinsrc/daft-sql/src/modules/temporal.rs. All five unix handlers go through a smallbuild_unix_extractorhelper that injects the righttime_unitliteral. The SQL planner registersto_unix_timestampas an alias forunix_timestamp.tests/dataframe/test_temporals.py:unix_seconds,unix_millis,unix_micros.unix_timestampandto_unix_timestampalias equivalence.weekdaynull propagation.Behavior
unix_seconds(date(2021, 1, 1))returns1609459200.unix_millis(date(2021, 1, 1))returns1609459200000.unix_micros(date(2021, 1, 1))returns1609459200000000.unix_timestampandto_unix_timestampare equivalent tounix_seconds.weekday(date(2021, 1, 1))returns4(Friday, with Mon=0).unix_timestamp(col)is implemented. The zero-arg form (unix_timestamp()returning current epoch) and the two-arg form (unix_timestamp(str, format)parsing a formatted string) are out of scope here and could be follow-ups.Test Plan
cargo check -p daft-functions-temporal -p daft-sqlmake buildDAFT_RUNNER=native pytest -q tests/dataframe/test_temporals.py -k "unix_seconds or unix_millis or unix_micros or unix_timestamp_alias or unix_extractors or weekday"Related Issues