Skip to content

feat(temporal): add Spark-style unix extractors and weekday#6920

Open
BABTUNA wants to merge 3 commits into
Eventual-Inc:mainfrom
BABTUNA:feat/temporal-unix-extractors
Open

feat(temporal): add Spark-style unix extractors and weekday#6920
BABTUNA wants to merge 3 commits into
Eventual-Inc:mainfrom
BABTUNA:feat/temporal-unix-extractors

Conversation

@BABTUNA
Copy link
Copy Markdown
Contributor

@BABTUNA BABTUNA commented May 12, 2026

Summary

Implements six more functions from issue #3798 by adding Spark-style unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, and weekday as wrappers over existing Daft primitives.

The unix extractors are thin Python and SQL wrappers over the existing to_unix_epoch with an explicit time_unit. weekday is an alias over day_of_week, which already uses Spark's Monday=0 / Sunday=6 numbering internally. No new Rust UDFs were needed.

Why

The issue asks for parity with PySpark's temporal functions. This PR focuses on:

  • Spark-named unix-time extractors at fixed resolutions, mirroring the already-shipped timestamp_seconds/millis/micros constructors in reverse.
  • A weekday alias so users migrating from Spark don't have to know about Daft's day_of_week.

Changes Made

  • Add Python wrappers in daft/functions/datetime.py:
    • unix_seconds(expr), unix_millis(expr), unix_micros(expr) delegate to to_unix_epoch with time_unit="s"/"ms"/"us".
    • unix_timestamp(expr) and to_unix_timestamp(expr) are Spark-named aliases for unix_seconds.
    • weekday(expr) delegates to day_of_week (Daft already uses Mon=0, Sun=6).
  • Export all six names from daft/functions/__init__.py in alphabetical order.
  • Change mod unix_timestamp to pub mod unix_timestamp in src/daft-functions-temporal/src/lib.rs so the SQL crate can access the underlying UnixTimestamp UDF.
  • Add SQL handlers SQLUnixSeconds, SQLUnixMillis, SQLUnixMicros, SQLUnixTimestamp, and SQLWeekday in src/daft-sql/src/modules/temporal.rs. All five unix handlers go through a small build_unix_extractor helper that injects the right time_unit literal. The SQL planner registers to_unix_timestamp as an alias for unix_timestamp.
  • Add focused tests in tests/dataframe/test_temporals.py:
    • One test each for unix_seconds, unix_millis, unix_micros.
    • unix_timestamp and to_unix_timestamp alias equivalence.
    • Null propagation across all three extractors.
    • Parameterized 7-day weekday numbering check (Mon=0 through Sun=6).
    • weekday null propagation.
    • SQL integration covering all six functions in a single query.

Behavior

  • unix_seconds(date(2021, 1, 1)) returns 1609459200.
  • unix_millis(date(2021, 1, 1)) returns 1609459200000.
  • unix_micros(date(2021, 1, 1)) returns 1609459200000000.
  • unix_timestamp and to_unix_timestamp are equivalent to unix_seconds.
  • weekday(date(2021, 1, 1)) returns 4 (Friday, with Mon=0).
  • Null in the input row propagates to null in the output.
  • Only Spark's one-argument form unix_timestamp(col) is implemented. The zero-arg form (unix_timestamp() returning current epoch) and the two-arg form (unix_timestamp(str, format) parsing a formatted string) are out of scope here and could be follow-ups.

Test Plan

  • cargo check -p daft-functions-temporal -p daft-sql
  • make build
  • DAFT_RUNNER=native pytest -q tests/dataframe/test_temporals.py -k "unix_seconds or unix_millis or unix_micros or unix_timestamp_alias or unix_extractors or weekday"

Related Issues

Adds unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp,
and weekday for Spark parity (Eventual-Inc#3798). The unix extractors are thin Python and
SQL wrappers over the existing to_unix_epoch with an explicit time_unit.
weekday wraps day_of_week directly since Daft's day_of_week already uses
Spark's Monday=0, Sunday=6 numbering.
@github-actions github-actions Bot added the feat label May 12, 2026
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Greptile Summary

This PR adds six Spark-style temporal functions (unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, weekday) as thin Python wrappers over the existing to_unix_epoch / day_of_week primitives, with matching SQL handlers and Expression methods for method-chaining parity.

  • Python functions and Expression methods are wired to existing to_unix_epoch (via _call_builtin_scalar_fn) and day_of_week; no new Rust UDFs were written.
  • SQL handlers use a shared build_unix_extractor helper that injects the right time_unit string literal into the existing UnixTimestamp UDF, and SQLWeekday directly dispatches to DayOfWeek.
  • The unix_timestamp module in daft-functions-temporal is changed from mod to pub mod so daft-sql can reference UnixTimestamp.
  • Tests cover value correctness, null propagation, alias equivalence, all-seven-weekday numbering, and SQL integration.

Confidence Score: 5/5

Safe to merge. All six new functions are thin wrappers over well-tested existing UDFs with no new Rust logic, and the test suite covers value correctness, null propagation, alias equivalence, and SQL integration.

The changes introduce no new execution logic — they wire Spark-named aliases to the already-proven UnixTimestamp and DayOfWeek UDFs. The SQL and Python paths converge on the same underlying function with the same argument handling. Tests are thorough and the weekday numbering is verified against all seven days of the week.

No files require special attention.

Important Files Changed

Filename Overview
daft/functions/datetime.py Adds six new public functions (unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, weekday) as thin wrappers over existing primitives. Docstrings, return types, and delegation are correct.
daft/expressions/expressions.py Adds the six new Expression methods following the same inline-import pattern already used by day_of_week, to_unix_epoch, and every other temporal method in the file. Placement is consistent with existing ordering.
daft/functions/init.py Exports the six new names; both the import block and all are in alphabetical order.
src/daft-functions-temporal/src/lib.rs Single-line visibility change: unix_timestamp module promoted from mod to pub mod so the SQL crate can import UnixTimestamp. No logic changes.
src/daft-sql/src/modules/temporal.rs Adds SQL handlers for all six functions. build_unix_extractor correctly constructs a BuiltinScalarFn with UnixTimestamp (name = 'to_unix_epoch') and the right time_unit string literal, matching the Python path.
tests/dataframe/test_temporals.py Good coverage: per-unit value checks, null propagation, alias equivalence, all-seven-weekday parametrized cases, SQL integration, and method-form tests.

Reviews (3): Last reviewed commit: "chore(temporal): address Greptile review..." | Re-trigger Greptile

Comment on lines 84 to 91
next_day,
unix_seconds,
unix_millis,
unix_micros,
unix_timestamp,
to_unix_timestamp,
weekday,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 The new imports are not in alphabetical order within the from .datetime import block. to_unix_timestamp (starts with t) is placed after the unix_* entries (start with u), and within the unix_* group the order is unix_seconds, unix_millis, unix_micros instead of the correct unix_micros, unix_millis, unix_seconds. The __all__ list in the same file has them in the correct order, so this is an inconsistency within the file.

Suggested change
next_day,
unix_seconds,
unix_millis,
unix_micros,
unix_timestamp,
to_unix_timestamp,
weekday,
)
next_day,
to_unix_timestamp,
unix_micros,
unix_millis,
unix_seconds,
unix_timestamp,
weekday,
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorted the imports alphabetically to match the all ordering.

Comment on lines +1310 to 1327
def weekday(expr: Expression) -> Expression:
"""Returns the day of the week with Monday=0 numbering.

Mirrors Spark's ``weekday``. Equivalent to :func:`day_of_week` since Daft already
uses Monday=0, Sunday=6 numbering internally.

Args:
expr: A Date or Timestamp expression.

Returns:
Expression: a UInt32 expression with the weekday (Mon=0, Sun=6).
"""
return day_of_week(expr)


def current_date() -> Expression:
"""Returns the current date (UTC).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Missing Expression-class counterparts

All existing temporal extractors (day_of_week, to_unix_epoch, unix_date, week_of_year, etc.) have corresponding Expression methods in daft/expressions/expressions.py. The six new functions (weekday, unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp) do not. Users who prefer the method-chaining style (e.g., df["ts"].weekday(), df["ts"].unix_seconds()) have no equivalent path, which is inconsistent with every other temporal function in the module.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added all six as Expression methods (unix_seconds, unix_millis, unix_micros, unix_timestamp, to_unix_timestamp, weekday) plus a method-form test, so df['ts'].weekday() / df['ts'].unix_seconds() etc. work like the rest of the temporal extractors.

Sort the new from .datetime imports alphabetically to match the __all__
list. Add Expression method forms (unix_seconds, unix_millis, unix_micros,
unix_timestamp, to_unix_timestamp, weekday) so the new functions are
reachable via method-chaining like every other temporal extractor.
@BABTUNA
Copy link
Copy Markdown
Contributor Author

BABTUNA commented May 12, 2026

@greptile

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 12, 2026

Want your agent to iterate on Greptile's feedback? Try greploops.

@madvart madvart self-requested a review May 12, 2026 20:00
@BABTUNA
Copy link
Copy Markdown
Contributor Author

BABTUNA commented May 13, 2026

@greptile re-review

…tractors

Resolved conflicts in lib.rs (kept pub mod unix_timestamp), __init__.py
__all__ (kept both weekday and weekofyear), test_temporals.py imports +
tests (kept both unix/weekday and add_months/months_between blocks),
and temporal.rs SQL module (kept both AddMonths/MonthsBetween and Unix
extractor handlers).

Added test_unix_seconds_timestamp_input parameterized across midnight,
mid-day, and the 1970-01-01 00:00:01 epoch boundary to exercise
Timestamp inputs across all three resolutions.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant