Fix datetime casting macro for ClickHouse#851
Conversation
Previously, when a `NULL` value was encountered in the `timestamp_field`, the `toString` function would convert this NULL value to a `Nullable(String)`. This would result in the split function not working properly, since it expects non-nullable input values. The datetime parsing can be simplified by using `parseDateTimeBestEffortOrNull` instead of doing manual parsing of the date, bypassing the issue of `Nullable` types entirely.
WalkthroughReplaced ClickHouse timestamp parsing in a macro: moved from slicing/splitting and toDateTime to parseDateTimeBestEffortOrNull(toString(...)) inside COALESCE and changed the fallback to toDateTime('1970-01-01 00:00:00', 'UTC'). Macro signature unchanged. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant C as Caller (dbt model)
participant M as Macro: clickhouse__edr_cast_as_timestamp
participant CH as ClickHouse
C->>M: Provide timestamp_field
M->>CH: parseDateTimeBestEffortOrNull(toString(timestamp_field), 'UTC')
alt Parsed OK
CH-->>M: Parsed Timestamp (UTC-aware)
M-->>C: Parsed Timestamp
else Parse fails (NULL)
CH-->>M: NULL
M->>M: COALESCE(NULL, toDateTime('1970-01-01 00:00:00', 'UTC'))
M-->>C: 1970-01-01 00:00:00 (UTC)
end
note over M,CH: Replaced split/slice + toDateTime with best-effort UTC-aware parsing
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
👋 @StevenReitsma |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
macros/utils/data_types/cast_column.sql (2)
26-26: Prefer CAST over toString to avoid TZ-format surprises and keep types consistent.Using CAST aligns with the rest of the macros and avoids toString’s server-TZ formatting for DateTime inputs. Functionally equivalent for strings/nullable strings.
Apply this diff:
- parseDateTimeBestEffortOrNull(toString({{ timestamp_field }})), + parseDateTimeBestEffortOrNull(cast({{ timestamp_field }} as {{ elementary.edr_type_string() }})),
27-27: Consider explicit timezone on the epoch fallback.toDateTime without TZ uses server settings; make it deterministic if you expect UTC everywhere.
- toDateTime('1970-01-01 00:00:00') + toDateTime('1970-01-01 00:00:00', 'UTC')If edr_type_timestamp resolves to DateTime64, you may want to switch the fallback to toDateTime64 with the appropriate scale.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
macros/utils/data_types/cast_column.sql(1 hunks)
🔇 Additional comments (1)
macros/utils/data_types/cast_column.sql (1)
24-29: Good fix: safer parsing for Nullable inputs in ClickHouse.Switching to parseDateTimeBestEffortOrNull avoids the Nullable(String) split failure and simplifies the logic. Keeps the epoch fallback behavior unchanged.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
macros/utils/data_types/cast_column.sql (3)
26-27: Align types to avoid implicit casts: prefer DateTime64 on both sides of COALESCE.Depending on ClickHouse version, parseDateTimeBestEffortOrNull can yield DateTime64 while the fallback is DateTime. Let’s keep both expressions the same type/precision to prevent surprises and preserve sub-second precision.
Apply:
- parseDateTimeBestEffortOrNull(toString({{ timestamp_field }}), 'UTC'), - toDateTime('1970-01-01 00:00:00', 'UTC') + parseDateTime64BestEffortOrNull(toString({{ timestamp_field }}), 3, 'UTC'), + toDateTime64('1970-01-01 00:00:00', 3, 'UTC')If your ClickHouse version lacks parseDateTime64BestEffortOrNull, keep your current change but consider casting the first branch to DateTime to enforce consistency.
26-27: Optional: consider numeric epoch inputs (seconds/milliseconds).If timestamp_field can be numeric epochs, add an extra branch. Example (keeps logic simple for seconds):
- parseDateTimeBestEffortOrNull(toString({{ timestamp_field }}), 'UTC'), + parseDateTime64BestEffortOrNull(toString({{ timestamp_field }}), 3, 'UTC'), + -- Optional: handle integer epoch seconds + toDateTime64OrNull(CAST({{ timestamp_field }} AS Int64), 3, 'UTC'), toDateTime64('1970-01-01 00:00:00', 3, 'UTC')If milliseconds are possible, you’ll need a scale check (e.g., length or threshold) and divide by 1000 before toDateTime64.
24-29: Add unit/SQL tests for edge cases.Recommend cases: NULL, plain ISO8601 (with/without Z), with offset (+02:00), with fractional seconds, already-typed DateTime/DateTime64, invalid string, and integer epoch (if applicable).
I can add a dbt test macro + fixtures for ClickHouse covering these.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
macros/utils/data_types/cast_column.sql(1 hunks)
🔇 Additional comments (2)
macros/utils/data_types/cast_column.sql (2)
24-29: Good fix: removes fragile split and handles NULLs via best-effort parsing.This simplification avoids the Nullable(String) + split failure and is more robust across ISO8601 variants. Nice.
26-27: Confirm timezone semantics are desired.Passing 'UTC' normalizes timestamps to UTC even when the source lacks/has a different TZ. Validate this aligns with downstream expectations (previous code implicitly used server TZ for toDateTime fallback).
|
Approved! Thank you for your contribution! |
Previously, when a
NULLvalue was encountered in thetimestamp_field, thetoStringfunction would convert this NULL value to aNullable(String). This would result in the split function not working properly, since it expects non-nullable input values. The datetime parsing can be simplified by usingparseDateTimeBestEffortOrNullinstead of doing manual parsing of the date, bypassing the issue ofNullabletypes entirely.Summary by CodeRabbit
Bug Fixes
Other