Skip to content

Fix datetime casting macro for ClickHouse#851

Merged
michael-myaskovsky merged 3 commits into
elementary-data:masterfrom
StevenReitsma:patch-1
Sep 8, 2025
Merged

Fix datetime casting macro for ClickHouse#851
michael-myaskovsky merged 3 commits into
elementary-data:masterfrom
StevenReitsma:patch-1

Conversation

@StevenReitsma
Copy link
Copy Markdown
Contributor

@StevenReitsma StevenReitsma commented Sep 2, 2025

Previously, when a NULL value was encountered in the timestamp_field, the toString function would convert this NULL value to a Nullable(String). This would result in the split function not working properly, since it expects non-nullable input values. The datetime parsing can be simplified by using parseDateTimeBestEffortOrNull instead of doing manual parsing of the date, bypassing the issue of Nullable types entirely.

Summary by CodeRabbit

  • Bug Fixes

    • Improved timestamp parsing for ClickHouse sources: now uses best-effort, UTC-aware parsing to handle varied or unexpected formats and reduce ingestion failures.
    • Ensures a safe fallback to the Unix epoch in UTC when parsing fails.
  • Other

    • No changes to public interfaces or configuration required.

Previously, when a `NULL` value was encountered in the `timestamp_field`, the `toString` function would convert this NULL value to a `Nullable(String)`. This would result in the split function not working properly, since it expects non-nullable input values. The datetime parsing can be simplified by using `parseDateTimeBestEffortOrNull` instead of doing manual parsing of the date, bypassing the issue of `Nullable` types entirely.
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 2, 2025

Walkthrough

Replaced ClickHouse timestamp parsing in a macro: moved from slicing/splitting and toDateTime to parseDateTimeBestEffortOrNull(toString(...)) inside COALESCE and changed the fallback to toDateTime('1970-01-01 00:00:00', 'UTC'). Macro signature unchanged.

Changes

Cohort / File(s) Summary
ClickHouse timestamp casting
macros/utils/data_types/cast_column.sql
Replaced toDateTime(left(splitByChar('+', toString(...))[1], 19)) with parseDateTimeBestEffortOrNull(toString(...), 'UTC') inside COALESCE; updated fallback to toDateTime('1970-01-01 00:00:00', 'UTC'). No API/signature changes.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant C as Caller (dbt model)
    participant M as Macro: clickhouse__edr_cast_as_timestamp
    participant CH as ClickHouse

    C->>M: Provide timestamp_field
    M->>CH: parseDateTimeBestEffortOrNull(toString(timestamp_field), 'UTC')
    alt Parsed OK
        CH-->>M: Parsed Timestamp (UTC-aware)
        M-->>C: Parsed Timestamp
    else Parse fails (NULL)
        CH-->>M: NULL
        M->>M: COALESCE(NULL, toDateTime('1970-01-01 00:00:00', 'UTC'))
        M-->>C: 1970-01-01 00:00:00 (UTC)
    end
    note over M,CH: Replaced split/slice + toDateTime with best-effort UTC-aware parsing
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I nibble strings and chase the signs,
From brittle cuts to kinder lines.
If formats hide or parsers stall,
I map to UTC and catch the fall.
Hop—epoch steady, timestamps call. 🐇⌚️

✨ Finishing Touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Sep 2, 2025

👋 @StevenReitsma
Thank you for raising your pull request.
Please make sure to add tests and document all user-facing changes.
You can do this by editing the docs files in the elementary repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
macros/utils/data_types/cast_column.sql (2)

26-26: Prefer CAST over toString to avoid TZ-format surprises and keep types consistent.

Using CAST aligns with the rest of the macros and avoids toString’s server-TZ formatting for DateTime inputs. Functionally equivalent for strings/nullable strings.

Apply this diff:

-        parseDateTimeBestEffortOrNull(toString({{ timestamp_field }})),
+        parseDateTimeBestEffortOrNull(cast({{ timestamp_field }} as {{ elementary.edr_type_string() }})),

27-27: Consider explicit timezone on the epoch fallback.

toDateTime without TZ uses server settings; make it deterministic if you expect UTC everywhere.

-        toDateTime('1970-01-01 00:00:00')
+        toDateTime('1970-01-01 00:00:00', 'UTC')

If edr_type_timestamp resolves to DateTime64, you may want to switch the fallback to toDateTime64 with the appropriate scale.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between f04e3c3 and 1f68112.

📒 Files selected for processing (1)
  • macros/utils/data_types/cast_column.sql (1 hunks)
🔇 Additional comments (1)
macros/utils/data_types/cast_column.sql (1)

24-29: Good fix: safer parsing for Nullable inputs in ClickHouse.

Switching to parseDateTimeBestEffortOrNull avoids the Nullable(String) split failure and simplifies the logic. Keeps the epoch fallback behavior unchanged.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
macros/utils/data_types/cast_column.sql (3)

26-27: Align types to avoid implicit casts: prefer DateTime64 on both sides of COALESCE.

Depending on ClickHouse version, parseDateTimeBestEffortOrNull can yield DateTime64 while the fallback is DateTime. Let’s keep both expressions the same type/precision to prevent surprises and preserve sub-second precision.

Apply:

-        parseDateTimeBestEffortOrNull(toString({{ timestamp_field }}), 'UTC'),
-        toDateTime('1970-01-01 00:00:00', 'UTC')
+        parseDateTime64BestEffortOrNull(toString({{ timestamp_field }}), 3, 'UTC'),
+        toDateTime64('1970-01-01 00:00:00', 3, 'UTC')

If your ClickHouse version lacks parseDateTime64BestEffortOrNull, keep your current change but consider casting the first branch to DateTime to enforce consistency.


26-27: Optional: consider numeric epoch inputs (seconds/milliseconds).

If timestamp_field can be numeric epochs, add an extra branch. Example (keeps logic simple for seconds):

-        parseDateTimeBestEffortOrNull(toString({{ timestamp_field }}), 'UTC'),
+        parseDateTime64BestEffortOrNull(toString({{ timestamp_field }}), 3, 'UTC'),
+        -- Optional: handle integer epoch seconds
+        toDateTime64OrNull(CAST({{ timestamp_field }} AS Int64), 3, 'UTC'),
         toDateTime64('1970-01-01 00:00:00', 3, 'UTC')

If milliseconds are possible, you’ll need a scale check (e.g., length or threshold) and divide by 1000 before toDateTime64.


24-29: Add unit/SQL tests for edge cases.

Recommend cases: NULL, plain ISO8601 (with/without Z), with offset (+02:00), with fractional seconds, already-typed DateTime/DateTime64, invalid string, and integer epoch (if applicable).

I can add a dbt test macro + fixtures for ClickHouse covering these.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 1f68112 and 287a8b7.

📒 Files selected for processing (1)
  • macros/utils/data_types/cast_column.sql (1 hunks)
🔇 Additional comments (2)
macros/utils/data_types/cast_column.sql (2)

24-29: Good fix: removes fragile split and handles NULLs via best-effort parsing.

This simplification avoids the Nullable(String) + split failure and is more robust across ISO8601 variants. Nice.


26-27: Confirm timezone semantics are desired.

Passing 'UTC' normalizes timestamps to UTC even when the source lacks/has a different TZ. Validate this aligns with downstream expectations (previous code implicitly used server TZ for toDateTime fallback).

@michael-myaskovsky
Copy link
Copy Markdown
Contributor

Approved! Thank you for your contribution!

@michael-myaskovsky michael-myaskovsky merged commit 945d310 into elementary-data:master Sep 8, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants