Add integration tests for data_freshness_sla and volume_threshold by devin-ai-integration[bot] · Pull Request #965 · elementary-data/dbt-data-reliability

devin-ai-integration · 2026-03-12T13:48:43Z

Add integration tests for data_freshness_sla and volume_threshold

Summary

Adds integration test coverage for the two new Elementary tests introduced in this PR branch: data_freshness_sla and volume_threshold.

test_data_freshness_sla.py (6 tests):

Fresh data passes, stale data fails, no-data fails
Deadline-not-yet-passed does not fail
where_expression filtering
Non-UTC timezone handling

test_volume_threshold.py (8 tests):

Stable volume passes
Large spike/drop triggers failure
direction parameter (spike-only, drop-only)
min_row_count skips small baselines
Custom warn/error thresholds
where_expression filtering

Both files follow existing integration test conventions (pytest fixtures, DbtProject.test(), generate_dates, etc.).

Review & Testing Checklist for Human

Time-sensitive flakiness: test_stale_data_fails uses sla_time: "00:01" UTC — will break if CI runs between 00:00–00:01 UTC. test_deadline_not_passed_does_not_fail uses Etc/GMT-14 to push deadline into the future — verify this works with pytz in the SLA macro.
test_custom_thresholds warn assertion: Asserts status == "warn" for an 8% change. This depends on the fail_calc='max(severity_level)' / warn_if='>=1' / error_if='>=2' config in the macro — verify this produces a "warn" (not "fail" or "pass") in elementary_test_results.
Tests were not executed against any warehouse — they were written by reading existing test patterns and the macro SQL. Run at least the Postgres suite (pytest test_data_freshness_sla.py test_volume_threshold.py -vvv --target postgres) to validate.
Multi-step tests reuse test_id without re-seeding: In test_with_where_expression and test_custom_thresholds, the second assertion relies on data persisting from the first seed. Verify this works with the test infrastructure.
Missing coverage: day_of_week/day_of_month scheduling for data_freshness_sla is not tested. Consider if this is acceptable or needs follow-up.

Notes

Requested by: @haritamar
Devin Session

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration · 2026-03-12T13:48:49Z

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

Disable automatic comment and CI monitoring

coderabbitai · 2026-03-12T13:48:55Z

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 4230a797-7c8d-47b2-a6fe-aefe4f566364

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch devin/1773323122-add-integration-tests-sla-volume-threshold

📝 Coding Plan for PR comments

Generate coding plan

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

* Add data_freshness_sla and volume_threshold tests Add two new Elementary tests: - data_freshness_sla: checks if data was updated before a specified SLA deadline - volume_threshold: monitors row count changes with configurable warn/error thresholds, using Elementary's metric caching to avoid redundant computation Fixes applied: - volume_threshold: union historical metrics with new metrics for comparison - volume_threshold: deterministic dedup with source_priority tiebreaker - volume_threshold: let get_time_bucket handle defaults - data_freshness_sla: treat future-dated data as fresh (remove upper bound) - data_freshness_sla: escape single quotes in where_expression - data_freshness_sla: simplify deadline_passed logic - data_freshness_sla: rename max_timestamp_utc to max_timestamp (no UTC conversion) - data_freshness_sla: fix macro comment to match actual behavior - data_freshness_sla: document UTC assumption, add ephemeral model check Co-authored-by: Cursor <cursoragent@cursor.com> * fix: handle missing table in read_table when raise_if_empty=False (BigQuery test_seed_run_results) Co-authored-by: Cursor <cursoragent@cursor.com> * Revert "fix: handle missing table in read_table when raise_if_empty=False (BigQuery test_seed_run_results)" This reverts commit 9fc552e. * Add integration tests for data_freshness_sla and volume_threshold (#965) * Add integration tests for data_freshness_sla and volume_threshold Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix sla_time YAML sexagesimal issue - use AM/PM format Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> --------- Co-authored-by: Devin AI <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Itamar Hartstein <haritamar@gmail.com> * Fix sqlfmt issues and Postgres round() bug in test macros - Reformat both test_data_freshness_sla.sql and test_volume_threshold.sql to pass sqlfmt - Fix Postgres round() bug: cast expression to numeric for round(numeric, int) compatibility - Restructure Jinja conditionals in data_freshness_sla to be sqlfmt-compatible - Extract where_suffix Jinja set to avoid parentheses inside SQL CASE expressions - Use edr_boolean_literal for is_failure CASE and WHERE clause - Remove 'where severity_level > 0' filter to prevent NULL fail_calc validation error Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix volume_threshold integration tests to use complete buckets Elementary only processes complete time buckets (the latest full bucket before run_started_at). With daily buckets, today's bucket is incomplete and gets excluded. Tests were putting anomalous data in today's bucket, so the macro never saw the spike/drop. Fix: shift all test data one day back so the anomalous data lands in yesterday's bucket (the latest complete bucket) and baseline data in the day before. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix cross-database compatibility in volume_threshold macro - Replace 'cast(... as numeric)' with edr_type_numeric() for adapter-aware type - Replace 'limit 1' with row_number() pattern (SQL Server/Fabric compatible) - Replace '||' string concat with edr_concat() (SQL Server/Fabric compatible) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix Dremio: rename 'prev' alias to 'prev_b' (reserved keyword) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix Dremio: rename 'result' CTE to 'volume_result' (reserved keyword) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix fusion pytz issue and SQL Server/Fabric concat in data_freshness_sla - Skip pytz.all_timezones validation in dbt-fusion (known discrepancy, dbt-labs/dbt-fusion#143) - Replace || concatenation with edr_concat() in test_data_freshness_sla.sql for SQL Server/Fabric Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix fusion pytz.localize() producing incorrect results in calculate_sla_deadline_utc In dbt-fusion, pytz.localize() produces incorrect timezone-aware datetimes, causing deadline_passed to be False when it should be True. Use datetime.timezone.utc and replace(tzinfo=) in the fusion path instead. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix fusion: use naive UTC datetimes to avoid broken tz-aware operations In dbt-fusion, datetime.timezone.utc doesn't exist and timezone-aware datetime comparison produces incorrect results. Use datetime.utcnow() with manual offset calculation so all comparisons use naive datetimes. Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Fix Fabric, BigQuery, and fusion failures in volume_threshold and data_freshness_sla - Fabric: rename CTE current_bucket -> curr_bucket (SQL Server rejects CURRENT_ prefix) - BigQuery: use \' instead of '' to escape single quotes in where_suffix string literals - fusion: replace pytz.localize() with stdlib datetime.timezone.utc in calculate_sla_deadline_utc to fix deadline_passed always being False Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix sqlfmt formatting in test_data_freshness_sla.sql Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix Fabric: replace scalar subquery with JOIN in previous_bucket CTE SQL Server/Fabric cannot resolve a CTE name inside a nested subquery within another CTE. Replace (select bucket_start from curr_bucket) with an INNER JOIN to curr_bucket. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix sqlfmt formatting in test_volume_threshold.sql Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix freshness SLA deadline check and volume threshold Fabric compatibility Move SLA deadline_passed check from compile-time Python (broken in dbt-fusion due to pytz issues) to SQL-level using edr_condition_as_boolean with edr_current_timestamp_in_utc. Replace cross-CTE subquery in volume_threshold with LAG window function to fix Fabric/SQL Server "Invalid object name" error. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert fusion path to use naive UTC datetimes instead of datetime.timezone.utc dbt-fusion does not support datetime.timezone.utc, causing "undefined value" render errors. Revert to the pytz probe approach. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Address CodeRabbit review feedback - Fix clock-flaky SLA tests: use Etc/GMT-14 for deadline-passed cases so the deadline is always in the past regardless of CI runner time - Use exact status assertions (== "error") instead of != "pass" in volume threshold tests to catch severity regressions - Add negative threshold and min_row_count validation in volume_threshold - Document DST limitation in fusion sla_utils path Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix volume threshold assertions: dbt reports 'fail' not 'error' Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Fix ClickHouse and DuckDB CI failures ClickHouse: reorder columns in freshness SLA final_result CTE so is_failure is computed before sla_deadline_utc is cast to string (ClickHouse resolves column refs to already-aliased columns in the same SELECT, causing DateTime vs String type mismatch). DuckDB: add explicit casts to LAG window function results in volume_threshold to work around DuckDB internal binder bug that confuses TIMESTAMP and FLOAT types across multiple LAG calls. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix ClickHouse freshness SLA: rename string-cast alias to avoid shadowing ClickHouse resolves column references against output aliases regardless of SELECT clause order. The cast(sla_deadline_utc as string) with the same alias name caused the is_failure comparison to use the string version instead of the timestamp, producing DateTime vs String type mismatch. Renamed to sla_deadline_utc_str internally and re-aliased in the final SELECT. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix DuckDB volume_threshold: replace LAG with ROW_NUMBER + self-join DuckDB has an internal binder bug where LAG window functions over UNION ALL sources confuse TIMESTAMP and FLOAT column types, causing "Failed to bind column reference bucket_end: inequal types". Using ROW_NUMBER + self-join achieves the same result without triggering the bug, and is cross-database compatible. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix fusion: use pytz.utc instead of datetime.timezone.utc; fix brittle test timezone - sla_utils.sql: replace datetime.timezone.utc with pytz.utc in fusion path. dbt-fusion's modules.datetime does not expose datetime.timezone, causing dbt1501 "undefined value" errors in all data_freshness_sla tests on fusion. - test_data_freshness_sla.py: change test_deadline_not_passed_does_not_fail from Etc/GMT-14 (UTC+14, deadline = 09:59 UTC) to plain UTC (deadline = 23:59 UTC). Etc/GMT-14 caused the test to fail whenever CI ran after 09:59 UTC. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix Fabric, ClickHouse, and Vertica failures in volume_threshold and freshness_sla volume_threshold (Fabric + ClickHouse): - Cast row_number() to signed int in bucket_numbered CTE to fix ClickHouse NO_COMMON_TYPE error (UInt64 vs Int64) on JOIN with bucket_num - 1. - Move max(bucket_num) into a separate max_bucket CTE and use INNER JOIN instead of a scalar subquery in the WHERE clause to fix SQL Server / Fabric "Invalid object name 'bucket_numbered'" error. data_freshness_sla (Vertica): - Replace timezone: "Etc/GMT-14" with timezone: "UTC" for tests that need a deadline in the past. Etc/GMT-14 behaved incorrectly in some pytz versions, causing Vertica tests to return 'pass' instead of 'fail'. 12:01am UTC (= 00:01 UTC) is always in the past when CI runs at 07:00+ UTC. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * Fix Dremio: rename 'prev' alias to 'prev_b' (reserved keyword in Dremio) Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> * Address CodeRabbit review comments: clarify DATA_FRESH semantics, handle zero-baseline spikes, add timestamp_column validation to volume_threshold Co-Authored-By: Itamar Hartstein <haritamar@gmail.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Itamar Hartstein <haritamar@gmail.com> Co-authored-by: devin-ai-integration[bot] <158243242+devin-ai-integration[bot]@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

Add integration tests for data_freshness_sla and volume_threshold

f67e562

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

devin-ai-integration bot assigned haritamar Mar 12, 2026

Fix sla_time YAML sexagesimal issue - use AM/PM format

87bde35

Co-Authored-By: Itamar Hartstein <haritamar@gmail.com>

haritamar merged commit be7de95 into feature/volume-threshold-test Mar 12, 2026
1 of 2 checks passed

haritamar deleted the devin/1773323122-add-integration-tests-sla-volume-threshold branch March 12, 2026 14:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add integration tests for data_freshness_sla and volume_threshold#965

Add integration tests for data_freshness_sla and volume_threshold#965
haritamar merged 2 commits intofeature/volume-threshold-testfrom
devin/1773323122-add-integration-tests-sla-volume-threshold

devin-ai-integration bot commented Mar 12, 2026

Uh oh!

devin-ai-integration bot commented Mar 12, 2026

Uh oh!

coderabbitai bot commented Mar 12, 2026 •

edited

Loading

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

devin-ai-integration bot commented Mar 12, 2026

Add integration tests for data_freshness_sla and volume_threshold

Summary

Review & Testing Checklist for Human

Notes

Uh oh!

devin-ai-integration bot commented Mar 12, 2026

🤖 Devin AI Engineer

Uh oh!

coderabbitai bot commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai bot commented Mar 12, 2026 •

edited

Loading