Skip to content

Commit 4121a3f

Browse files
authored
Merge pull request #2119 from elementary-data/add-sla-and-volume-threshold-test-docs
Clarify anomaly detection methods and execution vs data freshness
2 parents e37213b + a4e28cf commit 4121a3f

3 files changed

Lines changed: 14 additions & 10 deletions

File tree

docs/data-tests/data-freshness-sla.mdx

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ import AiGenerateTest from '/snippets/ai-generate-test.mdx';
1212

1313
Verifies that data in a model was updated before a specified SLA deadline time.
1414

15-
This test checks the maximum timestamp value of a specified column in your data to determine whether the data was refreshed before your deadline. Unlike `freshness_anomalies` (which uses ML-based anomaly detection), this test validates against a fixed, explicit SLA time — making it ideal when you have a concrete contractual or operational deadline.
15+
This test checks the maximum timestamp value of a specified column in your data to determine whether the data was actually refreshed before your deadline. Unlike `freshness_anomalies` (which uses z-score based anomaly detection as a dbt test, or ML-based detection in Elementary Cloud), this test validates against a fixed, explicit SLA time, making it ideal when you have a concrete contractual or operational deadline.
16+
17+
Unlike `execution_sla` (which only checks if the dbt model _ran_ on time), `data_freshness_sla` checks whether the actual _data_ is fresh. A pipeline can run successfully but still serve stale data if, for example, an upstream source didn't update. This test catches that.
1618

1719
### Use Case
1820

@@ -131,9 +133,9 @@ models:
131133

132134
| Feature | `data_freshness_sla` | `freshness_anomalies` | `execution_sla` |
133135
| --- | --- | --- | --- |
134-
| What it checks | Data timestamps | Data timestamps | Pipeline run time |
135-
| Detection method | Fixed SLA deadline | ML-based anomaly detection | Fixed SLA deadline |
136-
| Best for | Contractual/operational deadlines | Detecting unexpected delays | Pipeline execution deadlines |
136+
| What it checks | Actual data freshness (timestamps in the data) | Actual data freshness (timestamps in the data) | Pipeline execution (did the model run?) |
137+
| Detection method | Fixed SLA deadline | Z-score (dbt test) / ML (Cloud) | Fixed SLA deadline |
138+
| Best for | Contractual/operational deadlines on data | Detecting unexpected delays in data updates | Ensuring the pipeline itself ran on time |
137139
| Works with sources | Yes | Yes | No (models only) |
138140

139141
### Notes

docs/data-tests/execution-sla.mdx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,9 @@ import AiGenerateTest from '/snippets/ai-generate-test.mdx';
1212

1313
Verifies that dbt models are executed successfully before a specified SLA deadline time.
1414

15-
This test checks whether your pipeline completed before a specified deadline on the days you care about. It queries `dbt_run_results` for successful runs of the model and validates that at least one run completed before the SLA deadline.
15+
This test checks whether your pipeline _ran_ before a specified deadline on the days you care about. It queries `dbt_run_results` for successful runs of the model and validates that at least one run completed before the SLA deadline.
16+
17+
Note that this test only verifies that the model executed, not that the data is actually fresh. If you need to verify that the underlying data was updated (e.g., an upstream source refreshed), use [`data_freshness_sla`](/data-tests/data-freshness-sla) instead.
1618

1719
### Use Case
1820

docs/data-tests/volume-threshold.mdx

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ import AiGenerateTest from '/snippets/ai-generate-test.mdx';
1212

1313
Monitors row count changes between time buckets using configurable percentage thresholds with multiple severity levels.
1414

15-
Unlike `volume_anomalies` (which uses ML-based anomaly detection to determine what's "normal"), this test lets you define explicit percentage thresholds for warnings and errors giving you precise control over when to be alerted. It uses Elementary's metric caching infrastructure to avoid recalculating row counts for buckets that have already been computed.
15+
Unlike `volume_anomalies` (which uses z-score based anomaly detection as a dbt test, or ML-based detection in Elementary Cloud), this test lets you define explicit percentage thresholds for warnings and errors, giving you precise control over when to be alerted. It uses Elementary's metric caching infrastructure to avoid recalculating row counts for buckets that have already been computed.
1616

1717
### Use Case
1818

@@ -124,12 +124,12 @@ models:
124124

125125
| Parameter | Required | Default | Description |
126126
| ------------------------- | -------- | ------- | ---------------------------------------------------------------------------- |
127-
| `timestamp_column` | Yes | | Column to determine time periods |
127+
| `timestamp_column` | Yes | - | Column to determine time periods |
128128
| `warn_threshold_percent` | No | 5 | Percentage change that triggers a warning |
129129
| `error_threshold_percent` | No | 10 | Percentage change that triggers an error |
130130
| `direction` | No | `both` | Direction to monitor: `both`, `spike`, or `drop` |
131131
| `time_bucket` | No | `{period: day, count: 1}` | Time bucket configuration |
132-
| `where_expression` | No | | SQL expression to filter the data |
132+
| `where_expression` | No | - | SQL expression to filter the data |
133133
| `days_back` | No | 14 | Days of metric history to retain |
134134
| `backfill_days` | No | 2 | Days to recalculate on each run |
135135
| `min_row_count` | No | 100 | Minimum rows in the previous bucket required to trigger the check |
@@ -138,7 +138,7 @@ models:
138138

139139
| Feature | `volume_threshold` | `volume_anomalies` |
140140
| --- | --- | --- |
141-
| Detection method | Fixed percentage thresholds | ML-based anomaly detection |
141+
| Detection method | Fixed percentage thresholds | Z-score (dbt test) / ML (Cloud) |
142142
| Severity levels | Dual (warn + error) | Single (pass/fail) |
143143
| Best for | Known acceptable ranges | Unknown/variable patterns |
144144
| Configuration | Explicit thresholds | Sensitivity tuning |
@@ -147,6 +147,6 @@ models:
147147
### Notes
148148

149149
- The `warn_threshold_percent` must be less than or equal to `error_threshold_percent`
150-
- The test uses Elementary's metric caching infrastructure — row counts for previously computed time buckets are reused across runs
150+
- The test uses Elementary's metric caching infrastructure. Row counts for previously computed time buckets are reused across runs
151151
- If the previous bucket has fewer rows than `min_row_count`, the test passes (insufficient data for a meaningful comparison)
152152
- The test only evaluates completed time buckets

0 commit comments

Comments
 (0)