feat: add has_no_aggr_outliers stateless rolling-window sigma outlier check#1118
Merged
mwojtyczka merged 9 commits intodatabrickslabs:mainfrom Apr 28, 2026
Merged
Conversation
Contributor
|
All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits |
9690920 to
b987f48
Compare
mwojtyczka
reviewed
Apr 21, 2026
Contributor
There was a problem hiding this comment.
Thanks for the contribution @vpottam-nvidia — this is a nicely scoped, stateless counterpart to the existing is_aggr_* checks and it's great to see it paired with both unit and integration tests.
A couple of housekeeping items:
- Please sign your commits — DQX requires signed commits per the First Contribution guide. You'll need to enable
git config --global commit.gpgsign true(or equivalent) and force-push the re-signed history. - Please remove the CHANGELOG.md entry — the changelog is generated automatically during release from PR titles/labels, so manual entries create merge conflicts and get regenerated anyway.
- Please address the inline review comments
mwojtyczka
requested changes
Apr 21, 2026
ghanse
previously requested changes
Apr 21, 2026
Collaborator
ghanse
left a comment
There was a problem hiding this comment.
Thanks for submitting! Requested a few changes.
98ca5e2 to
581c37d
Compare
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
requested changes
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
mwojtyczka
reviewed
Apr 22, 2026
- Implements has_no_aggr_outliers (dataset-level check): rolling-window sigma outlier detection for time-series aggregates using pure PySpark. Replaces the earlier is_aggr_not_anomalous name per reviewer request. - Validates time_column is date/timestamp type (raises InvalidParameterError instead of cryptic Spark error at query time). - Normalizes group_by column names in alias via get_column_name_or_alias(normalize=True). - Adds 15 integration tests and full unit test coverage. - Adds has_no_aggr_outliers to all_dataset_checks.yaml (exercises apply_checks_by_metadata end-to-end per reviewer request). - Adds test_benchmark_has_no_aggr_outliers perf test. - Updates quality_checks.mdx reference documentation.
2789070 to
e6c0da3
Compare
mwojtyczka
reviewed
Apr 27, 2026
mwojtyczka
reviewed
Apr 27, 2026
Co-authored-by: Marcin Wojtyczka <marcin.wojtyczka@databricks.com>
Co-authored-by: Marcin Wojtyczka <marcin.wojtyczka@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds
has_no_aggr_outliers, a new dataset-level quality check that detects outliers in time-series aggregates using a stateless rolling-window sigma method.Tests
Documentation and Demos