Skip to content

feat: add has_no_aggr_outliers stateless rolling-window sigma outlier check#1118

Merged
mwojtyczka merged 9 commits intodatabrickslabs:mainfrom
vpottam-nvidia:feature/is-aggr-not-anomalous
Apr 28, 2026
Merged

feat: add has_no_aggr_outliers stateless rolling-window sigma outlier check#1118
mwojtyczka merged 9 commits intodatabrickslabs:mainfrom
vpottam-nvidia:feature/is-aggr-not-anomalous

Conversation

@vpottam-nvidia
Copy link
Copy Markdown
Contributor

@vpottam-nvidia vpottam-nvidia commented Apr 21, 2026

Adds has_no_aggr_outliers, a new dataset-level quality check that detects outliers in time-series aggregates using a stateless rolling-window sigma method.

Tests

  • manually tested
  • added unit tests
  • added integration tests
  • added end-to-end tests
  • added performance tests

Documentation and Demos

  • added/updated docs
  • added/updated demos

@vpottam-nvidia vpottam-nvidia requested a review from a team as a code owner April 21, 2026 00:52
@vpottam-nvidia vpottam-nvidia requested review from nehamilak-db and removed request for a team April 21, 2026 00:52
@github-actions
Copy link
Copy Markdown
Contributor

All commits in PR should be signed ('git commit -S ...'). See https://docs.github.com/en/authentication/managing-commit-signature-verification/signing-commits

@vpottam-nvidia vpottam-nvidia force-pushed the feature/is-aggr-not-anomalous branch from 9690920 to b987f48 Compare April 21, 2026 01:16
Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @vpottam-nvidia — this is a nicely scoped, stateless counterpart to the existing is_aggr_* checks and it's great to see it paired with both unit and integration tests.

A couple of housekeeping items:

  1. Please sign your commits — DQX requires signed commits per the First Contribution guide. You'll need to enable git config --global commit.gpgsign true (or equivalent) and force-push the re-signed history.
  2. Please remove the CHANGELOG.md entry — the changelog is generated automatically during release from PR titles/labels, so manual entries create merge conflicts and get regenerated anyway.
  3. Please address the inline review comments

Comment thread CHANGELOG.md Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread tests/integration/test_has_no_aggr_outliers.py
@mwojtyczka mwojtyczka added the under-review This PR is currently being reviewed by one of DQX maintainers. label Apr 21, 2026
ghanse
ghanse previously requested changes Apr 21, 2026
Copy link
Copy Markdown
Collaborator

@ghanse ghanse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for submitting! Requested a few changes.

Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
@vpottam-nvidia vpottam-nvidia force-pushed the feature/is-aggr-not-anomalous branch 4 times, most recently from 98ca5e2 to 581c37d Compare April 21, 2026 17:42
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread tests/integration/test_is_aggr_not_anomalous.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py
Comment thread docs/dqx/docs/reference/quality_checks.mdx Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
Comment thread src/databricks/labs/dqx/check_funcs.py Outdated
- Implements has_no_aggr_outliers (dataset-level check): rolling-window
  sigma outlier detection for time-series aggregates using pure PySpark.
  Replaces the earlier is_aggr_not_anomalous name per reviewer request.
- Validates time_column is date/timestamp type (raises InvalidParameterError
  instead of cryptic Spark error at query time).
- Normalizes group_by column names in alias via get_column_name_or_alias(normalize=True).
- Adds 15 integration tests and full unit test coverage.
- Adds has_no_aggr_outliers to all_dataset_checks.yaml (exercises
  apply_checks_by_metadata end-to-end per reviewer request).
- Adds test_benchmark_has_no_aggr_outliers perf test.
- Updates quality_checks.mdx reference documentation.
@vpottam-nvidia vpottam-nvidia force-pushed the feature/is-aggr-not-anomalous branch from 2789070 to e6c0da3 Compare April 24, 2026 18:43
@mwojtyczka mwojtyczka changed the title feat: add is_aggr_not_anomalous stateless rolling-window sigma anomaly check feat: add has_no_aggr_outliers stateless rolling-window sigma anomaly check Apr 27, 2026
@mwojtyczka mwojtyczka changed the title feat: add has_no_aggr_outliers stateless rolling-window sigma anomaly check feat: add has_no_aggr_outliers stateless rolling-window sigma outlier check Apr 27, 2026
Comment thread docs/dqx/docs/reference/quality_checks.mdx Outdated
Comment thread docs/dqx/docs/reference/quality_checks.mdx Outdated
Co-authored-by: Marcin Wojtyczka <marcin.wojtyczka@databricks.com>
Co-authored-by: Marcin Wojtyczka <marcin.wojtyczka@databricks.com>
Copy link
Copy Markdown
Contributor

@mwojtyczka mwojtyczka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mwojtyczka mwojtyczka requested a review from ghanse April 27, 2026 11:24
@mwojtyczka mwojtyczka dismissed ghanse’s stale review April 28, 2026 07:57

feedback implemented

@mwojtyczka mwojtyczka merged commit 9fb35f6 into databrickslabs:main Apr 28, 2026
37 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Approved to Merge When PR is reviewed and approved. To be merged once all tests pass

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants