Skip to content

Commit efd188b

Browse files
author
ci bot
committed
Merge branch 'aarthy/fixes' into 'enterprise'
feat(monitors): allow filtering by anomaly types and show more history See merge request dkinternal/testgen/dataops-testgen!410
2 parents b930b69 + 74f6a21 commit efd188b

34 files changed

Lines changed: 906 additions & 448 deletions

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
# DataOps Data Quality TestGen
2-
![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F&query=results%5B0%5D.name&label=latest%20version&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F&query=pull_count&style=flat&label=docker%20pulls&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/#!dataops-testgen-help/dataops-testgen-help) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat&logo=slack)](https://data-observability-slack.datakitchen.io/join)
2+
![apache 2.0 license Badge](https://img.shields.io/badge/License%20-%20Apache%202.0%20-%20blue) ![PRs Badge](https://img.shields.io/badge/PRs%20-%20Welcome%20-%20green) [![Latest Version](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2Ftags%2F&query=results%5B0%5D.name&label=latest%20version&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Docker Pulls](https://img.shields.io/badge/dynamic/json?url=https%3A%2F%2Fhub.docker.com%2Fv2%2Frepositories%2Fdatakitchen%2Fdataops-testgen%2F&query=pull_count&style=flat&label=docker%20pulls&color=06A04A)](https://hub.docker.com/r/datakitchen/dataops-testgen) [![Documentation](https://img.shields.io/badge/docs-On%20datakitchen.io-06A04A?style=flat)](https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help) [![Static Badge](https://img.shields.io/badge/Slack-Join%20Discussion-blue?style=flat&logo=slack)](https://data-observability-slack.datakitchen.io/join)
33

44
*<p style="text-align: center;">DataOps Data Quality TestGen, or "TestGen" for short, can help you find data issues so you can alert your users and notify your suppliers. It does this by delivering simple, fast data quality test generation and execution by data profiling, new dataset screening and hygiene review, algorithmic generation of data quality validation tests, ongoing production testing of new data refreshes, and continuous anomaly monitoring of datasets. TestGen is part of DataKitchen's Open Source Data Observability.</p>*
55

@@ -110,7 +110,7 @@ Within the virtual environment, install the TestGen package with pip.
110110
pip install dataops-testgen
111111
```
112112

113-
Verify that the [_testgen_ command line](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-commands-and-details) works.
113+
Verify that the [_testgen_ command line](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-commands-and-details) works.
114114
```shell
115115
testgen --help
116116
```
@@ -187,7 +187,7 @@ python3 dk-installer.py tg delete-demo
187187

188188
### Upgrade to latest version
189189

190-
New releases of TestGen are announced on the `#releases` channel on [Data Observability Slack](https://data-observability-slack.datakitchen.io/join), and release notes can be found on the [DataKitchen documentation portal](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-release-notes/a/h1_1691719522). Use the following command to upgrade to the latest released version.
190+
New releases of TestGen are announced on the `#releases` channel on [Data Observability Slack](https://data-observability-slack.datakitchen.io/join), and release notes can be found on the [DataKitchen documentation portal](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-release-notes/a/h1_1691719522). Use the following command to upgrade to the latest released version.
191191

192192
```shell
193193
python3 dk-installer.py tg upgrade
@@ -203,7 +203,7 @@ python3 dk-installer.py tg delete
203203

204204
### Access the _testgen_ CLI
205205

206-
The [_testgen_ command line](https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-commands-and-details) can be accessed within the running container.
206+
The [_testgen_ command line](https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-commands-and-details) can be accessed within the running container.
207207

208208
```shell
209209
docker compose exec engine bash
@@ -232,7 +232,7 @@ We recommend you start by going through the [Data Observability Overview Demo](h
232232
For support requests, [join the Data Observability Slack](https://data-observability-slack.datakitchen.io/join) 👋 and post on the `#support` channel.
233233

234234
### Connect to your database
235-
Follow [these instructions](https://docs.datakitchen.io/articles/#!dataops-testgen-help/connect-your-database) to improve the quality of data in your database.
235+
Follow [these instructions](https://docs.datakitchen.io/articles/dataops-testgen-help/connect-your-database) to improve the quality of data in your database.
236236

237237
### Community
238238
Talk and learn with other data practitioners who are building with DataKitchen. Share knowledge, get help, and contribute to our open-source project.

pyproject.toml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -99,8 +99,8 @@ tg-patch-streamlit = "testgen.ui.scripts.patch_streamlit:patch"
9999
[project.urls]
100100
"Source Code" = "https://github.com/DataKitchen/dataops-testgen"
101101
"Bug Tracker" = "https://github.com/DataKitchen/dataops-testgen/issues"
102-
"Documentation" = "https://docs.datakitchen.io/articles/#!dataops-testgen-help/dataops-testgen-help"
103-
"Release Notes" = "https://docs.datakitchen.io/articles/#!dataops-testgen-help/testgen-release-notes"
102+
"Documentation" = "https://docs.datakitchen.io/articles/dataops-testgen-help/dataops-testgen-help"
103+
"Release Notes" = "https://docs.datakitchen.io/articles/dataops-testgen-help/testgen-release-notes"
104104
"Slack" = "https://data-observability-slack.datakitchen.io/join"
105105
"Homepage" = "https://example.com"
106106

testgen/commands/test_thresholds_prediction.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from typing import ClassVar
44

55
import pandas as pd
6+
from scipy import stats
67

78
from testgen.common.database.database_service import (
89
execute_db_queries,
@@ -29,13 +30,14 @@ class TestThresholdsPrediction:
2930
"prediction",
3031
)
3132
num_forecast = 10
33+
t_distribution_threshold = 20
3234
z_score_map: ClassVar = {
33-
("lower_tolerance", PredictSensitivity.low): -2.0, # 2.5th percentile
34-
("lower_tolerance", PredictSensitivity.medium): -1.5, # 7th percentile
35-
("lower_tolerance", PredictSensitivity.high): -1.0, # 16th percentile
36-
("upper_tolerance", PredictSensitivity.high): 1.0, # 84th percentile
37-
("upper_tolerance", PredictSensitivity.medium): 1.5, # 93rd percentile
38-
("upper_tolerance", PredictSensitivity.low): 2.0, # 97.5th percentile
35+
("lower_tolerance", PredictSensitivity.low): -3.0, # 0.13th percentile
36+
("lower_tolerance", PredictSensitivity.medium): -2.5, # 0.62nd percentile
37+
("lower_tolerance", PredictSensitivity.high): -2.0, # 2.3rd percentile
38+
("upper_tolerance", PredictSensitivity.high): 2.0, # 97.7th percentile
39+
("upper_tolerance", PredictSensitivity.medium): 2.5, # 99.4th percentile
40+
("upper_tolerance", PredictSensitivity.low): 3.0, # 99.87th percentile
3941
}
4042

4143
def __init__(self, test_suite: TestSuite, run_date: datetime):
@@ -71,9 +73,15 @@ def run(self) -> None:
7173
] if self.test_suite.predict_holiday_codes else None,
7274
)
7375

76+
num_points = len(history)
7477
for key, z_score in self.z_score_map.items():
78+
if num_points < self.t_distribution_threshold:
79+
percentile = stats.norm.cdf(z_score)
80+
multiplier = stats.t.ppf(percentile, df=num_points - 1)
81+
else:
82+
multiplier = z_score
7583
column = f"{key[0]}|{key[1].value}"
76-
forecast[column] = forecast["mean"] + (z_score * forecast["se"])
84+
forecast[column] = forecast["mean"] + (multiplier * forecast["se"])
7785

7886
next_date = forecast.index[0]
7987
sensitivity = self.test_suite.predict_sensitivity or PredictSensitivity.medium

testgen/common/notifications/notifications.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -393,7 +393,7 @@ def get_body_template(self) -> str:
393393
</tr>
394394
<tr class="footer">
395395
<td>
396-
<a href="https://docs.datakitchen.io/articles/#!dataops-testgen-help/introduction-to-dataops-testgen"
396+
<a href="https://docs.datakitchen.io/articles/dataops-testgen-help/introduction-to-dataops-testgen"
397397
target="_blank">TestGen Help</a>
398398
</td>
399399
<td align="right">

testgen/template/data_chars/data_chars_update.sql

Lines changed: 45 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -17,20 +17,37 @@ WITH new_chars AS (
1717
schema_name,
1818
table_name,
1919
run_date
20+
),
21+
updated_records AS (
22+
UPDATE data_table_chars
23+
SET approx_record_ct = n.approx_record_ct,
24+
record_ct = n.record_ct,
25+
column_ct = n.column_ct,
26+
last_refresh_date = n.run_date,
27+
drop_date = NULL
28+
FROM new_chars n
29+
INNER JOIN data_table_chars d ON (
30+
n.table_groups_id = d.table_groups_id
31+
AND n.schema_name = d.schema_name
32+
AND n.table_name = d.table_name
33+
)
34+
WHERE data_table_chars.table_id = d.table_id
35+
RETURNING data_table_chars.*, d.drop_date as old_drop_date
36+
)
37+
INSERT INTO data_structure_log (
38+
table_groups_id,
39+
table_id,
40+
table_name,
41+
change_date,
42+
change
2043
)
21-
UPDATE data_table_chars
22-
SET approx_record_ct = n.approx_record_ct,
23-
record_ct = n.record_ct,
24-
column_ct = n.column_ct,
25-
last_refresh_date = n.run_date,
26-
drop_date = NULL
27-
FROM new_chars n
28-
INNER JOIN data_table_chars d ON (
29-
n.table_groups_id = d.table_groups_id
30-
AND n.schema_name = d.schema_name
31-
AND n.table_name = d.table_name
32-
)
33-
WHERE data_table_chars.table_id = d.table_id;
44+
SELECT u.table_groups_id,
45+
u.table_id,
46+
u.table_name,
47+
u.last_refresh_date,
48+
'A'
49+
FROM updated_records u
50+
WHERE u.old_drop_date IS NOT NULL;
3451

3552
-- Add new records
3653
WITH new_chars AS (
@@ -170,7 +187,7 @@ update_chars AS (
170187
)
171188
WHERE data_column_chars.table_id = d.table_id
172189
AND data_column_chars.column_name = d.column_name
173-
RETURNING data_column_chars.*, d.db_data_type as old_data_type
190+
RETURNING data_column_chars.*, d.db_data_type as old_data_type, d.drop_date as old_drop_date, n.run_date as run_date
174191
)
175192
INSERT INTO data_structure_log (
176193
table_groups_id,
@@ -193,7 +210,20 @@ SELECT u.table_groups_id,
193210
u.old_data_type,
194211
u.db_data_type
195212
FROM update_chars u
196-
WHERE u.old_data_type <> u.db_data_type;
213+
WHERE u.old_data_type <> u.db_data_type
214+
AND u.old_drop_date IS NULL
215+
UNION ALL
216+
SELECT u.table_groups_id,
217+
u.table_id,
218+
u.column_id,
219+
u.table_name,
220+
u.column_name,
221+
u.run_date,
222+
'A',
223+
NULL,
224+
u.db_data_type
225+
FROM update_chars u
226+
WHERE u.old_drop_date IS NOT NULL;
197227

198228

199229
-- Add new records

testgen/ui/components/frontend/js/components/help_menu.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,7 +23,7 @@ import { Icon } from './icon.js';
2323

2424
const { a, div, span } = van.tags;
2525

26-
const baseHelpUrl = 'https://docs.datakitchen.io/articles/#!dataops-testgen-help/';
26+
const baseHelpUrl = 'https://docs.datakitchen.io/articles/dataops-testgen-help/';
2727
const releaseNotesTopic = 'testgen-release-notes';
2828
const upgradeTopic = 'upgrade-testgen';
2929

testgen/ui/components/frontend/js/components/monitor_anomalies_summary.js

Lines changed: 55 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,13 @@
2121
* @property {number} lookback_end
2222
* @property {string?} project_code
2323
* @property {string?} table_group_id
24+
*
25+
* @typedef SummaryOptions
26+
* @type {object}
27+
* @property {function(string)?} onTagClick
28+
* @property {object?} activeTypes
2429
*/
25-
import { emitEvent } from '../utils.js';
30+
import { emitEvent, getValue, loadStylesheet } from '../utils.js';
2631
import { formatDuration, humanReadableDuration } from '../display_utils.js';
2732
import { withTooltip } from './tooltip.js';
2833
import van from '../van.min.js';
@@ -31,49 +36,63 @@ const { a, div, i, span } = van.tags;
3136

3237
/**
3338
* @param {MonitorSummary} summary
34-
* @param {any?} topLabel
39+
* @param {string?} label
40+
* @param {SummaryOptions?} options
3541
*/
36-
const AnomaliesSummary = (summary, label = 'Anomalies') => {
42+
const AnomaliesSummary = (summary, label = 'Anomalies', options = {}) => {
43+
loadStylesheet('anomalies-summary', summaryStylesheet);
44+
3745
if (!summary.lookback) {
3846
return span({class: 'text-secondary mt-3 mb-2'}, 'No monitor runs yet');
3947
}
4048

41-
const SummaryTag = (label, value, hasErrors, isTraining, isPending) => div(
42-
{class: 'flex-row fx-gap-1'},
43-
div(
44-
{class: `flex-row fx-justify-center anomaly-tag ${value > 0 ? 'has-anomalies' : hasErrors ? 'has-errors' : isTraining ? 'is-training' : isPending ? 'is-pending' : ''}`},
45-
value > 0
46-
? value
47-
: hasErrors
48-
? withTooltip(
49-
i({class: 'material-symbols-rounded'}, 'warning'),
50-
{text: 'Execution error', position: 'top-right'},
51-
)
52-
: isTraining
49+
const SummaryTag = (typeKey, tagLabel, value, hasErrors, isTraining, isPending) => {
50+
const isClickable = !!options.onTagClick;
51+
const isActive = van.derive(() => (getValue(options.activeTypes) ?? []).includes(typeKey));
52+
53+
return div(
54+
{
55+
class: () => `flex-row fx-gap-1 p-1 border-radius-1 summary-tag ${isClickable ? 'clickable' : ''} ${isActive.val ? 'active' : ''}`,
56+
onclick: isClickable ? (event) => {
57+
event.stopPropagation();
58+
options.onTagClick(typeKey);
59+
} : undefined,
60+
},
61+
div(
62+
{class: `flex-row fx-justify-center anomaly-tag ${value > 0 ? 'has-anomalies' : hasErrors ? 'has-errors' : isTraining ? 'is-training' : isPending ? 'is-pending' : ''}`},
63+
value > 0
64+
? value
65+
: hasErrors
5366
? withTooltip(
54-
i({class: 'material-symbols-rounded'}, 'more_horiz'),
55-
{text: 'Training model', position: 'top-right'},
67+
i({class: 'material-symbols-rounded'}, 'warning'),
68+
{text: 'Execution error', position: 'top-right'},
5669
)
57-
: isPending
70+
: isTraining
5871
? withTooltip(
59-
span({class: 'pl-2 pr-2', style: 'position: relative;'}, '-'),
60-
{text: 'No results yet or not configured'},
72+
i({class: 'material-symbols-rounded'}, 'more_horiz'),
73+
{text: 'Training model', position: 'top-right'},
6174
)
62-
: i({class: 'material-symbols-rounded'}, 'check'),
63-
),
64-
span({}, label),
65-
);
75+
: isPending
76+
? withTooltip(
77+
span({class: 'pl-2 pr-2', style: 'position: relative;'}, '-'),
78+
{text: 'No results yet or not configured'},
79+
)
80+
: i({class: 'material-symbols-rounded'}, 'check'),
81+
),
82+
span({}, tagLabel),
83+
);
84+
};
6685

6786
const numRuns = summary.lookback === 1 ? 'run' : `${summary.lookback} runs`;
6887
const duration = humanReadableDuration(formatDuration(summary.lookback_start, new Date()), true)
6988
const labelElement = span({class: 'text-small text-secondary'}, `${label} in last ${numRuns} (${duration})`);
7089

7190
const contentElement = div(
7291
{class: 'flex-row fx-gap-5'},
73-
SummaryTag('Freshness', summary.freshness_anomalies, summary.freshness_has_errors, summary.freshness_is_training, summary.freshness_is_pending),
74-
SummaryTag('Volume', summary.volume_anomalies, summary.volume_has_errors, summary.volume_is_training, summary.volume_is_pending),
75-
SummaryTag('Schema', summary.schema_anomalies, summary.schema_has_errors, false, summary.schema_is_pending),
76-
SummaryTag('Metrics', summary.metric_anomalies, summary.metric_has_errors, summary.metric_is_training, summary.metric_is_pending),
92+
SummaryTag('freshness', 'Freshness', summary.freshness_anomalies, summary.freshness_has_errors, summary.freshness_is_training, summary.freshness_is_pending),
93+
SummaryTag('volume', 'Volume', summary.volume_anomalies, summary.volume_has_errors, summary.volume_is_training, summary.volume_is_pending),
94+
SummaryTag('schema', 'Schema', summary.schema_anomalies, summary.schema_has_errors, false, summary.schema_is_pending),
95+
SummaryTag('metrics', 'Metrics', summary.metric_anomalies, summary.metric_has_errors, summary.metric_is_training, summary.metric_is_pending),
7796
);
7897

7998
if (summary.project_code && summary.table_group_id) {
@@ -96,4 +115,12 @@ const AnomaliesSummary = (summary, label = 'Anomalies') => {
96115
return div({class: 'flex-column fx-gap-2'}, labelElement, contentElement);
97116
};
98117

118+
const summaryStylesheet = new CSSStyleSheet();
119+
summaryStylesheet.replace(`
120+
.summary-tag.clickable:hover,
121+
.summary-tag.active {
122+
background: var(--select-hover-background);
123+
}
124+
`);
125+
99126
export { AnomaliesSummary };

testgen/ui/components/frontend/js/components/monitor_settings_form.js

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -286,9 +286,9 @@ const PredictionForm = (
286286
name: 'predict_sensitivity',
287287
label: 'Sensitivity',
288288
options: [
289-
{ label: 'Low', value: 'low', help: 'Fewer alerts. Flag values outside 2 standard deviations of predicted value.' },
290-
{ label: 'Medium', value: 'medium', help: 'Balanced. Flag values outside 1.5 standard deviations of predicted value.' },
291-
{ label: 'High', value: 'high', help: 'More alerts. Flag values outside 1 standard deviation of predicted value.' },
289+
{ label: 'Low', value: 'low', help: 'Fewer alerts. Flag values outside 3 standard deviations of predicted value.' },
290+
{ label: 'Medium', value: 'medium', help: 'Balanced. Flag values outside 2.5 standard deviations of predicted value.' },
291+
{ label: 'High', value: 'high', help: 'More alerts. Flag values outside 2 standard deviations of predicted value.' },
292292
],
293293
value: predictSensitivity,
294294
onChange: (value) => predictSensitivity.val = value,

0 commit comments

Comments
 (0)