Replies: 2 comments 3 replies
-
|
I am relatively new to DQX, and others might have a more suitable answer, but the new column for errors and warnings actually contains a JSON array object with information about each rule which failed. Guess you can use JSON SQL query if you are looking for a specific check. See here for some JSON SQL examples https://docs.databricks.com/aws/en/semi-structured/json. For the dashboard we also created an additional dataset with the check results, which has one row per check error or warning. We are using "explode" function https://sparkbyexamples.com/pyspark/pyspark-explode-array-and-map-columns-to-rows/ |
Beta Was this translation helpful? Give feedback.
-
|
If you apply checks without a split, you can get valid rows by filtering records that have _warning or _error column not null. Vice versa to get invalid records you can use: We are also adding summary table, where all summary stats from checks from all table will be centralized in a single table. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
We have recently started using the DQX library which is very intuitive and helpful for building our data quality checks on databricks. Based on the documentation when a record fails for a DQ rule (and also based on the given criticality) the _errors and/or _warnings columns are being populated with the corresponding DQ Rules that fail.
Is it possible to also identify/label the rows that are not failing for a set of rules that have been applied. If yes can you please provide me some guidance on how we can achieve this through the DQX library ?
Beta Was this translation helpful? Give feedback.
All reactions