Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
9e35d35
feat: add variable substitution support for check definitions
fedeflowers Mar 15, 2026
ecc09c2
Apply suggestions from code review
fedeflowers Mar 18, 2026
870afe6
add variables param in contracts apply_checks_by_metadata, apply_chec…
fedeflowers Mar 18, 2026
877e748
add change to parametrize variables from load_checks instead of apply…
fedeflowers Mar 19, 2026
0a1899f
Merge branch 'main' into feature/parametrization_rules
mwojtyczka Mar 20, 2026
a3a78de
Restore test_apply_checks_by_metadata_and_save_in_table_loads_checks_…
mwojtyczka Mar 20, 2026
b896032
add EXTRA_PARAMS compatibility, added unit and integrations tests for…
fedeflowers Mar 23, 2026
c1ff146
add test parametrization variables, checked col is missing and anothe…
fedeflowers Mar 23, 2026
b25a510
Merge branch 'main' into feature/parametrization_rules
mwojtyczka Mar 27, 2026
1213a01
Merge branch 'main' into feature/parametrization_rules
mwojtyczka Mar 30, 2026
100c9cf
fix tests for variable parametrization of core on load checks, revert…
fedeflowers Mar 31, 2026
e2e1f1e
fix reverted extra space on apply checks in table file
fedeflowers Mar 31, 2026
4e3a81d
add docs, fix overloading, deduplication of tests, removed integratio…
fedeflowers Apr 3, 2026
7fdd172
add docs for variable parametrization, fix dqx demo
fedeflowers Apr 5, 2026
a3a21a6
fix tests duplication
fedeflowers Apr 5, 2026
908873c
fix test readded test_extra_params_variables_substitution_and_overrid…
fedeflowers Apr 5, 2026
4446b52
add doc warnign and test with empty dictionary
fedeflowers Apr 6, 2026
8a9eb79
Update docs and fmt
ghanse Apr 7, 2026
be8c556
Merge branch 'main' into feature/parametrization_rules
mwojtyczka Apr 14, 2026
ffd982a
Apply suggestion from @mwojtyczka
mwojtyczka Apr 14, 2026
1eae942
Apply suggestions from code review
mwojtyczka Apr 14, 2026
a30271b
added tests
mwojtyczka Apr 14, 2026
07a0968
added vars resolution when saving checks and discourage using vars fo…
mwojtyczka Apr 14, 2026
55d4e1f
updated tests
mwojtyczka Apr 14, 2026
6b8687b
fix docs
mwojtyczka Apr 14, 2026
aa4883c
fixed ci
mwojtyczka Apr 14, 2026
b6fdb8c
fix CI
mwojtyczka Apr 14, 2026
5b12453
Merge branch 'main' into feature/parametrization_rules
mwojtyczka Apr 20, 2026
c45a260
Merge branch 'main' into feature/parametrization_rules
mwojtyczka Apr 21, 2026
77d2db4
Merge branch 'main' into feature/parametrization_rules
mwojtyczka Apr 21, 2026
6811258
fmt
mwojtyczka Apr 22, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 59 additions & 1 deletion demos/dqx_demo_library.py
Original file line number Diff line number Diff line change
Expand Up @@ -1481,4 +1481,62 @@ def safe_parse_json(col):

# explode warnings
warnings_df = valid_and_quarantine_df.select(F.explode(F.col("dq_warnings")).alias("dq")).select(F.expr("dq.*"))
display(warnings_df)
display(warnings_df)

# COMMAND ----------

# MAGIC %md
# MAGIC ## Advanced: Variable Substitution
# MAGIC
# MAGIC DQX supports variable substitution in declarative check definitions (YAML, JSON, or Delta tables).
# MAGIC This allows you to parameterize your rules and inject values at **load time** via the `variables` parameter in `load_checks`.
# MAGIC
# MAGIC ### Example Usage
# MAGIC
# MAGIC 1. Define a rule with `{{ placeholder }}` syntax.
# MAGIC 2. Pass a dictionary of variables when loading the rules.

# COMMAND ----------

from databricks.labs.dqx.config import WorkspaceFileChecksStorageConfig

# Save to a temporary file

# Define parameterized checks
parameterized_checks_yaml = """
- criticality: error
name: "threshold_check_{{ threshold_name }}"
check:
function: is_not_greater_than
arguments:
column: "{{ target_column }}"
limit: "{{ max_value }}"
"""

# Save to a temporary file
# demo_file_directory is defined at the beginning of this notebook
temp_checks_path = os.path.join(demo_file_directory, "parameterized_checks.yml")
with open(temp_checks_path, "w") as f:
f.write(parameterized_checks_yaml)

dq_engine = DQEngine(WorkspaceClient())

# Load checks with variable resolution
# Resolution happens during the load process
resolved_checks = dq_engine.load_checks(
config=WorkspaceFileChecksStorageConfig(location=temp_checks_path),
variables={
"threshold_name": "critical",
"target_column": "col1",
"max_value": 100
}
)

# The resolved checks now have the values injected
# Note: DQEngine internally converts string numbers to their appropriate types if needed during validation or apply
print(yaml.dump(resolved_checks))

# Apply the resolved checks to a DataFrame
data = spark.createDataFrame([[50], [150]], "col1: int")
result_df = dq_engine.apply_checks_by_metadata(data, resolved_checks)
display(result_df)
77 changes: 77 additions & 0 deletions docs/dqx/docs/guide/additional_configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -171,3 +171,80 @@ from pyspark.sql import functions as F

skipped = checked_df.select(F.explode("_errors").alias("e")).filter(F.col("e.skipped") == True)
```

## Defining default variables for substitution

DQX allows you to define engine-level defaults for variables used in declarative check definitions (YAML, JSON, or Delta tables). These defaults are automatically applied during `load_checks` and `save_checks` unless overridden by the per-call `variables` parameter.

<Tabs>
<TabItem value="Python" label="Python" default>
```python
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams, FileChecksStorageConfig, TableChecksStorageConfig
from databricks.sdk import WorkspaceClient

# Initialize engine with default variables
dq_engine = DQEngine(
WorkspaceClient(),
extra_params=ExtraParams(
variables={
"min_temp": 0,
"max_temp": 50,
"region": "GLOBAL"
}
)
)

# Load checks - uses 'min_temp' and 'max_temp' from defaults,
# but overrides 'region' specifically for this call.
resolved_checks = dq_engine.load_checks(
config=FileChecksStorageConfig(location="checks.yml"),
variables={"region": "EMEA"},
)

# Save checks - resolves variables before computing fingerprints and persisting.
# Uses 'min_temp' and 'max_temp' from defaults, overrides 'region' for this call.
dq_engine.save_checks(
checks=checks,
config=TableChecksStorageConfig(location="catalog.schema.checks_table"),
variables={"region": "EMEA"},
)
```
</TabItem>
</Tabs>

<Admonition type="warning" title="Variable substitution in workflows">
Variable substitution is not currently supported in DQX installable workflows. Variables can be defined and stored as YAML in the configuration file but will not be applied during workflow execution.

Variable substitution is only available when defining checks declaratively (as dictionaries or in files/tables). It is not supported when using DQX classes (e.g., `DQRowRule`) directly.
</Admonition>

## Overwriting run metadata

By default, DQX automatically generates a unique `run_id` for each engine instance and uses the current timestamp as the `run_time`. You can manually overwrite these values using `ExtraParams` if you need to align DQX results with external systems or re-run checks for a specific historical point in time.
Comment thread
mwojtyczka marked this conversation as resolved.

<Tabs>
<TabItem value="Python" label="Python" default>
```python
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import ExtraParams
from databricks.sdk import WorkspaceClient

extra_params = ExtraParams(
run_id_overwrite="custom-execution-id-123",
run_time_overwrite="2024-01-01T12:00:00Z"
)

dq_engine = DQEngine(WorkspaceClient(), extra_params=extra_params)
```
</TabItem>
<TabItem value="Workflows" label="Workflows">
You can set the following fields in the [configuration file](/docs/installation/#configuration-file) to overwrite the run metadata when using DQX workflows:
```yaml
extra_params:
run_id_overwrite: custom-execution-id-123
run_time_overwrite: 2024-01-01T12:00:00Z
```
</TabItem>
</Tabs>

74 changes: 74 additions & 0 deletions docs/dqx/docs/guide/quality_checks_definition.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -720,6 +720,80 @@ Example checks saved in a Delta or Lakebase table (compact format — `for_each_

If `run_config_name` is not provided, "default" is used. Typically, the input table or job name is used for run config name to establish a one-to-one mapping between tables or jobs and checks.

## Variable Substitution

DQX supports variable substitution in declarative check definitions (YAML, JSON, or Delta tables). This allows you to parameterize your quality rules and inject values at **load time** or **save time** from engine-level defaults and/or via the `variables` parameter in `load_checks` or `save_checks`.

### Syntax and Scope

Placeholders are defined using the `{{ variable_name }}` syntax. Variable substitution is supported in **all string values** within the check definitions, including:
- `name`
- `filter`
- `check` function arguments (`arguments`) and column names (`for_each_column`)
- any other top-level or nested string field

<Admonition type="warning" title="Do not use variable substitution for criticality">
The `criticality` field only accepts fixed values (`error` or `warn`). Do not use variable placeholders for `criticality` — the resolved value must be a valid criticality and substituting it defeats the purpose of having an explicit severity level in the check definition.
</Admonition>

### Resolution

Variables are resolved when checks are loaded or saved via the engine. To resolve variables, pass a dictionary to the `variables` parameter of `load_checks` or `save_checks`. User can decide whether to provide variables when loading or saving checks.

<Admonition type="tip" title="Resolving variables at save time">
When using `save_checks` with variables, placeholders are resolved **before** computing rule fingerprints and persisting. This ensures that stored checks and their fingerprints reflect the actual resolved check logic. Without resolving at save time, fingerprints would be computed on unresolved `{{ }}` placeholders, causing a mismatch between the fingerprints stored in the checks table and those recorded in the summary metrics and per-row detailed results tables.
</Admonition>

<Admonition type="info" title="Note">
Variable substitution is only available when defining checks declaratively (as dictionaries or in files/tables). It is not supported when using DQX classes (e.g., `DQRowRule`) directly.
</Admonition>

```python
import yaml
from databricks.labs.dqx.engine import DQEngine
from databricks.labs.dqx.config import FileChecksStorageConfig, TableChecksStorageConfig
from databricks.sdk import WorkspaceClient

dq_engine = DQEngine(WorkspaceClient())

# Define checks with variable placeholders
checks = yaml.safe_load("""
- criticality: error
check:
function: is_in_range
arguments:
column: temperature
min_limit: "{{ min_temp }}"
max_limit: "{{ max_temp }}"
filter: "region = '{{ region }}'"
""")

variables = {
"min_temp": 0,
"max_temp": 100,
"region": "EMEA",
}

# Load checks from file with variable resolution
resolved_checks = dq_engine.load_checks(
config=FileChecksStorageConfig(location="checks.yml"),
variables=variables,
)

# Or resolve variables when saving checks (ensures fingerprints are consistent)
dq_engine.save_checks(
checks=checks,
config=TableChecksStorageConfig(location="catalog.schema.checks_table"),
variables=variables,
)
```

## Default Variables

In addition to specifying variables during the load or save process, you can define engine-level defaults using the `ExtraParams` class. These constants are automatically applied to all checks unless explicitly overridden.

For technical details and configuration examples, see [Default Variables](/docs/guide/additional_configuration#defining-default-variables-for-substitution) in the Additional Configuration guide.

## Validating syntax of quality checks

You can validate the syntax of checks loaded from a storage system or checks defined programmatically before applying them.
Expand Down
6 changes: 6 additions & 0 deletions docs/dqx/docs/guide/quality_checks_storage.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -180,6 +180,12 @@ If you create checks as a list of DQRule objects, you can convert them using the
# also works for absolute and relative workspace paths if invoked from Databricks notebook or job
checks: list[dict] = dq_engine.load_checks(config=FileChecksStorageConfig(location="checks.yml"))

# load checks from a local file with variable substitution
checks: list[dict] = dq_engine.load_checks(
FileChecksStorageConfig(location="checks.yml"),
variables={"threshold": 100, "column_name": "total_amount"}
)

# load checks from arbitrary workspace location using absolute path
checks: list[dict] = dq_engine.load_checks(config=WorkspaceFileChecksStorageConfig(location="/Shared/App1/checks.yml"))

Expand Down
Loading
Loading