Skip to content

Commit f39b210

Browse files
authored
Allow bypassing data quality checks via a config key (#1598)
* Fix preëxisting issues in test_sklearn_plot_extensions * Add `disable_data_quality_checks` config key ## Context `@check_output` and `@check_output_custom` decorators let users attach data validators to function outputs. At graph-construction time, each decorated node is expanded into a subgraph: - `{name}_raw` — runs the actual function - `{name}_{validator}` — one node per validator (tagged `hamilton.data_quality.contains_dq_results`) - `{name}` — aggregates validation results and returns the raw output This flag is useful for two main reasons: 1. Validation may be crucial during development, but carries a real runtime cost in production where the pipeline is already trusted. Previously there was no way to turn it off short of removing the decorators from the source code. 2. When generating graph visualizations and we don't want to crowd them with validation nodes, passing `disable_data_quality_checks=True` at construction time causes the `_raw` and validator nodes not to be added to the graph - so any visualization of that driver's graph will only show the "business logic" nodes. ## What was changed ### `hamilton/function_modifiers/validation.py` `BaseDataValidationDecorator` (parent of both `check_output` and `check_output_custom`) now: 1. Overrides `optional_config()` to declare `{"disable_data_quality_checks": False}`. This registers the key with Hamilton's config-filtering pipeline so it is threaded through to `transform_node` automatically — no driver plumbing needed. 3. Adds an early-return guard at the top of `transform_node`: ```python if config.get("disable_data_quality_checks", False): return [node_] ``` When the flag is set, the node is returned unchanged — no `_raw` node, no validator nodes, and no expansion happens. The cost is literally zero: the extra nodes are never created. ### `hamilton/driver.py` `Builder` gains a convenience method: ```python def with_data_quality_disabled(self) -> "Builder": return self.with_config({"disable_data_quality_checks": True}) ``` This is a thin wrapper over `with_config` — discoverable by IDE autocomplete and explicit about intent. Because `with_config` does a dict `.update()`, a later call like `.with_config({"disable_data_quality_checks": False})` will re-enable validation, which is the expected last-write-wins behavior. ## Usage **Via the `Builder` convenience method (recommended):** ```python dr = ( hamilton.driver.Builder() .with_modules(my_pipeline) .with_data_quality_disabled() .build() ) ``` **Via `with_config` directly (equivalent, useful when config is assembled dynamically):** ```python dr = ( hamilton.driver.Builder() .with_modules(my_pipeline) .with_config({"disable_data_quality_checks": True}) .build() ) ``` **Legacy `Driver` constructor:** ```python dr = hamilton.driver.Driver( {"disable_data_quality_checks": True}, my_pipeline, adapter=DefaultAdapter(), ) ``` ## Tests added | File | Test | What it covers | |------|------|----------------| | `tests/function_modifiers/test_validation.py` | `test_check_output_disabled_via_config_returns_original_node` | `check_output_custom` returns the original node unchanged when flag is set | | `tests/function_modifiers/test_validation.py` | `test_check_output_builtin_disabled_via_config_returns_original_node` | `check_output` (built-in validators) also respects the flag | | `tests/test_end_to_end.py` | `test_builder_with_data_quality_disabled_removes_validator_nodes` | No DQ-tagged nodes appear in `list_available_variables()` when disabled | | `tests/test_end_to_end.py` | `test_builder_with_data_quality_disabled_still_executes_correctly` | Driver executes correctly and returns the function's real output when disabled | | `tests/test_end_to_end.py` | `test_disable_data_quality_checks_config_key_works_directly` | Raw `with_config` path (no convenience method) also suppresses validator nodes | All 13 pre-existing validation tests continue to pass. ## Design notes - **Graph-construction time, not execution time.** Disabling at construction eliminates the extra nodes entirely. An execution-time approach (e.g., a lifecycle adapter) would still pay graph construction and scheduling overhead, and would be harder to reason about. - **Config key, not a subclass.** A `NoValidationBuilder` subclass was considered. It was rejected: this is a single boolean flag, not a fundamentally different execution model. The existing `Builder` method pattern (`with_config`, `with_adapters`, …) is the right weight here. The `Builder.with_data_quality_disabled()` convenience method is offered as a discoverable alias — if maintainers prefer, the method alone (without the raw config key) or the raw key alone (without the method) is equally viable. - **`optional_config()` is the correct hook.** Hamilton's `resolve_config` / `filter_config` machinery passes only declared config keys to each decorator. Registering the key via `optional_config()` means the flag is silently ignored (defaulting to `False`) by all existing drivers that never set it — no breaking change. ## Checklist - [x] PR has an informative and human-readable title (this will be pulled into the release notes) - [x] Changes are limited to a single goal (no scope creep) - [x] Code passed the pre-commit check & code is left cleaner/nicer than when first encountered. - [x] Any _change_ in functionality is tested - [x] New functions are documented (with a description, list of inputs, and expected output) - [x] Placeholder code is flagged / future TODOs are captured in comments - [x] Project documentation has been updated if adding/changing functionality.
1 parent 7644a30 commit f39b210

7 files changed

Lines changed: 167 additions & 6 deletions

File tree

docs/how-tos/run-data-quality-checks.rst

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,4 +16,47 @@ Async validators
1616

1717
For validation logic that requires async operations (e.g., async database queries or API calls), use ``AsyncDataValidator`` or ``AsyncBaseDefaultValidator`` from ``hamilton.data_quality.base``. These define ``async def validate()`` and work with ``AsyncDriver``. You can mix sync and async validators in a single ``@check_output_custom`` call.
1818

19+
Disabling validators at runtime
20+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
21+
22+
Validators are useful during development but may be unnecessary overhead in a trusted production pipeline. You can disable all ``@check_output`` and ``@check_output_custom`` validators at graph-construction time, so no extra nodes are ever created:
23+
24+
.. code-block:: python
25+
26+
dr = (
27+
hamilton.driver.Builder()
28+
.with_modules(my_pipeline)
29+
.with_data_quality_disabled()
30+
.build()
31+
)
32+
33+
This is equivalent to passing ``{"hamilton.data_quality.disable_checks": True}`` via ``.with_config()``, which is useful when the flag is controlled dynamically (e.g., from an environment variable):
34+
35+
.. code-block:: python
36+
37+
import os
38+
39+
dr = (
40+
hamilton.driver.Builder()
41+
.with_modules(my_pipeline)
42+
.with_config({"hamilton.data_quality.disable_checks": os.getenv("DISABLE_DQ", "false") == "true"})
43+
.build()
44+
)
45+
46+
Because the flag is resolved at graph-construction time, disabled drivers carry zero runtime overhead from validation — no validator nodes are created at all.
47+
48+
A second use case is graph visualization. Each decorated function normally expands into several nodes (``{name}_raw``, one per validator, and the final ``{name}`` node), which can clutter a visualization when you want to communicate pipeline structure rather than validation wiring. Building a driver with ``with_data_quality_disabled()`` gives a clean visualization with only the business-logic nodes:
49+
50+
.. code-block:: python
51+
52+
dr_viz = (
53+
hamilton.driver.Builder()
54+
.with_modules(my_pipeline)
55+
.with_data_quality_disabled()
56+
.build()
57+
)
58+
dr_viz.display_all_functions("pipeline.png")
59+
60+
Note that this requires a separate driver instance from the one used for execution if you still want validations to run.
61+
1962
See the :doc:`check_output reference <../reference/decorators/check_output>` and `data quality writeup <https://github.com/apache/hamilton/blob/main/writeups/data_quality.md>`_ for details and examples.

docs/reference/decorators/check_output.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,25 @@ from ``hamilton.data_quality.base`` as your base class instead of the sync varia
3232
See `data_quality <https://github.com/apache/hamilton/blob/main/data\_quality.md>`_ for more information on
3333
available validators and how to build custom ones.
3434

35+
Disabling validators
36+
~~~~~~~~~~~~~~~~~~~~
37+
38+
All ``@check_output`` and ``@check_output_custom`` validators can be disabled at graph-construction
39+
time using ``Builder.with_data_quality_disabled()``:
40+
41+
.. code-block:: python
42+
43+
dr = (
44+
hamilton.driver.Builder()
45+
.with_modules(my_pipeline)
46+
.with_data_quality_disabled()
47+
.build()
48+
)
49+
50+
This eliminates all validator nodes from the graph — no ``_raw`` or validator nodes are created, so
51+
there is zero runtime cost. It is equivalent to ``.with_config({"hamilton.data_quality.disable_checks": True})``.
52+
See :doc:`../drivers/Driver` for full ``Builder`` documentation.
53+
3554
Note we also have a plugins that allow for validation with the pandera and pydantic libraries. There are two ways to access these:
3655

3756
1. ``@check_output(schema=pandera_schema)`` or ``@check_output(model=pydantic_model)``

hamilton/driver.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@
4545
from hamilton.caching.stores.base import MetadataStore, ResultStore
4646
from hamilton.dev_utils import deprecation
4747
from hamilton.execution import executors, graph_functions, grouping, state
48+
from hamilton.function_modifiers.validation import DISABLE_DATA_QUALITY_CHECKS_CONFIG_KEY
4849
from hamilton.graph_types import HamiltonNode
4950
from hamilton.io import materialization
5051
from hamilton.io.materialization import ExtractorFactory, MaterializerFactory
@@ -1843,6 +1844,19 @@ def with_adapters(self, *adapters: lifecycle_base.LifecycleAdapter) -> Self:
18431844
self.adapters.extend(adapters)
18441845
return self
18451846

1847+
def with_data_quality_disabled(self) -> Self:
1848+
"""Disables all ``@check_output`` / ``@check_output_custom`` validators at graph-construction
1849+
time. No validator nodes are created, so there is zero runtime cost.
1850+
1851+
This is equivalent to ``.with_config({DISABLE_DATA_QUALITY_CHECKS_CONFIG_KEY: True})``.
1852+
Note that a subsequent ``.with_config({DISABLE_DATA_QUALITY_CHECKS_CONFIG_KEY: False})``
1853+
will re-enable validation, since ``with_config`` always wins on the last write.
1854+
1855+
:return: self
1856+
"""
1857+
1858+
return self.with_config({DISABLE_DATA_QUALITY_CHECKS_CONFIG_KEY: True})
1859+
18461860
def with_materializers(self, *materializers: ExtractorFactory | MaterializerFactory) -> Self:
18471861
"""Add materializer nodes to the `Driver`
18481862
The generated nodes can be referenced by name in `.execute()`

hamilton/function_modifiers/validation.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030

3131
IS_DATA_VALIDATOR_TAG = "hamilton.data_quality.contains_dq_results"
3232
DATA_VALIDATOR_ORIGINAL_OUTPUT_TAG = "hamilton.data_quality.source_node"
33+
DISABLE_DATA_QUALITY_CHECKS_CONFIG_KEY = "hamilton.data_quality.disable_checks"
3334

3435

3536
class BaseDataValidationDecorator(base.NodeTransformer):
@@ -42,9 +43,14 @@ def get_validators(self, node_to_validate: node.Node) -> list[dq_base.DataValida
4243
"""
4344
pass
4445

46+
def optional_config(self) -> dict[str, Any]:
47+
return {DISABLE_DATA_QUALITY_CHECKS_CONFIG_KEY: False}
48+
4549
def transform_node(
4650
self, node_: node.Node, config: dict[str, Any], fn: Callable
4751
) -> Collection[node.Node]:
52+
if config.get(DISABLE_DATA_QUALITY_CHECKS_CONFIG_KEY, False):
53+
return [node_]
4854
raw_node = node.Node(
4955
name=node_.name
5056
+ "_raw", # TODO -- make this unique -- this will break with multiple validation decorators, which we *don't* want

tests/function_modifiers/test_validation.py

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -313,6 +313,44 @@ def fn(input: int) -> int:
313313
assert result_fail.passes is False
314314

315315

316+
def test_check_output_disabled_via_config_returns_original_node():
317+
"""With hamilton.data_quality.disable_checks=True, transform_node returns the original node unchanged."""
318+
decorator = check_output_custom(
319+
SampleDataValidator2(dataset_length=1, importance="warn"),
320+
SampleDataValidator3(dtype=np.int64, importance="warn"),
321+
)
322+
323+
def fn(input: pd.Series) -> pd.Series:
324+
return input
325+
326+
node_ = node.Node.from_fn(fn)
327+
subdag = decorator.transform_node(
328+
node_, config={"hamilton.data_quality.disable_checks": True}, fn=fn
329+
)
330+
assert len(subdag) == 1
331+
assert subdag[0] is node_
332+
333+
334+
def test_check_output_builtin_disabled_via_config_returns_original_node():
335+
"""check_output (not custom) also respects hamilton.data_quality.disable_checks."""
336+
decorator = check_output(
337+
importance="warn",
338+
default_validator_candidates=DUMMY_VALIDATORS_FOR_TESTING,
339+
dataset_length=1,
340+
dtype=np.int64,
341+
)
342+
343+
def fn(input: pd.Series) -> pd.Series:
344+
return input
345+
346+
node_ = node.Node.from_fn(fn)
347+
subdag = decorator.transform_node(
348+
node_, config={"hamilton.data_quality.disable_checks": True}, fn=fn
349+
)
350+
assert len(subdag) == 1
351+
assert subdag[0] is node_
352+
353+
316354
def test_sync_wrapper_guards_against_unawaited_coroutine():
317355
"""Sync wrapper should raise TypeError if validator accidentally returns a coroutine."""
318356

tests/plugins/test_sklearn_plot_extensions.py

Lines changed: 5 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
# specific language governing permissions and limitations
1616
# under the License.
1717

18+
import inspect
1819
import pathlib
1920

2021
import numpy as np
@@ -141,12 +142,10 @@ def decision_boundary_display() -> DecisionBoundaryDisplay:
141142
grid = np.vstack([feature_1.ravel(), feature_2.ravel()]).T
142143
tree = DecisionTreeClassifier().fit(iris.data[:, :2], iris.target)
143144
y_pred = np.reshape(tree.predict(grid), feature_1.shape)
144-
kwargs = dict(xx0=feature_1, xx1=feature_2, response=y_pred)
145-
# sklearn 1.8+ requires n_classes
146-
sig = inspection.DecisionBoundaryDisplay.__init__
147-
if "n_classes" in sig.__code__.co_varnames:
148-
kwargs["n_classes"] = 3
149-
decision_curve = inspection.DecisionBoundaryDisplay(**kwargs)
145+
dbd_kwargs = dict(xx0=feature_1, xx1=feature_2, response=y_pred)
146+
if "n_classes" in inspect.signature(inspection.DecisionBoundaryDisplay.__init__).parameters:
147+
dbd_kwargs["n_classes"] = len(np.unique(iris.target))
148+
decision_curve = inspection.DecisionBoundaryDisplay(**dbd_kwargs)
150149
return decision_curve
151150

152151

tests/test_end_to_end.py

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -600,3 +600,45 @@ def test_driver_v2_inputs_can_be_none():
600600
with pytest.raises(ValueError):
601601
# validate that None doesn't cause issues
602602
dr.execute(["e"], inputs=None)
603+
604+
605+
def test_builder_with_data_quality_disabled_removes_validator_nodes():
606+
"""with_data_quality_disabled() eliminates validator nodes from the graph entirely."""
607+
dr = (
608+
driver.Builder()
609+
.with_modules(tests.resources.data_quality)
610+
.with_data_quality_disabled()
611+
.build()
612+
)
613+
all_vars = dr.list_available_variables()
614+
dq_nodes = [
615+
var for var in all_vars if var.tags.get("hamilton.data_quality.contains_dq_results", False)
616+
]
617+
assert len(dq_nodes) == 0
618+
619+
620+
def test_builder_with_data_quality_disabled_still_executes_correctly():
621+
"""Driver built with data quality disabled returns correct output without raising."""
622+
dr = (
623+
driver.Builder()
624+
.with_modules(tests.resources.data_quality)
625+
.with_data_quality_disabled()
626+
.build()
627+
)
628+
result = dr.execute(["data_might_be_in_range"], inputs={"data_quality_should_fail": True})
629+
assert list(result["data_might_be_in_range"]) == [10.0]
630+
631+
632+
def test_disable_data_quality_checks_config_key_works_directly():
633+
"""hamilton.data_quality.disable_checks can also be passed via with_config directly."""
634+
dr = (
635+
driver.Builder()
636+
.with_modules(tests.resources.data_quality)
637+
.with_config({"hamilton.data_quality.disable_checks": True})
638+
.build()
639+
)
640+
all_vars = dr.list_available_variables()
641+
dq_nodes = [
642+
var for var in all_vars if var.tags.get("hamilton.data_quality.contains_dq_results", False)
643+
]
644+
assert len(dq_nodes) == 0

0 commit comments

Comments
 (0)