Skip to content

Commit b41a1de

Browse files
Hore01skrawcz
authored andcommitted
docs(testing): add testing guide and runnable example (#1044)
Hamilton's DAG model is the backbone of clean ETL, and tests are how that backbone stays honest. This adds a canonical guide and a runnable example that cover the four scenarios called out in #1044: 1. Unit-testing plain Hamilton functions 2. Unit-testing functions that use decorators (@tag, @parameterize, @extract_columns) -- both by calling the underlying callable and by building a Driver to verify the decorator wiring 3. Integration-testing the DAG with `Builder().with_modules(...).build()`, including `inputs=` and `overrides=` for short-circuiting upstream nodes 4. Driving an in-memory module via `ad_hoc_utils.create_temporary_module` for self-contained tests (e.g. of custom materializers) The docs page at `docs/how-tos/test-hamilton-code.rst` uses `.. literalinclude::` to pull every snippet from `examples/testing/`, so the guide and the example cannot drift out of sync. Closes #1044 Signed-off-by: Olajumoke Akinremi <106763970+Hore01@users.noreply.github.com>
1 parent e14969c commit b41a1de

11 files changed

Lines changed: 678 additions & 0 deletions

docs/how-tos/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ directory. If there's an example you want but don't see, reach out or open an is
1616
ml-training
1717
llm-workflows
1818
run-data-quality-checks
19+
test-hamilton-code
1920
use-hamilton-for-lineage
2021
scale-up
2122
microservice
Lines changed: 163 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,163 @@
1+
..
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
19+
==============================
20+
Testing Apache Hamilton code
21+
==============================
22+
23+
A common question on `Slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g>`_
24+
is "how do I test my Hamilton functions?" -- often with a worry that decorators
25+
will get in the way. The good news: a Hamilton function is just a Python
26+
function, so the standard ``pytest`` patterns you already know apply directly.
27+
28+
This guide walks through four cases, in order of increasing scope:
29+
30+
1. Unit-testing a plain function.
31+
2. Unit-testing a decorated function.
32+
3. Integration-testing the full DAG with the ``Driver``, including
33+
``inputs=`` and ``overrides=``.
34+
4. Driving an in-memory module for self-contained tests (e.g. of custom
35+
materializers).
36+
37+
The complete runnable code lives in
38+
`examples/testing <https://github.com/apache/hamilton/tree/main/examples/testing>`_.
39+
Every code block on this page is a ``literalinclude`` from that folder, so the
40+
docs and the example can never drift out of sync.
41+
42+
Prerequisites
43+
-------------
44+
45+
Install the example's dependencies and run it:
46+
47+
.. code-block:: bash
48+
49+
cd examples/testing
50+
pip install -r requirements.txt
51+
pytest
52+
53+
You should see all 13 tests pass.
54+
55+
1. Unit-testing plain functions
56+
-------------------------------
57+
58+
Hamilton encourages you to put your transformation logic in ordinary modules
59+
that don't import the Driver. That makes them trivial to unit-test:
60+
61+
.. literalinclude:: ../../examples/testing/my_functions.py
62+
:language: python
63+
:lines: 18-
64+
:caption: ``examples/testing/my_functions.py``
65+
66+
Tests are just calls to the function:
67+
68+
.. literalinclude:: ../../examples/testing/test_my_functions.py
69+
:language: python
70+
:lines: 18-
71+
:caption: ``examples/testing/test_my_functions.py``
72+
73+
Notes
74+
^^^^^
75+
76+
* No Driver is required. You import the module under test and call its
77+
functions like any other Python code.
78+
* ``pytest.mark.parametrize`` is a clean way to cover edge cases without
79+
copy-pasting test bodies.
80+
* Use ``pd.testing.assert_series_equal`` (or ``assert_frame_equal``) for
81+
pandas outputs -- it gives readable diffs on failure.
82+
83+
2. Unit-testing decorated functions
84+
-----------------------------------
85+
86+
Hamilton's function modifiers (``@tag``, ``@parameterize``, ``@extract_columns``,
87+
...) tell Hamilton how to wire the function into the DAG. They do **not**
88+
change what the function does when you call it directly. You can therefore
89+
mix two complementary techniques:
90+
91+
A. Call the underlying function in a unit test (cheap, fast).
92+
B. Build a Driver and assert on the expanded DAG, to verify the wiring (slower,
93+
but the only way to catch decorator misuse).
94+
95+
The decorated module:
96+
97+
.. literalinclude:: ../../examples/testing/decorated_functions.py
98+
:language: python
99+
:lines: 18-
100+
:caption: ``examples/testing/decorated_functions.py``
101+
102+
The tests:
103+
104+
.. literalinclude:: ../../examples/testing/test_decorated_functions.py
105+
:language: python
106+
:lines: 18-
107+
:caption: ``examples/testing/test_decorated_functions.py``
108+
109+
3. Integration-testing the DAG
110+
------------------------------
111+
112+
For end-to-end tests, build a Driver from the module(s) under test and call
113+
``execute(...)`` with controlled inputs.
114+
115+
Two arguments are especially useful:
116+
117+
* ``inputs=`` injects test data at the **inputs** of the DAG -- the parameter
118+
names that aren't produced by any function.
119+
* ``overrides=`` short-circuits an **intermediate** node by pinning its value.
120+
This is the integration-test sweet spot: instead of fabricating realistic
121+
raw inputs and re-deriving every intermediate, hand the DAG a known value
122+
for ``spend`` (or any other node) and assert on the *downstream* logic.
123+
124+
.. literalinclude:: ../../examples/testing/test_driver.py
125+
:language: python
126+
:lines: 18-
127+
:caption: ``examples/testing/test_driver.py``
128+
129+
Tip: ``Driver`` exposes a number of inspection methods --
130+
``what_is_upstream_of``, ``what_is_downstream_of``, ``list_available_variables``
131+
-- that are handy for asserting on graph shape, not just values.
132+
133+
4. In-memory modules for self-contained tests
134+
---------------------------------------------
135+
136+
Sometimes you want a test that defines its own tiny Hamilton module inline
137+
-- to exercise a custom materializer, regression-test a data-quality bug,
138+
or demonstrate a pattern in a doctest. You don't need to create a new
139+
``.py`` file; ``hamilton.ad_hoc_utils.create_temporary_module`` packages
140+
inline-defined functions into a real module that the Driver can consume:
141+
142+
.. literalinclude:: ../../examples/testing/test_ad_hoc_module.py
143+
:language: python
144+
:lines: 18-
145+
:caption: ``examples/testing/test_ad_hoc_module.py``
146+
147+
This is also how Hamilton itself tests several of its built-in materializers,
148+
so it scales up to fairly involved scenarios. See
149+
`tests/test_ad_hoc_utils.py <https://github.com/apache/hamilton/blob/main/tests/test_ad_hoc_utils.py>`_
150+
in the Hamilton source for more usage examples.
151+
152+
Where to go from here
153+
---------------------
154+
155+
* Read the :doc:`/concepts/best-practices/code-organization` page -- the
156+
module structure it recommends is the same one that makes tests easy to
157+
write.
158+
* Browse the
159+
`Hamilton test suite <https://github.com/apache/hamilton/tree/main/tests>`_
160+
for ideas; the same patterns work for user code.
161+
* Have a testing pattern that isn't covered here? Share it on
162+
`Slack <https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g>`_
163+
-- we'd love to add it.

examples/testing/README.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,79 @@
1+
<!--
2+
Licensed to the Apache Software Foundation (ASF) under one
3+
or more contributor license agreements. See the NOTICE file
4+
distributed with this work for additional information
5+
regarding copyright ownership. The ASF licenses this file
6+
to you under the Apache License, Version 2.0 (the
7+
"License"); you may not use this file except in compliance
8+
with the License. You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing,
13+
software distributed under the License is distributed on an
14+
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
15+
KIND, either express or implied. See the License for the
16+
specific language governing permissions and limitations
17+
under the License.
18+
-->
19+
20+
# Testing Apache Hamilton code
21+
22+
This is the runnable companion to the
23+
[Testing Hamilton code](https://hamilton.apache.org/how-tos/test-hamilton-code/)
24+
how-to. It shows that Hamilton functions are normal Python -- so the standard
25+
`pytest` patterns you already know apply, including when decorators are
26+
involved.
27+
28+
The example covers the four cases from issue
29+
[#1044](https://github.com/apache/hamilton/issues/1044):
30+
31+
1. **Unit-testing plain functions** -- `test_my_functions.py`
32+
2. **Unit-testing decorated functions** -- `test_decorated_functions.py`
33+
3. **Integration-testing the DAG with `inputs=` and `overrides=`** -- `test_driver.py`
34+
4. **In-memory modules with `ad_hoc_utils.create_temporary_module`** -- `test_ad_hoc_module.py`
35+
36+
## File organization
37+
38+
| File | Purpose |
39+
| ---- | ------- |
40+
| `my_functions.py` | A small marketing dataflow (no decorators). |
41+
| `decorated_functions.py` | The same style of dataflow, using `@tag`, `@parameterize` and `@extract_columns`. |
42+
| `test_my_functions.py` | Unit tests that import and call functions directly. |
43+
| `test_decorated_functions.py` | Unit + driver-level tests for the decorated module. |
44+
| `test_driver.py` | End-to-end tests using `Builder().with_modules(...).build()` plus `inputs=` and `overrides=`. |
45+
| `test_ad_hoc_module.py` | Builds a module from inline-defined functions for self-contained tests. |
46+
| `conftest.py` | Adds this folder to `sys.path` so `import my_functions` works under pytest. |
47+
48+
## Running the tests
49+
50+
```bash
51+
pip install -r requirements.txt
52+
pytest
53+
```
54+
55+
You should see all tests pass. Each test file is independently runnable:
56+
57+
```bash
58+
pytest test_my_functions.py -v
59+
pytest test_driver.py -v
60+
```
61+
62+
## What to take away
63+
64+
* A Hamilton function is just a Python function. Testing it does **not**
65+
require building a Driver.
66+
* Decorators (`@tag`, `@parameterize`, `@extract_columns`, ...) leave the
67+
underlying callable intact. Direct function calls still work; the decorator
68+
changes how Hamilton wires the function into the DAG, not what the function
69+
computes.
70+
* For integration tests, `Builder().with_modules(...).build()` is the canonical
71+
entry point. Use `inputs=` to inject test data at the DAG inputs and
72+
`overrides=` to short-circuit intermediate nodes when you want to assert on
73+
downstream logic in isolation.
74+
* Need to test inline -- e.g. for a regression test or a custom materializer
75+
-- without a `.py` file on disk? Use
76+
`hamilton.ad_hoc_utils.create_temporary_module`.
77+
78+
If you have questions, or need help with this example,
79+
join us on [Slack](https://join.slack.com/t/hamilton-opensource/shared_invite/zt-2niepkra8-DGKGf_tTYhXuJWBTXtIs4g).

examples/testing/conftest.py

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
"""Make the example modules importable when running ``pytest`` from this dir.
19+
20+
Hamilton needs to import your dataflow module by name. Adding this folder to
21+
``sys.path`` lets the example tests do ``import my_functions`` directly,
22+
mirroring how a real project would lay out its code.
23+
"""
24+
25+
import os
26+
import sys
27+
28+
sys.path.insert(0, os.path.dirname(__file__))
Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,55 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
"""Functions that use Hamilton decorators.
19+
20+
Decorators are a common source of confusion when testing. The point of this
21+
module is to show that decorators do not get in the way of unit testing -- the
22+
function below the decorator is still a plain Python callable, so you can call
23+
it directly from a test. To test what the decorator *expands to*, drive the
24+
function through a Driver instead (see ``test_decorated_functions.py``).
25+
"""
26+
27+
import pandas as pd
28+
29+
from hamilton.function_modifiers import extract_columns, parameterize, source, tag, value
30+
31+
32+
@tag(owner="growth-team", pii="false")
33+
def total_signups(signups: pd.Series) -> int:
34+
"""Sum of signups across the time window."""
35+
return int(signups.sum())
36+
37+
38+
@parameterize(
39+
spend_in_thousands={"raw_value": source("spend"), "divisor": value(1000.0)},
40+
signups_in_hundreds={"raw_value": source("signups"), "divisor": value(100.0)},
41+
)
42+
def scaled(raw_value: pd.Series, divisor: float) -> pd.Series:
43+
"""Scale a series by a constant divisor.
44+
45+
`@parameterize` produces one node per entry above. The function itself is
46+
still a normal callable, so a unit test can call ``scaled(some_series, 1000)``
47+
directly without a Driver.
48+
"""
49+
return raw_value / divisor
50+
51+
52+
@extract_columns("scaled_spend", "scaled_signups")
53+
def scaled_features(spend_in_thousands: pd.Series, signups_in_hundreds: pd.Series) -> pd.DataFrame:
54+
"""Bundle the two scaled series into a frame, then expose each column as a node."""
55+
return pd.DataFrame({"scaled_spend": spend_in_thousands, "scaled_signups": signups_in_hundreds})

0 commit comments

Comments
 (0)