Skip to content

Commit f954448

Browse files
committed
docs: add howto guide for defining schemas
1 parent aa40a6a commit f954448

2 files changed

Lines changed: 157 additions & 0 deletions

File tree

docs/howto/define-schemas.rst

Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,156 @@
1+
Define Schemas
2+
==============
3+
4+
:doc:`Transforms </concepts/transforms>` can define schemas to validate task data at each step of the
5+
pipeline. Taskgraph uses `msgspec`_ under the hood, and provides a
6+
:class:`~taskgraph.util.schema.Schema` base class that integrates with
7+
:meth:`TransformSequence.add_validate() <taskgraph.transforms.base.TransformSequence.add_validate>`.
8+
9+
There are two ways to define a schema: the **class based** approach and the
10+
**dict based** approach. Both produce equivalent results; which one you prefer
11+
is a matter of style.
12+
13+
.. _msgspec: https://jcristharif.com/msgspec/
14+
15+
16+
Class Based Schemas
17+
-------------------
18+
19+
Subclass :class:`~taskgraph.util.schema.Schema` and declare fields as class
20+
attributes with type annotations:
21+
22+
.. code-block:: python
23+
24+
from typing import Optional
25+
from taskgraph.transforms.base import TransformSequence
26+
from taskgraph.util.schema import Schema
27+
28+
class MySubConfig(Schema):
29+
total_num: int
30+
fields: list[str] = []
31+
32+
class MySchema(Schema, forbid_unknown_fields=False):
33+
config: Optional[MySubConfig] = None
34+
35+
transforms = TransformSequence()
36+
transforms.add_validate(MySchema)
37+
38+
A few things to note:
39+
40+
- Field names use ``snake_case`` in Python but are **automatically renamed to
41+
``kebab-case``** in YAML. So ``total_num`` in Python matches
42+
``total-num`` in YAML.
43+
- ``Optional[T]`` fields default to ``None`` unless you supply an explicit
44+
default.
45+
- Fields without a default are **required**.
46+
- ``forbid_unknown_fields=True`` (the default) causes validation to fail if the
47+
task data contains keys that are not declared in the schema. Set it to
48+
``False`` on outer schemas so that fields belonging to later transforms are
49+
not rejected.
50+
51+
52+
Dict Based Schemas
53+
------------------
54+
55+
Call :meth:`Schema.from_dict() <taskgraph.util.schema.Schema.from_dict>` with a
56+
dictionary mapping field names to ``type`` or ``(type, default)`` tuples:
57+
58+
.. code-block:: python
59+
60+
from typing import Optional, Union
61+
from taskgraph.transforms.base import TransformSequence
62+
from taskgraph.util.schema import Schema
63+
64+
MySchema = Schema.from_dict(
65+
{
66+
"config": Schema.from_dict(
67+
{
68+
"total-num": int,
69+
"fields": list[str] = []
70+
},
71+
optional=True,
72+
),
73+
},
74+
forbid_unknown_fields=False,
75+
)
76+
77+
transforms = TransformSequence()
78+
transforms.add_validate(MySchema)
79+
80+
This example is equivalent to the first example. One advantage with the dict based approach
81+
is that you can write keys in **kebab-case** directly.
82+
83+
Field specifications follow these rules:
84+
85+
- A bare type (e.g. ``str``) means the field is required.
86+
- ``Optional[T]`` means the field is optional and defaults to ``None``.
87+
- A ``(type, default)`` tuple supplies an explicit default, e.g.
88+
``(list[str], [])``.
89+
90+
Keyword arguments to ``from_dict`` are forwarded to ``msgspec.defstruct``.
91+
The most commonly used ones are ``name`` (for better error messages) and
92+
``forbid_unknown_fields``.
93+
94+
.. note::
95+
``Schema.from_dict`` does **not** apply ``rename="kebab"`` automatically,
96+
because you can express the kebab-case names directly in the dict keys.
97+
Underscores in dict keys stay as underscores and dashes become valid
98+
kebab-case field names.
99+
100+
101+
Nesting Schemas
102+
---------------
103+
104+
Both approaches support nesting:
105+
106+
.. code-block:: python
107+
108+
# Class-based nesting
109+
class Inner(Schema):
110+
value: str
111+
112+
class Outer(Schema, forbid_unknown_fields=False, kw_only=True):
113+
inner: Optional[Inner] = None
114+
115+
# Dict-based nesting
116+
Outer = Schema.from_dict(
117+
{
118+
"inner": Schema.from_dict({"value": str}, optional=True),
119+
},
120+
forbid_unknown_fields=False,
121+
)
122+
123+
Pass ``optional=True`` to ``from_dict`` to make the whole nested schema
124+
optional. This is necessary as function calls are not allowed in type
125+
annotations, so ``Optiona[Schema.from_dict(...)]`` is not valid Python.
126+
127+
128+
Mutually Exclusive Fields
129+
-------------------------
130+
131+
Use the ``exclusive`` keyword to declare groups of fields where at most one
132+
may be set at a time:
133+
134+
.. code-block:: python
135+
136+
# Class-based
137+
class MySchema(Schema, exclusive=[["field_a", "field_b"]]):
138+
field_a: Optional[str] = None
139+
field_b: Optional[str] = None
140+
141+
# Dict-based
142+
MySchema = Schema.from_dict(
143+
{
144+
"field-a": Optional[str],
145+
"field-b": Optional[str],
146+
},
147+
exclusive=[["field_a", "field_b"]],
148+
)
149+
150+
``exclusive`` takes a list of groups, where each group is a list of field
151+
names (Python ``snake_case``). A validation error is raised if more than one
152+
field in a group is set.
153+
154+
.. note::
155+
When using ``exclusive`` with the dict-based approach, refer to fields by
156+
their Python attribute names (``snake_case``), not their YAML keys.

docs/howto/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ A collection of how-to guides.
1010
run-locally
1111
debugging
1212
bootstrap-taskgraph
13+
define-schemas
1314
resolve-keyed-by
1415
use-fetches
1516
docker

0 commit comments

Comments
 (0)