|
| 1 | +Define Schemas |
| 2 | +============== |
| 3 | + |
| 4 | +:doc:`Transforms </concepts/transforms>` can define schemas to validate task data at each step of the |
| 5 | +pipeline. Taskgraph uses `msgspec`_ under the hood, and provides a |
| 6 | +:class:`~taskgraph.util.schema.Schema` base class that integrates with |
| 7 | +:meth:`TransformSequence.add_validate() <taskgraph.transforms.base.TransformSequence.add_validate>`. |
| 8 | + |
| 9 | +There are two ways to define a schema: the **class based** approach and the |
| 10 | +**dict based** approach. Both produce equivalent results; which one you prefer |
| 11 | +is a matter of style. |
| 12 | + |
| 13 | +.. _msgspec: https://jcristharif.com/msgspec/ |
| 14 | + |
| 15 | + |
| 16 | +Class Based Schemas |
| 17 | +------------------- |
| 18 | + |
| 19 | +Subclass :class:`~taskgraph.util.schema.Schema` and declare fields as class |
| 20 | +attributes with type annotations: |
| 21 | + |
| 22 | +.. code-block:: python |
| 23 | +
|
| 24 | + from typing import Optional |
| 25 | + from taskgraph.transforms.base import TransformSequence |
| 26 | + from taskgraph.util.schema import Schema |
| 27 | +
|
| 28 | + class MySubConfig(Schema): |
| 29 | + total_num: int |
| 30 | + fields: list[str] = [] |
| 31 | +
|
| 32 | + class MySchema(Schema, forbid_unknown_fields=False): |
| 33 | + config: Optional[MySubConfig] = None |
| 34 | +
|
| 35 | + transforms = TransformSequence() |
| 36 | + transforms.add_validate(MySchema) |
| 37 | +
|
| 38 | +A few things to note: |
| 39 | + |
| 40 | +- Field names use ``snake_case`` in Python but are **automatically renamed to |
| 41 | + ``kebab-case``** in YAML. So ``total_num`` in Python matches |
| 42 | + ``total-num`` in YAML. |
| 43 | +- ``Optional[T]`` fields default to ``None`` unless you supply an explicit |
| 44 | + default. |
| 45 | +- Fields without a default are **required**. |
| 46 | +- ``forbid_unknown_fields=True`` (the default) causes validation to fail if the |
| 47 | + task data contains keys that are not declared in the schema. Set it to |
| 48 | + ``False`` on outer schemas so that fields belonging to later transforms are |
| 49 | + not rejected. |
| 50 | + |
| 51 | + |
| 52 | +Dict Based Schemas |
| 53 | +------------------ |
| 54 | + |
| 55 | +Call :meth:`Schema.from_dict() <taskgraph.util.schema.Schema.from_dict>` with a |
| 56 | +dictionary mapping field names to ``type`` or ``(type, default)`` tuples: |
| 57 | + |
| 58 | +.. code-block:: python |
| 59 | +
|
| 60 | + from typing import Optional, Union |
| 61 | + from taskgraph.transforms.base import TransformSequence |
| 62 | + from taskgraph.util.schema import Schema |
| 63 | +
|
| 64 | + MySchema = Schema.from_dict( |
| 65 | + { |
| 66 | + "config": Schema.from_dict( |
| 67 | + { |
| 68 | + "total-num": int, |
| 69 | + "fields": list[str] = [] |
| 70 | + }, |
| 71 | + optional=True, |
| 72 | + ), |
| 73 | + }, |
| 74 | + forbid_unknown_fields=False, |
| 75 | + ) |
| 76 | +
|
| 77 | + transforms = TransformSequence() |
| 78 | + transforms.add_validate(MySchema) |
| 79 | +
|
| 80 | +This example is equivalent to the first example. One advantage with the dict based approach |
| 81 | +is that you can write keys in **kebab-case** directly. |
| 82 | + |
| 83 | +Field specifications follow these rules: |
| 84 | + |
| 85 | +- A bare type (e.g. ``str``) means the field is required. |
| 86 | +- ``Optional[T]`` means the field is optional and defaults to ``None``. |
| 87 | +- A ``(type, default)`` tuple supplies an explicit default, e.g. |
| 88 | + ``(list[str], [])``. |
| 89 | + |
| 90 | +Keyword arguments to ``from_dict`` are forwarded to ``msgspec.defstruct``. |
| 91 | +The most commonly used ones are ``name`` (for better error messages) and |
| 92 | +``forbid_unknown_fields``. |
| 93 | + |
| 94 | +.. note:: |
| 95 | + ``Schema.from_dict`` does **not** apply ``rename="kebab"`` automatically, |
| 96 | + because you can express the kebab-case names directly in the dict keys. |
| 97 | + Underscores in dict keys stay as underscores and dashes become valid |
| 98 | + kebab-case field names. |
| 99 | + |
| 100 | + |
| 101 | +Nesting Schemas |
| 102 | +--------------- |
| 103 | + |
| 104 | +Both approaches support nesting: |
| 105 | + |
| 106 | +.. code-block:: python |
| 107 | +
|
| 108 | + # Class-based nesting |
| 109 | + class Inner(Schema): |
| 110 | + value: str |
| 111 | +
|
| 112 | + class Outer(Schema, forbid_unknown_fields=False, kw_only=True): |
| 113 | + inner: Optional[Inner] = None |
| 114 | +
|
| 115 | + # Dict-based nesting |
| 116 | + Outer = Schema.from_dict( |
| 117 | + { |
| 118 | + "inner": Schema.from_dict({"value": str}, optional=True), |
| 119 | + }, |
| 120 | + forbid_unknown_fields=False, |
| 121 | + ) |
| 122 | +
|
| 123 | +Pass ``optional=True`` to ``from_dict`` to make the whole nested schema |
| 124 | +optional. This is necessary as function calls are not allowed in type |
| 125 | +annotations, so ``Optiona[Schema.from_dict(...)]`` is not valid Python. |
| 126 | + |
| 127 | + |
| 128 | +Mutually Exclusive Fields |
| 129 | +------------------------- |
| 130 | + |
| 131 | +Use the ``exclusive`` keyword to declare groups of fields where at most one |
| 132 | +may be set at a time: |
| 133 | + |
| 134 | +.. code-block:: python |
| 135 | +
|
| 136 | + # Class-based |
| 137 | + class MySchema(Schema, exclusive=[["field_a", "field_b"]]): |
| 138 | + field_a: Optional[str] = None |
| 139 | + field_b: Optional[str] = None |
| 140 | +
|
| 141 | + # Dict-based |
| 142 | + MySchema = Schema.from_dict( |
| 143 | + { |
| 144 | + "field-a": Optional[str], |
| 145 | + "field-b": Optional[str], |
| 146 | + }, |
| 147 | + exclusive=[["field_a", "field_b"]], |
| 148 | + ) |
| 149 | +
|
| 150 | +``exclusive`` takes a list of groups, where each group is a list of field |
| 151 | +names (Python ``snake_case``). A validation error is raised if more than one |
| 152 | +field in a group is set. |
| 153 | + |
| 154 | +.. note:: |
| 155 | + When using ``exclusive`` with the dict-based approach, refer to fields by |
| 156 | + their Python attribute names (``snake_case``), not their YAML keys. |
0 commit comments