Using the project structure from the previous tutorial, write your first task.
The task task_create_random_data is defined in
src/my_project/task_data_preparation.py and generates a data set stored in
bld/data.pkl.
The task_ prefix for modules and task functions is important so that pytask
automatically discovers them.
my_project
│
├───.pytask
│
├───bld
│ └────data.pkl
│
├───src
│ └───my_project
│ ├────__init__.py
│ ├────config.py
│ └────task_data_preparation.py
│
├───pytask.lock
│
└───pyproject.toml
Generally, a task is a function whose name starts with task_. Tasks produce outputs
and the most common output is a file which we will focus on throughout the tutorials.
The following interfaces are different ways to specify the products of a task which is necessary for pytask to correctly run a workflow. The interfaces are ordered from most (left) to least recommended (right).
!!! important
You cannot mix different interfaces for the same task. Choose only one.
=== "Annotated"
The task accepts the argument `path` that points to the file where the data set will be
stored. The path is passed to the task via the default value, `BLD / "data.pkl"`. To
indicate that this file is a product we add some metadata to the argument.
The type hint `Annotated[Path, Product]` uses
[`typing.Annotated`](https://docs.python.org/3/library/typing.html#typing.Annotated)
syntax. The first entry specifies the argument type
([`pathlib.Path`](https://docs.python.org/3/library/pathlib.html#pathlib.Path)), and the
second entry ([`pytask.Product`](../api/utilities_and_typing.md#pytask.Product)) marks
this argument as a product.
```{ .python .annotate hl_lines="2 12" title="task_data_preparation.py" }
--8<-- "docs_src/tutorials/write_a_task_py310.py"
```
1. `Annotated[Path, Product]` marks the argument as a task product.
!!! tip
If you want to refresh your knowledge about type hints, read
[this guide](../type_hints.md).
=== "produces"
Tasks can use `produces` as an argument name. Every value, or in this case path, passed
to this argument is automatically treated as a task product. Here, the path is given by
the default value of the argument.
```{ .python .annotate hl_lines="8" title="task_data_preparation.py" }
--8<-- "docs_src/tutorials/write_a_task_produces.py"
```
1. Using the argument name `produces` marks this path as a task product.
Now, execute pytask to collect tasks in the current and subsequent directories.
--8<-- "docs/source/_static/md/write-a-task.md"
After the task succeeds, pytask writes pytask.lock next to pyproject.toml. Keep this
file under version control so later builds can detect unchanged tasks.
Use the @task decorator to mark a function as
a task regardless of its function name. You can optionally pass a new name for the task.
Otherwise, pytask uses the function name.
from pytask import task
# The id will be ".../task_data_preparation.py::create_random_data".
@task
def create_random_data(): ...
# The id will be ".../task_data_preparation.py::create_data".
@task(name="create_data")
def create_random_data(): ...Use the configuration value
task_files if you prefer a
different naming scheme for the task modules. task_*.py is the default. You can
specify one or multiple patterns to collect tasks from other files.