@@ -32,27 +32,35 @@ are growing over time and you run into these problems.
3232## Solution
3333
3434The main idea for the solution is quickly explained. We will, first, formalize
35- dimensions into objects and, secondly, combine them in one object such that we only have
36- to iterate over instances of this object in a single loop.
37-
38- We will start by defining the dimensions using {class}` ~typing.NamedTuple ` or
35+ dimensions into objects using {class}` ~typing.NamedTuple ` or
3936{func}` ~dataclasses.dataclass ` .
4037
41- Then, we will define the object that holds both pieces of information together and for
42- the lack of a better name, we will call it an experiment.
38+ Secondly, we will combine dimensions in multi-dimensional objects such that we only have
39+ to iterate over instances of this object in a single loop. Here and for the lack of a
40+ better name, we will call the object an experiment.
41+
42+ Lastly, we will also use the {class}` ~pytask.DataCatalog ` to not be bothered with
43+ defining paths.
4344
44- ``` {literalinclude} ../../../docs_src/how_to_guides/bp_complex_task_repetitions/experiment.py
45+ ``` {seealso}
46+ If you have not learned about the {class}`~pytask.DataCatalog` yet, start with the
47+ {doc}`tutorial <../tutorials/using_a_data_catalog>` and continue with the
48+ {doc}`how-to guide <the_data_catalog>`.
49+ ```
50+
51+ ``` {literalinclude} ../../../docs_src/how_to_guides/bp_complex_task_repetitions/config.py
4552-- -
4653caption: config.py
4754-- -
4855```
4956
5057There are some things to be said.
5158
52- - The names on each dimension need to be unique and ensure that by combining them for
53- the name of the experiment, we get a unique and descriptive id.
54- - Dimensions might need more attributes than just a name, like paths, or other arguments
55- for the task. Add them.
59+ - The ` .name ` attributes on each dimension need to return unique names and to ensure
60+ that by combining them for the name of the experiment, we get a unique and descriptive
61+ id.
62+ - Dimensions might need more attributes than just a name, like paths, keys for the data
63+ catalog, or other arguments for the task.
5664
5765Next, we will use these newly defined data structures and see how our tasks change when
5866we use them.
@@ -63,21 +71,55 @@ caption: task_example.py
6371-- -
6472```
6573
66- As you see, we replaced
74+ As you see, we lost a level of indentation and we moved all the generations of names and
75+ paths to the dimensions and multi-dimensional objects.
6776
68- ## Using the ` DataCatalog `
77+ ## Adding another level
6978
70- ## Adding another dimension
79+ Extending a dimension by another level is usually quickly done. For example, if we have
80+ another model that we want to fit to the data, we extend ` MODELS ` which will
81+ automatically lead to all downstream tasks being created.
7182
72- ## Adding another level
83+ ``` {code-block} python
84+ ---
85+ caption: config.py
86+ ---
87+ ...
88+ MODELS = [Model("ols"), Model("logit"), Model("linear_prob"), Model("new_model")]
89+ ...
90+ ```
91+
92+ Of course, you might need to alter ` task_fit_model ` because the task needs to handle the
93+ new model as well as the others. Here is where it pays off if you are using high-level
94+ interfaces in your code that handle all of the models with a simple
95+ ` fitted_model = fit_model(data=data, model_name=model_name) ` call and also return fitted
96+ models that are similar objects.
7397
7498## Executing a subset
7599
76- ## Grouping and aggregating
100+ What if you want to execute a subset of tasks, for example, all tasks related to a model
101+ or a dataset?
102+
103+ When you are using the ` .name ` attributes of the dimensions and multi-dimensional
104+ objects like in the example above, you ensure that the names of dimensions are included
105+ in all downstream tasks.
106+
107+ Thus, you can simply call pytask with the following expression to execute all tasks
108+ related to the logit model.
109+
110+ ``` console
111+ pytask -k logit
112+ ```
113+
114+ ``` {seealso}
115+ Expressions and markers for selecting tasks are explained in
116+ {doc}`../tutorials/selecting_tasks`.
117+ ```
77118
78119## Extending repetitions
79120
80- Some parametrized tasks are costly to run - costly in terms of computing power, memory,
81- or time. Users often extend repetitions triggering all repetitions to be rerun. Thus,
82- use the {func}` @pytask.mark.persist <pytask.mark.persist> ` decorator, which is explained
83- in more detail in this {doc}` tutorial <../tutorials/making_tasks_persist> ` .
121+ Some repeated tasks are costly to run - costly in terms of computing power, memory, or
122+ runtime. If you change a task module, you might accidentally trigger all other tasks in
123+ the module to be rerun. Use the {func}` @pytask.mark.persist <pytask.mark.persist> `
124+ decorator, which is explained in more detail in this
125+ {doc}` tutorial <../tutorials/making_tasks_persist> ` .
0 commit comments