|
| 1 | +# Complex task repetitions |
| 2 | + |
| 3 | +{doc}`Task repetitions <../tutorials/repeating_tasks_with_different_inputs>` are amazing |
| 4 | +if you want to execute lots of tasks while not repeating yourself in code. |
| 5 | + |
| 6 | +But, in any bigger project, repetitions can become hard to maintain because there are |
| 7 | +multiple layers or dimensions of repetition. |
| 8 | + |
| 9 | +Here you find some tips on how to set up your project such that adding dimensions and |
| 10 | +increasing dimensions becomes much easier. |
| 11 | + |
| 12 | +## Example |
| 13 | + |
| 14 | +You can write multiple loops around a task function where each loop stands for a |
| 15 | +different dimension. A dimension might represent different datasets or model |
| 16 | +specifications to analyze the datasets like in the following example. The task arguments |
| 17 | +are derived from the dimensions. |
| 18 | + |
| 19 | +```{literalinclude} ../../../docs_src/how_to_guides/bp_complex_task_repetitions/example.py |
| 20 | +--- |
| 21 | +caption: task_example.py |
| 22 | +--- |
| 23 | +``` |
| 24 | + |
| 25 | +There is nothing wrong with using nested loops for simpler projects. But, often projects |
| 26 | +are growing over time and you run into these problems. |
| 27 | + |
| 28 | +- When you add a new task, you need to duplicate the nested loops in another module. |
| 29 | +- When you add a dimension, you need to touch multiple files in your project and add |
| 30 | + another loop and level of indentation. |
| 31 | + |
| 32 | +## Solution |
| 33 | + |
| 34 | +The main idea for the solution is quickly explained. We will, first, formalize |
| 35 | +dimensions into objects and, secondly, combine them in one object such that we only have |
| 36 | +to iterate over instances of this object in a single loop. |
| 37 | + |
| 38 | +We will start by defining the dimensions using {class}`~typing.NamedTuple` or |
| 39 | +{func}`~dataclasses.dataclass`. |
| 40 | + |
| 41 | +Then, we will define the object that holds both pieces of information together and for |
| 42 | +the lack of a better name, we will call it an experiment. |
| 43 | + |
| 44 | +```{literalinclude} ../../../docs_src/how_to_guides/bp_complex_task_repetitions/experiment.py |
| 45 | +--- |
| 46 | +caption: config.py |
| 47 | +--- |
| 48 | +``` |
| 49 | + |
| 50 | +There are some things to be said. |
| 51 | + |
| 52 | +- The names on each dimension need to be unique and ensure that by combining them for |
| 53 | + the name of the experiment, we get a unique and descriptive id. |
| 54 | +- Dimensions might need more attributes than just a name, like paths, or other arguments |
| 55 | + for the task. Add them. |
| 56 | + |
| 57 | +Next, we will use these newly defined data structures and see how our tasks change when |
| 58 | +we use them. |
| 59 | + |
| 60 | +```{literalinclude} ../../../docs_src/how_to_guides/bp_complex_task_repetitions/example_improved.py |
| 61 | +--- |
| 62 | +caption: task_example.py |
| 63 | +--- |
| 64 | +``` |
| 65 | + |
| 66 | +As you see, we replaced |
| 67 | + |
| 68 | +## Using the `DataCatalog` |
| 69 | + |
| 70 | +## Adding another dimension |
| 71 | + |
| 72 | +## Adding another level |
| 73 | + |
| 74 | +## Executing a subset |
| 75 | + |
| 76 | +## Grouping and aggregating |
| 77 | + |
| 78 | +## Extending repetitions |
| 79 | + |
| 80 | +Some parametrized tasks are costly to run - costly in terms of computing power, memory, |
| 81 | +or time. Users often extend repetitions triggering all repetitions to be rerun. Thus, |
| 82 | +use the {func}`@pytask.mark.persist <pytask.mark.persist>` decorator, which is explained |
| 83 | +in more detail in this {doc}`tutorial <../tutorials/making_tasks_persist>`. |
0 commit comments