|
| 1 | +# Portability |
| 2 | + |
| 3 | +This guide explains what you need to do to move a pytask project between machines and |
| 4 | +why the lockfile is central to that process. |
| 5 | + |
| 6 | +```{seealso} |
| 7 | +The lockfile format and behavior are documented in the |
| 8 | +[reference guide](../reference_guides/lockfile.md). |
| 9 | +``` |
| 10 | + |
| 11 | +## How to port a project |
| 12 | + |
| 13 | +Use this checklist when you move a project to another machine or environment. |
| 14 | + |
| 15 | +1. **Update state once on the source machine.** |
| 16 | + |
| 17 | + Run a normal build so `pytask.lock` is up to date: |
| 18 | + |
| 19 | + ```console |
| 20 | + $ pytask build |
| 21 | + ``` |
| 22 | + |
| 23 | + If you already have a recent lockfile and up-to-date outputs, you can skip this step. |
| 24 | + |
| 25 | +1. **Ship the right files.** |
| 26 | + |
| 27 | + Commit `pytask.lock` to your repository and move it with the project. In practice, |
| 28 | + you should move: |
| 29 | + |
| 30 | + - the project files tracked in version control (source, configuration, data inputs |
| 31 | + and `pytask.lock`) |
| 32 | + - the build artifacts you want to reuse (often in `bld/` if you follow the tutorial |
| 33 | + layout) |
| 34 | + - the `.pytask` folder in case you are using the data catalog and it manages some of |
| 35 | + the files |
| 36 | + |
| 37 | +1. **Files outside the project** |
| 38 | + |
| 39 | + If you have files outside the project root (the folder with the `pyproject.toml` |
| 40 | + file), you need to make sure that the same relative layout exists on the target |
| 41 | + machine. |
| 42 | + |
| 43 | +1. **Run pytask on the target machine.** |
| 44 | + |
| 45 | + When states match, tasks are skipped. When they differ, tasks run and the lockfile is |
| 46 | + updated. |
| 47 | + |
| 48 | +## What makes a project portable |
| 49 | + |
| 50 | +There are two things that must stay stable across machines: |
| 51 | + |
| 52 | +First, task and node IDs must be stable. An ID is the unique identifier that ties a task |
| 53 | +or node to an entry in `pytask.lock`. pytask builds these IDs from project-relative |
| 54 | +paths anchored at the project root, so most users do not need to do anything. If you |
| 55 | +implement custom nodes, make sure their IDs remain project-relative and stable across |
| 56 | +machines. |
| 57 | + |
| 58 | +Second, state values must be portable. The lockfile stores opaque state strings from |
| 59 | +`PNode.state()` and `PTask.state()`, and pytask uses them to decide whether a task is up |
| 60 | +to date. Content hashes are portable; timestamps or absolute paths are not. This mostly |
| 61 | +matters when you define custom nodes or custom hash functions. |
| 62 | + |
| 63 | +## Tips for stable state values |
| 64 | + |
| 65 | +- Prefer file content hashes over timestamps for custom nodes. |
| 66 | +- For `PythonNode` values that are not natively stable, provide a custom hash function. |
| 67 | +- Avoid machine-specific paths or timestamps in custom `state()` implementations. |
| 68 | + |
| 69 | +```{seealso} |
| 70 | +For custom nodes, see [Writing custom nodes](writing_custom_nodes.md). |
| 71 | +For hashing guidance, see |
| 72 | +[Hashing inputs of tasks](hashing_inputs_of_tasks.md). |
| 73 | +``` |
| 74 | + |
| 75 | +## Cleaning up the lockfile |
| 76 | + |
| 77 | +`pytask.lock` is updated incrementally. Entries are only replaced when the corresponding |
| 78 | +tasks run. If tasks are removed or renamed, their old entries remain as stale data and |
| 79 | +are ignored. |
| 80 | + |
| 81 | +To clean up stale entries without deleting the file, run: |
| 82 | + |
| 83 | +```console |
| 84 | +$ pytask build --clean-lockfile |
| 85 | +``` |
| 86 | + |
| 87 | +This rewrites the lockfile after a successful build with only the currently collected |
| 88 | +tasks and their current state values. |
0 commit comments