You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/plans.md
+41-28Lines changed: 41 additions & 28 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,44 +4,55 @@ A plan is a set of changes that summarizes the difference between the local stat
4
4
5
5
During plan creation:
6
6
7
-
*the local state of the SQLMesh project is compared against the state of a target environment. The difference computed is what constitutes a plan.
8
-
*users may be prompted to categorize changes (refer to [change categories](#change-categories)) to existing models in order for SQLMesh to devise a backfill strategy for models that have been affected indirectly (by being downstream dependencies of updated models). By default, SQLMesh attempts to categorize changes automatically, but this behavior can be changed through [configuration](../reference/configuration.md#auto_categorize_changes).
9
-
*each plan requires a date range to which it will be applied. If not specified, the date range is derived automatically based on model definitions and the target environment.
7
+
*The local state of the SQLMesh project is compared to the state of a target environment. The difference between the two and the actions needed to synchronize the environment with the local state are what constitutes a plan.
8
+
*Users may be prompted to [categorize changes](#change-categories)) to existing models so SQLMesh can determine what actions to take for indirectly affected models (the downstream models that depend on the updated models). By default, SQLMesh attempts to categorize changes automatically, but this behavior can be changed through [configuration](../reference/configuration.md#auto_categorize_changes).
9
+
*Each plan requires a date range to which it will be applied. If not specified, the date range is derived automatically based on model definitions and the target environment.
10
10
11
-
The benefit of having a plan is that all changes can be reviewed and verified before they are applied to the data warehouse. A typical plan contains a combination of the following:
11
+
The benefit of plans is that all changes can be reviewed and verified before they are applied to the data warehouse and any computations are performed. A typical plan contains a combination of the following:
12
12
13
-
* A list of added models.
14
-
* A list of removed models.
15
-
* A list of directly modified models and a text diff of changes that have been made.
16
-
* A list of indirectly modified models.
17
-
* Missing data intervals for affected models.
18
-
* A date range that will be affected by the plan application.
13
+
* A list of added models
14
+
* A list of removed models
15
+
* A list of directly modified models and a text diff of changes that have been made
16
+
* A list of indirectly modified models
17
+
* Missing data intervals for affected models
18
+
* A date range that will be affected by the plan application
19
19
20
20
To create a new plan, run the following command:
21
21
```bash
22
-
sqlmesh plan
22
+
sqlmesh plan [environment name]
23
23
```
24
+
25
+
If no environment name is specified, the plan is generated for the `prod` environment.
26
+
24
27
## Change categories
25
-
Categories only need to be provided for models that have been modified directly. The categorization of indirectly modified downstream models is inferred based on upstream decisions. If more than one upstream dependency of an indirectly modified model has been modified and they have conflicting categories, the most conservative category (breaking) is assigned to this model.
28
+
Categories only need to be provided for models that have been modified directly. The categorization of indirectly modified downstream models is inferred based on the types of changes to the directly modified models.
29
+
30
+
If more than one upstream dependency of an indirectly modified model has been modified and they have conflicting categories, the most conservative category (breaking) is assigned to this model.
26
31
27
32
### Breaking change
28
-
If a directly modified model change is categorized as breaking, then it will be backfilled along with all its downstream dependencies. In general, this is the safest option to choose because it guarantees all downstream dependencies will reflect the change. However, it is a more expensive option because it involves additional data reprocessing, which has a runtime cost associated with it (refer to [backfilling](#backfilling)). Choose this option when a change has been made to a model's logic that has a functional impact on its downstream dependencies.
33
+
If a directly modified model change is categorized as breaking, it and its downstream dependencies will be backfilled.
34
+
35
+
In general, this is the safest option because it guarantees all downstream dependencies will reflect the change. However, it is a more expensive option because it involves additional data reprocessing, which has a runtime cost associated with it (refer to [backfilling](#backfilling)).
36
+
37
+
Choose this option when a change has been made to a model's logic that has a functional impact on its downstream dependencies. For example, adding or modifying a model's `WHERE` clause is a breaking change because downstream models contain rows that would now be filtered out.
29
38
30
39
### Non-breaking change
31
-
A directly-modified model that is classified as non-breaking will be backfilled, but its downstream dependencies will not. This is a common choice in scenarios such as an addition of a new column, an action which doesn't affect downstream models as new columns can't be used by downstream models without modifying them directly.
40
+
A directly-modified model that is classified as non-breaking will be backfilled, but its downstream dependencies will not.
41
+
42
+
This is a common choice in scenarios such as an addition of a new column, an action which doesn't affect downstream models, as new columns can't be used by downstream models without modifying them directly to select the column. If any downstream models contain a `select *` from the model, SQLMesh attempts to infer breaking status on a best-effort basis. We recommend explicitly specifying a query's columns to avoid unnecessary recomputation.
32
43
33
44
## Plan application
34
-
Once a plan has been created and reviewed, it should then be applied to a target [environment](environments.md) in order for the changes that are part of it to take effect.
45
+
Once a plan has been created and reviewed, it is then applied to the target [environment](environments.md) in order for its changes to take effect.
35
46
36
-
Every time a model is changed as part of a plan, a new variant of this model gets created behind the scenes (a [snapshot](architecture/snapshots.md)) with a unique [fingerprint](architecture/snapshots.md#fingerprints) assigned to it. In turn, each model variant gets a separate physical location for data (i.e. table). Data between different variants of the same model is never shared (except for the [forward-only](#forward-only-plans)case).
47
+
Every time a model is changed as part of a plan, a new variant of this model gets created behind the scenes (a [snapshot](architecture/snapshots.md) with a unique [fingerprint](architecture/snapshots.md#fingerprints)is assigned to it). In turn, each model variant's data is stored in a separate physical table. Data between different variants of the same model is never shared, except for [forward-only](#forward-only-plans)plans.
37
48
38
-
When a plan is applied to an environment, that environment gets associated with a collection of model variants that are part of that plan. In other words, each environment is a collection of references to model variants and the physical tables associated with them.
49
+
When a plan is applied to an environment, the environment gets associated with the set of model variants that are part of that plan. In other words, each environment is a collection of references to model variants and the physical tables associated with them.
39
50
40
51

41
52
42
53
*Each model variant gets its own physical table while environments only contain references to these tables.*
43
54
44
-
This unique approach to understanding and applying changes is what enables SQLMesh's Virtual Environments. This technology allows SQLMesh to ensure complete isolation between environments while allowing it to share physical data assets between environments when appropriate and safe to do so. Additionally, since each model change is captured in a separate physical table, reverting to a previous version becomes a simple and quick operation (refer to [Virtual Update](#virtual-update)) as long as its physical table hasn't been garbage collected by the janitor process. SQLMesh makes it easy to be correct, and really hard to accidentally and irreversibly break things.
55
+
This unique approach to understanding and applying changes is what enables SQLMesh's Virtual Environments. This technology allows SQLMesh to ensure complete isolation between environments while allowing it to share physical data assets between environments when appropriate and safe to do so. Additionally, since each model change is captured in a separate physical table, reverting to a previous version becomes a simple and quick operation (refer to [Virtual Update](#virtual-update)) as long as its physical table hasn't been garbage collected by the janitor process. SQLMesh makes it easy to be correct and really hard to accidentally and irreversibly break things.
45
56
46
57
### Backfilling
47
58
Despite all the benefits, the approach described above is not without trade-offs. When a new model version is just created, a physical table assigned to it is empty. Therefore, SQLMesh needs to re-apply the logic of the new model version to the entire date range of this model in order to populate the new version's physical table. This process is called backfilling.
@@ -59,22 +70,24 @@ We will be iterating on terminology to better capture the nuances of each type i
59
70
Note for incremental models: despite the fact that backfilling can happen incrementally (see `batch_size` parameter on models), there is an extra cost associated with this operation due to additional runtime involved. If the runtime cost is a concern, a [forward-only plan](#forward-only-plans) can be used instead.
60
71
61
72
### Virtual Update
62
-
Another benefit of the aforementioned approach is that data for a new model version can be fully pre-built while still in a development environment. This means that all changes and their downstream dependencies can be fully previewed before they get promoted to the production environment. Therefore, the process of promoting a change to production is reduced to reference swapping. If during plan creation no data gaps have been detected and only references to new model versions need to be updated, then such update is referred to as a Virtual Update. Virtual Updates impose no additional runtime overhead or cost.
73
+
Another benefit of the SQLMesh approach is that data for a new model version can be fully pre-built while still in a development environment. That way all changes and their downstream dependencies can be fully previewed before they are promoted to the production environment.
74
+
75
+
With this approach, the process of promoting a change to production is reduced to reference swapping. If during plan creation no data gaps have been detected and only references to new model versions need to be updated, then the update is referred to as a Virtual Update. Virtual Updates impose no additional runtime overhead or cost.
63
76
64
77
## Forward-only plans
65
78
Sometimes the runtime cost associated with rebuilding an entire physical table is too high and outweighs the benefits a separate table provides. This is when a forward-only plan comes in handy.
66
79
67
-
When a forward-only plan is applied, all of the contained model changes will not get separate physical tables assigned to them. Instead, physical tables of previous model versions are reused. The benefit of such a plan is that no backfilling is required, so there is no runtime overhead and hence no cost. The drawback is that reverting to a previous version is no longer as straightforward, and requires a combination of additional forward-only changes and restatements (refer to [restatement plans](#restatement-plans)).
80
+
When a forward-only plan is applied to the `prod` environment, none of the plan's changed models will have new physical tables created for them. Instead, physical tables from previous model versions are reused. The benefit of this is that no backfilling is required, so there is no runtime overhead or cost. The drawback is that reverting to a previous version is no longer simple and requires a combination of additional forward-only changes and [restatements](#restatement-plans).
68
81
69
-
Also note that once a forward-only change is applied to production, all development environments that referred to the previous versions of the updated models will be impacted.
82
+
Note that once a forward-only change is applied to `prod`, all development environments that referred to the previous versions of the updated models will be impacted.
70
83
71
-
To preserve isolation between environments during development, SQLMesh creates temporary physical tables for forward-only model versions and uses them for evaluation in development environments. However, the implication of this is that only a limited change preview is available in the development environment before the change makes it to production. The date range of the preview is provided as part of plan creation.
84
+
A core component of the development process is to execute code and verify its behavior. To enable this while preserving isolation between environments, `sqlmesh plan [environment name]` evaluates code in non-`prod` environments by creating temporary physical tables for forward-only model versions. This means that only a limited preview of changes is available in the development environment before the change is promoted to `prod`. The date range of the preview is provided as part of plan creation command.
72
85
73
-
Note that all changes made as part of a forward-only plan automatically get a **forward-only** category assigned to them. These types of changes can't be mixed together with breaking and non-breaking changes (refer to [change categories](#change-categories)) as part of the same plan.
86
+
Note that all changes made as part of a forward-only plan automatically get a **forward-only** category assigned to them. These types of changes can't be mixed together with [breaking and non-breaking changes](#change-categories) within the same plan.
74
87
75
-
To create a forward-only plan, the `--forward-only` option has to be added to the `plan` command:
88
+
To create a forward-only plan, add the `--forward-only` option to the `plan` command:
76
89
```bash
77
-
sqlmesh plan --forward-only
90
+
sqlmesh plan [environment name] --forward-only
78
91
```
79
92
80
93
### Effective date
@@ -89,10 +102,10 @@ There are cases when models need to be re-evaluated for a given time range, even
89
102
90
103
For this reason, the `plan` command supports the `--restate-model` option, which allows users to specify one or more names of a model to be reprocessed. Each name can also refer to an external table defined outside SQLMesh.
91
104
92
-
Application of such a plan will trigger a cascading backfill for all specified models (excluding external tables), as well as all models downstream from them. The plan's date range in this case determines data intervals that will be affected. For example:
105
+
Application of a plan will trigger a cascading backfill for all specified models (other than external tables), as well as all models downstream from them. The plan's date range determines the data intervals that will be affected.
106
+
107
+
For example, this command creates a plan that restates the model `db.model_a` and all its downstream dependencies, as well as all models that refer to the `external.table_a` table and their downstream dependencies:
93
108
94
109
```bash
95
110
sqlmesh plan --restate-model db.model_a --restate-model external.table_a
96
111
```
97
-
98
-
The command above creates a plan that restates the model `db.model_a` and all its downstream dependencies, as well as all models that refer to the `external.table_a` table and their downstream dependencies.
0 commit comments