You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/developer/design/20260209_simplified_rollout_triggers_and_crd.md
+41-41Lines changed: 41 additions & 41 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -19,7 +19,7 @@ Additionally, the current system is difficult to automate when faced with evicti
19
19
20
20
1.**Automatic rollout detection**: The system should automatically detect when a rollout is needed based on spec changes, without requiring users to manually set a UUID.
21
21
22
-
2.**Seamless version migration**: Existing v1alpha1 resources should continue to work, with automatic conversion to v1alpha2 as needed.
22
+
2.**Seamless version migration**: Existing v1alpha1 resources should continue to work, with automatic conversion to v1 as needed.
23
23
24
24
3.**Terraform compatibility**: Configuration must not fight with infrastructure as code tools such as Terraform.
25
25
@@ -34,9 +34,9 @@ Additionally, the current system is difficult to automate when faced with evicti
34
34
35
35
## Solution Proposal
36
36
37
-
### 1. New CRD Version: v1alpha2
37
+
### 1. New CRD Version: v1
38
38
39
-
Introduce a new `v1alpha2` version of the Materialize CRD with the following changes:
39
+
Introduce a new `v1` version of the Materialize CRD with the following changes:
40
40
41
41
**Spec changes:**
42
42
- Remove `requestRollout` (`Uuid`) - Rollouts are now triggered automatically when the spec hash changes.
@@ -122,14 +122,14 @@ A new HTTPS webhook server handles CRD version conversion:
122
122
**Endpoint:**`POST /convert`
123
123
124
124
**Supported conversions:**
125
-
- v1alpha1 -> v1alpha2
126
-
-v1alpha2 -> v1alpha1\*
125
+
- v1alpha1 -> v1
126
+
-v1 -> v1alpha1\*
127
127
128
128
\*The API server seemed to want this, I don't know why. We can't reconcile these, so going back never makes sense.
@@ -144,52 +144,52 @@ A new HTTPS webhook server handles CRD version conversion:
144
144
- If we are already in "promoting" status, we should unconditionally complete the promotion for the current rollout rather than destroying and replacing it.
145
145
This may trigger an additional rollout this one time, but I don't know any way around that. I think this is acceptable given the user is doing something very weird by updating orchestratord mid-rollout.
146
146
147
-
###### v1alpha2 to v1alpha1:
147
+
###### v1 to v1alpha1:
148
148
149
-
We need to include the `lastCompletedRolloutHash` from v1alpha2 in v1alpha1 as well. This is required for round tripping from v1alpha2 -> v1alpha1 -> v1alpha2,
150
-
which may happen if a user applies a v1alpha1 change over a v1alpha2 object.
149
+
We need to include the `lastCompletedRolloutHash` from v1 in v1alpha1 as well. This is required for round tripping from v1 -> v1alpha1 -> v1,
150
+
which may happen if a user applies a v1alpha1 change over a v1 object.
151
151
152
-
In the case there is an existing `lastCompletedRolloutHash`, it should be kept as-is through the round trip. As we never reconcile with v1alpha1, it should only change at v1alpha2, so this should be safe.
152
+
In the case there is an existing `lastCompletedRolloutHash`, it should be kept as-is through the round trip. As we never reconcile with v1alpha1, it should only change at v1, so this should be safe.
153
153
154
-
No attempt is made to support v1alpha1 beyond giving a valid v1alpha1 structure and supporting round tripping to v1alpha2. Fields that do not exist in v1alpha2 may have their nil value.
154
+
No attempt is made to support v1alpha1 beyond giving a valid v1alpha1 structure and supporting round tripping to v1. Fields that do not exist in v1 may have their nil value.
155
155
156
156
##### Example round trips
157
157
158
-
In these examples, we assume that orchestratord's attempt to update the stored version succeeds and that reconciliation is triggered after this update. This is only to simplify this document, and is not necessary for correctness. If orchestratord's attempt to update the stored version fails, or the reconciliation is triggered first, the conversion webhook is simply called at that time and we will reconcile the same v1alpha2 object.
158
+
In these examples, we assume that orchestratord's attempt to update the stored version succeeds and that reconciliation is triggered after this update. This is only to simplify this document, and is not necessary for correctness. If orchestratord's attempt to update the stored version fails, or the reconciliation is triggered first, the conversion webhook is simply called at that time and we will reconcile the same v1 object.
159
159
160
160
###### Simplest case
161
161
1. There is a stored v1alpha1 Materialize resource, not actively rolling out, with both `status.lastCompletedRolloutRequest` and `spec.requestRollout` matching.
162
-
1. Orchestratord gets updated to a version with v1alpha2 support.
163
-
1. Orchestratord lists existing v1alpha1 resources on startup, in order to upgrade them to v1alpha2.
164
-
1. The API server calls the conversion webhook, which returns a v1alpha2 resource. In this case, it would have `status.lastCompletedRolloutHash` and `status.requestedRolloutHash` set to the same calculated hash after conversion.
165
-
1. Orchestratord calls `replace` to store the resource as v1alpha2.
166
-
1. Orchestratord gets notified of the new v1alpha2 resource, but determines there is nothing to do.
162
+
1. Orchestratord gets updated to a version with v1 support.
163
+
1. Orchestratord lists existing v1alpha1 resources on startup, in order to upgrade them to v1.
164
+
1. The API server calls the conversion webhook, which returns a v1 resource. In this case, it would have `status.lastCompletedRolloutHash` and `status.requestedRolloutHash` set to the same calculated hash after conversion.
165
+
1. Orchestratord calls `replace` to store the resource as v1.
166
+
1. Orchestratord gets notified of the new v1 resource, but determines there is nothing to do.
167
167
168
-
At this point, the stored version is v1alpha2, and no rollout is triggered.
168
+
At this point, the stored version is v1, and no rollout is triggered.
169
169
170
170
1. The user then applies a v1alpha1 resource. It contains some change that affects the hash (ie: `spec.environmentd_image_ref`). It may or may not include `spec.requestRollout`, that doesn't matter.
171
-
1. Before storing this change, the API server calls the conversion webhook, which returns a v1alpha2 resource. In this case, it should not contain a status, as the user applied v1alpha1 resource did not contain a status (TODO verify this).
172
-
1. Orchestratord gets notified of the new v1alpha2 resource, which contains the old status not yet updated after the applied v1alpha1 resource. This means the `status.lastCompletedRolloutHash` and `status.requestedRolloutHash` still match each other, but do not match the calculated hash.
171
+
1. Before storing this change, the API server calls the conversion webhook, which returns a v1 resource. In this case, it should not contain a status, as the user applied v1alpha1 resource did not contain a status (TODO verify this).
172
+
1. Orchestratord gets notified of the new v1 resource, which contains the old status not yet updated after the applied v1alpha1 resource. This means the `status.lastCompletedRolloutHash` and `status.requestedRolloutHash` still match each other, but do not match the calculated hash.
173
173
1. Orchestratord reconciles like normal, calculating a new `status.requestedRolloutHash` and triggering a rollout since it is different.
174
174
175
-
If the user had instead applied a v1alpha2 resource instead, no conversion would be needed and orchestratord would reconcile it directly.
175
+
If the user had instead applied a v1 resource instead, no conversion would be needed and orchestratord would reconcile it directly.
176
176
177
177
###### Existing v1alpha1 resource is mid-upgrade, but not promoting
178
178
1. There is a stored v1alpha1 Materialize resource, actively rolling out, with `status.lastCompletedRolloutRequest` and `spec.requestRollout` not matching. It is not in "promoting" status.
179
-
1. Orchestratord gets updated to a version with v1alpha2 support.
180
-
1. Orchestratord lists existing v1alpha1 resources on startup, in order to upgrade them to v1alpha2.
181
-
1. The API server calls the conversion webhook, which returns a v1alpha2 resource. In this case, it would have `status.lastCompletedRolloutHash` set to `None` and `status.requestedRolloutHash` set to the calculated hash after conversion.
182
-
1. Orchestratord calls `replace` to store the resource as v1alpha2.
183
-
1. Orchestratord gets notified of the new v1alpha2 resource.
179
+
1. Orchestratord gets updated to a version with v1 support.
180
+
1. Orchestratord lists existing v1alpha1 resources on startup, in order to upgrade them to v1.
181
+
1. The API server calls the conversion webhook, which returns a v1 resource. In this case, it would have `status.lastCompletedRolloutHash` set to `None` and `status.requestedRolloutHash` set to the calculated hash after conversion.
182
+
1. Orchestratord calls `replace` to store the resource as v1.
183
+
1. Orchestratord gets notified of the new v1 resource.
184
184
1. Orchestratord reconciles like normal, continuing the existing rollout and overwriting any objects that are different. This is the same behavior it would have with current orchestratord and v1alpha1.
185
185
186
186
###### Existing v1alpha1 resource is mid-upgrade and already promoting
187
187
1. There is a stored v1alpha1 Materialize resource, actively rolling out, with `status.lastCompletedRolloutRequest` and `spec.requestRollout` not matching. It is in "promoting" status.
188
-
1. Orchestratord gets updated to a version with v1alpha2 support.
189
-
1. Orchestratord lists existing v1alpha1 resources on startup, in order to upgrade them to v1alpha2.
190
-
1. The API server calls the conversion webhook, which returns a v1alpha2 resource. In this case, it would have `status.lastCompletedRolloutHash` set to `None` and `status.requestedRolloutHash` set to the calculated hash after conversion.
191
-
1. Orchestratord calls `replace` to store the resource as v1alpha2.
192
-
1. Orchestratord gets notified of the new v1alpha2 resource.
188
+
1. Orchestratord gets updated to a version with v1 support.
189
+
1. Orchestratord lists existing v1alpha1 resources on startup, in order to upgrade them to v1.
190
+
1. The API server calls the conversion webhook, which returns a v1 resource. In this case, it would have `status.lastCompletedRolloutHash` set to `None` and `status.requestedRolloutHash` set to the calculated hash after conversion.
191
+
1. Orchestratord calls `replace` to store the resource as v1.
192
+
1. Orchestratord gets notified of the new v1 resource.
193
193
1. Orchestratord reconciles like normal. Critically, it unconditionally continues with promotion rather than overwriting any objects.
194
194
1. After promotion is successful, the updated status triggers a new rollout. (TODO verify that this works if we have a `status.requestedRolloutHash` set in the initial conversion)
195
195
@@ -216,8 +216,8 @@ Orchestratord will also get readiness probes so nothing tries to call this webho
216
216
### 5. CRD Registration
217
217
218
218
The CRD is registered with:
219
-
- Both v1alpha1 and v1alpha2 versions
220
-
-v1alpha2 as the stored version
219
+
- Both v1alpha1 and v1 versions
220
+
-v1 as the stored version
221
221
- Webhook conversion configuration pointing to the operator service
### 6. Replace all Materialize resources to update their stored versions
243
243
244
-
We have set v1alpha2 as the stored version, but that doesn't update existing resources. Those are only updated when they are reapplied.
244
+
We have set v1 as the stored version, but that doesn't update existing resources. Those are only updated when they are reapplied.
245
245
246
246
During orchestratord startup, after waiting for the CRD to be established, we need to loop through all Materialize resources and `replace` them.
247
247
@@ -250,22 +250,22 @@ If it is possible to determine the stored version of these resources, we should
250
250
I think it is OK for this to be best-effort, and only warn in case of failure.
251
251
For backward compatibility reasons, we're going to have to support the old version for some time.
252
252
Orchestratord is likely to get restarted/upgraded multiple times in that period, so it can try again.
253
-
If the user ever writes an updated CR, it will also be stored in v1alpha2, so it isn't critical that this work immediately.
253
+
If the user ever writes an updated CR, it will also be stored in v1, so it isn't critical that this work immediately.
254
254
255
255
## Known testing required
256
256
257
257
Our existing nightly orchestratord tests cover a lot, but we'll need to extend them to work with multiple CRD versions.
258
258
259
-
- Upgrades from existing v1alpha1 environments by applying v1alpha1 CR. (this is basically what we have now, but we need to not break it with the orchestratord changes to reconcile v1alpha2 after conversion)
260
-
- Upgrades from existing v1alpha1 environments by applying v1alpha2 CR.
261
-
- Upgrades from existing v1alpha2 environments by applying v1alpha1 CR.
262
-
- Upgrades from existing v1alpha2 environments by applying v1alpha2 CR.
259
+
- Upgrades from existing v1alpha1 environments by applying v1alpha1 CR. (this is basically what we have now, but we need to not break it with the orchestratord changes to reconcile v1 after conversion)
260
+
- Upgrades from existing v1alpha1 environments by applying v1 CR.
261
+
- Upgrades from existing v1 environments by applying v1alpha1 CR.
262
+
- Upgrades from existing v1 environments by applying v1 CR.
263
263
- Upgrade from existing v1alpha1 environment that is mid-rollout not in "promoting" status.
264
264
- Upgrade from existing v1alpha1 environment that is mid-rollout in "promoting" status.
265
265
- Upgrades with a previous rollout already in progress.
266
266
- Upgrades triggered by annotation.
267
-
- Deploy of latest Materialize image versions using v1alpha2 CR.
268
-
- Deploy of older Materialize image versions using v1alpha2 CR.
267
+
- Deploy of latest Materialize image versions using v1 CR.
268
+
- Deploy of older Materialize image versions using v1 CR.
0 commit comments