Skip to content

Commit e90772c

Browse files
committed
fix comments
1 parent 9a2b110 commit e90772c

2 files changed

Lines changed: 277 additions & 258 deletions

File tree

docs/concepts/StagedUpdateRun/README.md

Lines changed: 277 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,281 @@ An updateRun executes in two phases. During the initialization phase, the contro
120120

121121
In the execution phase, the controller processes each stage sequentially, updates clusters within each stage one at a time, and enforces completion of after-stage tasks. It then executes a final delete stage to clean up resources from unscheduled clusters. The updateRun succeeds when all stages complete successfully. However, it will fail if any execution-affecting events occur, for example, the target ClusterResourcePlacement being deleted, and member cluster changes triggering new scheduling. In such cases, error details are recorded in the updateRun status. Remember that once initialized, an updateRun operates on its strategy snapshot, making it immune to subsequent strategy modifications.
122122

123+
## Understand ClusterStagedUpdateRun status
124+
125+
Let's take a deep look into the status of a completed `ClusterStagedUpdateRun`. It displays details about the rollout status for every clusters and stages.
126+
127+
```bash
128+
$ kubectl describe crsur run example-run
129+
...
130+
Status:
131+
Conditions:
132+
Last Transition Time: 2025-03-12T23:21:39Z
133+
Message: ClusterStagedUpdateRun initialized successfully
134+
Observed Generation: 1
135+
Reason: UpdateRunInitializedSuccessfully
136+
Status: True
137+
Type: Initialized
138+
Last Transition Time: 2025-03-12T23:21:39Z
139+
Message:
140+
Observed Generation: 1
141+
Reason: UpdateRunStarted
142+
Status: True
143+
Type: Progressing
144+
Last Transition Time: 2025-03-12T23:26:15Z
145+
Message:
146+
Observed Generation: 1
147+
Reason: UpdateRunSucceeded
148+
Status: True
149+
Type: Succeeded
150+
Deletion Stage Status:
151+
Clusters:
152+
Conditions:
153+
Last Transition Time: 2025-03-12T23:26:15Z
154+
Message:
155+
Observed Generation: 1
156+
Reason: StageUpdatingStarted
157+
Status: True
158+
Type: Progressing
159+
Last Transition Time: 2025-03-12T23:26:15Z
160+
Message:
161+
Observed Generation: 1
162+
Reason: StageUpdatingSucceeded
163+
Status: True
164+
Type: Succeeded
165+
End Time: 2025-03-12T23:26:15Z
166+
Stage Name: kubernetes-fleet.io/deleteStage
167+
Start Time: 2025-03-12T23:26:15Z
168+
Policy Observed Cluster Count: 2
169+
Policy Snapshot Index Used: 0
170+
Staged Update Strategy Snapshot:
171+
Stages:
172+
After Stage Tasks:
173+
Type: Approval
174+
Wait Time: 0s
175+
Type: TimedWait
176+
Wait Time: 1m0s
177+
Label Selector:
178+
Match Labels:
179+
Environment: staging
180+
Name: staging
181+
After Stage Tasks:
182+
Type: Approval
183+
Wait Time: 0s
184+
Label Selector:
185+
Match Labels:
186+
Environment: canary
187+
Name: canary
188+
Sorting Label Key: name
189+
After Stage Tasks:
190+
Type: TimedWait
191+
Wait Time: 1m0s
192+
Type: Approval
193+
Wait Time: 0s
194+
Label Selector:
195+
Match Labels:
196+
Environment: production
197+
Name: production
198+
Sorting Label Key: order
199+
Stages Status:
200+
After Stage Task Status:
201+
Approval Request Name: example-run-staging
202+
Conditions:
203+
Last Transition Time: 2025-03-12T23:21:54Z
204+
Message:
205+
Observed Generation: 1
206+
Reason: AfterStageTaskApprovalRequestCreated
207+
Status: True
208+
Type: ApprovalRequestCreated
209+
Last Transition Time: 2025-03-12T23:22:55Z
210+
Message:
211+
Observed Generation: 1
212+
Reason: AfterStageTaskApprovalRequestApproved
213+
Status: True
214+
Type: ApprovalRequestApproved
215+
Type: Approval
216+
Conditions:
217+
Last Transition Time: 2025-03-12T23:22:54Z
218+
Message:
219+
Observed Generation: 1
220+
Reason: AfterStageTaskWaitTimeElapsed
221+
Status: True
222+
Type: WaitTimeElapsed
223+
Type: TimedWait
224+
Clusters:
225+
Cluster Name: member1
226+
Conditions:
227+
Last Transition Time: 2025-03-12T23:21:39Z
228+
Message:
229+
Observed Generation: 1
230+
Reason: ClusterUpdatingStarted
231+
Status: True
232+
Type: Started
233+
Last Transition Time: 2025-03-12T23:21:54Z
234+
Message:
235+
Observed Generation: 1
236+
Reason: ClusterUpdatingSucceeded
237+
Status: True
238+
Type: Succeeded
239+
Conditions:
240+
Last Transition Time: 2025-03-12T23:21:54Z
241+
Message:
242+
Observed Generation: 1
243+
Reason: StageUpdatingWaiting
244+
Status: False
245+
Type: Progressing
246+
Last Transition Time: 2025-03-12T23:22:55Z
247+
Message:
248+
Observed Generation: 1
249+
Reason: StageUpdatingSucceeded
250+
Status: True
251+
Type: Succeeded
252+
End Time: 2025-03-12T23:22:55Z
253+
Stage Name: staging
254+
Start Time: 2025-03-12T23:21:39Z
255+
After Stage Task Status:
256+
Approval Request Name: example-run-canary
257+
Conditions:
258+
Last Transition Time: 2025-03-12T23:23:10Z
259+
Message:
260+
Observed Generation: 1
261+
Reason: AfterStageTaskApprovalRequestCreated
262+
Status: True
263+
Type: ApprovalRequestCreated
264+
Last Transition Time: 2025-03-12T23:25:15Z
265+
Message:
266+
Observed Generation: 1
267+
Reason: AfterStageTaskApprovalRequestApproved
268+
Status: True
269+
Type: ApprovalRequestApproved
270+
Type: Approval
271+
Clusters:
272+
Cluster Name: member2
273+
Conditions:
274+
Last Transition Time: 2025-03-12T23:22:55Z
275+
Message:
276+
Observed Generation: 1
277+
Reason: ClusterUpdatingStarted
278+
Status: True
279+
Type: Started
280+
Last Transition Time: 2025-03-12T23:23:10Z
281+
Message:
282+
Observed Generation: 1
283+
Reason: ClusterUpdatingSucceeded
284+
Status: True
285+
Type: Succeeded
286+
Conditions:
287+
Last Transition Time: 2025-03-12T23:23:10Z
288+
Message:
289+
Observed Generation: 1
290+
Reason: StageUpdatingWaiting
291+
Status: False
292+
Type: Progressing
293+
Last Transition Time: 2025-03-12T23:25:15Z
294+
Message:
295+
Observed Generation: 1
296+
Reason: StageUpdatingSucceeded
297+
Status: True
298+
Type: Succeeded
299+
End Time: 2025-03-12T23:25:15Z
300+
Stage Name: canary
301+
Start Time: 2025-03-12T23:22:55Z
302+
After Stage Task Status:
303+
Conditions:
304+
Last Transition Time: 2025-03-12T23:26:15Z
305+
Message:
306+
Observed Generation: 1
307+
Reason: AfterStageTaskWaitTimeElapsed
308+
Status: True
309+
Type: WaitTimeElapsed
310+
Type: TimedWait
311+
Approval Request Name: example-run-production
312+
Conditions:
313+
Last Transition Time: 2025-03-12T23:25:15Z
314+
Message:
315+
Observed Generation: 1
316+
Reason: AfterStageTaskApprovalRequestCreated
317+
Status: True
318+
Type: ApprovalRequestCreated
319+
Last Transition Time: 2025-03-12T23:25:25Z
320+
Message:
321+
Observed Generation: 1
322+
Reason: AfterStageTaskApprovalRequestApproved
323+
Status: True
324+
Type: ApprovalRequestApproved
325+
Type: Approval
326+
Clusters:
327+
Conditions:
328+
Last Transition Time: 2025-03-12T23:25:15Z
329+
Message:
330+
Observed Generation: 1
331+
Reason: StageUpdatingWaiting
332+
Status: False
333+
Type: Progressing
334+
Last Transition Time: 2025-03-12T23:26:15Z
335+
Message:
336+
Observed Generation: 1
337+
Reason: StageUpdatingSucceeded
338+
Status: True
339+
Type: Succeeded
340+
End Time: 2025-03-12T23:26:15Z
341+
Stage Name: production
342+
Events: <none>
343+
```
344+
345+
### UpdateRun overall status
346+
347+
At the very top, `Status.Conditions` gives the overall status of the updateRun. The execution an update run consists of two phases: initialization and execution.
348+
During initialization, the controller performs a one-time setup where it captures a snapshot of the updateRun strategy, collects scheduled and to-be-deleted `ClusterResourceBindings`,
349+
generates the cluster update sequence, and records all this information in the updateRun status.
350+
The `UpdateRunInitializedSuccessfully` condition indicates the initialization is successful.
351+
352+
After initialization, the controller starts executing the updateRun. The `UpdateRunStarted` condition indicates the execution has started.
353+
354+
After all clusters are updated, all after-stage tasks are completed, and thus all stages are finished, the `UpdateRunSucceeded` condition is set to `True`, indicating the updateRun has succeeded.
355+
356+
### Fields recorded in the updateRun status during initialization
357+
358+
During initialization, the controller records the following fields in the updateRun status:
359+
- `PolicySnapshotIndexUsed`: the index of the policy snapshot used for the updateRun, it should be the latest one.
360+
- `PolicyObservedClusterCount`: the number of clusters selected by the scheduling policy.
361+
- `StagedUpdateStrategySnapshot`: the snapshot of the updateRun strategy, which ensures any strategy changes will not affect executing updateRuns.
362+
363+
### Stages and clusters status
364+
365+
The `Stages Status` section displays the status of each stage and cluster. As shown in the strategy snapshot, the updateRun has three stages: `staging`, `canary`, and `production`. During initialization, the controller generates the rollout plan, classifies the scheduled clusters
366+
into these three stages and dumps the plan into the updateRun status. As the execution progresses, the controller updates the status of each stage and cluster. Take the `staging` stage as an example, `member1` is included in this stage. `ClusterUpdatingStarted` condition indicates the cluster is being updated and `ClusterUpdatingSucceeded` condition shows the cluster is updated successfully.
367+
368+
After all clusters are updated in a stage, the controller executes the specified after-stage tasks. Stage `staging` has two after-stage tasks: `Approval` and `TimedWait`. The `Approval` task requires the admin to manually approve a `ClusterApprovalRequest` generated by the controller. The name of the `ClusterApprovalRequest` is also included in the status, which is `example-run-staging`. `AfterStageTaskApprovalRequestCreated` condition indicates the approval request is created and `AfterStageTaskApprovalRequestApproved` condition indicates the approval request has been approved. The `TimedWait` task enforces a suspension of the rollout until the specified wait time has elapsed and in this case, the wait time is 1 minute. `AfterStageTaskWaitTimeElapsed` condition indicates the wait time has elapsed and the rollout can proceed to the next stage.
369+
370+
Each stage also has its own conditions. When a stage starts, the `Progressing` condition is set to `True`. When all the cluster updates complete, the `Progressing` condition is set to `False` with reason `StageUpdatingWaiting` as shown above. It means the stage is waiting for
371+
after-stage tasks to pass.
372+
And thus the `lastTransitionTime` of the `Progressing` condition also serves as the start time of the wait in case there's a `TimedWait` task. When all after-stage tasks pass, the `Succeeded` condition is set to `True`. Each stage status also has `Start Time` and `End Time` fields, making it easier to read.
373+
374+
There's also a `Deletion Stage Status` section, which displays the status of the deletion stage. The deletion stage is the last stage of the updateRun. It deletes resources from the unscheduled clusters. The status is pretty much the same as a normal update stage, except that there are no after-stage tasks.
375+
376+
Note that all these conditions have `lastTransitionTime` set to the time when the controller updates the status. It can help debug and check
377+
the progress of the updateRun.
378+
379+
## Relationship between ClusterStagedUpdateRun and ClusterResourcePlacement
380+
381+
A `ClusterStagedUpdateRun` serves as the trigger mechanism for rolling out a `ClusterResourcePlacement`. The key points of this relationship are:
382+
* The `ClusterResourcePlacement` remains in a scheduled state without being deployed until a corresponding `ClusterStagedUpdateRun` is created.
383+
* During rollout, the `ClusterResourcePlacement` status is continuously updated with detailed information from each target cluster.
384+
* While a `ClusterStagedUpdateRun` only indicates whether updates have started and completed for each member cluster (as described in [previous section](#understand-clusterstagedupdaterun-status)), the `ClusterResourcePlacement` provides comprehensive details including:
385+
* Success/failure of resource creation
386+
* Application of overrides
387+
* Specific error messages
388+
389+
When troubleshooting a stalled updateRun, examining the `ClusterResourcePlacement` status offers valuable diagnostic information that can help identify the root cause.
390+
For comprehensive troubleshooting steps, refer to the [troubleshooting guide](../../troubleshooting/updaterun.md).
391+
392+
## Concurrent updateRuns
393+
394+
Multiple concurrent `ClusterStagedUpdateRun`s can be created for the same `ClusterResourcePlacement`, allowing fleet administrators to pipeline the rollout of different resource versions. However, to maintain consistency across the fleet and prevent member clusters from running different resource versions simultaneously, we enforce a key constraint: all concurrent `ClusterStagedUpdateRun`s must use identical `ClusterStagedUpdateStrategy` settings.
395+
396+
This strategy consistency requirement is validated during the initialization phase of each updateRun. This validation ensures predictable rollout behavior and prevents configuration drift across your cluster fleet, even when multiple updates are in progress.
397+
123398
## Next Steps
124-
* Learn how to [rollout CRP resources with Staged Update Run](../../howtos/updaterun.md)
125-
* Learn how to [troubleshoot a Staged Update Run](../../troubleshooting/updaterun.md)
399+
* Learn how to [rollout and rollback CRP resources with Staged Update Run](../../howtos/updaterun.md)
400+
* Learn how to [troubleshoot a Staged Update Run](../../troubleshooting/updaterun.md)

0 commit comments

Comments
 (0)