You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/concepts/StagedUpdateRun/README.md
+277-2Lines changed: 277 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -120,6 +120,281 @@ An updateRun executes in two phases. During the initialization phase, the contro
120
120
121
121
In the execution phase, the controller processes each stage sequentially, updates clusters within each stage one at a time, and enforces completion of after-stage tasks. It then executes a final delete stage to clean up resources from unscheduled clusters. The updateRun succeeds when all stages complete successfully. However, it will fail if any execution-affecting events occur, for example, the target ClusterResourcePlacement being deleted, and member cluster changes triggering new scheduling. In such cases, error details are recorded in the updateRun status. Remember that once initialized, an updateRun operates on its strategy snapshot, making it immune to subsequent strategy modifications.
122
122
123
+
## Understand ClusterStagedUpdateRun status
124
+
125
+
Let's take a deep look into the status of a completed `ClusterStagedUpdateRun`. It displays details about the rollout status for every clusters and stages.
At the very top, `Status.Conditions` gives the overall status of the updateRun. The execution an update run consists of two phases: initialization and execution.
348
+
During initialization, the controller performs a one-time setup where it captures a snapshot of the updateRun strategy, collects scheduled and to-be-deleted `ClusterResourceBindings`,
349
+
generates the cluster update sequence, and records all this information in the updateRun status.
350
+
The `UpdateRunInitializedSuccessfully` condition indicates the initialization is successful.
351
+
352
+
After initialization, the controller starts executing the updateRun. The `UpdateRunStarted` condition indicates the execution has started.
353
+
354
+
After all clusters are updated, all after-stage tasks are completed, and thus all stages are finished, the `UpdateRunSucceeded` condition is set to `True`, indicating the updateRun has succeeded.
355
+
356
+
### Fields recorded in the updateRun status during initialization
357
+
358
+
During initialization, the controller records the following fields in the updateRun status:
359
+
- `PolicySnapshotIndexUsed`: the index of the policy snapshot used for the updateRun, it should be the latest one.
360
+
- `PolicyObservedClusterCount`: the number of clusters selected by the scheduling policy.
361
+
- `StagedUpdateStrategySnapshot`: the snapshot of the updateRun strategy, which ensures any strategy changes will not affect executing updateRuns.
362
+
363
+
### Stages and clusters status
364
+
365
+
The `Stages Status` section displays the status of each stage and cluster. As shown in the strategy snapshot, the updateRun has three stages: `staging`, `canary`, and `production`. During initialization, the controller generates the rollout plan, classifies the scheduled clusters
366
+
into these three stages and dumps the plan into the updateRun status. As the execution progresses, the controller updates the status of each stage and cluster. Take the `staging` stage as an example, `member1` is included in this stage. `ClusterUpdatingStarted` condition indicates the cluster is being updated and `ClusterUpdatingSucceeded` condition shows the cluster is updated successfully.
367
+
368
+
After all clusters are updated in a stage, the controller executes the specified after-stage tasks. Stage `staging` has two after-stage tasks: `Approval`and `TimedWait`. The `Approval` task requires the admin to manually approve a `ClusterApprovalRequest` generated by the controller. The name of the `ClusterApprovalRequest` is also included in the status, which is `example-run-staging`. `AfterStageTaskApprovalRequestCreated` condition indicates the approval request is created and `AfterStageTaskApprovalRequestApproved` condition indicates the approval request has been approved. The `TimedWait` task enforces a suspension of the rollout until the specified wait time has elapsed and in this case, the wait time is 1 minute. `AfterStageTaskWaitTimeElapsed` condition indicates the wait time has elapsed and the rollout can proceed to the next stage.
369
+
370
+
Each stage also has its own conditions. When a stage starts, the `Progressing` condition is set to `True`. When all the cluster updates complete, the `Progressing` condition is set to `False` with reason `StageUpdatingWaiting` as shown above. It means the stage is waiting for
371
+
after-stage tasks to pass.
372
+
And thus the `lastTransitionTime` of the `Progressing` condition also serves as the start time of the wait in case there's a `TimedWait` task. When all after-stage tasks pass, the `Succeeded` condition is set to `True`. Each stage status also has `Start Time` and `End Time` fields, making it easier to read.
373
+
374
+
There's also a `Deletion Stage Status` section, which displays the status of the deletion stage. The deletion stage is the last stage of the updateRun. It deletes resources from the unscheduled clusters. The status is pretty much the same as a normal update stage, except that there are no after-stage tasks.
375
+
376
+
Note that all these conditions have `lastTransitionTime` set to the time when the controller updates the status. It can help debug and check
377
+
the progress of the updateRun.
378
+
379
+
## Relationship between ClusterStagedUpdateRun and ClusterResourcePlacement
380
+
381
+
A `ClusterStagedUpdateRun` serves as the trigger mechanism for rolling out a `ClusterResourcePlacement`. The key points of this relationship are:
382
+
* The `ClusterResourcePlacement` remains in a scheduled state without being deployed until a corresponding `ClusterStagedUpdateRun` is created.
383
+
* During rollout, the `ClusterResourcePlacement` status is continuously updated with detailed information from each target cluster.
384
+
* While a `ClusterStagedUpdateRun` only indicates whether updates have started and completed for each member cluster (as described in [previous section](#understand-clusterstagedupdaterun-status)), the `ClusterResourcePlacement` provides comprehensive details including:
385
+
* Success/failure of resource creation
386
+
* Application of overrides
387
+
* Specific error messages
388
+
389
+
When troubleshooting a stalled updateRun, examining the `ClusterResourcePlacement` status offers valuable diagnostic information that can help identify the root cause.
390
+
For comprehensive troubleshooting steps, refer to the [troubleshooting guide](../../troubleshooting/updaterun.md).
391
+
392
+
## Concurrent updateRuns
393
+
394
+
Multiple concurrent `ClusterStagedUpdateRun`s can be created for the same `ClusterResourcePlacement`, allowing fleet administrators to pipeline the rollout of different resource versions. However, to maintain consistency across the fleet and prevent member clusters from running different resource versions simultaneously, we enforce a key constraint: all concurrent `ClusterStagedUpdateRun`s must use identical `ClusterStagedUpdateStrategy` settings.
395
+
396
+
This strategy consistency requirement is validated during the initialization phase of each updateRun. This validation ensures predictable rollout behavior and prevents configuration drift across your cluster fleet, even when multiple updates are in progress.
397
+
123
398
## Next Steps
124
-
* Learn how to [rollout CRP resources with Staged Update Run](../../howtos/updaterun.md)
125
-
* Learn how to [troubleshoot a Staged Update Run](../../troubleshooting/updaterun.md)
399
+
* Learn how to [rollout and rollback CRP resources with Staged Update Run](../../howtos/updaterun.md)
400
+
* Learn how to [troubleshoot a Staged Update Run](../../troubleshooting/updaterun.md)
0 commit comments