|
| 1 | +# How-to Guide: Enabling Drift Detection in Fleet |
| 2 | + |
| 3 | +This guide provides an overview on how to set up Fleet's takeover experience, which allows |
| 4 | +developers and admins to choose what will happen when Fleet encounters a pre-existing resource. |
| 5 | +This occurs most often in the Fleet adoption scenario, where a cluster just joins into a fleet and |
| 6 | +the system finds out that the resources to place onto the new member cluster via the CRP API has |
| 7 | +already been running there. |
| 8 | + |
| 9 | +A concern commonly associated with this scenario is that the running (pre-existing) set of |
| 10 | +resources might have configuration differences from their equivalents on the hub cluster, |
| 11 | +for example: On the hub cluster one might have a namespace `work` where it hosts a deployment |
| 12 | +`web-server` that runs the image `rpd-stars:latest`; while on the member cluster in the same |
| 13 | +namespace lives a deployment of the same name but with the image `umbrella-biolab:latest`. |
| 14 | +If Fleet applies the resource template from the hub cluster, unexpected service interruptions |
| 15 | +might occur. |
| 16 | + |
| 17 | +To address this concern, Fleet also introduces a new field, `whenToTakeOver`, in the apply |
| 18 | +strategy. Two options are available: |
| 19 | + |
| 20 | +* `Always`: this is the default option 😑. With this setting, Fleet will take over a |
| 21 | +pre-existing resource as soon as it encounters them. Fleet will apply the corresponding |
| 22 | +resource template from the hub cluster, and any value differences in the managed fields |
| 23 | +will be overwritten. This is consistent with the behavior before the new takeover experience is |
| 24 | +added. |
| 25 | +* `IfNoDiff`: this is the new option ✨ provided by the takeover mechanism. With this setting, |
| 26 | +Fleet will check for configuration differences when it finds a pre-existing resource and |
| 27 | +will only take over the resource (apply the resource template) if no configuration |
| 28 | +differences are found. Consider using this option for a safer adoption journey. |
| 29 | +* `Never`: this is another new option ✨ provided by the takeover mechanism. With this setting, |
| 30 | +Fleet will ignore pre-existing resources and no apply op will be performed. This will be considered |
| 31 | +as an apply error. Use this option if you would like to check for the presence of pre-existing |
| 32 | +resources without taking any action. |
| 33 | + |
| 34 | +> Before you begin |
| 35 | +> |
| 36 | +> The new takeover experience is currently in preview. |
| 37 | +> |
| 38 | +> Note that the APIs for the new experience are only available in the Fleet v1beta1 API, not the v1 API. If you do not see the new APIs in command outputs, verify that you are explicitly requesting the v1beta1 API objects, as opposed to the v1 API objects (the default). |
| 39 | +
|
| 40 | +## How Fleet can be used to safely take over pre-existing resources |
| 41 | + |
| 42 | +The steps below explain how the takeover experience functions. The code assumes that you have |
| 43 | +a fleet of two clusters, `member-1` and `member-2`: |
| 44 | + |
| 45 | +* Switch to the second member cluster, and create a namespace, `work-2`, with labels: |
| 46 | + |
| 47 | + ```sh |
| 48 | + kubectl config use-context member-2-admin |
| 49 | + kubectl create ns work-2 |
| 50 | + kubectl label ns work-2 app=work-2 |
| 51 | + kubectl label ns work-2 owner=wesker |
| 52 | + ``` |
| 53 | + |
| 54 | +* Switch to the hub cluster, and create the same namespace, but with a slightly different set of labels: |
| 55 | + |
| 56 | + ```sh |
| 57 | + kubectl config use-context hub-admin |
| 58 | + kubectl create ns work-2 |
| 59 | + kubectl label ns work-2 app=work-2 |
| 60 | + kubectl label ns work-2 owner=redfield |
| 61 | + ``` |
| 62 | + |
| 63 | +* Create a CRP object that places the namespace to all member clusters: |
| 64 | + |
| 65 | + ```sh |
| 66 | + cat <<EOF | kubectl apply -f - |
| 67 | + # The YAML configuration of the CRP object. |
| 68 | + apiVersion: placement.kubernetes-fleet.io/v1beta1 |
| 69 | + kind: ClusterResourcePlacement |
| 70 | + metadata: |
| 71 | + name: work-2 |
| 72 | + spec: |
| 73 | + resourceSelectors: |
| 74 | + - group: "" |
| 75 | + kind: Namespace |
| 76 | + version: v1 |
| 77 | + # Select all namespaces with the label app=work. |
| 78 | + labelSelector: |
| 79 | + matchLabels: |
| 80 | + app: work-2 |
| 81 | + policy: |
| 82 | + placementType: PickAll |
| 83 | + strategy: |
| 84 | + # For simplicity reasons, the CRP is configured to roll out changes to |
| 85 | + # all member clusters at once. This is not a setup recommended for production |
| 86 | + # use. |
| 87 | + type: RollingUpdate |
| 88 | + rollingUpdate: |
| 89 | + maxUnavailable: 100% |
| 90 | + unavailablePeriodSeconds: 1 |
| 91 | + applyStrategy: |
| 92 | + whenToTakeOver: Never |
| 93 | + EOF |
| 94 | + ``` |
| 95 | +
|
| 96 | +* Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see a failure there that complains about an apply error: |
| 97 | +
|
| 98 | + ```sh |
| 99 | + kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].conditions[?(@.type=="Applied")]}' | jq |
| 100 | + # The command above uses JSON paths to query the drift details directly and |
| 101 | + # uses the jq utility to pretty print the output JSON. |
| 102 | + # |
| 103 | + # jq might not be available in your environment. You may have to install it |
| 104 | + # separately, or omit it from the command. |
| 105 | + # |
| 106 | + # If the output is empty, the status might have not been populated properly |
| 107 | + # yet. You can switch the output type from jsonpath to yaml to see the full |
| 108 | + # object. |
| 109 | + ``` |
| 110 | +
|
| 111 | + The output should look like this: |
| 112 | +
|
| 113 | + ```json |
| 114 | + { |
| 115 | + "lastTransitionTime": "...", |
| 116 | + "message": "...", |
| 117 | + "observedGeneration": ..., |
| 118 | + "reason": "NotAllWorkHaveBeenApplied", |
| 119 | + "status": "False", |
| 120 | + "type": "Applied" |
| 121 | + } |
| 122 | + ``` |
| 123 | +
|
| 124 | +* You can take a look at the `failedPlacements` field in the placement status for error details: |
| 125 | +
|
| 126 | + ```sh |
| 127 | + kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].failedPlacements}' | jq |
| 128 | + # The command above uses JSON paths to query the drift details directly and |
| 129 | + # uses the jq utility to pretty print the output JSON. |
| 130 | + # |
| 131 | + # jq might not be available in your environment. You may have to install it |
| 132 | + # separately, or omit it from the command. |
| 133 | + # |
| 134 | + # If the output is empty, the status might have not been populated properly |
| 135 | + # yet. You can switch the output type from jsonpath to yaml to see the full |
| 136 | + # object. |
| 137 | + ``` |
| 138 | +
|
| 139 | + The output should look like this: |
| 140 | +
|
| 141 | + ```json |
| 142 | + [ |
| 143 | + { |
| 144 | + "condition": { |
| 145 | + "lastTransitionTime": "...", |
| 146 | + "message": "Failed to applied the manifest (error: no ownership of the object in the member cluster; takeover is needed)", |
| 147 | + "reason": "NotTakenOver", |
| 148 | + "status": "False", |
| 149 | + "type": "Applied" |
| 150 | + }, |
| 151 | + "kind": "Namespace", |
| 152 | + "name": "work-2", |
| 153 | + "version": "v1" |
| 154 | + } |
| 155 | + ] |
| 156 | + ``` |
| 157 | +
|
| 158 | + Fleet finds out that the namespace `work-2` already exists on the member cluster, and |
| 159 | + it is not owned by Fleet; since the takeover policy is set to `Never`, Fleet will not assume |
| 160 | + ownership of the namespace; no apply will be performed and an apply error will be raised |
| 161 | + instead. |
| 162 | +
|
| 163 | +* Next, update the CRP object and set the `whenToTakeOver` field to `IfNoDiff`: |
| 164 | +
|
| 165 | + ```sh |
| 166 | + cat <<EOF | kubectl apply -f - |
| 167 | + # The YAML configuration of the CRP object. |
| 168 | + apiVersion: placement.kubernetes-fleet.io/v1beta1 |
| 169 | + kind: ClusterResourcePlacement |
| 170 | + metadata: |
| 171 | + name: work-2 |
| 172 | + spec: |
| 173 | + resourceSelectors: |
| 174 | + - group: "" |
| 175 | + kind: Namespace |
| 176 | + version: v1 |
| 177 | + # Select all namespaces with the label app=work. |
| 178 | + labelSelector: |
| 179 | + matchLabels: |
| 180 | + app: work-2 |
| 181 | + policy: |
| 182 | + placementType: PickAll |
| 183 | + strategy: |
| 184 | + # For simplicity reasons, the CRP is configured to roll out changes to |
| 185 | + # all member clusters at once. This is not a setup recommended for production |
| 186 | + # use. |
| 187 | + type: RollingUpdate |
| 188 | + rollingUpdate: |
| 189 | + maxUnavailable: 100% |
| 190 | + unavailablePeriodSeconds: 1 |
| 191 | + applyStrategy: |
| 192 | + whenToTakeOver: IfNoDiff |
| 193 | + EOF |
| 194 | + ``` |
| 195 | +
|
| 196 | +* Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see the apply op still fails. |
| 197 | +
|
| 198 | + ```sh |
| 199 | + kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 |
| 200 | + ``` |
| 201 | +
|
| 202 | +* Verify the error details reported in the `failedPlacements` field for another time: |
| 203 | +
|
| 204 | + ```sh |
| 205 | + kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].failedPlacements}' | jq |
| 206 | + # The command above uses JSON paths to query the drift details directly and |
| 207 | + # uses the jq utility to pretty print the output JSON. |
| 208 | + # |
| 209 | + # jq might not be available in your environment. You may have to install it |
| 210 | + # separately, or omit it from the command. |
| 211 | + # |
| 212 | + # If the output is empty, the status might have not been populated properly |
| 213 | + # yet. You can switch the output type from jsonpath to yaml to see the full |
| 214 | + # object. |
| 215 | + ``` |
| 216 | +
|
| 217 | + The output have changed: |
| 218 | +
|
| 219 | + ```json |
| 220 | + [ |
| 221 | + { |
| 222 | + "condition": { |
| 223 | + "lastTransitionTime": "...", |
| 224 | + "message": "Failed to applied the manifest (error: cannot take over object: configuration differences are found between the manifest object and the corresponding object in the member cluster)", |
| 225 | + "reason": "FailedToTakeOver", |
| 226 | + "status": "False", |
| 227 | + "type": "Applied" |
| 228 | + }, |
| 229 | + "kind": "Namespace", |
| 230 | + "name": "work-2", |
| 231 | + "version": "v1" |
| 232 | + } |
| 233 | + ] |
| 234 | + ``` |
| 235 | +
|
| 236 | + Now, with the takeover policy set to `IfNoDiff`, Fleet can assume ownership of pre-existing |
| 237 | + resources; however, as a configuration difference has been found between the hub cluster |
| 238 | + and the member cluster, takeover is blocked. |
| 239 | +
|
| 240 | +* Similar to the drift detection mechanism, Fleet will report details about the found |
| 241 | +configuration differences as well: |
| 242 | +
|
| 243 | + ```sh |
| 244 | + kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].diffedPlacements}' | jq |
| 245 | + # The command above uses JSON paths to query the drift details directly and |
| 246 | + # uses the jq utility to pretty print the output JSON. |
| 247 | + # |
| 248 | + # jq might not be available in your environment. You may have to install it |
| 249 | + # separately, or omit it from the command. |
| 250 | + # |
| 251 | + # If the output is empty, the status might have not been populated properly |
| 252 | + # yet. You can switch the output type from jsonpath to yaml to see the full |
| 253 | + # object. |
| 254 | + ``` |
| 255 | +
|
| 256 | + ```json |
| 257 | + [ |
| 258 | + { |
| 259 | + "firstDiffedObservedTime": "...", |
| 260 | + "group": "", |
| 261 | + "version": "v1", |
| 262 | + "kind": "Namespace", |
| 263 | + "name": "work-2", |
| 264 | + "observationTime": "...", |
| 265 | + "observedDiffs": [ |
| 266 | + { |
| 267 | + "path": "/metadata/labels/owner", |
| 268 | + "valueInHub": "redfield", |
| 269 | + "valueInMember": "wesker" |
| 270 | + } |
| 271 | + ], |
| 272 | + "targetClusterObservedGeneration": 0 |
| 273 | + } |
| 274 | + ] |
| 275 | + ``` |
| 276 | +
|
| 277 | + Fleet will report the following information about a configuration difference: |
| 278 | +
|
| 279 | + * `group`, `kind`, `version` and `name`: the resource that has configuration differences. |
| 280 | + * `observationTime`: the timestamp where the current diff detail is collected. |
| 281 | + * `firstDiffedObservedTime`: the timestamp where the current diff is first observed. |
| 282 | + * `observedDiffs`: the diff details, specifically: |
| 283 | + * `path`: A JSON path (RFC 6901) that points to the diff'd field; |
| 284 | + * `valueInHub`: the value at the JSON path as seen from the hub cluster resource template |
| 285 | + (the desired state). If this value is absent, the field does not exist in the resource template. |
| 286 | + * `valueInMember`: the value at the JSON path as seen from the member cluster resource |
| 287 | + (the current state). If this value is absent, the field does not exist in the current state. |
| 288 | + * `targetClusterObservedGeneration`: the generation of the member cluster resource. |
| 289 | +
|
| 290 | +* To fix the configuration difference, consider one of the following options: |
| 291 | +
|
| 292 | + * Switch the `whenToTakeOver` setting back to `Always`, which will instruct Fleet to take over the resource right away and overwrite all configuration differences; or |
| 293 | + * Edit the diff'd field directly on the member cluster side, so that the value is consistent with that on the hub cluster; Fleet will periodically re-evaluate diffs and should take over the resource soon after. |
| 294 | + * Delete the resource from the member cluster. Fleet will then re-apply the resource template and re-create the resource. |
| 295 | +
|
| 296 | + Here the guide will take the first option available, setting the `whenToTakeOver` field to |
| 297 | + `Always`: |
| 298 | +
|
| 299 | + ```sh |
| 300 | + cat <<EOF | kubectl apply -f - |
| 301 | + # The YAML configuration of the CRP object. |
| 302 | + apiVersion: placement.kubernetes-fleet.io/v1beta1 |
| 303 | + kind: ClusterResourcePlacement |
| 304 | + metadata: |
| 305 | + name: work-2 |
| 306 | + spec: |
| 307 | + resourceSelectors: |
| 308 | + - group: "" |
| 309 | + kind: Namespace |
| 310 | + version: v1 |
| 311 | + # Select all namespaces with the label app=work. |
| 312 | + labelSelector: |
| 313 | + matchLabels: |
| 314 | + app: work-2 |
| 315 | + policy: |
| 316 | + placementType: PickAll |
| 317 | + strategy: |
| 318 | + # For simplicity reasons, the CRP is configured to roll out changes to |
| 319 | + # all member clusters at once. This is not a setup recommended for production |
| 320 | + # use. |
| 321 | + type: RollingUpdate |
| 322 | + rollingUpdate: |
| 323 | + maxUnavailable: 100% |
| 324 | + unavailablePeriodSeconds: 1 |
| 325 | + applyStrategy: |
| 326 | + whenToTakeOver: Always |
| 327 | + EOF |
| 328 | + ``` |
| 329 | +
|
| 330 | +* Check the CRP status; in a few seconds, Fleet will report that all objects have been applied. |
| 331 | +
|
| 332 | + ```sh |
| 333 | + kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 |
| 334 | + ``` |
| 335 | +
|
| 336 | + If you switch to the member cluster `member-2` now, you should see that the object looks |
| 337 | + exactly the same as the resource template kept on the hub cluster; the owner label has been |
| 338 | + over-written. |
| 339 | +
|
| 340 | +> Important |
| 341 | +> |
| 342 | +> When Fleet fails to take over an object, the pre-existing resource will not be put under Fleet's management: any change one makes on the hub cluster side will have no effect on the pre-existing resource. If you choose to delete |
| 343 | +the resource template, or remove the CRP object, Fleet will not attempt to delete the pre-existing resource. |
| 344 | +
|
| 345 | +## Takeover and comparison options |
| 346 | +
|
| 347 | +Fleet provides a `comparisonOptions` setting that allows you to fine-tune how Fleet calculate |
| 348 | +configuration differences between a resource template created on the hub cluster and the |
| 349 | +corresponding pre-existing resource on a member cluster. |
| 350 | +
|
| 351 | +> Note |
| 352 | +> |
| 353 | +> The `comparisonOptions` setting controls also how Fleet detect drifts. See the how-to guide on drift detection for more information. |
| 354 | +
|
| 355 | +If `partialComparison` is used, Fleet will only report configuration differences in the managed |
| 356 | +fields, i.e., fields that are explicitly specified in the resource template; the presence of additional |
| 357 | +fields on the member cluster side will not stop Fleet from taking |
| 358 | +over the pre-existing resource; on the contrary, with `fullComparison`, Fleet will only take over |
| 359 | +a pre-existing resource if it looks exactly the same as its hub cluster counterpart. |
| 360 | +
|
| 361 | +Below is the synergy table that summarizes the combos and their respective effects: |
| 362 | +
|
| 363 | +| `whenToTakeOver` setting | `comparisonOption` setting | Configuration difference scenario | Outcome |
| 364 | +| -------- | ------- | -------- | ------- | |
| 365 | +| `IfNoDiff` | `partialComparison` | There exists a value difference in a managed field between a pre-existing resource on a member cluster and the hub cluster resource template. | Fleet will report an apply error in the status, plus the diff details. | |
| 366 | +| `IfNoDiff` | `partialComparison` | The pre-existing resource has a field that is absent on the hub cluster resource template. | Fleet will take over the resource; the configuration difference in the unmanaged field will be left untouched. | |
| 367 | +| `IfNoDiff` | `fullComparison` | **Difference has been found on a field, managed or not.** | Fleet will report an apply error in the status, plus the diff details. | |
| 368 | +| `Always` | any option | Difference has been found on a field, managed or not. | Fleet will take over the resource; configuration differences in unmanaged fields will be left untouched. | |
0 commit comments