Skip to content

Commit 01a17b3

Browse files
authored
docs: add the takeover how-to doc (#967)
1 parent 14b1ffa commit 01a17b3

1 file changed

Lines changed: 368 additions & 0 deletions

File tree

docs/howtos/takeover.md

Lines changed: 368 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,368 @@
1+
# How-to Guide: Enabling Drift Detection in Fleet
2+
3+
This guide provides an overview on how to set up Fleet's takeover experience, which allows
4+
developers and admins to choose what will happen when Fleet encounters a pre-existing resource.
5+
This occurs most often in the Fleet adoption scenario, where a cluster just joins into a fleet and
6+
the system finds out that the resources to place onto the new member cluster via the CRP API has
7+
already been running there.
8+
9+
A concern commonly associated with this scenario is that the running (pre-existing) set of
10+
resources might have configuration differences from their equivalents on the hub cluster,
11+
for example: On the hub cluster one might have a namespace `work` where it hosts a deployment
12+
`web-server` that runs the image `rpd-stars:latest`; while on the member cluster in the same
13+
namespace lives a deployment of the same name but with the image `umbrella-biolab:latest`.
14+
If Fleet applies the resource template from the hub cluster, unexpected service interruptions
15+
might occur.
16+
17+
To address this concern, Fleet also introduces a new field, `whenToTakeOver`, in the apply
18+
strategy. Two options are available:
19+
20+
* `Always`: this is the default option 😑. With this setting, Fleet will take over a
21+
pre-existing resource as soon as it encounters them. Fleet will apply the corresponding
22+
resource template from the hub cluster, and any value differences in the managed fields
23+
will be overwritten. This is consistent with the behavior before the new takeover experience is
24+
added.
25+
* `IfNoDiff`: this is the new option ✨ provided by the takeover mechanism. With this setting,
26+
Fleet will check for configuration differences when it finds a pre-existing resource and
27+
will only take over the resource (apply the resource template) if no configuration
28+
differences are found. Consider using this option for a safer adoption journey.
29+
* `Never`: this is another new option ✨ provided by the takeover mechanism. With this setting,
30+
Fleet will ignore pre-existing resources and no apply op will be performed. This will be considered
31+
as an apply error. Use this option if you would like to check for the presence of pre-existing
32+
resources without taking any action.
33+
34+
> Before you begin
35+
>
36+
> The new takeover experience is currently in preview.
37+
>
38+
> Note that the APIs for the new experience are only available in the Fleet v1beta1 API, not the v1 API. If you do not see the new APIs in command outputs, verify that you are explicitly requesting the v1beta1 API objects, as opposed to the v1 API objects (the default).
39+
40+
## How Fleet can be used to safely take over pre-existing resources
41+
42+
The steps below explain how the takeover experience functions. The code assumes that you have
43+
a fleet of two clusters, `member-1` and `member-2`:
44+
45+
* Switch to the second member cluster, and create a namespace, `work-2`, with labels:
46+
47+
```sh
48+
kubectl config use-context member-2-admin
49+
kubectl create ns work-2
50+
kubectl label ns work-2 app=work-2
51+
kubectl label ns work-2 owner=wesker
52+
```
53+
54+
* Switch to the hub cluster, and create the same namespace, but with a slightly different set of labels:
55+
56+
```sh
57+
kubectl config use-context hub-admin
58+
kubectl create ns work-2
59+
kubectl label ns work-2 app=work-2
60+
kubectl label ns work-2 owner=redfield
61+
```
62+
63+
* Create a CRP object that places the namespace to all member clusters:
64+
65+
```sh
66+
cat <<EOF | kubectl apply -f -
67+
# The YAML configuration of the CRP object.
68+
apiVersion: placement.kubernetes-fleet.io/v1beta1
69+
kind: ClusterResourcePlacement
70+
metadata:
71+
name: work-2
72+
spec:
73+
resourceSelectors:
74+
- group: ""
75+
kind: Namespace
76+
version: v1
77+
# Select all namespaces with the label app=work.
78+
labelSelector:
79+
matchLabels:
80+
app: work-2
81+
policy:
82+
placementType: PickAll
83+
strategy:
84+
# For simplicity reasons, the CRP is configured to roll out changes to
85+
# all member clusters at once. This is not a setup recommended for production
86+
# use.
87+
type: RollingUpdate
88+
rollingUpdate:
89+
maxUnavailable: 100%
90+
unavailablePeriodSeconds: 1
91+
applyStrategy:
92+
whenToTakeOver: Never
93+
EOF
94+
```
95+
96+
* Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see a failure there that complains about an apply error:
97+
98+
```sh
99+
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].conditions[?(@.type=="Applied")]}' | jq
100+
# The command above uses JSON paths to query the drift details directly and
101+
# uses the jq utility to pretty print the output JSON.
102+
#
103+
# jq might not be available in your environment. You may have to install it
104+
# separately, or omit it from the command.
105+
#
106+
# If the output is empty, the status might have not been populated properly
107+
# yet. You can switch the output type from jsonpath to yaml to see the full
108+
# object.
109+
```
110+
111+
The output should look like this:
112+
113+
```json
114+
{
115+
"lastTransitionTime": "...",
116+
"message": "...",
117+
"observedGeneration": ...,
118+
"reason": "NotAllWorkHaveBeenApplied",
119+
"status": "False",
120+
"type": "Applied"
121+
}
122+
```
123+
124+
* You can take a look at the `failedPlacements` field in the placement status for error details:
125+
126+
```sh
127+
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].failedPlacements}' | jq
128+
# The command above uses JSON paths to query the drift details directly and
129+
# uses the jq utility to pretty print the output JSON.
130+
#
131+
# jq might not be available in your environment. You may have to install it
132+
# separately, or omit it from the command.
133+
#
134+
# If the output is empty, the status might have not been populated properly
135+
# yet. You can switch the output type from jsonpath to yaml to see the full
136+
# object.
137+
```
138+
139+
The output should look like this:
140+
141+
```json
142+
[
143+
{
144+
"condition": {
145+
"lastTransitionTime": "...",
146+
"message": "Failed to applied the manifest (error: no ownership of the object in the member cluster; takeover is needed)",
147+
"reason": "NotTakenOver",
148+
"status": "False",
149+
"type": "Applied"
150+
},
151+
"kind": "Namespace",
152+
"name": "work-2",
153+
"version": "v1"
154+
}
155+
]
156+
```
157+
158+
Fleet finds out that the namespace `work-2` already exists on the member cluster, and
159+
it is not owned by Fleet; since the takeover policy is set to `Never`, Fleet will not assume
160+
ownership of the namespace; no apply will be performed and an apply error will be raised
161+
instead.
162+
163+
* Next, update the CRP object and set the `whenToTakeOver` field to `IfNoDiff`:
164+
165+
```sh
166+
cat <<EOF | kubectl apply -f -
167+
# The YAML configuration of the CRP object.
168+
apiVersion: placement.kubernetes-fleet.io/v1beta1
169+
kind: ClusterResourcePlacement
170+
metadata:
171+
name: work-2
172+
spec:
173+
resourceSelectors:
174+
- group: ""
175+
kind: Namespace
176+
version: v1
177+
# Select all namespaces with the label app=work.
178+
labelSelector:
179+
matchLabels:
180+
app: work-2
181+
policy:
182+
placementType: PickAll
183+
strategy:
184+
# For simplicity reasons, the CRP is configured to roll out changes to
185+
# all member clusters at once. This is not a setup recommended for production
186+
# use.
187+
type: RollingUpdate
188+
rollingUpdate:
189+
maxUnavailable: 100%
190+
unavailablePeriodSeconds: 1
191+
applyStrategy:
192+
whenToTakeOver: IfNoDiff
193+
EOF
194+
```
195+
196+
* Give Fleet a few seconds to handle the placement. Check the status of the CRP object; you should see the apply op still fails.
197+
198+
```sh
199+
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2
200+
```
201+
202+
* Verify the error details reported in the `failedPlacements` field for another time:
203+
204+
```sh
205+
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].failedPlacements}' | jq
206+
# The command above uses JSON paths to query the drift details directly and
207+
# uses the jq utility to pretty print the output JSON.
208+
#
209+
# jq might not be available in your environment. You may have to install it
210+
# separately, or omit it from the command.
211+
#
212+
# If the output is empty, the status might have not been populated properly
213+
# yet. You can switch the output type from jsonpath to yaml to see the full
214+
# object.
215+
```
216+
217+
The output have changed:
218+
219+
```json
220+
[
221+
{
222+
"condition": {
223+
"lastTransitionTime": "...",
224+
"message": "Failed to applied the manifest (error: cannot take over object: configuration differences are found between the manifest object and the corresponding object in the member cluster)",
225+
"reason": "FailedToTakeOver",
226+
"status": "False",
227+
"type": "Applied"
228+
},
229+
"kind": "Namespace",
230+
"name": "work-2",
231+
"version": "v1"
232+
}
233+
]
234+
```
235+
236+
Now, with the takeover policy set to `IfNoDiff`, Fleet can assume ownership of pre-existing
237+
resources; however, as a configuration difference has been found between the hub cluster
238+
and the member cluster, takeover is blocked.
239+
240+
* Similar to the drift detection mechanism, Fleet will report details about the found
241+
configuration differences as well:
242+
243+
```sh
244+
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2 -o jsonpath='{.status.placementStatuses[?(@.clusterName=="member-2")].diffedPlacements}' | jq
245+
# The command above uses JSON paths to query the drift details directly and
246+
# uses the jq utility to pretty print the output JSON.
247+
#
248+
# jq might not be available in your environment. You may have to install it
249+
# separately, or omit it from the command.
250+
#
251+
# If the output is empty, the status might have not been populated properly
252+
# yet. You can switch the output type from jsonpath to yaml to see the full
253+
# object.
254+
```
255+
256+
```json
257+
[
258+
{
259+
"firstDiffedObservedTime": "...",
260+
"group": "",
261+
"version": "v1",
262+
"kind": "Namespace",
263+
"name": "work-2",
264+
"observationTime": "...",
265+
"observedDiffs": [
266+
{
267+
"path": "/metadata/labels/owner",
268+
"valueInHub": "redfield",
269+
"valueInMember": "wesker"
270+
}
271+
],
272+
"targetClusterObservedGeneration": 0
273+
}
274+
]
275+
```
276+
277+
Fleet will report the following information about a configuration difference:
278+
279+
* `group`, `kind`, `version` and `name`: the resource that has configuration differences.
280+
* `observationTime`: the timestamp where the current diff detail is collected.
281+
* `firstDiffedObservedTime`: the timestamp where the current diff is first observed.
282+
* `observedDiffs`: the diff details, specifically:
283+
* `path`: A JSON path (RFC 6901) that points to the diff'd field;
284+
* `valueInHub`: the value at the JSON path as seen from the hub cluster resource template
285+
(the desired state). If this value is absent, the field does not exist in the resource template.
286+
* `valueInMember`: the value at the JSON path as seen from the member cluster resource
287+
(the current state). If this value is absent, the field does not exist in the current state.
288+
* `targetClusterObservedGeneration`: the generation of the member cluster resource.
289+
290+
* To fix the configuration difference, consider one of the following options:
291+
292+
* Switch the `whenToTakeOver` setting back to `Always`, which will instruct Fleet to take over the resource right away and overwrite all configuration differences; or
293+
* Edit the diff'd field directly on the member cluster side, so that the value is consistent with that on the hub cluster; Fleet will periodically re-evaluate diffs and should take over the resource soon after.
294+
* Delete the resource from the member cluster. Fleet will then re-apply the resource template and re-create the resource.
295+
296+
Here the guide will take the first option available, setting the `whenToTakeOver` field to
297+
`Always`:
298+
299+
```sh
300+
cat <<EOF | kubectl apply -f -
301+
# The YAML configuration of the CRP object.
302+
apiVersion: placement.kubernetes-fleet.io/v1beta1
303+
kind: ClusterResourcePlacement
304+
metadata:
305+
name: work-2
306+
spec:
307+
resourceSelectors:
308+
- group: ""
309+
kind: Namespace
310+
version: v1
311+
# Select all namespaces with the label app=work.
312+
labelSelector:
313+
matchLabels:
314+
app: work-2
315+
policy:
316+
placementType: PickAll
317+
strategy:
318+
# For simplicity reasons, the CRP is configured to roll out changes to
319+
# all member clusters at once. This is not a setup recommended for production
320+
# use.
321+
type: RollingUpdate
322+
rollingUpdate:
323+
maxUnavailable: 100%
324+
unavailablePeriodSeconds: 1
325+
applyStrategy:
326+
whenToTakeOver: Always
327+
EOF
328+
```
329+
330+
* Check the CRP status; in a few seconds, Fleet will report that all objects have been applied.
331+
332+
```sh
333+
kubectl get clusterresourceplacement.v1beta1.placement.kubernetes-fleet.io work-2
334+
```
335+
336+
If you switch to the member cluster `member-2` now, you should see that the object looks
337+
exactly the same as the resource template kept on the hub cluster; the owner label has been
338+
over-written.
339+
340+
> Important
341+
>
342+
> When Fleet fails to take over an object, the pre-existing resource will not be put under Fleet's management: any change one makes on the hub cluster side will have no effect on the pre-existing resource. If you choose to delete
343+
the resource template, or remove the CRP object, Fleet will not attempt to delete the pre-existing resource.
344+
345+
## Takeover and comparison options
346+
347+
Fleet provides a `comparisonOptions` setting that allows you to fine-tune how Fleet calculate
348+
configuration differences between a resource template created on the hub cluster and the
349+
corresponding pre-existing resource on a member cluster.
350+
351+
> Note
352+
>
353+
> The `comparisonOptions` setting controls also how Fleet detect drifts. See the how-to guide on drift detection for more information.
354+
355+
If `partialComparison` is used, Fleet will only report configuration differences in the managed
356+
fields, i.e., fields that are explicitly specified in the resource template; the presence of additional
357+
fields on the member cluster side will not stop Fleet from taking
358+
over the pre-existing resource; on the contrary, with `fullComparison`, Fleet will only take over
359+
a pre-existing resource if it looks exactly the same as its hub cluster counterpart.
360+
361+
Below is the synergy table that summarizes the combos and their respective effects:
362+
363+
| `whenToTakeOver` setting | `comparisonOption` setting | Configuration difference scenario | Outcome
364+
| -------- | ------- | -------- | ------- |
365+
| `IfNoDiff` | `partialComparison` | There exists a value difference in a managed field between a pre-existing resource on a member cluster and the hub cluster resource template. | Fleet will report an apply error in the status, plus the diff details. |
366+
| `IfNoDiff` | `partialComparison` | The pre-existing resource has a field that is absent on the hub cluster resource template. | Fleet will take over the resource; the configuration difference in the unmanaged field will be left untouched. |
367+
| `IfNoDiff` | `fullComparison` | **Difference has been found on a field, managed or not.** | Fleet will report an apply error in the status, plus the diff details. |
368+
| `Always` | any option | Difference has been found on a field, managed or not. | Fleet will take over the resource; configuration differences in unmanaged fields will be left untouched. |

0 commit comments

Comments
 (0)