Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
110 changes: 52 additions & 58 deletions public/kubernetes-guides/advanced-guides/upgrading-kubernetes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -27,15 +27,27 @@ Kubernetes upgrades are non-disruptive from Talos, but Kubelet upgrades may caus

To trigger a Kubernetes upgrade, issue a command specifying the version of Kubernetes to upgrade to, such as:

`talosctl --nodes <controlplane node> upgrade-k8s --to ${k8s_release}`
<CodeBlock lang="sh">
{
`talosctl --nodes <control-plane-ip> upgrade-k8s --to <new-kubernetes-version>`
}
</CodeBlock>


Note that the `--nodes` parameter specifies the control plane node to send the API call to, but all members of the cluster will be upgraded.

To check what will be upgraded you can run `talosctl upgrade-k8s` with the `--dry-run` flag:

<CodeBlock lang="sh">
{`
$ talosctl --nodes <controlplane node> upgrade-k8s --to ${k8s_release} --dry-run
talosctl --nodes <control-plane-ip> upgrade-k8s --to ${k8s_release} --dry-run
`}
</CodeBlock>

You should get an output similar to this:

<CodeBlock lang="sh">
{`
WARNING: found resources which are going to be deprecated/migrated in the version ${k8s_release}
RESOURCE COUNT
validatingwebhookconfigurations.v1beta1.admissionregistration.k8s.io 4
Expand Down Expand Up @@ -71,30 +83,11 @@ updating manifests
> apply manifest ClusterRoleBinding system-bootstrap-approve-node-client-csr
> apply skipped in dry run
<snip>
`}
</CodeBlock>

To upgrade Kubernetes from `${k8s_prev_release}` to v`${k8s_release}` run:

<CodeBlock lang="sh">
{`
$ talosctl --nodes <controlplane node> upgrade-k8s --to ${k8s_release}
automatically detected the lowest Kubernetes version ${k8s_prev_release}
checking for resource APIs to be deprecated in version ${k8s_release}
discovered controlplane nodes ["172.20.0.2" "172.20.0.3" "172.20.0.4"]
discovered worker nodes ["172.20.0.5" "172.20.0.6"]
updating "kube-apiserver" to version "${k8s_release}"
> "172.20.0.2": starting update
> update kube-apiserver: ${k8s_prev_release} -> ${k8s_release}
> "172.20.0.2": machine configuration patched
> "172.20.0.2": waiting for API server state pod update
< "172.20.0.2": successfully updated
> "172.20.0.3": starting update
> update kube-apiserver: ${k8s_prev_release} -> ${k8s_release}
<snip>
`}
</CodeBlock>


This command runs in several phases:

1. Images for new Kubernetes components are pre-pulled to the nodes to minimize downtime and test for image availability.
Expand All @@ -111,7 +104,7 @@ This command runs in several phases:

If the command fails for any reason, it can be safely restarted to continue the upgrade process from the moment of the failure.

> Note: When using custom/overridden Kubernetes component images, use flags `--*-image` to override the default image names.
<Note> When using custom/overridden Kubernetes component images, use flags `--*-image` to override the default image names.</Note>

### Kubernetes manifest synchronization

Expand All @@ -122,7 +115,7 @@ If any services were deleted or disabled, the resources associated with them wil
For example if [kube-proxy](https://docs.siderolabs.com/talos/latest/reference/configuration/v1alpha1/config#proxy) was disabled in the machine configuration the resources associated with it would be deleted at this stage.
Pruning can be disabled by passing the `--manifests-no-prune` flag.

> Note: Pruning is supported from Talos and talosctl v1.13 onwards.
<Note>Pruning is supported from Talos and talosctl v1.13 onwards.</Note>

From Talos v1.13 onwards, all Kubernetes manifests are applied via [Kubernetes Server-Side Apply](https://kubernetes.io/docs/reference/using-api/server-side-apply/).
Talos forces ownership of all fields it applies, even if they have a different field manager.
Expand All @@ -131,7 +124,7 @@ If you wish to manage a resource previously applied by Talos, you need to take t
1. Remove the inline manifest or extra manifest entry from the machine configuration.
After this step you can run `talosctl upgrade-k8s --dry-run --to <in-cluster-k8s-version>`.
Resources affected will be marked for pruning in the output.
2. Remove the resource entries from the talos tracking inventory configmap data block:
2. Remove the resource entries from the talos tracking inventory configmap data block by running:

```bash
kubectl edit cm --namespace kube-system talos-bootstrap-manifests-inventory
Expand Down Expand Up @@ -164,16 +157,15 @@ In order to edit the control plane, you need a working `kubectl` config.
If you don't already have one, you can get one by running:

```bash
talosctl --nodes <controlplane node> kubeconfig
talosctl --nodes <control-plane-ip> kubeconfig
```

### API server

Patch machine configuration using `talosctl patch` command:

```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
talosctl -n <control-plane-ip> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/apiServer/image", "value": "registry.k8s.io/kube-apiserver:v${k8s_release}"}]'
```

The JSON patch might need to be adjusted if current machine configuration is missing `.cluster.apiServer.image` key.
Expand All @@ -183,7 +175,12 @@ Also the machine configuration can be edited manually with `talosctl -n <IP> ed
Capture the new version of `kube-apiserver` config with:

```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get apiserverconfig -o yaml
talosctl -n <control-plane-ip> get apiserverconfig -o yaml
```

You should get an output similar to this:

```bash
node: 172.20.0.2
metadata:
namespace: controlplane
Expand Down Expand Up @@ -217,16 +214,14 @@ In this example, the new version is `5`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):

```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
5
```

Check that the pod is running:

```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-apiserver-talos-default-controlplane-1 1/1 Running 0 16m
kubectl get pod -n kube-system -l k8s-app=kube-apiserver --field-selector spec.nodeName=talos-default-controlplane-1
```

Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Expand All @@ -236,16 +231,20 @@ Repeat this process for every control plane node, verifying that state got propa
Patch machine configuration using `talosctl patch` command:

```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "registry.k8s.io/kube-controller-manager:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
talosctl -n <control-plane-ip> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/controllerManager/image", "value": "registry.k8s.io/kube-controller-manager:v${k8s_release}"}]'
```

The JSON patch might need be adjusted if current machine configuration is missing `.cluster.controllerManager.image` key.

Capture new version of `kube-controller-manager` config with:

```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get controllermanagerconfig -o yaml
talosctl -n <control-plane-ip> get controllermanagerconfig -o yaml
```

You should see an output similar to this:

```yaml
node: 172.20.0.2
metadata:
namespace: controlplane
Expand All @@ -270,22 +269,19 @@ spec:
cpu: ""
memory: ""
limits: {}
```
```

In this example, new version is `3`.
Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):

```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
```

Check that the pod is running:

```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-controller-manager-talos-default-controlplane-1 1/1 Running 0 35m
kubectl get pod -n kube-system -l k8s-app=kube-controller-manager --field-selector spec.nodeName=talos-default-controlplane-1
```

Repeat this process for every control plane node, verifying that state propagated successfully between each node update.
Expand All @@ -295,16 +291,20 @@ Repeat this process for every control plane node, verifying that state propagate
Patch machine configuration using `talosctl patch` command:

```bash
$ talosctl -n <CONTROL_PLANE_IP_1> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "registry.k8s.io/kube-scheduler:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
talosctl -n <control-plane-ip> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/cluster/scheduler/image", "value": "registry.k8s.io/kube-scheduler:v${k8s_release}"}]'
```

JSON patch might need be adjusted if current machine configuration is missing `.cluster.scheduler.image` key.

Capture new version of `kube-scheduler` config with:

```bash
$ talosctl -n <CONTROL_PLANE_IP_1> get schedulerconfig -o yaml
talosctl -n <control-plane-ip> get schedulerconfig -o yaml
```

You should see an output similar to this:

```yaml
node: 172.20.0.2
metadata:
namespace: controlplane
Expand All @@ -328,21 +328,18 @@ spec:
limits: {}
config: {}
```

In this example, new version is `3`.

Wait for the new pod definition to propagate to the API server state (replace `talos-default-controlplane-1` with the node name):

```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
3
kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1 -o jsonpath='{.items[0].metadata.annotations.talos\.dev/config\-version}'
```

Check that the pod is running:

```bash
$ kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1
NAME READY STATUS RESTARTS AGE
kube-scheduler-talos-default-controlplane-1 1/1 Running 0 39m
kubectl get pod -n kube-system -l k8s-app=kube-scheduler --field-selector spec.nodeName=talos-default-controlplane-1
```

Repeat this process for every control plane node, verifying that state got propagated successfully between each node update.
Expand Down Expand Up @@ -397,7 +394,7 @@ kubectl edit daemonsets -n kube-system kube-proxy
Bootstrap manifests can be retrieved in a format which works for `kubectl` with the following command:

```bash
talosctl -n <controlplane IP> get manifests -o yaml | yq eval-all '.spec | .[] | splitDoc' - > manifests.yaml
talosctl -n <control-plane-ip> get manifests -o yaml | yq eval-all '.spec | .[] | splitDoc' - > manifests.yaml
```

Diff the manifests with the cluster:
Expand All @@ -412,21 +409,18 @@ Apply the manifests:
kubectl apply -f manifests.yaml
```

> Note: if some bootstrap resources were removed, they have to be removed from the cluster manually.
<Note>If some bootstrap resources were removed, they have to be removed from the cluster manually.</Note>

### kubelet

For every node, patch machine configuration with new kubelet version, wait for the kubelet to restart with new version:

```bash
$ talosctl -n <IP> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v${k8s_release}"}]'
patched mc at the node 172.20.0.2
talosctl -n <control-plane-ip> patch mc --mode=no-reboot -p '[{"op": "replace", "path": "/machine/kubelet/image", "value": "ghcr.io/siderolabs/kubelet:v${k8s_release}"}]'
```

Once `kubelet` restarts with the new configuration, confirm upgrade with `kubectl get nodes <name>`:

```bash
$ kubectl get nodes talos-default-controlplane-1
NAME STATUS ROLES AGE VERSION
talos-default-controlplane-1 Ready control-plane 123m v${k8s_release}
kubectl get nodes talos-default-controlplane-1
```
Loading
Loading