Skip to content

Commit 8519974

Browse files
committed
OSDOCS-17761: node replacement procedure updates
1 parent 1a5bb5f commit 8519974

8 files changed

Lines changed: 37 additions & 8 deletions

modules/nodes-add-new-etcd-member.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
[id="add-new-etcd-member_{context}"]
77
= Adding the new etcd member
88

9+
[role="_abstract"]
910
Finish adding the new control plane node by adding the new etcd member to the cluster.
1011

1112
.Procedure

modules/nodes-create-new-control-plane-node.adoc

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
[id="create-new-machine_{context}"]
77
= Creating the new control plane node
88

9+
[role="_abstract"]
910
Begin creating the new control plane node by creating a `BareMetalHost` object and node.
1011

1112
.Procedure
@@ -136,5 +137,6 @@ $ coreos-installer iso customize rhcos-live.86_64.iso \
136137
Replace `<device_path>` with the path to the target device on which the ISO will be generated.
137138

138139
. Boot the new control plane node with the customized {op-system} live ISO.
140+
The node will automatically reboot twice before the pending Certificate Signing Requests (CSRs) appear.
139141

140-
. Approve the Certificate Signing Requests (CSR) to join the new node to the cluster.
142+
. Approve the CSRs to join the new node to the cluster.

modules/nodes-delete-machine-unhealthy-etcd.adoc

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
[id="deleting-machine_{context}"]
77
= Deleting the machine of the unhealthy etcd member
88

9+
[role="_abstract"]
910
Finish removing the failed control plane node by deleting the machine of the unhealthy etcd member.
1011

1112
.Procedure
@@ -62,7 +63,7 @@ $ oc get machines -n openshift-machine-api -o wide
6263
NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE
6364
examplecluster-control-plane-0 Running 3h11m openshift-control-plane-0 baremetalhost:///openshift-machine-api/openshift-control-plane-0/da1ebe11-3ff2-41c5-b099-0aa41222964e externally provisioned
6465
examplecluster-control-plane-1 Running 3h11m openshift-control-plane-1 baremetalhost:///openshift-machine-api/openshift-control-plane-1/d9f9acbc-329c-475e-8d81-03b20280a3e1 externally provisioned
65-
examplecluster-control-plane-2 Running 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned
66+
examplecluster-control-plane-2 Failed 3h11m openshift-control-plane-2 baremetalhost:///openshift-machine-api/openshift-control-plane-2/3354bdac-61d8-410f-be5b-6a395b056135 externally provisioned
6667
examplecluster-compute-0 Running 165m openshift-compute-0 baremetalhost:///openshift-machine-api/openshift-compute-0/3d685b81-7410-4bb3-80ec-13a31858241f provisioned
6768
examplecluster-compute-1 Running 165m openshift-compute-1 baremetalhost:///openshift-machine-api/openshift-compute-1/0fdae6eb-2066-4241-91dc-e7ea72ab13b9 provisioned
6869
----

modules/nodes-link-node-machine-bmh.adoc

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -6,11 +6,12 @@
66
[id="linking-node-machine-bmh_{context}"]
77
= Linking the node, bare metal host, and machine together
88

9+
[role="_abstract"]
910
Continue creating the new control plane node by creating a machine and then linking it with the new `BareMetalHost` object and node.
1011

1112
.Procedure
1213

13-
. Get the `providerID` for control plane nodes by running the following command:
14+
. Get the `providerID` for the replaced node by running the following command:
1415
+
1516
[source,terminal]
1617
----
@@ -25,7 +26,7 @@ baremetalhost:///openshift-machine-api/master-01/58fb60bd-b2a6-4ff3-a88d-208c33a
2526
baremetalhost:///openshift-machine-api/master-02/dc5a94f3-625b-43f6-ab5a-7cc4fc79f105
2627
----
2728

28-
. Get cluster information for labels by running the following command:
29+
. Get the `cluster-api-cluster` label by running the following command:
2930
+
3031
[source,terminal]
3132
----
@@ -40,10 +41,11 @@ $ oc get machine -n openshift-machine-api \
4041
NAME PHASE TYPE REGION ZONE AGE CLUSTER-API-CLUSTER
4142
ci-op-jcp3s7wx-ng5sd-master-0 Running 10h ci-op-jcp3s7wx-ng5sd
4243
ci-op-jcp3s7wx-ng5sd-master-1 Running 10h ci-op-jcp3s7wx-ng5sd
43-
ci-op-jcp3s7wx-ng5sd-master-2 Running 10h ci-op-jcp3s7wx-ng5sd
4444
----
4545

46-
. Create a `Machine` object for the new control plane node by creating a yaml file similar to the following:
46+
. Create a `Machine` object for the new control plane node:
47+
48+
.. Create a YAML file similar to the following:
4749
+
4850
[source,yaml]
4951
----
@@ -75,13 +77,18 @@ spec:
7577
name: master-user-data-managed
7678
----
7779
+
78-
--
7980
where:
8081

8182
`<new_control_plane_machine>`:: Specifies the name of the new machine, which can be the same as the previously deleted machine name.
8283
`<cluster_api_cluster>`:: Specifies the `CLUSTER-API-CLUSTER` value for the other control plane machines, shown in the output of the previous step.
8384
`<provider_id>`:: Specifies the `providerID` value of the new bare metal host, shown in the output of an earlier step.
84-
--
85+
86+
.. Apply the YAML file by running the following command:
87+
+
88+
[source,terminal]
89+
----
90+
$ oc apply -f <machine_object_yaml_file>
91+
----
8592
+
8693
The following warning is expected:
8794
+
@@ -100,6 +107,17 @@ $ NEW_NODE_NAME=<new_node_name>
100107
----
101108
+
102109
Replace `<new_node_name>` with the name of the new control plane node.
110+
+
111+
[NOTE]
112+
====
113+
The name of the new node might be different than the name of the node you are replacing.
114+
You can check the name of the new node by running the following command:
115+
116+
[source,terminal]
117+
----
118+
$ oc get nodes
119+
----
120+
====
103121
104122
.. Define the `NEW_MACHINE_NAME` variable by running the following command:
105123
+

modules/nodes-remove-unhealthy-etcd-member.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
[id="removing-etcd-member_{context}"]
77
= Removing the unhealthy etcd member
88

9+
[role="_abstract"]
910
Begin removing the failed control plane node by first removing the unhealthy etcd member.
1011

1112
.Procedure

modules/nodes-replace-control-plane-prereqs.adoc

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,10 @@
66
[id="prerequisites_{context}"]
77
= Prerequisites
88

9+
[role="_abstract"]
10+
You must meet the following prerequisites to replace a failed bare-metal control plane node using this method.
11+
12+
913
* You have identified the unhealthy bare metal etcd member.
1014
* You have verified that either the machine is not running or the node is not ready.
1115
* You have access to the cluster as a user with the `cluster-admin` role.

modules/nodes-verify-failed-node-deleted.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@
66
[id="verify-machine-deleted_{context}"]
77
= Verifying that the failed node was deleted
88

9+
[role="_abstract"]
910
Before proceeding to create a replacement control plane node, verify that the failed node was successfully deleted.
1011

1112
.Procedure

nodes/nodes/nodes-nodes-replace-control-plane.adoc

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ include::_attributes/common-attributes.adoc[]
66

77
toc::[]
88

9+
[role="_abstract"]
910
If a control plane node on your bare-metal cluster has failed and cannot be recovered, but you installed your cluster without providing baseboard management controller (BMC) credentials, you must take extra steps in order to replace the failed node with a new one.
1011

1112
// Prerequisites

0 commit comments

Comments
 (0)