diff --git a/docs/en/install/global_dr.mdx b/docs/en/install/global_dr.mdx index 24972c20..6bf5ba58 100644 --- a/docs/en/install/global_dr.mdx +++ b/docs/en/install/global_dr.mdx @@ -132,32 +132,61 @@ Refer to the following documentation to complete installation: ### Step 3: Enable etcd Synchronization \{#etcd_sync} -1. When applicable, configure the load balancer to forward port `2379` to control plane nodes of the corresponding cluster. ONLY TCP mode is supported; forwarding on L7 is not supported. +1. Before installing the plugin, create the `etcd-sync-active-cluster-token` Secret in the standby global cluster under the `cpaas-system` namespace. The Secret must store a bearer token for accessing the active global cluster API server under the data key `token`. + + ```bash + # Run this command on the standby cluster. + ACTIVE_CLUSTER_TOKEN='' + kubectl -n cpaas-system create secret generic etcd-sync-active-cluster-token \ + --from-literal=token="${ACTIVE_CLUSTER_TOKEN}" \ + --dry-run=client -o yaml | kubectl apply -f - + ``` + +2. When applicable, configure the load balancer to forward port `2379` to control plane nodes of the corresponding cluster. ONLY TCP mode is supported; forwarding on L7 is not supported. :::info Port forwarding through a load balancer is not required. If direct access from the standby cluster to the active global cluster is available, specify the etcd addresses via **Active Global Cluster ETCD Endpoints**. ::: -2. Access the **standby global cluster** Web Console using its VIP, and switch to **Administrator** view; -3. Navigate to **Marketplace > Cluster Plugins**, select the `global` cluster; -4. Find ** etcd Synchronizer**, click **Install**, configure parameters: +3. Access the **standby global cluster** Web Console using its VIP, and switch to **Administrator** view. +4. Navigate to **Marketplace > Cluster Plugins**, select the `global` cluster. +5. Find ** etcd Synchronizer**, click **Install**, and configure these parameters: + + * Set **Active Global Cluster VIP** to the VIP of the active global cluster. + * When not forwarding port `2379` through the load balancer, configure **Active Global Cluster ETCD Endpoints** correctly. + * Set **Standby Cluster ETCD Endpoints** to the standby cluster etcd address. Use the default value unless the local etcd service is exposed through a different endpoint. + * Set **Active Global Cluster Token Secret** to `etcd-sync-active-cluster-token`. + * Use the default value of **Data Check Interval**. + * Leave **Print detail logs** disabled unless troubleshooting. - * When not forwarding port `2379` through load balancer, its required to configure **Active Global Cluster ETCD Endpoints** correctly; - * Use the default value of **Data Check Interval**; - * Leave **Print detail logs** switch disabled unless troubleshooting. +During installation, the system runs the `etcd-sync-bootstrap` Job before the `etcd-sync` Deployment starts. The plugin installation continues only after the Job prepares `remote-etcd-ca`, `remote-etcd-issuer`, and `remote-etcd-client`. + +Verify the bootstrap Job and runtime resources: + +```bash +kubectl get job -n cpaas-system etcd-sync-bootstrap +kubectl logs -n cpaas-system job/etcd-sync-bootstrap +kubectl get secret -n cpaas-system remote-etcd-ca +kubectl get issuer -n cpaas-system remote-etcd-issuer +kubectl get certificate -n cpaas-system remote-etcd-client +kubectl get secret -n cpaas-system remote-etcd-client +``` -Verify the sync Pod is running on the standby cluster: +Verify the sync Pods are running on the standby cluster and identify the current leader: ```bash kubectl get po -n cpaas-system -l app=etcd-sync -kubectl logs -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | head -1) | grep -i "Start Sync update" +kubectl get lease -n cpaas-system etcd-sync-mirror +leader_pod=$(kubectl get lease -n cpaas-system etcd-sync-mirror -o jsonpath='{.spec.holderIdentity}') +kubectl logs -n cpaas-system "$leader_pod" | grep -E "Acquired leader lease|Start Sync update" ``` -Once “Start Sync update” appears, recreate one of the pods to re-trigger sync of resources with ownerReference dependencies: +If resources with `ownerReference` dependencies need to be resynchronized, recreate the current leader Pod after `Start Sync update` appears: ```bash -kubectl delete po -n cpaas-system $(kubectl get po -n cpaas-system -l app=etcd-sync --no-headers | head -1) +leader_pod=$(kubectl get lease -n cpaas-system etcd-sync-mirror -o jsonpath='{.spec.holderIdentity}') +kubectl delete po -n cpaas-system "$leader_pod" ``` Check sync status: @@ -166,16 +195,16 @@ Check sync status: mirror_svc=$(kubectl get svc -n cpaas-system etcd-sync-monitor -o jsonpath='{.spec.clusterIP}') ipv6_regex="^[0-9a-fA-F:]+$" if [[ $mirror_svc =~ $ipv6_regex ]]; then - export mirror_new_svc="[$mirror_svc]" + mirror_host="[$mirror_svc]" else - export mirror_new_svc=$mirror_svc + mirror_host="$mirror_svc" fi -curl $mirror_new_svc/check +curl -g "http://${mirror_host}/check" ``` **Output explanation:** -* `LOCAL ETCD missed keys`: Keys exist in the Primary but are missing from the standby. Often caused by GC due to resource order during sync. Restart one etcd-sync Pod to fix; +* `LOCAL ETCD missed keys`: Keys exist in the Primary but are missing from the standby. Often caused by GC due to resource order during sync. Restart the current etcd-sync leader Pod to fix; * `LOCAL ETCD surplus keys`: Extra keys exist only in the standby cluster. Confirm with ops team before deleting these keys from the standby. If the following components are installed, restart their services: @@ -258,7 +287,14 @@ If the following components are installed, restart their services: Regularly check sync status on the standby cluster: ```bash -curl $(kubectl get svc -n cpaas-system etcd-sync-monitor -o jsonpath='{.spec.clusterIP}')/check +mirror_svc=$(kubectl get svc -n cpaas-system etcd-sync-monitor -o jsonpath='{.spec.clusterIP}') +ipv6_regex="^[0-9a-fA-F:]+$" +if [[ $mirror_svc =~ $ipv6_regex ]]; then + mirror_host="[$mirror_svc]" +else + mirror_host="$mirror_svc" +fi +curl -g "http://${mirror_host}/check" ``` If any keys are missing or surplus, follow the instructions in the output to resolve them. diff --git a/docs/en/upgrade/upgrade_global_cluster.mdx b/docs/en/upgrade/upgrade_global_cluster.mdx index 04f3512e..d4a61203 100644 --- a/docs/en/upgrade/upgrade_global_cluster.mdx +++ b/docs/en/upgrade/upgrade_global_cluster.mdx @@ -315,6 +315,16 @@ After the standby global cluster has reached the desired version, run the remain Before reinstalling the plugin, verify that port `2379` is forwarded correctly from both global-cluster VIPs to their control plane nodes when that forwarding mode is used. Port forwarding through a load balancer is not required if the standby global cluster can access the active global cluster directly. +Before reinstalling the plugin, create or update the `etcd-sync-active-cluster-token` Secret in `cpaas-system`. The Secret must store a bearer token for accessing the active global cluster API server under the data key `token`. Use this Secret through **Active Global Cluster Token Secret**. Legacy plain-token configuration remains only as a compatibility fallback and is not the recommended operational path. + +```bash +# Run this command on the standby cluster. +ACTIVE_CLUSTER_TOKEN='' +kubectl -n cpaas-system create secret generic etcd-sync-active-cluster-token \ + --from-literal=token="${ACTIVE_CLUSTER_TOKEN}" \ + --dry-run=client -o yaml | kubectl apply -f - +``` + To reinstall the plugin: 1. Access the **standby global cluster** Web Console through its VIP and switch to **Administrator** view. @@ -323,23 +333,40 @@ To reinstall the plugin: When you configure the plugin: +- Set **Active Global Cluster VIP** to the VIP of the active global cluster. - When port `2379` is not forwarded through a load balancer, set **Active Global Cluster ETCD Endpoints** correctly. +- Set **Standby Cluster ETCD Endpoints** to the standby cluster etcd address. Use the default value unless the local etcd service is exposed through a different endpoint. +- Set **Active Global Cluster Token Secret** to `etcd-sync-active-cluster-token`. - Use the default value of **Data Check Interval**. - Leave **Print detail logs** disabled unless you are troubleshooting. -Verify the sync Pod is running on the standby global cluster: +During reinstallation, the system runs the `etcd-sync-bootstrap` Job before the `etcd-sync` Deployment starts. The release continues only after the Job prepares `remote-etcd-ca`, `remote-etcd-issuer`, and `remote-etcd-client`. + +Verify the bootstrap Job and runtime resources: + +```bash +kubectl get job -n cpaas-system etcd-sync-bootstrap +kubectl logs -n cpaas-system job/etcd-sync-bootstrap +kubectl get secret -n cpaas-system remote-etcd-ca +kubectl get issuer -n cpaas-system remote-etcd-issuer +kubectl get certificate -n cpaas-system remote-etcd-client +kubectl get secret -n cpaas-system remote-etcd-client +``` + +Verify the sync Pods are running on the standby global cluster and identify the current leader: ```bash kubectl get po -n cpaas-system -l app=etcd-sync -etcd_sync_pod=$(kubectl get po -n cpaas-system -l app=etcd-sync -o jsonpath='{.items[0].metadata.name}') -kubectl logs -n cpaas-system "$etcd_sync_pod" | grep -i "Start Sync update" +kubectl get lease -n cpaas-system etcd-sync-mirror +leader_pod=$(kubectl get lease -n cpaas-system etcd-sync-mirror -o jsonpath='{.spec.holderIdentity}') +kubectl logs -n cpaas-system "$leader_pod" | grep -E "Acquired leader lease|Start Sync update" ``` -Once `Start Sync update` appears, recreate one of the Pods to trigger synchronization of resources with ownerReference dependencies: +If resources with `ownerReference` dependencies need to be resynchronized, recreate the current leader Pod after `Start Sync update` appears: ```bash -etcd_sync_pod=$(kubectl get po -n cpaas-system -l app=etcd-sync -o jsonpath='{.items[0].metadata.name}') -kubectl delete po -n cpaas-system "$etcd_sync_pod" +leader_pod=$(kubectl get lease -n cpaas-system etcd-sync-mirror -o jsonpath='{.spec.holderIdentity}') +kubectl delete po -n cpaas-system "$leader_pod" ``` Check sync status: @@ -357,9 +384,13 @@ curl -g "http://${mirror_host}/check" Output interpretation: -- `LOCAL ETCD missed keys`: Keys exist in the primary global cluster but are missing from the standby. This often resolves after restarting one `etcd-sync` Pod. +- `LOCAL ETCD missed keys`: Keys exist in the primary global cluster but are missing from the standby. This often resolves after restarting the current `etcd-sync` leader Pod. - `LOCAL ETCD surplus keys`: Keys exist in the standby global cluster but not in the primary. Review these with your operations team before deleting them. +After verification succeeds, remove any remaining legacy plain-token configuration from the plugin settings or release values. + +If the active-cluster token or the remote `etcd-ca` changes later, run the plugin upgrade or reinstall workflow again so that the `etcd-sync-bootstrap` hook refreshes the runtime credentials and certificates. + ## Related Documentation