fix(lb): prevent empty backend pools on IPAM failures#213
Conversation
543753c to
51afd0e
Compare
|
Hello, thanks for your contribution. There was already a fix in #175 to handle internal IPs being missing during node initialization. It requires the |
| targetIPs = extractNodesInternalIps(nodes) | ||
|
|
||
| // Fallback: if no internal IPs and we're in private network mode, try IPAM directly | ||
| if len(targetIPs) == 0 && (l.pnID != "" || len(pnIDs) > 0) { |
There was a problem hiding this comment.
extractNodesInternalIps() return all private ips, but it can contains ips from pns not from pnIDs.
so in the case of this annotation, I think it would be better to not use extractNodesInternalIps ?
There was a problem hiding this comment.
Thanks for the comment - that makes sense.
I have updated the PR to make it work without the function
Thank you! Yes, we start the kubelet with said flag |
|
Is this PR still needed if we update this scaleway-cloud-controller-manager/scaleway/instances.go Lines 238 to 239 in 8cbe826 to be: return addresses, fmt.Errorf("unable to query ipam for node %s: %v", server.Name, err)And also to be: return nil, fmt.Errorf("error getting private network addresses for node %s: %v", server.Name, err)That would prevent Nodes to be initialized without InternalIPs when there is a transient IPAM failure. If possible, I'd like to avoid putting additional pressure on the IPAM API. |
7d1db9a to
73ea234
Compare
Prevent nodes from being initialized without InternalIPs when IPAM queries fail transiently. By returning errors instead of only logging warnings, the node keeps its uninitialized taint and Kubernetes retries until IPAM is available. Additionally, use internal IPs when the service annotation scw-loadbalancer-pn-ids is set (not just the global PN_ID env var), and refuse to clear existing LB backends when no replacement IPs are found. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
73ea234 to
6eeac07
Compare
|
@andreaswachs I pushed some changes based on the above comment. Please test this and let us know the results. Once verified, this should be good to merge. |
I have just manually tested this to work:
|
* feat(instance): avoid missing Node Refs for Private Network Instances without pnNIC Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> * fix(instances,lb): propagate IPAM errors and harden backend pool updates (scaleway#213) Prevent nodes from being initialized without InternalIPs when IPAM queries fail transiently. By returning errors instead of only logging warnings, the node keeps its uninitialized taint and Kubernetes retries until IPAM is available. Additionally, use internal IPs when the service annotation scw-loadbalancer-pn-ids is set (not just the global PN_ID env var), and refuse to clear existing LB backends when no replacement IPs are found. Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: env to disable tags sync from ccm * fix(examples): update container image for CCM Signed-off-by: Nicklas Frahm <nfm@corti.ai> * Update Scaleway CCM image version Signed-off-by: Andreas Wachs <awa@corti.ai> * Update Scaleway CCM image version to v0.36.0-kommodity.3 Signed-off-by: Andreas Wachs <awa@corti.ai> * WIP Signed-off-by: Andreas Wachs <awa@corti.ai> * Update k8s-scaleway-ccm-latest.yml Signed-off-by: Andreas Wachs <awa@corti.ai> --------- Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> Signed-off-by: Nicklas Frahm <nfm@corti.ai> Signed-off-by: Andreas Wachs <awa@corti.ai> Co-authored-by: Steffen Karlsson <steffen.karlsson@gmail.com> Co-authored-by: Jérémy THERIN <jtherin@users.noreply.github.com> Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Jeremy THERIN <jtherin@scaleway.com>
* feat(instance): avoid missing Node Refs for Private Network Instances without pnNIC Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> * fix(instances,lb): propagate IPAM errors and harden backend pool updates (scaleway#213) Prevent nodes from being initialized without InternalIPs when IPAM queries fail transiently. By returning errors instead of only logging warnings, the node keeps its uninitialized taint and Kubernetes retries until IPAM is available. Additionally, use internal IPs when the service annotation scw-loadbalancer-pn-ids is set (not just the global PN_ID env var), and refuse to clear existing LB backends when no replacement IPs are found. Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: env to disable tags sync from ccm * fix(examples): update container image for CCM Signed-off-by: Nicklas Frahm <nfm@corti.ai> * Update Scaleway CCM image version Signed-off-by: Andreas Wachs <awa@corti.ai> * Update Scaleway CCM image version to v0.36.0-kommodity.3 Signed-off-by: Andreas Wachs <awa@corti.ai> * WIP Signed-off-by: Andreas Wachs <awa@corti.ai> * Update k8s-scaleway-ccm-latest.yml Signed-off-by: Andreas Wachs <awa@corti.ai> --------- Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> Signed-off-by: Nicklas Frahm <nfm@corti.ai> Signed-off-by: Andreas Wachs <awa@corti.ai> Co-authored-by: Steffen Karlsson <steffen.karlsson@gmail.com> Co-authored-by: Jérémy THERIN <jtherin@users.noreply.github.com> Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Jeremy THERIN <jtherin@scaleway.com>
* feat(instance): avoid missing Node Refs for Private Network Instances without pnNIC Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> * fix(instances,lb): propagate IPAM errors and harden backend pool updates (scaleway#213) Prevent nodes from being initialized without InternalIPs when IPAM queries fail transiently. By returning errors instead of only logging warnings, the node keeps its uninitialized taint and Kubernetes retries until IPAM is available. Additionally, use internal IPs when the service annotation scw-loadbalancer-pn-ids is set (not just the global PN_ID env var), and refuse to clear existing LB backends when no replacement IPs are found. Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: env to disable tags sync from ccm * fix(examples): update container image for CCM Signed-off-by: Nicklas Frahm <nfm@corti.ai> * Update Scaleway CCM image version Signed-off-by: Andreas Wachs <awa@corti.ai> * Update Scaleway CCM image version to v0.36.0-kommodity.3 Signed-off-by: Andreas Wachs <awa@corti.ai> * WIP Signed-off-by: Andreas Wachs <awa@corti.ai> * Update k8s-scaleway-ccm-latest.yml Signed-off-by: Andreas Wachs <awa@corti.ai> --------- Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> Signed-off-by: Nicklas Frahm <nfm@corti.ai> Signed-off-by: Andreas Wachs <awa@corti.ai> Co-authored-by: Steffen Karlsson <steffen.karlsson@gmail.com> Co-authored-by: Jérémy THERIN <jtherin@users.noreply.github.com> Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Jeremy THERIN <jtherin@scaleway.com>
* feat(instance): avoid missing Node Refs for Private Network Instances without pnNIC Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> * fix(instances,lb): propagate IPAM errors and harden backend pool updates (scaleway#213) Prevent nodes from being initialized without InternalIPs when IPAM queries fail transiently. By returning errors instead of only logging warnings, the node keeps its uninitialized taint and Kubernetes retries until IPAM is available. Additionally, use internal IPs when the service annotation scw-loadbalancer-pn-ids is set (not just the global PN_ID env var), and refuse to clear existing LB backends when no replacement IPs are found. Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: env to disable tags sync from ccm * fix(examples): update container image for CCM Signed-off-by: Nicklas Frahm <nfm@corti.ai> * Update Scaleway CCM image version Signed-off-by: Andreas Wachs <awa@corti.ai> * Update Scaleway CCM image version to v0.36.0-kommodity.3 Signed-off-by: Andreas Wachs <awa@corti.ai> * WIP Signed-off-by: Andreas Wachs <awa@corti.ai> * Update k8s-scaleway-ccm-latest.yml Signed-off-by: Andreas Wachs <awa@corti.ai> --------- Signed-off-by: Steffen Karlsson <steffen.karlsson@gmail.com> Signed-off-by: Nicklas Frahm <nfm@corti.ai> Signed-off-by: Andreas Wachs <awa@corti.ai> Co-authored-by: Steffen Karlsson <steffen.karlsson@gmail.com> Co-authored-by: Jérémy THERIN <jtherin@users.noreply.github.com> Co-authored-by: Nicklas Frahm <nfm@corti.ai> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Jeremy THERIN <jtherin@scaleway.com>
Summary
Prevents load balancer backend pools from being emptied when IPAM data is unavailable during node initialization.
Changes:
scw-loadbalancer-pn-idsis set (not just globalPN_IDenv var)This has been manually tested to work by: