Skip to content

Commit 123ffe4

Browse files
committed
[occm] Tag load balancers with cluster identity to prevent name collisions
OCCM constructs Octavia load balancer names as kube_service_<cluster-name>_<namespace>_<service>. When two Kubernetes clusters share the same OpenStack project and use the same --cluster-name (default "kubernetes"), services with identical namespace/name produce identical load balancer names. Octavia does not enforce uniqueness on load balancer names, so OCCM's first-time name-based lookup can adopt and overwrite a load balancer that actually belongs to a different cluster (see issues #2241, #2571, #2624). This commit adds a stable Kubernetes cluster identifier - the UID of the kube-system namespace - as a load balancer tag of the form kube_cluster_id_<uid>. getLoadbalancerByName now ignores load balancers that carry a cluster-id tag for a different cluster and falls back to the legacy behaviour for load balancers without any cluster-id tag, so existing deployments keep working unchanged. Pre-existing load balancers gain the new tag during the next reconciliation. The cluster UID is read once at controller-manager start-up via the kube-system namespace; failure to read it (RBAC denial, missing namespace) is non-fatal and disables the safeguard, falling back to legacy name-based lookup. The cloud-controller-manager ClusterRole and the helm chart gain "get" on namespaces. Made-with: Cursor
1 parent 23db95b commit 123ffe4

7 files changed

Lines changed: 369 additions & 8 deletions

File tree

charts/openstack-cloud-controller-manager/templates/clusterrole.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -96,3 +96,9 @@ rules:
9696
- list
9797
- get
9898
- watch
99+
- apiGroups:
100+
- ""
101+
resources:
102+
- namespaces
103+
verbs:
104+
- get

docs/openstack-cloud-controller-manager/expose-applications-using-loadbalancer-type-service.md

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
- [Restrict Access For LoadBalancer Service](#restrict-access-for-loadbalancer-service)
1212
- [Use PROXY protocol to preserve client IP](#use-proxy-protocol-to-preserve-client-ip)
1313
- [Sharing load balancer with multiple Services](#sharing-load-balancer-with-multiple-services)
14+
- [Running multiple Kubernetes clusters in one OpenStack project](#running-multiple-kubernetes-clusters-in-one-openstack-project)
1415
- [IPv4 / IPv6 dual-stack services](#ipv4--ipv6-dual-stack-services)
1516

1617
<!-- END doctoc generated TOC please keep comment here to allow auto update -->
@@ -627,6 +628,55 @@ $ openstack loadbalancer listener list --loadbalancer 2b224530-9414-4302-8163-5a
627628

628629
The load balancer will be deleted after `service-2` is deleted.
629630

631+
### Running multiple Kubernetes clusters in one OpenStack project
632+
633+
OCCM names each load balancer it creates as `kube_service_<cluster-name>_<namespace>_<service>`,
634+
where `<cluster-name>` comes from the kube-controller-manager `--cluster-name` flag and
635+
defaults to `kubernetes`. When two Kubernetes clusters share the same OpenStack project
636+
and use the same `--cluster-name`, namespace and service name, the resulting load balancer
637+
names collide. OpenStack does not require load balancer names to be unique, so previously
638+
the second cluster could "adopt" the load balancer that the first cluster owned and start
639+
overwriting its configuration. The recommended way to avoid this remains to set a unique
640+
`--cluster-name` on every Kubernetes cluster (see issues
641+
[#2241](https://github.com/kubernetes/cloud-provider-openstack/issues/2241),
642+
[#2571](https://github.com/kubernetes/cloud-provider-openstack/issues/2571),
643+
[#2624](https://github.com/kubernetes/cloud-provider-openstack/issues/2624)).
644+
645+
To make OCCM resilient even when that recommendation isn't followed, OCCM also tags
646+
every load balancer it creates with the UID of the cluster's `kube-system` namespace,
647+
which is a stable and unique identifier per Kubernetes cluster. The tag has the form
648+
`kube_cluster_id_<uid>`. When OCCM looks up a load balancer by name on the first
649+
reconcile of a Service, it ignores load balancers that carry a `kube_cluster_id_*` tag
650+
for a *different* cluster and creates a new one instead. Load balancers that don't
651+
carry any `kube_cluster_id_*` tag (legacy load balancers, or load balancers created by
652+
external tooling) keep their previous behaviour for backward compatibility, and gain
653+
the cluster-id tag on the next reconcile.
654+
655+
OCCM reads the `kube-system` namespace UID once at start-up; this requires the
656+
`get` verb on the `namespaces` resource (already part of the standard cloud-controller
657+
RBAC). If the lookup fails (for example because of a custom RBAC restriction) OCCM logs
658+
a warning and continues without the safeguard, falling back to the legacy name-based
659+
behaviour.
660+
661+
After successful reconcile, the load balancer tags will look similar to:
662+
663+
```shell
664+
$ openstack loadbalancer show <lb-id> -c name -c tags
665+
+-------+--------------------------------------------------------------------------+
666+
| Field | Value |
667+
+-------+--------------------------------------------------------------------------+
668+
| name | kube_service_kubernetes_default_service-1 |
669+
| tags | kube_service_kubernetes_default_service-1, |
670+
| | kube_cluster_id_11111111-2222-3333-4444-555555555555 |
671+
+-------+--------------------------------------------------------------------------+
672+
```
673+
674+
> NOTE: This safeguard only protects new (or freshly reconciled) load balancers. It
675+
> does not retroactively resolve an already-stolen load balancer between two running
676+
> clusters. If you suspect cross-cluster collisions, set unique `--cluster-name` on
677+
> each cluster and let OCCM rename the existing resources (see
678+
> [PR #2552](https://github.com/kubernetes/cloud-provider-openstack/pull/2552)).
679+
630680
### IPv4 / IPv6 dual-stack services
631681
Since Kubernetes 1.20, Kubernetes clusters can run in dual-stack mode,
632682
which allows simultaneous usage of both IPv4 and IPv6 addresses in the cluster.

manifests/controller-manager/cloud-controller-manager-roles.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,12 @@ items:
9393
- list
9494
- get
9595
- watch
96+
- apiGroups:
97+
- ""
98+
resources:
99+
- namespaces
100+
verbs:
101+
- get
96102
- apiVersion: rbac.authorization.k8s.io/v1
97103
kind: ClusterRole
98104
metadata:

pkg/openstack/loadbalancer.go

Lines changed: 105 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -105,8 +105,31 @@ const (
105105
poolFormat = poolPrefix + "%d_%s"
106106
monitorPrefix = "monitor_"
107107
monitorFormat = monitorPrefix + "%d_%s"
108+
109+
// clusterIDTagPrefix is the prefix used for the load balancer tag that
110+
// carries a stable Kubernetes cluster identifier (the kube-system
111+
// namespace UID). Together with the existing servicePrefix tag it allows
112+
// OCCM to disambiguate load balancers when multiple Kubernetes clusters
113+
// share the same OpenStack project and a service name happens to collide.
114+
clusterIDTagPrefix = "kube_cluster_id_"
115+
116+
// eventLBStolen is emitted when getLoadbalancerByName finds a load
117+
// balancer with a matching name that belongs to a different Kubernetes
118+
// cluster (different cluster-id tag). The lookup is treated as NotFound
119+
// so OCCM creates a new load balancer instead of stealing an existing one.
120+
eventLBStolen = "LoadBalancerNameCollision"
108121
)
109122

123+
// clusterIDTag formats the Octavia load balancer tag carrying the cluster
124+
// identifier. It returns the empty string when uid is empty so callers can
125+
// safely append the result without a separate nil-check.
126+
func clusterIDTag(uid string) string {
127+
if uid == "" {
128+
return ""
129+
}
130+
return clusterIDTagPrefix + uid
131+
}
132+
110133
// LbaasV2 is a LoadBalancer implementation based on Octavia
111134
type LbaasV2 struct {
112135
LoadBalancer
@@ -153,8 +176,17 @@ type listenerKey struct {
153176
Port int
154177
}
155178

156-
// getLoadbalancerByName get the load balancer which is in valid status by the given name/legacy name.
157-
func getLoadbalancerByName(ctx context.Context, client *gophercloud.ServiceClient, name string, legacyName string) (*loadbalancers.LoadBalancer, error) {
179+
// getLoadbalancerByName gets the load balancer which is in valid status by the given name/legacy name.
180+
//
181+
// When clusterUID is non-empty, the returned load balancer must either carry
182+
// the matching clusterIDTagPrefix tag for that UID, or carry no clusterIDTagPrefix
183+
// tag at all (legacy load balancer that pre-dates the tag). Load balancers
184+
// whose name matches but that carry a different cluster-id tag belong to
185+
// another Kubernetes cluster sharing the OpenStack project; they are ignored
186+
// and the lookup returns ErrNotFound, which causes OCCM to create a new load
187+
// balancer instead of accidentally adopting (and overwriting) one that is
188+
// owned by a different cluster.
189+
func getLoadbalancerByName(ctx context.Context, client *gophercloud.ServiceClient, name string, legacyName string, clusterUID string) (*loadbalancers.LoadBalancer, error) {
158190
var validLBs []loadbalancers.LoadBalancer
159191

160192
opts := loadbalancers.ListOpts{
@@ -187,16 +219,71 @@ func getLoadbalancerByName(ctx context.Context, client *gophercloud.ServiceClien
187219
}
188220
}
189221

222+
validLBs, foreignFound := filterLoadBalancersByClusterID(validLBs, clusterUID)
223+
190224
if len(validLBs) > 1 {
191225
return nil, cpoerrors.ErrMultipleResults
192226
}
193227
if len(validLBs) == 0 {
228+
if foreignFound {
229+
klog.Warningf("Found a load balancer named %q in OpenStack but it belongs to a different Kubernetes cluster "+
230+
"(no %s%s tag); ignoring it. A new load balancer will be created.", name, clusterIDTagPrefix, clusterUID)
231+
}
194232
return nil, cpoerrors.ErrNotFound
195233
}
196234

197235
return &validLBs[0], nil
198236
}
199237

238+
// filterLoadBalancersByClusterID returns the subset of lbs that may belong to
239+
// the cluster identified by clusterUID. The selection rules are:
240+
//
241+
// - If clusterUID is empty, the filter is a no-op (the caller has no cluster
242+
// identity to match on).
243+
// - Load balancers carrying the matching clusterIDTagPrefix+clusterUID tag
244+
// are kept (strongly owned by this cluster).
245+
// - If none of the load balancers carries any clusterIDTagPrefix tag, all of
246+
// them are kept. This preserves the legacy behaviour for load balancers
247+
// created before the tag was introduced or by tools that don't set it.
248+
// - Otherwise (every candidate carries a foreign clusterIDTagPrefix tag) all
249+
// load balancers are dropped. The second return value is true in that case
250+
// to let the caller emit a more specific log line / event.
251+
func filterLoadBalancersByClusterID(lbs []loadbalancers.LoadBalancer, clusterUID string) ([]loadbalancers.LoadBalancer, bool) {
252+
if clusterUID == "" || len(lbs) == 0 {
253+
return lbs, false
254+
}
255+
wantTag := clusterIDTag(clusterUID)
256+
var owned []loadbalancers.LoadBalancer
257+
taggedAny := false
258+
for _, lb := range lbs {
259+
hasTag := false
260+
match := false
261+
for _, tag := range lb.Tags {
262+
if strings.HasPrefix(tag, clusterIDTagPrefix) {
263+
hasTag = true
264+
if tag == wantTag {
265+
match = true
266+
}
267+
}
268+
}
269+
if match {
270+
owned = append(owned, lb)
271+
}
272+
if hasTag {
273+
taggedAny = true
274+
}
275+
}
276+
if len(owned) > 0 {
277+
return owned, false
278+
}
279+
if !taggedAny {
280+
// Legacy load balancers without any cluster-id tag, behave as before.
281+
return lbs, false
282+
}
283+
// All candidates are tagged for some other cluster.
284+
return nil, true
285+
}
286+
200287
func popListener(existingListeners []listeners.Listener, id string) []listeners.Listener {
201288
newListeners := []listeners.Listener{}
202289
for _, existingListener := range existingListeners {
@@ -236,6 +323,9 @@ func (lbaas *LbaasV2) createOctaviaLoadBalancer(ctx context.Context, name, clust
236323

237324
if svcConf.supportLBTags {
238325
createOpts.Tags = []string{svcConf.lbName}
326+
if tag := clusterIDTag(lbaas.clusterUID); tag != "" {
327+
createOpts.Tags = append(createOpts.Tags, tag)
328+
}
239329
}
240330

241331
if svcConf.flavorID != "" {
@@ -339,7 +429,7 @@ func (lbaas *LbaasV2) GetLoadBalancer(ctx context.Context, clusterName string, s
339429
if lbID != "" {
340430
loadbalancer, err = openstackutil.GetLoadbalancerByID(ctx, lbaas.lb, lbID)
341431
} else {
342-
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, name, legacyName)
432+
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, name, legacyName, lbaas.clusterUID)
343433
}
344434
if err != nil && cpoerrors.IsNotFound(err) {
345435
return nil, false, nil
@@ -1737,7 +1827,7 @@ func (lbaas *LbaasV2) ensureOctaviaLoadBalancer(ctx context.Context, clusterName
17371827
}
17381828
} else {
17391829
legacyName := lbaas.getLoadBalancerLegacyName(service)
1740-
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, lbName, legacyName)
1830+
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, lbName, legacyName, lbaas.clusterUID)
17411831
if err != nil {
17421832
if err != cpoerrors.ErrNotFound {
17431833
return nil, fmt.Errorf("error getting loadbalancer for Service %s: %v", serviceName, err)
@@ -1826,11 +1916,19 @@ func (lbaas *LbaasV2) ensureOctaviaLoadBalancer(ctx context.Context, clusterName
18261916
// save address into the annotation
18271917
lbaas.updateServiceAnnotation(service, ServiceAnnotationLoadBalancerAddress, addr)
18281918

1829-
// add LB name to load balancer tags.
1919+
// add LB name and cluster-identity tags to the load balancer.
18301920
if svcConf.supportLBTags {
18311921
lbTags := loadbalancer.Tags
1922+
changed := false
18321923
if !slices.Contains(lbTags, lbName) {
18331924
lbTags = append(lbTags, lbName)
1925+
changed = true
1926+
}
1927+
if tag := clusterIDTag(lbaas.clusterUID); tag != "" && !slices.Contains(lbTags, tag) {
1928+
lbTags = append(lbTags, tag)
1929+
changed = true
1930+
}
1931+
if changed {
18341932
klog.InfoS("Updating load balancer tags", "lbID", loadbalancer.ID, "tags", lbTags)
18351933
if err := openstackutil.UpdateLoadBalancerTags(ctx, lbaas.lb, loadbalancer.ID, lbTags); err != nil {
18361934
return nil, err
@@ -1912,7 +2010,7 @@ func (lbaas *LbaasV2) updateOctaviaLoadBalancer(ctx context.Context, clusterName
19122010
// This is a Service created before shared LB is supported.
19132011
name := lbaas.GetLoadBalancerName(ctx, clusterName, service)
19142012
legacyName := lbaas.getLoadBalancerLegacyName(service)
1915-
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, name, legacyName)
2013+
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, name, legacyName, lbaas.clusterUID)
19162014
if err != nil {
19172015
return err
19182016
}
@@ -2100,7 +2198,7 @@ func (lbaas *LbaasV2) ensureLoadBalancerDeleted(ctx context.Context, clusterName
21002198
loadbalancer, err = openstackutil.GetLoadbalancerByID(ctx, lbaas.lb, svcConf.lbID)
21012199
} else {
21022200
// This may happen when this Service creation was failed previously.
2103-
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, lbName, legacyName)
2201+
loadbalancer, err = getLoadbalancerByName(ctx, lbaas.lb, lbName, legacyName, lbaas.clusterUID)
21042202
}
21052203
if err != nil && !cpoerrors.IsNotFound(err) {
21062204
return err

0 commit comments

Comments
 (0)