[occm] Cross-cluster load balancer name collision when multiple Kubernetes clusters share an OpenStack project

/kind bug

**What happened**:

OCCM identifies an existing Octavia load balancer for a Service by name on
the first reconcile (`getLoadbalancerByName` in `pkg/openstack/loadbalancer.go`).
The name is built from `kube_service_<cluster-name>_<namespace>_<service>`,
where `<cluster-name>` defaults to `kubernetes`. When two Kubernetes clusters
share the same OpenStack project and use the same `--cluster-name` (which is
the default for many distributions: kubeadm, RKE2, etc.), Services with
identical namespace and name produce identical load balancer names.

Octavia does not require load balancer names to be unique inside a project,
so OCCM in cluster B happily picks up cluster A's load balancer, sets the
`loadbalancer.openstack.org/load-balancer-id` annotation on its own Service,
and starts driving cluster A's load balancer (rewriting listeners,
members, FIP, etc.). Cluster A then loses its load balancer.

This is the same root cause discussed in the closed issues #2241, #2571 and
#2624. The accepted upstream guidance is "use a unique `--cluster-name`",
which is correct but does not *defend* against the failure mode at all -
two operators independently bootstrapping clusters in the same tenant will
keep hitting it.

**What you expected to happen**:

OCCM should never adopt a load balancer that is owned by a different
Kubernetes cluster, even when names collide. A unique `--cluster-name`
should be a recommendation, not the only safety mechanism.

**How to reproduce it (as minimally and precisely as possible)**:

1. Create two Kubernetes clusters (cluster A and cluster B) in the same
   OpenStack project. Both run OCCM with the default
   `--cluster-name=kubernetes` (or any matching value).
2. On cluster A: `kubectl create deployment web --image nginx --port 80 && kubectl expose deployment web --type LoadBalancer --port 6666 --target-port 80`.
   An Octavia LB named `kube_service_kubernetes_default_web` is created
   in OpenStack.
3. On cluster B: same commands, exposing the service on port 8888.
4. Observe that no second load balancer is created. Instead OCCM in
   cluster B locates cluster A's LB by name, annotates its own Service
   with the same `load-balancer-id`, and rewrites the LB to point at
   cluster B's nodes on port 8888. Cluster A's Service is now broken.

**Anything else we need to know?**:

I'd like to propose adding a stable cluster identifier (the UID of the
`kube-system` namespace) as a load balancer tag of the form
`kube_cluster_id_<uid>`. The lookup would treat a load balancer with a
foreign `kube_cluster_id_*` tag as not-found instead of adopting it, and
fall back to the legacy behaviour for load balancers that don't carry any
`kube_cluster_id_*` tag (existing deployments and externally-created LBs).

This is a strictly additive, backward-compatible change that defends
against the failure mode without forcing operators to coordinate
`--cluster-name` values. I have a working implementation and will open a
PR shortly that links this issue.

**Environment**:
- openstack-cloud-controller-manager version: master (reproduced against
  v1.33.0 as well)
- OpenStack version: any with Octavia tags support (>= API v2.5 / Stein)
- Others: N/A


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[occm] Cross-cluster load balancer name collision when multiple Kubernetes clusters share an OpenStack project #3102

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[occm] Cross-cluster load balancer name collision when multiple Kubernetes clusters share an OpenStack project #3102

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions