Skip to content

[occm] Support instance discovery via system UUID instead of hostname-only matching #3093

@fvrk4n

Description

@fvrk4n

Problem

The current getServerByName() implementation in pkg/openstack/instances.go discovers OpenStack instances by matching the Kubernetes node name against Nova server names using an exact regex match:

opts := servers.ListOpts{
    Name: fmt.Sprintf("^%s$", regexp.QuoteMeta(name)),
}

This silently fails when the Kubernetes node hostname does not exactly match the OpenStack instance name — which is common in:

  • RKE2/K3s: hostnames set via --node-name or OS hostname may differ from Terraform resource names
  • Autoscaling: instance names include random suffixes that don't match kubelet hostnames
  • Enterprise naming: organizations may have different naming conventions for infra vs K8s

When discovery fails, OCCM sets no providerID, no zone/region labels, and logs no error — making this extremely hard to diagnose.

Proposed Solution

Kubernetes nodes already expose the SMBIOS system UUID via node.status.nodeInfo.systemUUID. OpenStack sets the Nova instance UUID as the guest's SMBIOS product UUID, making this a reliable 1:1 mapping.

The getInstance() function should try multiple discovery strategies in order:

func (i *InstancesV2) getInstance(ctx context.Context, node *v1.Node) (*servers.Server, error) {
    // 1. If providerID already set, use it (existing behavior)
    if node.Spec.ProviderID != "" {
        return getServerByProviderID(...)
    }

    // 2. Try system UUID — Nova servers.Get(uuid) — most reliable
    if uuid := node.Status.NodeInfo.SystemUUID; uuid != "" {
        srv, err := servers.Get(ctx, i.compute, strings.ToLower(uuid)).Extract()
        if err == nil {
            return srv, nil
        }
        klog.V(4).Infof("Failed to find instance by system UUID %s: %v, falling back to name match", uuid, err)
    }

    // 3. Fallback to name match (existing behavior)
    return getServerByName(ctx, i.compute, node.Name)
}

This is:

  • Zero-config: No new cloud.conf options needed
  • Backward-compatible: Falls back to name match if UUID lookup fails
  • Reliable: SMBIOS UUID = Nova instance UUID is guaranteed by OpenStack

Evidence

On an RKE2 cluster where node hostnames don't match Nova server names, systemUUID correctly maps to the OpenStack instance:

$ kubectl get nodes -o jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.status.nodeInfo.systemUUID}{"\n"}{end}'
cpe-central-master-1    0e6d1a84-b37c-40d2-bb69-76f9b22fd5bb
cpe-central-master-2    989ec61a-37e2-4f7a-9927-4833f55be53e
cpe-central-master-3    24038e8a-fa44-45dc-9bcf-8e7ba04593eb
cpe-central-worker-1    a3634fe4-e3c5-486d-918e-97df55ad476c
cpe-central-worker-2    c9410041-4b6e-4e17-8b8f-b7ffbf9525bc
cpe-central-worker-3    4c0fbb6a-ff80-4c5e-b783-46abccb04677

All UUIDs match their corresponding Nova instance IDs.

Current Workarounds

  • Ensure K8s node hostnames exactly match OpenStack instance names (fragile, not always possible)
  • Set --provider-id=openstack:///UUID on kubelet at boot time (requires infra-level changes per node)

Both workarounds require manual coordination between infra provisioning and K8s node registration, which the UUID-based discovery would eliminate entirely.

Additional Context

The silent failure when name matching fails (no error logged, providerID simply never set) makes debugging this issue very difficult. Even with --v=2, there is no indication that instance discovery failed. At minimum, a warning log when getServerByName returns no results would help operators diagnose mismatches.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions