Skip to content

Commit 66f54ea

Browse files
stuggiclaude
andcommitted
Fix MinorUpdateOVNDataplane race when OVN image unchanged between versions
Commit 63fd705 removed the nodeset.IsReady() check from DataplaneNodesetsOVNControllerImagesMatch to fix the minor update workflow getting stuck when unrelated deployments were running. That check was too strict — it blocked when any deployment was in progress, even if the OVN update had already completed. However, the remaining pure image comparison is too loose. When the OVN controller image does not change between two OpenStack versions, the nodeset already has the matching image from the previous update. The OpenStackVersion controller then sets MinorUpdateOVNDataplane=True immediately, before the edpm-ovn-update deployment finishes. This causes the subsequent minor update steps (controlplane update, edpm services update) to proceed while the OVN dataplane deployment is still running — resulting in both dataplane deployments running concurrently. Fix this with two mechanisms: 1. IsDataplaneDeploymentRunningForServiceType: a new generic function that lists all in-progress OpenStackDataPlaneDeployment resources, resolves which services each deploys (from ServicesOverride or the nodeset's service list), and checks the service's EDPMServiceType to identify deployments of a given type (e.g. "ovn"). EDPMServiceType is a fixed identifier on the service spec, independent of the service or deployment resource name. 2. Saved condition state tracking: when the OVN image is unchanged between the deployed and target versions, image comparison alone cannot distinguish "already updated" from "not yet updated." To handle this, we use the saved condition state (captured before Init resets conditions each reconciliation) to track whether a running OVN deployment has been observed during this update cycle: - Running OVN deployment seen → set condition False(RequestedReason) - No running deployment + previous condition was RequestedReason → deployment completed, proceed - No running deployment + previous condition was NOT RequestedReason → deployment not created yet, keep waiting The OpenStackVersion controller watches nodesets, so when a deployment starts (nodeset becomes not-Ready) and completes (nodeset becomes Ready), reconciliation is triggered, ensuring we observe the running deployment at least once. When the OVN image differs between versions, the existing image match check is sufficient — the nodeset's ContainerImages are only updated on successful deployment completion, so a match proves the deployment ran. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
1 parent 4e1f4fb commit 66f54ea

2 files changed

Lines changed: 150 additions & 0 deletions

File tree

internal/controller/core/openstackversion_controller.go

Lines changed: 69 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -291,6 +291,75 @@ func (r *OpenStackVersionReconciler) Reconcile(ctx context.Context, req ctrl.Req
291291
Log.Info("Waiting on OVN Dataplane updates to complete")
292292
return ctrl.Result{}, nil
293293
}
294+
295+
// When the OVN controller image is the same between the deployed
296+
// version and the target version, the image comparison above always
297+
// passes because the nodeset already has the matching image from
298+
// the previous update. In this case we need additional checks to
299+
// confirm the OVN dataplane deployment for this update cycle has
300+
// actually completed.
301+
//
302+
// We use the saved condition state (from before Init reset) to
303+
// track whether we have observed a running OVN deployment during
304+
// this update cycle:
305+
// - If we see a running OVN deployment now: set condition False
306+
// (RequestedReason) to record that we observed one
307+
// - If no running OVN deployment AND the previous condition was
308+
// False/RequestedReason: the deployment we saw previously has
309+
// completed → proceed (fall through to set True)
310+
// - If no running OVN deployment AND the previous condition was
311+
// NOT False/RequestedReason (e.g. still Unknown from Init):
312+
// we haven't seen a deployment yet → keep waiting
313+
//
314+
// When the image differs between versions, the image match alone
315+
// is sufficient proof that a deployment updated it, since the
316+
// nodeset's ContainerImages are only set on successful completion.
317+
deployedDefaults, hasDeployedDefaults := instance.Status.ContainerImageVersionDefaults[*instance.Status.DeployedVersion]
318+
if hasDeployedDefaults &&
319+
deployedDefaults.OvnControllerImage != nil &&
320+
instance.Status.ContainerImages.OvnControllerImage != nil &&
321+
*deployedDefaults.OvnControllerImage == *instance.Status.ContainerImages.OvnControllerImage {
322+
323+
ovnDeploymentRunning, err := openstack.IsDataplaneDeploymentRunningForServiceType(
324+
ctx, versionHelper, instance.Namespace, dataplaneNodesets, "ovn")
325+
if err != nil {
326+
return ctrl.Result{}, err
327+
}
328+
329+
if ovnDeploymentRunning {
330+
// OVN deployment is actively running — record this in
331+
// the condition so we can detect its completion later.
332+
instance.Status.Conditions.Set(condition.FalseCondition(
333+
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,
334+
condition.RequestedReason,
335+
condition.SeverityInfo,
336+
corev1beta1.OpenStackVersionMinorUpdateReadyRunningMessage))
337+
Log.Info("Waiting on OVN Dataplane deployment to complete (OVN image unchanged between versions)")
338+
return ctrl.Result{}, nil
339+
}
340+
341+
// No OVN deployment running. Check the saved condition state
342+
// from the previous reconciliation to determine if we ever
343+
// observed one running during this update cycle.
344+
prevOvnDataplaneCond := savedConditions.Get(corev1beta1.OpenStackVersionMinorUpdateOVNDataplane)
345+
if prevOvnDataplaneCond == nil ||
346+
prevOvnDataplaneCond.Reason != condition.RequestedReason {
347+
// We have never observed a running OVN deployment in
348+
// this update cycle — the deployment has not been
349+
// created yet. Keep waiting.
350+
instance.Status.Conditions.Set(condition.FalseCondition(
351+
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,
352+
condition.InitReason,
353+
condition.SeverityInfo,
354+
corev1beta1.OpenStackVersionMinorUpdateReadyRunningMessage))
355+
Log.Info("Waiting for OVN Dataplane deployment to be created (OVN image unchanged between versions)")
356+
return ctrl.Result{}, nil
357+
}
358+
// Previously saw a running OVN deployment (condition was
359+
// False/RequestedReason), now no OVN deployment is running
360+
// → the deployment has completed. Fall through to set True.
361+
Log.Info("OVN Dataplane deployment completed (OVN image unchanged between versions)")
362+
}
294363
}
295364
instance.Status.Conditions.MarkTrue(
296365
corev1beta1.OpenStackVersionMinorUpdateOVNDataplane,

internal/openstack/dataplane.go

Lines changed: 81 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,8 @@ import (
77
corev1beta1 "github.com/openstack-k8s-operators/openstack-operator/api/core/v1beta1"
88

99
dataplanev1 "github.com/openstack-k8s-operators/openstack-operator/api/dataplane/v1beta1"
10+
k8s_errors "k8s.io/apimachinery/pkg/api/errors"
11+
"k8s.io/apimachinery/pkg/types"
1012
"sigs.k8s.io/controller-runtime/pkg/client"
1113
)
1214

@@ -58,6 +60,85 @@ func DataplaneNodesetsOVNControllerImagesMatch(version *corev1beta1.OpenStackVer
5860
return true
5961
}
6062

63+
// IsDataplaneDeploymentRunningForServiceType checks whether any in-progress
64+
// OpenStackDataPlaneDeployment is deploying a service with the given
65+
// EDPMServiceType (e.g. "ovn"). It resolves which services each deployment
66+
// runs (from ServicesOverride or the nodeset's service list) and inspects
67+
// the service's EDPMServiceType to determine if it matches.
68+
func IsDataplaneDeploymentRunningForServiceType(
69+
ctx context.Context,
70+
h *helper.Helper,
71+
namespace string,
72+
dataplaneNodesets *dataplanev1.OpenStackDataPlaneNodeSetList,
73+
serviceType string,
74+
) (bool, error) {
75+
// List all deployments in the namespace
76+
deployments := &dataplanev1.OpenStackDataPlaneDeploymentList{}
77+
opts := []client.ListOption{
78+
client.InNamespace(namespace),
79+
}
80+
err := h.GetClient().List(ctx, deployments, opts...)
81+
if err != nil {
82+
return false, err
83+
}
84+
85+
// Build a map of nodeset name -> nodeset for quick lookup
86+
nodesetMap := make(map[string]*dataplanev1.OpenStackDataPlaneNodeSet, len(dataplaneNodesets.Items))
87+
for i := range dataplaneNodesets.Items {
88+
nodesetMap[dataplaneNodesets.Items[i].Name] = &dataplaneNodesets.Items[i]
89+
}
90+
91+
// Cache service lookups to avoid repeated API calls
92+
serviceCache := make(map[string]*dataplanev1.OpenStackDataPlaneService)
93+
94+
for _, deployment := range deployments.Items {
95+
// Skip completed deployments
96+
if deployment.Status.Deployed {
97+
continue
98+
}
99+
100+
// Determine which services this deployment runs for each of its nodesets
101+
for _, nodesetName := range deployment.Spec.NodeSets {
102+
nodeset, exists := nodesetMap[nodesetName]
103+
if !exists || len(nodeset.Spec.Nodes) == 0 {
104+
continue
105+
}
106+
107+
var services []string
108+
if len(deployment.Spec.ServicesOverride) != 0 {
109+
services = deployment.Spec.ServicesOverride
110+
} else {
111+
services = nodeset.Spec.Services
112+
}
113+
114+
for _, serviceName := range services {
115+
svc, cached := serviceCache[serviceName]
116+
if !cached {
117+
foundService := &dataplanev1.OpenStackDataPlaneService{}
118+
err := h.GetClient().Get(ctx, types.NamespacedName{
119+
Name: serviceName,
120+
Namespace: namespace,
121+
}, foundService)
122+
if err != nil {
123+
if k8s_errors.IsNotFound(err) {
124+
continue
125+
}
126+
return false, err
127+
}
128+
svc = foundService
129+
serviceCache[serviceName] = svc
130+
}
131+
132+
if svc.Spec.EDPMServiceType == serviceType {
133+
return true, nil
134+
}
135+
}
136+
}
137+
}
138+
139+
return false, nil
140+
}
141+
61142
// DataplaneNodesetsDeployed returns true if all nodesets are deployed with the latest version
62143
func DataplaneNodesetsDeployed(version *corev1beta1.OpenStackVersion, dataplaneNodesets *dataplanev1.OpenStackDataPlaneNodeSetList) bool {
63144
for _, nodeset := range dataplaneNodesets.Items {

0 commit comments

Comments
 (0)