Skip to content

[DNM] [update] fix: stop control plane check during split variant post-services tests#3978

Draft
Valkyrie00 wants to merge 1 commit into
openstack-k8s-operators:mainfrom
Valkyrie00:fix/OSPCIX-1381-stop-bg-tests-before-split-intermediate
Draft

[DNM] [update] fix: stop control plane check during split variant post-services tests#3978
Valkyrie00 wants to merge 1 commit into
openstack-k8s-operators:mainfrom
Valkyrie00:fix/OSPCIX-1381-stop-bg-tests-before-split-intermediate

Conversation

@Valkyrie00
Copy link
Copy Markdown
Contributor

@Valkyrie00 Valkyrie00 commented Jun 3, 2026

Context

This is a targeted fix to unblock the job which is currently failing due to a race condition between the continuous control plane check and tobiko tests.

A more comprehensive solution, such as adding a pause/resume mechanism to the control plane check scripts or reworking how background tests interact with the split update variant, could be evaluated as a follow-up.

Problem

Some jobs fails because tobiko tests encounter VMs in BUILD state during the post-services test phase.

The root cause is a race condition: the continuous control plane check (cifmw_update_control_plane_check) runs workload_launch.sh in an infinite loop, creating and tearing down VMs every few seconds.

In the split variant, tests run between the services update and system update phases while this loop is still active.
Tobiko lists all servers, finds a VM mid-creation, and fails:
novaclient.exceptions.Conflict: Cannot 'reboot' instance ... while it is in vm_state building (HTTP 409)

Proposed solution

Stop the control plane check before running the post-services tests in update_variant_split.yml, then restart it so it monitors the system update phase as well.

The ping test is not stopped because it only runs ping against an existing VM and does not create any new OpenStack resources.

Each control plane check run produces PID-scoped log files (control-plane-test-<PID>.log), so the stop/restart cycle creates two separate runs with independent validation, no log collision.

Closes: OSPCIX-1381
Assisted-By: Cursor

…tests

In the split update variant, the continuous control plane check
runs in the background creating and deleting VMs in a loop. When
post-services tests (tobiko) execute between the services and system
update phases, they discover transient VMs in BUILD state and fail
with HTTP 409 conflicts.

Stop the control plane check before running the post-services tests
and restart it afterward so it continues monitoring during the system
update phase. The ping test is left running since it only pings an
existing VM and does not create new resources.

Closes: OSPCIX-1381
Signed-off-by: Vito Castellano <vcastell@redhat.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Jun 3, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sathlan for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@Valkyrie00 Valkyrie00 changed the title [DNM] [update] Stop control plane check during split variant post-services tests [DNM] [update] fix: stop control plane check during split variant post-services tests Jun 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant