Skip to content

NO-JIRA: fix machine sync e2e test flake#612

Open
stefanonardo wants to merge 1 commit into
openshift:mainfrom
stefanonardo:fix-machine-sync-infra-diff
Open

NO-JIRA: fix machine sync e2e test flake#612
stefanonardo wants to merge 1 commit into
openshift:mainfrom
stefanonardo:fix-machine-sync-infra-diff

Conversation

@stefanonardo

@stefanonardo stefanonardo commented Jun 26, 2026

Copy link
Copy Markdown

Summary

  • Wait for the CAPI Machine to reach Running phase before capturing the infrastructure machine UID and asserting stability
  • The test was flaky because it started the stability check before provisioning completed — when the MAPI machine controller set providerID, the sync controller saw a spec diff on the infra machine and did a legitimate delete+recreate, changing the UID within the Consistently window

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Tests
    • Improved end-to-end validation to wait for machine startup before checking sync status and stability, making the test more reliable.

Wait for CAPI Machine to reach Running phase before asserting infra
machine UID stability, so the providerID-triggered recreate has
already happened.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: LGTM mode

@openshift-ci-robot

Copy link
Copy Markdown

@stefanonardo: This pull request explicitly references no jira issue.

Details

In response to this:

Summary

  • Wait for the CAPI Machine to reach Running phase before capturing the infrastructure machine UID and asserting stability
  • The test was flaky because it started the stability check before provisioning completed — when the MAPI machine controller set providerID, the sync controller saw a spec diff on the infra machine and did a legitimate delete+recreate, changing the UID within the Consistently window

🤖 Generated with Claude Code

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 26, 2026
@openshift-ci openshift-ci Bot requested review from RadekManak and nrb June 26, 2026 12:08
@coderabbitai

coderabbitai Bot commented Jun 26, 2026

Copy link
Copy Markdown

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 490b4a3f-5142-42ec-8616-e220fde86e41

📥 Commits

Reviewing files that changed from the base of the PR and between 925d57a and fc943a7.

📒 Files selected for processing (1)
  • e2e/machine_sync_test.go

Walkthrough

The e2e machine sync test now waits for the CAPI Machine to reach Running before validating the infrastructure machine and sync-related conditions.

Changes

Machine sync e2e timing

Layer / File(s) Summary
Wait for CAPI Machine Running
e2e/machine_sync_test.go
The test creates a minimal clusterv1.Machine reference, calls verifyMachineRunning, and then continues with infrastructure-machine lookup and stability checks.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested labels

approved, lgtm, verified

Suggested reviewers

  • racheljpg
  • RadekManak
🚥 Pre-merge checks | ✅ 14 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Microshift Test Compatibility ⚠️ Warning The test has no MicroShift skip/tag and uses config.openshift.io plus Machine API resources, which MicroShift doesn’t serve. Add an [apigroup:machine.openshift.io] or [Skipped:MicroShift] tag, or guard with exutil.IsMicroShiftCluster() and g.Skip().
✅ Passed checks (14 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: fixing the machine sync e2e test flake.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PASS: e2e/machine_sync_test.go uses only static Ginkgo titles ('Machine Sync', one It); the new wait logic adds no dynamic data to test names.
Test Structure And Quality ✅ Passed The test has one focused behavior, uses DeferCleanup for resources, and adds explicit timeouts via verifyMachineRunning and Consistently.
Single Node Openshift (Sno) Test Compatibility ✅ Passed No multi-node or HA assumptions found; the test only waits for a Machine to run and checks UID stability, with no SNO-unsafe logic.
Topology-Aware Scheduling Compatibility ✅ Passed Only e2e/machine_sync_test.go changed; it waits for Machine Running and adds no manifests/controllers or scheduling constraints.
Ote Binary Stdout Contract ✅ Passed The only change is a new wait inside an It body; no main/init/TestMain/BeforeSuite stdout writes were added, and suite setup already uses stderr.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed The new wait only polls cluster resources; I found no hardcoded IPv4s, IP-family assumptions, or public-internet calls in the changed test or helper.
No-Weak-Crypto ✅ Passed Only adds a wait in an e2e test; no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB, custom crypto, or secret/token comparisons appear in the change.
Container-Privileges ✅ Passed PASS: The PR only changes e2e/machine_sync_test.go, and it contains no container/K8s securityContext or privilege flags.
No-Sensitive-Data-In-Logs ✅ Passed The PR only adds a wait/assertion in a test; no new logging or sensitive-data exposure is present.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@mdbooth

mdbooth commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

/approve
/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Jun 26, 2026
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling tests matching the pipeline_run_if_changed or not excluded by pipeline_skip_if_only_changed parameters:
/test e2e-aws-capi-disconnected-techpreview
/test e2e-aws-capi-techpreview
/test e2e-aws-capi-techpreview-post-install
/test e2e-aws-ovn-techpreview
/test e2e-aws-ovn-techpreview-upgrade
/test e2e-azure-capi-techpreview
/test e2e-azure-ovn-techpreview
/test e2e-azure-ovn-techpreview-upgrade
/test e2e-gcp-capi-techpreview
/test e2e-gcp-ovn-techpreview
/test e2e-metal3-capi-techpreview
/test e2e-openstack-capi-techpreview
/test e2e-vsphere-capi-techpreview
/test regression-clusterinfra-aws-ipi-techpreview-capi

@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: mdbooth

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 26, 2026
@openshift-ci

openshift-ci Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

@stefanonardo: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-ovn-techpreview fc943a7 link true /test e2e-gcp-ovn-techpreview
ci/prow/e2e-aws-capi-techpreview-post-install fc943a7 link true /test e2e-aws-capi-techpreview-post-install
ci/prow/e2e-metal3-capi-techpreview fc943a7 link false /test e2e-metal3-capi-techpreview

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants