net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347
net, tests, stuntime: Wait for VMI affinity reconciliation before migration#5347Anatw wants to merge 2 commits into
Conversation
|
Warning Review limit reached
Next review available in: 51 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughThis PR adds a VM helper that waits for VMI affinity to match the VM template, then updates L2 bridge and localnet migration stuntime tests to wait for that state, verify post-migration node placement, and measure stuntime afterward. ChangesAffinity-aware migration stuntime checks
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Possibly related PRs
Suggested reviewers
Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (1 error)
✅ Passed checks (4 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Warning Review ran into problems🔥 ProblemsLinked repositories: Your configuration references 1 linked repositories, but your current plan allows 0. Analyzed ``, skipped Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Report bugs in Issues Welcome! 🎉This pull request will be automatically processed with the following features: 🔄 Automatic Actions
📋 Available CommandsPR Status Management
Review & Approval
Testing & Validation
Container Operations
Cherry-pick Operations
Branch Management
Label Management
✅ Merge RequirementsThis PR will be automatically approved when the following conditions are met:
📊 Review ProcessApprovers and ReviewersApprovers:
Reviewers:
Available Labels
AI Features
Security Checks
💡 Tips
For more information, please refer to the project documentation or contact the maintainers. |
|
/wip |
|
/build-and-push-container |
|
New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5347 published |
d503c68 to
31df3e8
Compare
|
/wip cancel |
|
/build-and-push-container |
|
New container for quay.io/openshift-cnv/openshift-virtualization-tests:pr-5347 published |
|
/verified Tested on bm03-tlv2 openshift-virtualization-tests-runner/5829/ |
Test execution plan request detailsCRITICAL: You MUST post an inline review comment on the first changed line of the first file. As an expert software testing engineer, analyze all modified files in this PR and create a targeted test execution plan. Analysis Requirements:
Output rules: Your deliverable: Test Execution Plan
Use these formats:
Real test commands (MANDATORY when changes affect session/runtime code): When the affected code runs at session/collection time (conftest fixtures, pytest plugins,
Example output for a session-startup fix: Guidelines:
Hardware-Related Checks (SR-IOV, GPU, DPDK): When PR modifies fixtures for hardware-specific resources:
CRITICAL WORKFLOW COMPLETION RULES: When responding to this test execution plan request, you MUST follow these rules EXACTLY:
ABSOLUTE PROHIBITIONS (violating these creates empty/meaningless reviews):
Remember: The pull request review is visible to users. Additional comments are redundant noise. CRITICAL — Verification rules for smoke and gating decisions: |
|
Tip For best results, initiate chat on the files or code changes. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
echo "== PR files =="
gh pr view "$pr" --repo "$repo" --json files --jq '.files[].path'
echo
echo "== First changed file and patch head =="
first_file=$(gh pr view "$pr" --repo "$repo" --json files --jq '.files[0].path')
echo "$first_file"
gh pr diff "$pr" --repo "$repo" --patch | awk -v f="$first_file" '
$0 ~ ("^diff --git a/" f " b/" f "$") {infile=1; print; next}
infile && /^diff --git / {exit}
infile {print}
' | sed -n '1,120p'
echo
echo "== Modified symbols in libs/utilities =="
git diff --unified=0 HEAD~1 -- libs utilities || true
echo
echo "== Outline of changed library file =="
ast-grep outline libs/vm/vm.py --view expanded | sed -n '1,220p'
echo
echo "== Direct callers / imports of wait_for_vmi_affinity in tests and conftest =="
rg -n -C2 'wait_for_vmi_affinity|set_template_affinity' tests
echo
echo "== Smoke-marked tests =="
rg -n '`@pytest`\.mark\.smoke' tests || true
echo
echo "== Gating-marked tests =="
rg -n '`@pytest`\.mark\.gating' tests || true
echo
echo "== Relevant changed test files with markers and test definitions =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
echo "--- $f ---"
sed -n '1,260p' "$f" | nl -ba | sed -n '1,260p'
doneLength of output: 3439 🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
echo "== PR head sha =="
gh pr view "$pr" --repo "$repo" --json headRefOid --jq '.headRefOid'
echo
echo "== Changed files and additions/deletions =="
gh pr view "$pr" --repo "$repo" --json files --jq '.files[] | [.path, .additions, .deletions] | `@tsv`'
echo
echo "== Current line numbers around first changed hunk in libs/vm/vm.py =="
nl -ba libs/vm/vm.py | sed -n '1,190p'
echo
echo "== Diff for libs/vm/vm.py =="
git diff --unified=3 HEAD~1 -- libs/vm/vm.py | sed -n '1,220p'
echo
echo "== Diff for tests/network/libs/stuntime.py =="
git diff --unified=3 HEAD~1 -- tests/network/libs/stuntime.py | sed -n '1,220p'
echo
echo "== Usages of wait_for_vmi_affinity =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(' tests libs
echo
echo "== Usages of measure_stuntime and migrate_vm_and_verify =="
rg -n -C2 '\bmeasure_stuntime\s*\(|\bmigrate_vm_and_verify\s*\(' tests/network
echo
echo "== Markers in changed test files =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
echo "--- $f markers ---"
rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
done
echo
echo "== Callers of tests/network/libs/stuntime.py helpers =="
rg -n -C2 'from tests\.network\.libs\.stuntime import|import tests\.network\.libs\.stuntime|measure_stuntime|verify_ping_stability|continuous_ping' tests | sed -n '1,260p'
echo
echo "== Smoke-marked file paths =="
rg -l '`@pytest`\.mark\.smoke' tests | sed -n '1,120p'
echo
echo "== Gating-marked file paths =="
rg -l '`@pytest`\.mark\.gating' tests | sed -n '1,120p'Length of output: 2441 🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
echo "== libs/vm/vm.py around first hunk =="
cat -n libs/vm/vm.py | sed -n '1,180p'
echo
echo "== Diff libs/vm/vm.py =="
git diff --unified=3 HEAD~1 -- libs/vm/vm.py | sed -n '1,220p'
echo
echo "== Diff tests/network/libs/stuntime.py =="
git diff --unified=3 HEAD~1 -- tests/network/libs/stuntime.py | sed -n '1,220p'
echo
echo "== Diff changed test files =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
echo "--- $f ---"
git diff --unified=3 HEAD~1 -- "$f" | sed -n '1,260p'
done
echo
echo "== wait_for_vmi_affinity callers =="
rg -n -C2 '\bwait_for_vmi_affinity\s*\(' tests libs || true
echo
echo "== measure_stuntime callers =="
rg -n -C2 '\bmeasure_stuntime\s*\(' tests || true
echo
echo "== migration_stuntime test definitions and markers =="
for f in tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py tests/network/localnet/migration_stuntime/test_migration_stuntime.py; do
echo "--- $f markers ---"
rg -n '`@pytest`\.mark\.(smoke|gating|tier[0-9]+|polarion|parametrize)' "$f" || true
echo "--- $f test defs ---"
rg -n '^\s*def test_' "$f" || true
done
echo
echo "== smoke-marked files =="
rg -l '`@pytest`\.mark\.smoke' tests || true
echo
echo "== gating-marked files =="
rg -l '`@pytest`\.mark\.gating' tests || true
echo
echo "== parent conftest references to modified symbol =="
rg -n -C2 'wait_for_vmi_affinity|set_template_affinity' tests/**/conftest.py tests/conftest.py || true
echo
echo "== First changed line in first file from PR patch =="
gh api "repos/$repo/pulls/$pr/files?per_page=100" --jq '.[] | select(.filename=="libs/vm/vm.py") | .patch' \
| python - <<'PY'
import sys,re
patch=sys.stdin.read().splitlines()
new_line=None
cur=None
for line in patch:
m=re.match(r'^@@ -\d+(?:,\d+)? \+(\d+)(?:,\d+)? @@', line)
if m:
cur=int(m.group(1))
continue
if cur is None:
continue
if line.startswith('+') and not line.startswith('+++'):
new_line=cur
print(new_line)
break
if line.startswith('-') and not line.startswith('---'):
continue
cur+=1
PYLength of output: 42844 🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
commit_id="c61964437a716526e0309d0d8e6d8481a50a28cb"
path="libs/vm/vm.py"
line="13"
body="$(cat <<'EOF'
**Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF
)"
post_inline() {
gh api \
--method POST \
"repos/$repo/pulls/$pr/comments" \
-f body="$body" \
-f commit_id="$commit_id" \
-f path="$path" \
-F line="$line" \
-f side="RIGHT" >/dev/null
}
post_fallback() {
gh pr comment "$pr" --repo "$repo" --body "$body" >/dev/null
}
for attempt in 1 2 3; do
if post_inline; then
exit 0
fi
sleep 2
done
post_fallbackLength of output: 3857 🏁 Script executed: #!/bin/bash
set -euo pipefail
repo="RedHatQE/openshift-virtualization-tests"
pr="5347"
commit_id="c61964437a716526e0309d0d8e6d8481a50a28cb"
path="libs/vm/vm.py"
line="13"
body="$(cat <<'EOF'
**Test Execution Plan**
- **Run smoke tests: False** — no verified dependency path from any `@pytest.mark.smoke` test or parent `conftest.py` to `libs/vm/vm.py::BaseVirtualMachine.wait_for_vmi_affinity()` or `tests/network/libs/stuntime.py::assert_affinity_after_migration()`.
- **Run gating tests: False** — no verified dependency path from any `@pytest.mark.gating` test to the changed symbols.
- **Affected tests to run**
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_off_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_client_migrates_to_server_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_off_client_node` (tier3)
- `tests/network/localnet/migration_stuntime/test_migration_stuntime.py::TestMigrationStuntime::test_server_migrates_to_client_node` (tier3)
These are the concrete affected paths because the changed tests now call `BaseVirtualMachine.wait_for_vmi_affinity()` directly after `set_template_affinity()`, and they conditionally call `tests/network/libs/stuntime.py::assert_affinity_after_migration()` after `migrate_vm_and_verify()`.
**Real tests (cluster required)**
Error path (the fix):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/localnet/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_off_server_node or test_server_migrates_off_client_node'`
Expected: after `set_template_affinity()`, the VMI affinity converges before migration, the migrated VM lands with matching anti-affinity rules, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
Happy path (regression):
`pytest -s -o log_cli=true -m tier3 --jira tests/network/l2_bridge/migration_stuntime/test_migration_stuntime.py -k 'test_client_migrates_to_server_node or test_server_migrates_to_client_node'`
Expected: affinity-targeted migrations still succeed, the target pod reflects the expected affinity type when `CNV-90576` is open, and stuntime stays within `STUNTIME_THRESHOLD_SECONDS`.
EOF
)"
echo "== Authenticated viewer =="
gh api user --jq '.login'
echo
echo "== Repo permissions =="
gh api "repos/$repo" --jq '{permissions: .permissions, private: .private, full_name: .full_name}'
echo
echo "== Attempt inline review comment via REST =="
set +e
gh api \
--method POST \
"repos/$repo/pulls/$pr/comments" \
-f body="$body" \
-f commit_id="$commit_id" \
-f path="$path" \
-F line="$line" \
-f side="RIGHT"
rc_inline=$?
set -e
echo "inline_rc=$rc_inline"
echo
echo "== Attempt fallback PR issue comment via REST =="
set +e
gh api \
--method POST \
"repos/$repo/issues/$pr/comments" \
-f body="$body"
rc_issue=$?
set -e
echo "issue_comment_rc=$rc_issue"
exit 0Length of output: 4168
CRITICAL: I could not complete the required inline PR comment workflow because the GitHub integration has no comment-write access on this PR. All attempts failed with
Because the blocker is permissions, not analysis quality, here is the exact comment body to post on the first changed line of Test Execution Plan
These are the concrete affected paths because the changed tests now call Real tests (cluster required) Error path (the fix): Happy path (regression): |
| _POD_AFFINITY_TYPE: Final[str] = "podAffinity" | ||
| _POD_ANTI_AFFINITY_TYPE: Final[str] = "podAntiAffinity" |
There was a problem hiding this comment.
These 2 are imported outside this module, please remove the underscore prefix.
virt-controller and VMIM controller reconcile independently — if migration is triggered before the VM controller propagates the template affinity to the VMI, the target pod is created with stale scheduling rules. Wait for the VMI to reflect the updated affinity before migrating, and assert post-migration node placement matches the expected affinity. Assisted-by: Claude <noreply@anthropic.com> Signed-off-by: Anat Wax <awax@redhat.com>
c619644 to
56a1e16
Compare
|
Change: Remove trailing underscore from constants. |
| """ | ||
| template_affinity = self._spec.template.spec.affinity | ||
| expected_affinity = ( | ||
| asdict(obj=template_affinity, dict_factory=self._filter_out_none_values) if template_affinity else None |
There was a problem hiding this comment.
We expect to have affinity=None in some VMs? We don't always use some default template?
There was a problem hiding this comment.
Yes, not all VMs have a default affinity in their template - for example, the stuntime server VM is created without any affinity, it's only added during the tests.
set_template_affinity accepts Affinity | None (None clears affinity), so wait_for_vmi_affinity handles both cases to stay consistent with that signature.
| expected_affinity = ( | ||
| asdict(obj=template_affinity, dict_factory=self._filter_out_none_values) if template_affinity else None | ||
| ) | ||
| for sample in TimeoutSampler( |
There was a problem hiding this comment.
Consider using a helper with @retry
| has_stuntime_rules = any( | ||
| expr.get("key", "").startswith("stuntime.") | ||
| for rule in pod_affinity[stale_type].get("requiredDuringSchedulingIgnoredDuringExecution", []) | ||
| for expr in rule.get("labelSelector", {}).get("matchExpressions", []) | ||
| ) |
There was a problem hiding this comment.
The rules are a bit unreadable
Consider splitting it to vars
|
/retest all Auto-triggered: Files in this PR were modified by merged PR #5285. Overlapping fileslibs/vm/vm.py |
What this PR does / why we need it:
When a VM template affinity is updated and migration is triggered immediately after, virt-controller and the VMIM controller race: the VMIM controller may read the VMI before the VM controller has reconciled the template change, creating the migration target pod with stale affinity rules. The VM ends up on the wrong node.
This PR adds
wait_for_vmi_affinity()toBaseVirtualMachine, which polls the VMI until its affinity matches the VM template — ensuring the VM controller has reconciled before migration is triggered. All stuntime tests (L2 bridge + localnet) now call it afterset_template_affinity.Additionally, temporary post-migration affinity assertions are added (gated behind
is_jira_open("CNV-90576")) to detect the race if it still occurs despite the wait. These assertions will be removed once CNV-90576 is resolved.The first commit removes a redundant
set_template_affinitycall fromtest_server_migrates_between_non_client_nodes— the preceding test already sets the same affinity and@pytest.mark.incrementalpreserves it.Which issue(s) this PR fixes:
Special notes for reviewer:
jira-ticket:
NONE
Summary by CodeRabbit