test: add debugging for raid tests#625
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughThis PR adds a debug task that collects system and RAID diagnostics and updates four RAID test playbooks to wrap test sequences in Ansible block/rescue handlers that include the debug task and then fail the test on error. ChangesTest failure diagnostics
🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## main #625 +/- ##
==========================================
- Coverage 16.54% 10.16% -6.39%
==========================================
Files 2 9 +7
Lines 284 2056 +1772
Branches 79 0 -79
==========================================
+ Hits 47 209 +162
- Misses 237 1847 +1610
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/tasks/debug.yml`:
- Around line 2-41: The playbook's debug tasks can abort on undefined variables
or command failures; update the Play "Gather debug information" (shell) to avoid
set -e (remove the -e flag or add "set +e" after setup) or make each diagnostic
command non-fatal (append "|| true" or wrap commands so failures don't exit),
and keep changed_when: false; update the RAID-related when checks to guard
against undefined vars by using "is defined" or defaults (e.g., replace
conditions referencing storage_test_volume.type and storage_test_volume._device
with checks like "storage_test_volume is defined and storage_test_volume.type ==
'raid' and (storage_test_volume._device is defined and
storage_test_volume._device | length > 0)" or use
"storage_test_volume._raw_device | default('') | length > 0" if you intend to
check _raw_device), fix the mismatch by making the when inspect the same field
used by the command (use _raw_device when checking before calling mdadm with
_raw_device), and make the mdadm command tasks and the /dev/md/... detail task
non-fatal by adding ignore_errors: true or failed_when: false so diagnostics run
best-effort.
In `@tests/tests_create_raid_pool_then_remove.yml`:
- Around line 204-219: The task title says "Create a RAID1 lvm raid device" but
the storage_pools entry sets raid_level: raid0, causing misleading logs; update
either the task name or the storage_pools configuration so they match—e.g.,
change raid_level to raid1 in the storage_pools > volumes entry (or rename the
task to RAID0) and ensure the raid_disks and expected behavior in the
surrounding tasks reference the same RAID level (check the storage_pools,
volumes, raid_disks, and raid_level symbols).
In `@tests/tests_raid_volume_options.yml`:
- Around line 18-25: The task title "Create a RAID0 device mounted on \"{{
mount_location }}\"" is inconsistent with the vars where
storage_volumes[0].raid_level is "raid1"; update either the task name or the
raid_level so they match (e.g., change the task title to mention RAID1 or set
storage_volumes[0].raid_level to "raid0") and ensure the referenced variable
names (storage_volumes, name: test1, raid_level) remain correct.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: a316fac1-fac0-43da-b725-64cf05c866bf
📒 Files selected for processing (5)
tests/tasks/debug.ymltests/tests_create_raid_pool_then_remove.ymltests/tests_create_raid_volume_then_remove.ymltests/tests_raid_pool_options.ymltests/tests_raid_volume_options.yml
e793a16 to
7bd50a1
Compare
|
[citest] |
7bd50a1 to
463b0c8
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tests/tasks/debug.yml`:
- Line 6: The diagnostic script currently uses `set -euxo pipefail`, causing
early exit when commands like `mdadm --detail "$dev"` or `pvs`/`lvs` fail; make
the task non-fatal so subsequent diagnostics still run by either adding
`failed_when: false` to this Ansible task in tests/tasks/debug.yml or by
suppressing failures for the specific commands (e.g., append `|| true` or
temporarily disable `-e` for `mdadm --detail "$dev"` and the `pvs`/`lvs`
invocations) so failures are tolerated and the task completes on a best-effort
basis.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 9cd8f9ca-56d9-4efe-b62b-c7f3f7026c0b
📒 Files selected for processing (5)
tests/tasks/debug.ymltests/tests_create_raid_pool_then_remove.ymltests/tests_create_raid_volume_then_remove.ymltests/tests_raid_pool_options.ymltests/tests_raid_volume_options.yml
🚧 Files skipped from review as they are similar to previous changes (4)
- tests/tests_create_raid_volume_then_remove.yml
- tests/tests_raid_volume_options.yml
- tests/tests_raid_pool_options.yml
- tests/tests_create_raid_pool_then_remove.yml
|
[citest] |
4 similar comments
|
[citest] |
|
[citest] |
|
[citest] |
|
[citest] |
e7c2341 to
102d95e
Compare
|
This is a bug somewhere in blivet. The problem is we don't recognize the LUKS device is actually unlocked so we don't close it before trying to stop and remove the MD array. I am not sure why blivet doesn't know the LUKS is opened (in some cases), I have some theories but no fix yet. |
The raid tests are the ones which most often fail with flakes. Add
some debugging to try to find out why they fail.
Signed-off-by: Rich Megginson rmeggins@redhat.com
Summary by CodeRabbit