test: add debugging for raid tests by richm · Pull Request #625 · linux-system-roles/storage

richm · 2026-06-04T13:21:43Z

The raid tests are the ones which most often fail with flakes. Add
some debugging to try to find out why they fail.

Signed-off-by: Rich Megginson rmeggins@redhat.com

Summary by CodeRabbit

Tests
- Added a debug task to capture detailed system/storage/RAID diagnostics when tests fail, ensuring required tools are present.
- Wrapped RAID test sequences in structured blocks with rescue handling so failures trigger debug capture and then fail explicitly.
- Reorganized RAID lifecycle and option tests (create/verify/idempotence/remove) for clearer ordering and more reliable, diagnosable outcomes.

coderabbitai · 2026-06-04T13:21:56Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR adds a debug task that collects system and RAID diagnostics and updates four RAID test playbooks to wrap test sequences in Ansible block/rescue handlers that include the debug task and then fail the test on error.

Changes

Test failure diagnostics

Layer / File(s)	Summary
Debug information collection task `tests/tasks/debug.yml`	New Ansible task module that collects system debug information via a strict Bash script executing storage commands (lsblk, mdadm, dmsetup, lvm, udevadm), enumerates /dev/md* and runs `mdadm --detail`, outputs selected `lvs`/`pvs` fields and appends logs; task uses `changed_when: false`.
Apply debug rescue handlers to test playbooks `tests/tests_create_raid_pool_then_remove.yml`, `tests/tests_create_raid_volume_then_remove.yml`, `tests/tests_raid_pool_options.yml`, `tests/tests_raid_volume_options.yml`	Four RAID test playbooks are wrapped in named Ansible `block`s with `rescue` sections that include `tests/tasks/debug.yml` and then call `fail:` on error; task ordering around RAID create/verify/remove flows was adjusted accordingly.

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Description Format	⚠️ Warning	PR description lacks required template sections. Missing "Enhancement:", "Reason:", and "Result:" headers as defined in .github/pull_request_template.md.	Reformat the PR description to follow the template with Enhancement, Reason, Result, and optional Issue Tracker Tickets sections.
Description check	❓ Inconclusive	The PR description explains the purpose and reason (RAID tests fail frequently) but doesn't follow the template structure with all required sections.	Consider following the template with Enhancement, Reason, and Result sections for consistency with repository standards.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title follows the Conventional Commits format with type 'test' and a clear description about adding debugging for raid tests.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

codecov · 2026-06-04T13:23:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 10.16%. Comparing base (59fd1c6) to head (102d95e).
⚠️ Report is 147 commits behind head on main.

❗ There is a different number of reports uploaded between BASE (59fd1c6) and HEAD (102d95e). Click for more details.

HEAD has 1 upload less than BASE

Flag BASE (59fd1c6) HEAD (102d95e)

sanity 1 0

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #625      +/-   ##
==========================================
- Coverage   16.54%   10.16%   -6.39%     
==========================================
  Files           2        9       +7     
  Lines         284     2056    +1772     
  Branches       79        0      -79     
==========================================
+ Hits           47      209     +162     
- Misses        237     1847    +1610

Flag	Coverage Δ
sanity	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/tasks/debug.yml`:
- Around line 2-41: The playbook's debug tasks can abort on undefined variables
or command failures; update the Play "Gather debug information" (shell) to avoid
set -e (remove the -e flag or add "set +e" after setup) or make each diagnostic
command non-fatal (append "|| true" or wrap commands so failures don't exit),
and keep changed_when: false; update the RAID-related when checks to guard
against undefined vars by using "is defined" or defaults (e.g., replace
conditions referencing storage_test_volume.type and storage_test_volume._device
with checks like "storage_test_volume is defined and storage_test_volume.type ==
'raid' and (storage_test_volume._device is defined and
storage_test_volume._device | length > 0)" or use
"storage_test_volume._raw_device | default('') | length > 0" if you intend to
check _raw_device), fix the mismatch by making the when inspect the same field
used by the command (use _raw_device when checking before calling mdadm with
_raw_device), and make the mdadm command tasks and the /dev/md/... detail task
non-fatal by adding ignore_errors: true or failed_when: false so diagnostics run
best-effort.

In `@tests/tests_create_raid_pool_then_remove.yml`:
- Around line 204-219: The task title says "Create a RAID1 lvm raid device" but
the storage_pools entry sets raid_level: raid0, causing misleading logs; update
either the task name or the storage_pools configuration so they match—e.g.,
change raid_level to raid1 in the storage_pools > volumes entry (or rename the
task to RAID0) and ensure the raid_disks and expected behavior in the
surrounding tasks reference the same RAID level (check the storage_pools,
volumes, raid_disks, and raid_level symbols).

In `@tests/tests_raid_volume_options.yml`:
- Around line 18-25: The task title "Create a RAID0 device mounted on \"{{
mount_location }}\"" is inconsistent with the vars where
storage_volumes[0].raid_level is "raid1"; update either the task name or the
raid_level so they match (e.g., change the task title to mention RAID1 or set
storage_volumes[0].raid_level to "raid0") and ensure the referenced variable
names (storage_volumes, name: test1, raid_level) remain correct.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: a316fac1-fac0-43da-b725-64cf05c866bf

📥 Commits

Reviewing files that changed from the base of the PR and between 4dc4779 and a5a51c0.

📒 Files selected for processing (5)

tests/tasks/debug.yml
tests/tests_create_raid_pool_then_remove.yml
tests/tests_create_raid_volume_then_remove.yml
tests/tests_raid_pool_options.yml
tests/tests_raid_volume_options.yml

richm · 2026-06-04T14:35:21Z

[citest]

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@tests/tasks/debug.yml`:
- Line 6: The diagnostic script currently uses `set -euxo pipefail`, causing
early exit when commands like `mdadm --detail "$dev"` or `pvs`/`lvs` fail; make
the task non-fatal so subsequent diagnostics still run by either adding
`failed_when: false` to this Ansible task in tests/tasks/debug.yml or by
suppressing failures for the specific commands (e.g., append `|| true` or
temporarily disable `-e` for `mdadm --detail "$dev"` and the `pvs`/`lvs`
invocations) so failures are tolerated and the task completes on a best-effort
basis.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 9cd8f9ca-56d9-4efe-b62b-c7f3f7026c0b

📥 Commits

Reviewing files that changed from the base of the PR and between 7bd50a1 and 463b0c8.

📒 Files selected for processing (5)

tests/tasks/debug.yml
tests/tests_create_raid_pool_then_remove.yml
tests/tests_create_raid_volume_then_remove.yml
tests/tests_raid_pool_options.yml
tests/tests_raid_volume_options.yml

🚧 Files skipped from review as they are similar to previous changes (4)

tests/tests_create_raid_volume_then_remove.yml
tests/tests_raid_volume_options.yml
tests/tests_raid_pool_options.yml
tests/tests_create_raid_pool_then_remove.yml

richm · 2026-06-04T21:20:26Z

[citest]

richm · 2026-06-05T13:30:52Z

[citest]

richm · 2026-06-05T18:27:28Z

[citest]

richm · 2026-06-05T21:39:12Z

[citest]

richm · 2026-06-05T22:55:22Z

[citest]

The raid tests are the ones which most often fail with flakes. Add some debugging to try to find out why they fail. Signed-off-by: Rich Megginson <rmeggins@redhat.com>

vojtechtrefny · 2026-06-24T08:53:50Z

This is a bug somewhere in blivet. The problem is we don't recognize the LUKS device is actually unlocked so we don't close it before trying to stop and remove the MD array. I am not sure why blivet doesn't know the LUKS is opened (in some cases), I have some theories but no fix yet.

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread tests/tasks/debug.yml Outdated

Comment thread tests/tests_create_raid_pool_then_remove.yml Outdated

Comment thread tests/tests_raid_volume_options.yml Outdated

richm force-pushed the test-catch-raid-errors branch 2 times, most recently from e793a16 to 7bd50a1 Compare June 4, 2026 14:03

richm force-pushed the test-catch-raid-errors branch from 7bd50a1 to 463b0c8 Compare June 4, 2026 20:53

coderabbitai Bot reviewed Jun 4, 2026

View reviewed changes

Comment thread tests/tasks/debug.yml Outdated

richm added 5 commits June 23, 2026 17:34

test: add debugging for raid tests

77ade4a

The raid tests are the ones which most often fail with flakes. Add some debugging to try to find out why they fail. Signed-off-by: Rich Megginson <rmeggins@redhat.com>

more

f368728

more

96fca3f

more

da9bdff

more

102d95e

richm force-pushed the test-catch-raid-errors branch from e7c2341 to 102d95e Compare June 23, 2026 23:34

Conversation

richm commented Jun 4, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

codecov Bot commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

richm commented Jun 4, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

richm commented Jun 4, 2026

Uh oh!

richm commented Jun 5, 2026

Uh oh!

richm commented Jun 5, 2026

Uh oh!

richm commented Jun 5, 2026

Uh oh!

richm commented Jun 5, 2026

Uh oh!

vojtechtrefny commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

richm commented Jun 4, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 4, 2026 •

edited

Loading

codecov Bot commented Jun 4, 2026 •

edited

Loading

vojtechtrefny commented Jun 24, 2026 •

edited

Loading