fix(satellite): fire late-add metadata/self-heal only for unattached desired volumes (BUG-048 deadlock) by kvaps · Pull Request #168 · cozystack/blockstor

Andrei Kvapil (kvaps) · 2026-06-16T07:21:29Z

Summary

PR #164 (9866305, BUG-048 concurrent late vd-add fix) widened the ensurePerVolumeMetadata trigger gate in pkg/satellite/reconciler.go:

before (f3515045a): if !diskless && hasLateAddedVolume(resPath, dr) && !flip
after (fix(rest,store,satellite): converge concurrent late vd-adds (BUG-048) #164): if !diskless && dr.GetMetadataCreated() && !flip

dr.GetMetadataCreated() is true for the lifetime of every post-activation RD, so the per-volume metadata pass — and, alongside it on every diskful reconcile, the cluster-wide late-add self-heals — now fire on every reconcile of a converged/resizing diskful resource.

ensurePerVolumeMetadata runs a per-volume drbdadm dump-md probe, which is an md_buffer consumer. During a vd s resize DRBD holds md_buffer for the whole cluster-wide size change (change_cluster_wide_device_size / drbd_determine_dev_size). The per-reconcile dump-md then perpetually loses the cluster-wide state-change arbitration: the resize never completes, md_buffer is never released, and the resource deadlocks cluster-wide, reboot-proof (forensic repro: size-change retry count 10,790 vs the normal 2-20, on a zfs-thin 2-diskful + 1-diskless-client RD; clean at f3515045a). The cluster-wide self-heals (MintLateAddSource = disconnect + new-current-uuid --clear-bitmap + adjust; InvalidateVolume) compound the contention when the resized volume transiently dips below UpToDate.

CI was fully green on #164 (unit + integration + every E2E lane incl. the existing vd-resize cell) yet shipped this. Green CI proved nothing — so this PR's fix is inseparable from a regression test that fails on #164 and passes on the fix.

The fix

Narrow the gate to the race-free, deadlock-safe truth: fire the per-volume metadata pass only when some desired volume is NOT present-and-attached in the kernel (absent from the device list, or present but Diskless / no-metadata).

Rationale (DRBD invariant): a volume that is attached already has a valid DRBD-9 superblock — the kernel cannot attach a lower disk without one — so dump-md on it is pointless and is the md_buffer consumer that contends with the resize. The new predicate:

fires on a genuine late-add (the new volume is unattached / comes up Diskless) → BUG-048 fix(rest,store,satellite): converge concurrent late vd-adds (BUG-048) #164 preserved, and the existing TestApplyLateAddedVolumePerVolumeMetadataRunsDespitePreRenderedRes still passes;
skips a converged steady-state (all desired volumes attached) → no dump-md;
skips during a resize (all volumes attached, only the grown region resyncing) → no dump-md, no md_buffer contention → no deadlock.

Implementation:

pkg/drbd/drbdsetup_show.go — new Adm.AttachedVolumes: parses drbdsetup status <res> --json into the set of local volume numbers that are attached with metadata (disk-state UpToDate / Inconsistent / Consistent / Outdated). Conservative toward firing: empty set on any probe/parse failure or unloaded slot, so a cold start still runs the (idempotent) metadata pass — the pre-fix(rest,store,satellite): converge concurrent late vd-adds (BUG-048) #164 behaviour.
pkg/satellite/reconciler.go — hasUnattachedDesiredVolume predicate backing the gate (shouldEnsurePerVolumeMetadata).
pkg/satellite/reconciler.go — defense-in-depth: the BUG-048 late-add self-heals (maybeLateAddPromote / maybeKickLateAddResync) are gated out of any reconcile pass that just issued drbdadm resize. Their kernel-truth probes already veto a normal converged resize, but a cluster-wide size change can transiently dip a volume below UpToDate, and their recovery actions are themselves cluster-wide state-change / md_buffer operations that would reintroduce the deadlock.

The REST/idempotency layer (pkg/rest/volume_definitions.go) from #164 is not touched — that part is correct.

Regression test (fails on #164, passes on fix)

pkg/satellite/reconciler_bug048_resize_deadlock_test.go drives a full rec.Apply() and observes the mock-recorded command stream:

converged steady-state (all desired volumes attached UpToDate) → no drbdadm dump-md, no cluster-wide self-heal;
resize-in-flight (drbdadm resize issued, grown volume transiently Inconsistent) → same: no dump-md, no self-heal;
BUG-048 preserved: a desired volume NOT attached (genuine late-add) → the per-volume metadata pass DOES fire (dump-md + create-md for the missing volume).

Proven against the pre-fix dr.GetMetadataCreated() gate (temporarily restored locally):

pre-fix FAIL — drbdadm dump-md <rd>/0 and /1 fired on the converged reconcile; on the resize the self-heal disconnect <rd> + drbdsetup new-current-uuid --clear-bitmap 1001 fired mid-resize (MintLateAddSource minting a source on the transiently-Inconsistent volume during a resize — the catastrophic contention);
with the fix PASS — no dump-md, no self-heal in steady-state/resize; the late-add bookend fires the metadata pass for the unattached volume.

pkg/drbd/attached_volumes_test.go unit-pins the AttachedVolumes parse (only metadata-bearing disk-states count; Diskless / transient / absent excluded; empty on no kernel slot).

CLI-bug-fix protocol cells

The fix touches DRBD resize / satellite, so per the project protocol:

L6 tests/e2e/cli-matrix/bug-048-resize-deadlock-zfs-thin.sh — builds the forensic shape (zfs-thin, 2 diskful + 1 diskless-client), drives 3 sequential grows under reconcile pressure, asserts each grow completes (size lands + diskful replicas re-reach UpToDate within a bounded window) with no deadlock surface (StandAlone / suspended).
L7 tests/operator-harness/replay/bug048-resize.yaml — codifies the operator sequence (rd c → vd c 1G → 2 diskful + 1 diskless-client → grow 1G→2G→3G) with vd_size_kib + all_uptodate + replica_diskless convergence asserts.

Honest scope: the full kernel md_buffer deadlock needs scale/contention to escalate (forensic repro ~10k size-change retries), so the deterministic CI guard for this bug is the L1 unit regression; the L6/L7 cells are best-effort stand reproduction the orchestrator runs on a stand under load (a hang/timeout there is the deadlock).

Validation

go build ./... — clean.
golangci-lint run — 0 issues.
go test ./... — green (29 packages).
go test -race ./pkg/satellite/... ./pkg/drbd/... ./internal/controller/... ./pkg/rest/... ./pkg/store/... — no DATA RACE. (A pre-existing pkg/store/k8s optimistic-lock contention flake in patch_bug_201_test.go surfaces only under -race; that package is byte-identical to main and untouched by this PR.)
Integration (envtest 1.34.1) — go test -tags=integration ./tests/integration/... 127 PASS / 0 FAIL, incl. TestGroupKWFLateVD / TestGroupEVDCreateListDelete / TestGroupKWFHappyPath (BUG-048 stays fixed). The existing TestApplyLateAddedVolumePerVolumeMetadataRunsDespitePreRenderedRes and the maybe_late_add_promote / needs_late_add_promote tests stay green.

Summary by CodeRabbit

Release Notes

Bug Fixes
- Fixed a DRBD resize deadlock that could occur during volume expansion operations, improving system stability during cluster resizing.
- Enhanced metadata handling and volume attachment detection for more reliable DRBD operations.
Tests
- Added comprehensive test coverage for DRBD resize scenarios and metadata handling to prevent regression.

…desired volumes (BUG-048 deadlock) PR #164 widened the ensurePerVolumeMetadata trigger gate from the .res-count hasLateAddedVolume() to the unconditional dr.GetMetadataCreated(), so every post-activation diskful reconcile ran a per-volume drbdadm dump-md (an md_buffer consumer) plus the cluster- wide late-add self-heals. During a vd s resize DRBD holds md_buffer for the whole cluster-wide size change (change_cluster_wide_device_size / drbd_determine_dev_size); the per-reconcile md probe + self-heals then perpetually lose the cluster-wide state-change arbitration, the resize never completes, md_buffer is never released, and the resource deadlocks cluster-wide, reboot-proof. Narrow the gate to the race-free, deadlock-safe truth: fire the per- volume metadata pass ONLY when some desired volume is NOT present-and- attached in the kernel (absent, or present but Diskless / no-metadata). An attached volume already has a valid DRBD-9 superblock (the kernel cannot attach a lower disk without one), so dump-md on it is pointless and is the md_buffer consumer that contends with the resize. This still fires for a genuine late-add (the new volume is unattached) so BUG-048 #164 and the pre-rendered-.res regression are preserved, but skips a converged steady-state and a resize-in-flight (all volumes attached). Add Adm.AttachedVolumes (parses drbdsetup status --json into the set of attached, metadata-bearing local volume numbers) as the kernel-truth backing for the new hasUnattachedDesiredVolume predicate. Also gate the late-add promote / resync-kick self-heals OUT of a reconcile that just issued drbdadm resize (defense-in-depth: their recovery actions are themselves cluster-wide state-change / md_buffer operations). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

…dump-md/self-heal on attached volumes) L1 regression for the BUG-048 resize deadlock. The satellite test drives a full rec.Apply() and observes the mock-recorded command stream: - converged steady-state (all desired volumes attached UpToDate) → NO drbdadm dump-md, NO cluster-wide self-heal (disconnect / new-current-uuid --clear-bitmap / invalidate); - resize-in-flight (drbdadm resize issued, grown volume transiently Inconsistent) → same: no dump-md, no self-heal; - BUG-048 preserved: a desired volume NOT attached in the kernel (genuine late-add) → the per-volume metadata pass DOES fire (dump-md + create-md for the missing volume). Proven to FAIL on the pre-fix dr.GetMetadataCreated() gate (dump-md fired on the converged reconcile; MintLateAddSource disconnect + new-current-uuid fired mid-resize) and PASS with the fix. drbd AttachedVolumes unit pins the kernel-truth parse: only metadata-bearing disk-states (UpToDate/Inconsistent/Consistent/Outdated) count as attached; Diskless / transient / absent volumes do not; empty on no kernel slot (fail-toward-firing). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

… (zfs-thin 2-diskful+1-diskless) CLI-bug-fix protocol cells for the BUG-048 resize deadlock (the fix touches DRBD resize / satellite paths). L6 cli-matrix bug-048-resize-deadlock-zfs-thin.sh: build the forensic shape (zfs-thin, 2 diskful + 1 diskless-client), drive 3 sequential grows under reconcile pressure, and assert each grow COMPLETES (size lands + diskful replicas re-reach UpToDate within a bounded window) with NO deadlock surface (StandAlone / suspended / a diskful volume stuck below UpToDate). With the bug the first grow stalls in a size-change retry storm and the assert times out. L7 replay bug048-resize.yaml: codify the operator sequence (rd c -> vd c 1G -> 2 diskful + 1 diskless-client -> grow 1G->2G->3G) with vd_size_kib + all_uptodate + replica_diskless convergence asserts; the grow timeouts are deadlock discriminators, not convergence targets. Honest scope: the full kernel md_buffer deadlock needs scale/contention to escalate, so the DETERMINISTIC CI guard for this bug is the L1 unit regression; these cells are best-effort stand reproduction the orchestrator runs on a stand under load. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com>

coderabbitai · 2026-06-16T07:21:40Z

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 77452d96-4827-498b-ad40-fbeb1c293464

📥 Commits

Reviewing files that changed from the base of the PR and between d364a44 and 77f374d.

📒 Files selected for processing (6)

pkg/drbd/attached_volumes_test.go
pkg/drbd/drbdsetup_show.go
pkg/satellite/reconciler.go
pkg/satellite/reconciler_bug048_resize_deadlock_test.go
tests/e2e/cli-matrix/bug-048-resize-deadlock-zfs-thin.sh
tests/operator-harness/replay/bug048-resize.yaml

📝 Walkthrough

Walkthrough

Fixes a DRBD md_buffer deadlock (BUG-048) triggered during volume resize. Adds Adm.AttachedVolumes to probe kernel disk-state via drbdsetup status --json. The reconciler's per-volume metadata gate now uses kernel-attached state instead of .res volume counts, and late-add self-heals are suppressed during resize passes. Adds unit, regression, e2e, and operator-harness replay tests.

Changes

BUG-048 DRBD Resize Deadlock Fix

Layer / File(s)	Summary
`AttachedVolumes` implementation `pkg/drbd/drbdsetup_show.go`	Adds `Adm.AttachedVolumes` which runs `drbdsetup status --json` and returns a set of volume numbers whose `disk-state` carries metadata (`UpToDate`, `Inconsistent`, `Consistent`, `Outdated`); `attachedVolumeSet` performs the disk-state filter.
`AttachedVolumes` unit tests `pkg/drbd/attached_volumes_test.go`	Three tests via fake executor: only metadata-bearing disk states included, absent volumes excluded, empty kernel output returns nil.
Reconciler metadata gate and helpers `pkg/satellite/reconciler.go`	Adds `hasUnattachedDesiredVolume` and `shouldEnsurePerVolumeMetadata` helpers; replaces the prior `!diskless && MetadataCreated && !isDisklessToDiskfulFlip` condition in `applyDRBD` with a call to `shouldEnsurePerVolumeMetadata`, gating metadata creation on kernel-attached state.
Reconciler resizing guard in self-heal pipeline `pkg/satellite/reconciler.go`	Threads `resizing` boolean into `finishDRBDApply` and `maybePromoteSelfHeals`; wraps `maybeLateAddPromote` and `maybeKickLateAddResync` in `if !resizing` to suppress late-add self-heals during resize passes.
Reconciler BUG-048 regression tests `pkg/satellite/reconciler_bug048_resize_deadlock_test.go`	Three regression tests (steady-state, resize-in-flight, true late-add) with fixture helpers and `assertNoDumpMd` / `assertNoLateAddSelfHeal` assertion helpers covering the full behavior matrix.
E2E and operator-harness replay tests `tests/e2e/cli-matrix/bug-048-resize-deadlock-zfs-thin.sh`, `tests/operator-harness/replay/bug048-resize.yaml`	ZFS-thin e2e script performs three sequential grows on a live cluster with deadlock surface checks after each; YAML replay spec provisions replicas and runs two resize steps with `all_uptodate` and diskless-not-wedged assertions.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

cozystack/blockstor#120: Modifies the same finishDRBDApply/maybePromoteSelfHeals code path in pkg/satellite/reconciler.go to add the maybeSoloPromote logic, which is the same pipeline extended by this PR's resizing guard.

Poem

🐇 Hop hop, the deadlock is gone at last,
The kernel whispers which volumes are fast,
No metadata probes when resize is near,
Late-add self-heals wait till the coast is clear,
Three grows tested, no md_buffer in sight —
BUG-048 banished to the night! ✨

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/bug-048-resize-deadlock

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

gemini-code-assist

Code Review

This pull request addresses a cluster-wide resize deadlock (BUG-048) by optimizing how per-volume metadata is reconciled. It introduces a gate that prevents unnecessary 'drbdadm dump-md' probes on already-attached volumes, which previously contended with in-flight resizes for the md_buffer. The implementation adds 'AttachedVolumes' to probe kernel state and 'shouldEnsurePerVolumeMetadata' to safely gate the metadata pass. The review identified potential nil pointer dereferences when accessing 'r.cfg.Adm' and suggested a minor CLI flag adjustment for consistency with existing code, all of which are addressed in the provided feedback.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

gemini-code-assist · 2026-06-16T07:24:08Z

+func (r *Reconciler) hasUnattachedDesiredVolume(ctx context.Context, dr *intent.DesiredResource) bool {
+	attached := r.cfg.Adm.AttachedVolumes(ctx, dr.GetName())
+
+	for _, vol := range dr.GetVolumes() {
+		if _, ok := attached[vol.GetVolumeNumber()]; !ok {
+			return true
+		}
+	}
+
+	return false
+}


The hasUnattachedDesiredVolume method accesses r.cfg.Adm directly. Since r.cfg.Adm can be nil in certain configurations (such as storage-only setups or specific unit tests), this will cause a nil pointer dereference panic. Adding a defensive nil guard prevents this.

Suggested change

func (r *Reconciler) hasUnattachedDesiredVolume(ctx context.Context, dr *intent.DesiredResource) bool {

attached := r.cfg.Adm.AttachedVolumes(ctx, dr.GetName())

for _, vol := range dr.GetVolumes() {

if _, ok := attached[vol.GetVolumeNumber()]; !ok {

return true

}

}

return false

}

func (r *Reconciler) hasUnattachedDesiredVolume(ctx context.Context, dr *intent.DesiredResource) bool {

if r.cfg.Adm == nil {

return true

}

attached := r.cfg.Adm.AttachedVolumes(ctx, dr.GetName())

for _, vol := range dr.GetVolumes() {

if _, ok := attached[vol.GetVolumeNumber()]; !ok {

return true

}

}

return false

}

gemini-code-assist · 2026-06-16T07:24:08Z

+func (r *Reconciler) shouldEnsurePerVolumeMetadata(ctx context.Context, dr *intent.DesiredResource, diskless bool) bool {
+	return !diskless &&
+		dr.GetMetadataCreated() &&
+		r.hasUnattachedDesiredVolume(ctx, dr) &&
+		!r.isDisklessToDiskfulFlip(ctx, dr, diskless)
+}


The shouldEnsurePerVolumeMetadata method should guard against a nil r.cfg.Adm before calling hasUnattachedDesiredVolume to avoid unnecessary execution and potential panics.

Suggested change

func (r *Reconciler) shouldEnsurePerVolumeMetadata(ctx context.Context, dr *intent.DesiredResource, diskless bool) bool {

return !diskless &&

dr.GetMetadataCreated() &&

r.hasUnattachedDesiredVolume(ctx, dr) &&

!r.isDisklessToDiskfulFlip(ctx, dr, diskless)

}

func (r *Reconciler) shouldEnsurePerVolumeMetadata(ctx context.Context, dr *intent.DesiredResource, diskless bool) bool {

return !diskless &&

r.cfg.Adm != nil &&

dr.GetMetadataCreated() &&

r.hasUnattachedDesiredVolume(ctx, dr) &&

!r.isDisklessToDiskfulFlip(ctx, dr, diskless)

}

gemini-code-assist · 2026-06-16T07:24:08Z

+func (a *Adm) AttachedVolumes(ctx context.Context, resource string) map[int32]struct{} {
+	out, err := a.exec.Run(ctx, "drbdsetup", "status", resource, "--json")
+	if err != nil {
+		return nil
+	}


For consistency with the Show method in the same file (line 221), consider using "-j" instead of "--json" and placing the option before the positional resource argument. Some older or stricter CLI parsers can be sensitive to option ordering (placing options after positional arguments).

Suggested change

func (a *Adm) AttachedVolumes(ctx context.Context, resource string) map[int32]struct{} {

out, err := a.exec.Run(ctx, "drbdsetup", "status", resource, "--json")

if err != nil {

return nil

}

func (a *Adm) AttachedVolumes(ctx context.Context, resource string) map[int32]struct{} {

out, err := a.exec.Run(ctx, "drbdsetup", "status", "-j", resource)

if err != nil {

return nil

}

BUG-048 concurrent late-vd fix (#164 converge) + resize-deadlock guard (#168), release-gate GO + completed 24h ZFS-thick burn-in. Also documents the #167 auto-release pipeline that now cuts the GitHub Release on tag. Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>

Andrei Kvapil (kvaps) and others added 3 commits June 16, 2026 09:46

gemini-code-assist Bot reviewed Jun 16, 2026

View reviewed changes

Andrei Kvapil (kvaps) marked this pull request as ready for review June 16, 2026 08:37

Andrei Kvapil (kvaps) merged commit 825b6e5 into main Jun 16, 2026
26 of 27 checks passed

Andrei Kvapil (kvaps) mentioned this pull request Jun 17, 2026

chore(release): v0.1.14 #169

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(satellite): fire late-add metadata/self-heal only for unattached desired volumes (BUG-048 deadlock)#168

fix(satellite): fire late-add metadata/self-heal only for unattached desired volumes (BUG-048 deadlock)#168
Andrei Kvapil (kvaps) merged 3 commits into
mainfrom
fix/bug-048-resize-deadlock

Andrei Kvapil (kvaps) commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

Andrei Kvapil (kvaps) commented Jun 16, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

The fix

Regression test (fails on #164, passes on fix)

CLI-bug-fix protocol cells

Validation

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Poem

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Jun 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Andrei Kvapil (kvaps) commented Jun 16, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 16, 2026 •

edited

Loading