glusterd/snapshot (ZFS): refuse snapshot delete when a dependent clone exists#4690
Open
ThalesBarretto wants to merge 1 commit into
Open
glusterd/snapshot (ZFS): refuse snapshot delete when a dependent clone exists#4690ThalesBarretto wants to merge 1 commit into
ThalesBarretto wants to merge 1 commit into
Conversation
…e exists On a ZFS-backed volume, snapshot clone/restore build the backend with `zfs clone` and never `zfs promote`, so the clone keeps origin == <snapshot>. A later `snapshot delete` of that snapshot runs a plain `zfs destroy`, which ZFS refuses while a dependent clone exists; glusterd logs the failure but the return code is overwritten by the directory-cleanup rmdir, so it reports success, drops the snapshot object, and leaves the backend ZFS snapshot orphaned. Refuse the delete in prevalidate, before any state change, when the backend snapshot still has a dependent clone. Add an optional `dependents` method to the snapshot-provider vtable; the ZFS provider implements it via `zfs get -H -o value clones`. Providers that do not track dependents (LVM) leave the slot NULL and are unchanged. The check is conservative: it blocks only on a positively-confirmed dependent and tolerates every inconclusive result (snapshot already gone, zfs not runnable, empty output), so it never blocks a legitimate delete. Fixes: gluster#4689 Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>
This was referenced Jun 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes: #4689
Summary
On a ZFS-backed volume,
gluster snapshot clone(andsnapshot restore) build the backend withzfs cloneand neverzfs promote, so the clone keepsorigin == <snapshot>. A latersnapshot deleteof that source snapshot runs a plainzfs destroy(no-R), which ZFS correctlyrefuses while a dependent clone exists. glusterd logs that failure but does not propagate it — the
return code is overwritten by the subsequent directory-cleanup
sys_rmdir— so the CLI reportssnap removed successfully(exit 0), glusterd drops the snapshot object, and the backend ZFS snapshot isleft behind, now untracked. See #4689 for the full root-cause walk.
This change makes the delete fail before any state change, with a clear, actionable error, instead of
silently reporting success.
Fix
dependentsmethod to the snapshot-provider vtablestruct glusterd_snap_ops.Providers that do not track dependents leave the slot NULL (LVM is unchanged — designated-init).
glusterd_zfs_snapshot_dependents): reconstruct the snapshot handle<dataset>@<volname>_<brick_num>(mirroringglusterd_zfs_snapshot_remove) and runzfs get -H -o value clones <snap_device>.glusterd_snapshot_remove_prevalidate): walk the snap's volumes →local bricks (
MY_UUID) and refuse the op (op_errstr+op_errno, return −1) when a dependent cloneis confirmed.
Why prevalidate (not commit-phase error propagation):
glusterd_snapshot_remove_commitmarks the snapfor decommission before the brick-remove runs, so failing at commit time would leave a
half-decommissioned snapshot. Prevalidate refuses before any state is written. (The cluster-wide op-sm
lock serializes snapshot ops, so there is no prevalidate→commit TOCTOU.)
Conservative by design:
*has_dependentis set true only on a positively-confirmed, non-emptyclonesvalue. Every inconclusive result — dataset unresolvable, snapshot already gone,zfsnotrunnable, output empty or
-— returns "no dependent", so the check can never become a new way to blocka legitimate delete and preserves the existing tolerant "already gone" behavior.
Testing (GlusterFS 11.2 + ZFS 2.4.2, single node)
Built on-node and verified with a three-way control (swap only
glusterd.so):glusterd.sodeleteof a clone-held snapshotThe unpatched-rebuild control reproduces identically to stock, so the patched result is attributable to
the source change. Positive controls on the patched build:
(so only the dependent case is blocked).
The patch applies cleanly to both
develand the 11.2 source.Known limitations / disclosures (deliberately out of scope here)
snapshot delete allbecomes fail-fast at a clone-held snapshot. The CLI already issues one deleteper snapshot and aborts on the first failure (
gf_cli_snapshot_for_delete, by design — "Fail theoperation if deleting one of the snapshots is failed"). With this fix, a clone-held snapshot makes its
delete fail in prevalidate, so
delete allstops there: snapshots processed before it are deleted,the clone-held one and any after it remain (tested). This is the same fail-fast behavior as for any
un-deletable snapshot — and strictly better than the previous silent-orphan-and-continue. Making
delete allskip-and-continue past a refused snapshot is a possible follow-up, intentionally not donehere to keep the change minimal.
gluster volume delete <clone>does not free the backendZFS clone dataset (verified); the dependent clone has to be removed at the backend (e.g.
zfs destroy <clone_dataset>) before the snapshot can be deleted. This is a related gap (the clonevolume's backend dataset outliving
volume delete) worth tracking separately; the error message here isdeliberately worded as "remove the dependent clone" rather than prescribing
volume delete.on a live LVM brick.
peer confirming a dependent aborts the op. Verified on a single brick; the per-brick logic is identical
to the (multi-brick-correct) remove path.
glusterd_snap_removedirectly, bypassing thisdelete prevalidate; they are neither covered nor changed by this fix. Verified: soft-limit
auto-delete of a clone-held oldest snapshot still drops it from glusterd and orphans the backend ZFS
snapshot, hitting the same swallow in
glusterd_snapshot_remove. The shared root cause is that swallow(every
glusterd_snap_removecaller is exposed); a focused follow-up could guard the sink or makeauto-delete skip clone-held snapshots (reusing this
dependents()method). Kept out of this PR becauseit changes auto-delete's "always frees space" contract and warrants its own design + testing.
zfs getin prevalidate, so a hung pool blocks thedelete in prevalidate. This is the same synchronous-
zfsexposure glusterd already has in thesesnapshot ops (
zfs list/zfs destroy), just surfaced earlier — not a new failure class.glusterd_zfs_datasetuse-after-scope fix):dependents()callsglusterd_zfs_datasetwith the current signature and consumes the result immediately (the establishedpattern in create/remove/restore). If glusterd/snapshot: fix use-after-scope in glusterd_zfs_dataset() (CWE-562) #4688 lands first, rebase
dependents()onto the caller-ownedbuffer signature.
Note
This is not data loss (no
-Ris issued); it is a correctness bug (success reported for a failedoperation) plus an invisible, accumulating consumer of pool space.
zfs promoteis not a safealternative (it re-parents the live brick onto the snapshot-clone, after which a
zfs destroy -Rof theclone would also destroy the live brick) — see #4689.