glusterd/snapshot (ZFS): refuse snapshot delete when a dependent clone exists by ThalesBarretto · Pull Request #4690 · gluster/glusterfs

ThalesBarretto · 2026-06-24T18:11:16Z

Summary

On a ZFS-backed volume, gluster snapshot clone (and snapshot restore) build the backend with
zfs clone and never zfs promote, so the clone keeps origin == <snapshot>. A later
snapshot delete of that source snapshot runs a plain zfs destroy (no -R), which ZFS correctly
refuses while a dependent clone exists. glusterd logs that failure but does not propagate it — the
return code is overwritten by the subsequent directory-cleanup sys_rmdir — so the CLI reports
snap removed successfully (exit 0), glusterd drops the snapshot object, and the backend ZFS snapshot is
left behind, now untracked. See #4689 for the full root-cause walk.

This change makes the delete fail before any state change, with a clear, actionable error, instead of
silently reporting success.

Fix

Add an optional dependents method to the snapshot-provider vtable struct glusterd_snap_ops.
Providers that do not track dependents leave the slot NULL (LVM is unchanged — designated-init).
Implement it for the ZFS provider (glusterd_zfs_snapshot_dependents): reconstruct the snapshot handle
<dataset>@<volname>_<brick_num> (mirroring glusterd_zfs_snapshot_remove) and run
zfs get -H -o value clones <snap_device>.
Gate the delete prevalidate (glusterd_snapshot_remove_prevalidate): walk the snap's volumes →
local bricks (MY_UUID) and refuse the op (op_errstr + op_errno, return −1) when a dependent clone
is confirmed.

Why prevalidate (not commit-phase error propagation): glusterd_snapshot_remove_commit marks the snap
for decommission before the brick-remove runs, so failing at commit time would leave a
half-decommissioned snapshot. Prevalidate refuses before any state is written. (The cluster-wide op-sm
lock serializes snapshot ops, so there is no prevalidate→commit TOCTOU.)

Conservative by design: *has_dependent is set true only on a positively-confirmed, non-empty
clones value. Every inconclusive result — dataset unresolvable, snapshot already gone, zfs not
runnable, output empty or - — returns "no dependent", so the check can never become a new way to block
a legitimate delete and preserves the existing tolerant "already gone" behavior.

Testing (GlusterFS 11.2 + ZFS 2.4.2, single node)

Built on-node and verified with a three-way control (swap only glusterd.so):

`glusterd.so`	`delete` of a clone-held snapshot	backend snapshot
stock	reports success (exit 0), snap dropped	orphaned
rebuilt, UNPATCHED (control)	reports success (exit 0), snap dropped	orphaned — rebuild alone doesn't change it
rebuilt, PATCHED	refused (exit 1), snap retained	retained

The unpatched-rebuild control reproduces identically to stock, so the patched result is attributable to
the source change. Positive controls on the patched build:

delete-with-dependent → refused, snap + backend retained, error names the clone (no state change);
independent snapshot (no clone) → still deletes and frees its backend snapshot (no regression);
after the dependency is removed → the snapshot deletes and the backend snapshot is genuinely gone
(so only the dependent case is blocked).

The patch applies cleanly to both devel and the 11.2 source.

Known limitations / disclosures (deliberately out of scope here)

snapshot delete all becomes fail-fast at a clone-held snapshot. The CLI already issues one delete
per snapshot and aborts on the first failure (gf_cli_snapshot_for_delete, by design — "Fail the
operation if deleting one of the snapshots is failed"). With this fix, a clone-held snapshot makes its
delete fail in prevalidate, so delete all stops there: snapshots processed before it are deleted,
the clone-held one and any after it remain (tested). This is the same fail-fast behavior as for any
un-deletable snapshot — and strictly better than the previous silent-orphan-and-continue. Making
delete all skip-and-continue past a refused snapshot is a possible follow-up, intentionally not done
here to keep the change minimal.
Operator remedy needs backend action. gluster volume delete <clone> does not free the backend
ZFS clone dataset (verified); the dependent clone has to be removed at the backend (e.g.
zfs destroy <clone_dataset>) before the snapshot can be deleted. This is a related gap (the clone
volume's backend dataset outliving volume delete) worth tracking separately; the error message here is
deliberately worded as "remove the dependent clone" rather than prescribing volume delete.
LVM: unchanged (NULL vtable slot; the LVM TU compiles clean against the new struct). Not exercised
on a live LVM brick.
Multi-brick / multi-peer: the check is per-brick and runs on each peer for its local bricks; any
peer confirming a dependent aborts the op. Verified on a single brick; the per-brick logic is identical
to the (multi-brick-correct) remove path.
auto-delete / restart-GC reach the removal sink via glusterd_snap_remove directly, bypassing this
delete prevalidate; they are neither covered nor changed by this fix. Verified: soft-limit
auto-delete of a clone-held oldest snapshot still drops it from glusterd and orphans the backend ZFS
snapshot, hitting the same swallow in glusterd_snapshot_remove. The shared root cause is that swallow
(every glusterd_snap_remove caller is exposed); a focused follow-up could guard the sink or make
auto-delete skip clone-held snapshots (reusing this dependents() method). Kept out of this PR because
it changes auto-delete's "always frees space" contract and warrants its own design + testing.
Synchronous backend probe: the check shells zfs get in prevalidate, so a hung pool blocks the
delete in prevalidate. This is the same synchronous-zfs exposure glusterd already has in these
snapshot ops (zfs list/zfs destroy), just surfaced earlier — not a new failure class.
Interaction with glusterd/snapshot: fix use-after-scope in glusterd_zfs_dataset() (CWE-562) #4688 (the glusterd_zfs_dataset use-after-scope fix): dependents() calls
glusterd_zfs_dataset with the current signature and consumes the result immediately (the established
pattern in create/remove/restore). If glusterd/snapshot: fix use-after-scope in glusterd_zfs_dataset() (CWE-562) #4688 lands first, rebase dependents() onto the caller-owned
buffer signature.

Note

This is not data loss (no -R is issued); it is a correctness bug (success reported for a failed
operation) plus an invisible, accumulating consumer of pool space. zfs promote is not a safe
alternative (it re-parents the live brick onto the snapshot-clone, after which a zfs destroy -R of the
clone would also destroy the live brick) — see #4689.

…e exists On a ZFS-backed volume, snapshot clone/restore build the backend with `zfs clone` and never `zfs promote`, so the clone keeps origin == <snapshot>. A later `snapshot delete` of that snapshot runs a plain `zfs destroy`, which ZFS refuses while a dependent clone exists; glusterd logs the failure but the return code is overwritten by the directory-cleanup rmdir, so it reports success, drops the snapshot object, and leaves the backend ZFS snapshot orphaned. Refuse the delete in prevalidate, before any state change, when the backend snapshot still has a dependent clone. Add an optional `dependents` method to the snapshot-provider vtable; the ZFS provider implements it via `zfs get -H -o value clones`. Providers that do not track dependents (LVM) leave the slot NULL and are unchanged. The check is conservative: it blocks only on a positively-confirmed dependent and tolerates every inconclusive result (snapshot already gone, zfs not runnable, empty output), so it never blocks a legitimate delete. Fixes: gluster#4689 Signed-off-by: Thales Antunes de Oliveira Barretto <thales.barretto.git@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

glusterd/snapshot (ZFS): refuse snapshot delete when a dependent clone exists#4690

glusterd/snapshot (ZFS): refuse snapshot delete when a dependent clone exists#4690
ThalesBarretto wants to merge 1 commit into
gluster:develfrom
ThalesBarretto:snap-zfs-restore-clone-orphan

ThalesBarretto commented Jun 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ThalesBarretto commented Jun 24, 2026

Summary

Fix

Testing (GlusterFS 11.2 + ZFS 2.4.2, single node)

Known limitations / disclosures (deliberately out of scope here)

Note

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant