Skip to content

Commit c193457

Browse files
kvapsclaude
andauthored
fix(satellite,rest): close last-UpToDate-delete race on Secondary SyncSource (BUG-045) (#159)
* fix(satellite): stamp diskState=UpToDate for Secondary SyncSource (BUG-045) A diskful replica that is Secondary and acts as the SyncSource for a freshly-added SyncTarget peer could end up with a blank .status.volumes[].diskState in the CRD projection. On a Secondary SyncSource drbd-9 does not re-emit a local `device` frame carrying `disk:UpToDate` (the local disk-state did not transition), so when the volume-cache entry is (re)created by the peer-device `replication: SyncSource` frame it carries an empty DiskState and the projection leaves diskState blank. That blank diskState is the load-bearing input the U130 last-copy delete guard reads, so it could false-allow deleting the only UpToDate source while a peer is mid-sync, stranding the SyncTarget with no source (a data-availability loss). Fix: when the observer sees a peer-device `replication:SyncSource` frame, stamp the local volume's DiskState=UpToDate. A SyncSource feeds a peer's resync only from an UpToDate local disk, so this is a hard DRBD invariant. The stamp is idempotent and never downgrades a richer local-frame observation (mergeVolumeInto only upgrades to a non-empty value, and UpToDate is the terminal disk-state). Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * fix(rest): make U130 last-copy delete guard fail safe (BUG-045) The U130 guard decided whether deleting the last UpToDate diskful replica would strand a mid-sync peer solely from the CRD `diskState` projection. When the satellite observer left a live source's diskState blank (the Secondary SyncSource race), the guard concluded the target was "not a source" and allowed the destructive delete, stranding the SyncTarget with no UpToDate source. Harden the guard to fail safe — when it cannot positively confirm that another UpToDate copy would survive, it refuses: - Treat a SyncSource replication-state as kernel ground truth that a replica holds an UpToDate copy (a SyncSource only feeds a resync from an UpToDate local disk), so a lagging/blank diskState no longer hides a real source. This applies to both the target (do not conclude "not a source") and the siblings (a SyncSource sibling counts as a surviving source, avoiding over-refusal of legitimate deletes). - Treat an empty/unknown diskState on a diskful target conservatively: while a peer is mid-sync, refuse the last-copy delete rather than trust an unstamped projection. A false refusal only asks the operator to wait or pass `?force=true`; a false allow is unrecoverable. Legitimate last-copy deletes (no peer mid-sync) and the `?force=true` override are unaffected. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> --------- Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent d1d583f commit c193457

4 files changed

Lines changed: 458 additions & 12 deletions

File tree

pkg/rest/resource_delete_last_uptodate_u130.go

Lines changed: 103 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -65,20 +65,44 @@ import (
6565
// true when the guard MUST fire and the delete must be refused.
6666
//
6767
// `siblings` is the full replica set of the RD (including `target`).
68+
//
69+
// This is a data-availability guard, so it FAILS SAFE: when it cannot
70+
// positively confirm that another UpToDate copy would survive the
71+
// delete, it refuses. The load-bearing safety property is "never drop
72+
// the last source of a syncing peer" — a false refusal merely asks the
73+
// operator to wait or pass `?force=true`, whereas a false allow strands
74+
// the SyncTarget with no UpToDate source (unrecoverable data loss).
75+
//
76+
// BUG-045 hardening: the decision must NOT rest solely on the
77+
// possibly-lagging CRD `diskState` projection. The satellite observer
78+
// can leave a live source's `diskState` blank (a Secondary SyncSource
79+
// whose local `device` frame's `disk:UpToDate` was not re-stamped),
80+
// while the SyncSource replication-state — emitted on the reliably-
81+
// delivered peer-device frame — is present. The guard therefore treats
82+
// a SyncSource replication-state as kernel ground-truth evidence that a
83+
// replica holds an UpToDate copy, and treats an unknown/empty diskState
84+
// on a diskful replica conservatively rather than as "not a source".
6885
func resourceMidSyncDeleteRefusal(target *apiv1.Resource, siblings []apiv1.Resource) bool {
6986
// A diskless / tiebreaker replica carries no UpToDate backing
7087
// storage; removing it never strands a SyncTarget's only source.
7188
if !resourceIsDiskful(target) {
7289
return false
7390
}
7491

75-
if !resourceIsUpToDate(target) {
76-
// The target itself is not an UpToDate source — removing it
77-
// cannot remove "the last UpToDate source". Out of scope.
92+
// The target must be a plausible last source. A diskful replica
93+
// that the kernel confirms is UpToDate or SyncSource clearly is.
94+
// A diskful replica whose diskState is empty/unknown is ALSO treated
95+
// as a possible source here (fail-safe): the BUG-045 race leaves a
96+
// real source's diskState blank, and concluding "not a source" on an
97+
// unknown projection is exactly the false-allow that strands the
98+
// SyncTarget. Only a target the kernel positively reports as NOT a
99+
// complete copy (Inconsistent / SyncTarget / Diskless / Outdated /
100+
// Failed) is out of scope.
101+
if !resourceIsPossibleUpToDateSource(target) {
78102
return false
79103
}
80104

81-
otherUpToDateDiskful := 0
105+
otherConfirmedSources := 0
82106
otherMidSyncDiskful := 0
83107

84108
for i := range siblings {
@@ -92,25 +116,91 @@ func resourceMidSyncDeleteRefusal(target *apiv1.Resource, siblings []apiv1.Resou
92116
}
93117

94118
switch {
95-
case resourceIsUpToDate(sib):
96-
otherUpToDateDiskful++
119+
case resourceIsConfirmedUpToDateSource(sib):
120+
// Kernel-confirmed surviving source: UpToDate diskState or a
121+
// SyncSource replication-state. The SyncTarget keeps a valid
122+
// source after the delete.
123+
otherConfirmedSources++
97124
case resourceIsMidSync(sib):
98125
otherMidSyncDiskful++
99126
}
100127
}
101128

102-
// Safe: another UpToDate diskful survives — the SyncTarget keeps a
103-
// valid source after the delete, or there is simply nothing in
104-
// flight to strand.
105-
if otherUpToDateDiskful > 0 {
129+
// Safe: a kernel-confirmed UpToDate/SyncSource diskful survives —
130+
// the SyncTarget keeps a valid source after the delete, or there is
131+
// simply nothing in flight to strand.
132+
if otherConfirmedSources > 0 {
106133
return false
107134
}
108135

109-
// Refuse only when the target is the sole UpToDate source AND a
110-
// diskful peer is mid-sync that would be stranded by the delete.
136+
// Refuse when the target is the sole surviving source AND a diskful
137+
// peer is mid-sync that would be stranded by the delete.
111138
return otherMidSyncDiskful > 0
112139
}
113140

141+
// resourceIsConfirmedUpToDateSource reports whether the kernel
142+
// positively confirms the replica holds a complete, current copy that
143+
// can serve a resync: its local disk_state is UpToDate, OR it reports a
144+
// SyncSource replication-state toward a peer. The SyncSource signal is
145+
// kernel ground truth — DRBD only lets a node feed a peer's resync from
146+
// an UpToDate local disk — so it stands in for a lagging/blank diskState
147+
// projection (BUG-045).
148+
func resourceIsConfirmedUpToDateSource(res *apiv1.Resource) bool {
149+
return resourceIsUpToDate(res) || resourceIsSyncSource(res)
150+
}
151+
152+
// resourceIsPossibleUpToDateSource reports whether the replica might be
153+
// the last UpToDate source and so must keep the guard armed. It is the
154+
// fail-safe widening of resourceIsConfirmedUpToDateSource: in addition
155+
// to kernel-confirmed sources it also returns true for a diskful replica
156+
// whose diskState is empty/unknown (the BUG-045 unstamped-projection
157+
// case). Only a replica the kernel positively reports as NOT a complete
158+
// copy is excluded.
159+
func resourceIsPossibleUpToDateSource(res *apiv1.Resource) bool {
160+
if resourceIsConfirmedUpToDateSource(res) {
161+
return true
162+
}
163+
164+
// An empty/unknown diskState on a diskful replica is treated
165+
// conservatively as "might be a source": refusing in doubt is the
166+
// fail-safe choice for a last-copy delete. A replica the kernel
167+
// positively reports mid-sync or not-a-copy is excluded by
168+
// resourceHasKnownDiskState being true with a non-UpToDate value.
169+
return !resourceHasKnownDiskState(res)
170+
}
171+
172+
// resourceHasKnownDiskState reports whether any of the replica's volumes
173+
// carries a non-empty kernel-observed disk_state. False means the
174+
// observer has not (yet) projected a disk_state for the replica — the
175+
// BUG-045 unstamped-projection window — which the last-copy guard treats
176+
// conservatively.
177+
func resourceHasKnownDiskState(res *apiv1.Resource) bool {
178+
for i := range res.Volumes {
179+
if res.Volumes[i].State.DiskState != "" {
180+
return true
181+
}
182+
}
183+
184+
return false
185+
}
186+
187+
// resourceIsSyncSource reports whether the replica reports a SyncSource
188+
// replication-state toward any peer on any volume. A SyncSource is, by
189+
// DRBD invariant, holding an UpToDate local copy (it is actively feeding
190+
// a peer's resync), so this is authoritative evidence the replica is a
191+
// valid source even when the diskState projection lags or is blank.
192+
func resourceIsSyncSource(res *apiv1.Resource) bool {
193+
for i := range res.Volumes {
194+
for _, rep := range res.Volumes[i].State.ReplicationStates {
195+
if rep.ReplicationState == drbdReplStateSyncSource {
196+
return true
197+
}
198+
}
199+
}
200+
201+
return false
202+
}
203+
114204
// resourceIsDiskful reports whether a replica carries real backing
115205
// storage (not DISKLESS, not TIE_BREAKER).
116206
func resourceIsDiskful(res *apiv1.Resource) bool {
@@ -169,6 +259,7 @@ const (
169259
drbdDiskStateInconsistent = "Inconsistent"
170260
drbdDiskStateSyncTarget = "SyncTarget"
171261
drbdReplStateSyncTarget = "SyncTarget"
262+
drbdReplStateSyncSource = "SyncSource"
172263
)
173264

174265
// refuseLastUpToDateDiskfulMidSyncDelete is the U130 gate invoked from

pkg/rest/resource_delete_last_uptodate_u130_test.go

Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,29 @@ func disklessReplica(rd, node string, flag string) *apiv1.Resource {
6565
}
6666
}
6767

68+
// syncSourceUnstampedReplica is the BUG-045 shape: a diskful replica
69+
// that is actively feeding a peer's resync (SyncSource replication-state
70+
// toward `peer`) but whose local diskState projection is blank — the
71+
// satellite observer's device-frame `disk:UpToDate` was not re-stamped
72+
// on the Secondary SyncSource. The kernel ground truth (SyncSource) says
73+
// it IS an UpToDate source; the guard must trust that, not the empty
74+
// diskState.
75+
func syncSourceUnstampedReplica(rd, node, peer string) *apiv1.Resource {
76+
return &apiv1.Resource{
77+
Name: rd,
78+
NodeName: node,
79+
Volumes: []apiv1.Volume{{
80+
VolumeNumber: 0,
81+
State: apiv1.VolumeState{
82+
DiskState: "", // BUG-045: projection left it blank
83+
ReplicationStates: map[string]apiv1.ReplicationState{
84+
peer: {ReplicationState: drbdReplStateSyncSource},
85+
},
86+
},
87+
}},
88+
}
89+
}
90+
6891
// TestU130RejectsDeleteOfLastUpToDateWhilePeerSyncing is the core
6992
// repro: 1 diskful UpToDate source + 1 diskful SyncTarget (mid-sync) +
7093
// 1 diskless Primary. Deleting the UpToDate source must be refused.
@@ -161,6 +184,119 @@ func TestU130RejectsViaPeerReplicationStateSyncTarget(t *testing.T) {
161184
}
162185
}
163186

187+
// TestU130RejectsSyncSourceTargetWithBlankDiskState is the BUG-045
188+
// repro: the deleted target is the SOURCE replica (a Secondary feeding
189+
// the resync) whose CRD diskState projection is BLANK because the
190+
// satellite observer never re-stamped `disk:UpToDate` on the Secondary
191+
// SyncSource. Its SyncSource replication-state toward the mid-sync peer
192+
// is kernel ground truth that it holds an UpToDate copy. The guard must
193+
// NOT trust the blank diskState and conclude "not a source" — it must
194+
// refuse the delete that would strand the SyncTarget.
195+
func TestU130RejectsSyncSourceTargetWithBlankDiskState(t *testing.T) {
196+
st := store.NewInMemory()
197+
ctx := t.Context()
198+
rd := "pvc-u130-bug045"
199+
200+
seedU130RD(t, st, rd)
201+
202+
// n1: the source being deleted — Secondary SyncSource, blank
203+
// diskState (the BUG-045 unstamped-projection window).
204+
if err := st.Resources().Create(ctx, syncSourceUnstampedReplica(rd, "n1", "n2")); err != nil {
205+
t.Fatalf("seed n1: %v", err)
206+
}
207+
// n2: freshly-added second diskful, still catching up.
208+
if err := st.Resources().Create(ctx, diskfulReplica(rd, "n2", drbdDiskStateSyncTarget)); err != nil {
209+
t.Fatalf("seed n2: %v", err)
210+
}
211+
// n3: diskless Primary/InUse holding the resource open.
212+
if err := st.Resources().Create(ctx, disklessReplica(rd, "n3", apiv1.ResourceFlagDiskless)); err != nil {
213+
t.Fatalf("seed n3: %v", err)
214+
}
215+
216+
base, stop := startServerWithStore(t, st)
217+
defer stop()
218+
219+
resp := httpDelete(t, base+"/v1/resource-definitions/"+rd+"/resources/n1")
220+
defer func() { _ = resp.Body.Close() }()
221+
222+
if resp.StatusCode != http.StatusConflict {
223+
t.Fatalf("status: got %d, want 409 (BUG-045: blank-diskState SyncSource must still be refused)", resp.StatusCode)
224+
}
225+
226+
if _, err := st.Resources().Get(ctx, rd, "n1"); err != nil {
227+
t.Errorf("n1 must survive the refused delete; got err=%v", err)
228+
}
229+
}
230+
231+
// TestU130RejectsBlankDiskStateTargetMidSync pins the fail-safe widening
232+
// independent of any replication-state signal on the target: a diskful
233+
// target with a completely UNKNOWN diskState (no SyncSource token either
234+
// — e.g. the projection has not landed any field yet) while a diskful
235+
// peer is mid-sync must be refused. Refusing in doubt is the load-bearing
236+
// data-safety property; a false allow strands the SyncTarget.
237+
func TestU130RejectsBlankDiskStateTargetMidSync(t *testing.T) {
238+
st := store.NewInMemory()
239+
ctx := t.Context()
240+
rd := "pvc-u130-blank"
241+
242+
seedU130RD(t, st, rd)
243+
244+
// n1: diskful target with an entirely blank state surface.
245+
if err := st.Resources().Create(ctx, diskfulReplica(rd, "n1", "")); err != nil {
246+
t.Fatalf("seed n1: %v", err)
247+
}
248+
if err := st.Resources().Create(ctx, diskfulReplica(rd, "n2", drbdDiskStateSyncTarget)); err != nil {
249+
t.Fatalf("seed n2: %v", err)
250+
}
251+
252+
base, stop := startServerWithStore(t, st)
253+
defer stop()
254+
255+
resp := httpDelete(t, base+"/v1/resource-definitions/"+rd+"/resources/n1")
256+
defer func() { _ = resp.Body.Close() }()
257+
258+
if resp.StatusCode != http.StatusConflict {
259+
t.Fatalf("status: got %d, want 409 (fail-safe: blank-diskState target mid-sync must be refused)", resp.StatusCode)
260+
}
261+
}
262+
263+
// TestU130AllowsDeleteWhenSyncSourceSiblingSurvives pins that a SyncSource
264+
// SIBLING with a blank diskState counts as a surviving UpToDate source:
265+
// deleting one UpToDate diskful is allowed because the SyncSource peer
266+
// keeps feeding the SyncTarget. Without crediting the SyncSource sibling
267+
// the guard would over-refuse a legitimate delete (must-not-regress).
268+
func TestU130AllowsDeleteWhenSyncSourceSiblingSurvives(t *testing.T) {
269+
st := store.NewInMemory()
270+
ctx := t.Context()
271+
rd := "pvc-u130-srcsib"
272+
273+
seedU130RD(t, st, rd)
274+
275+
// n1: the delete target — plainly UpToDate.
276+
if err := st.Resources().Create(ctx, diskfulReplica(rd, "n1", drbdDiskStateUpToDate)); err != nil {
277+
t.Fatalf("seed n1: %v", err)
278+
}
279+
// n2: a SyncSource sibling (blank diskState) — kernel-confirmed
280+
// surviving source that keeps feeding n3.
281+
if err := st.Resources().Create(ctx, syncSourceUnstampedReplica(rd, "n2", "n3")); err != nil {
282+
t.Fatalf("seed n2: %v", err)
283+
}
284+
// n3: the SyncTarget being fed by n2.
285+
if err := st.Resources().Create(ctx, diskfulReplica(rd, "n3", drbdDiskStateSyncTarget)); err != nil {
286+
t.Fatalf("seed n3: %v", err)
287+
}
288+
289+
base, stop := startServerWithStore(t, st)
290+
defer stop()
291+
292+
resp := httpDelete(t, base+"/v1/resource-definitions/"+rd+"/resources/n1")
293+
defer func() { _ = resp.Body.Close() }()
294+
295+
if resp.StatusCode != http.StatusOK {
296+
t.Fatalf("status: got %d, want 200 (SyncSource sibling survives as source)", resp.StatusCode)
297+
}
298+
}
299+
164300
// TestU130AllowsDeleteWhenNoPeerSyncing pins bug-hunt #6: when NO peer
165301
// is mid-sync, deleting the last diskful is allowed (matches upstream).
166302
// Here n1 is the only diskful; n2 is a plain diskless (not syncing).
@@ -346,6 +482,51 @@ func TestU130PredicateUnit(t *testing.T) {
346482
},
347483
refuse: false,
348484
},
485+
{
486+
// BUG-045: the source's diskState is blank but it is a
487+
// SyncSource feeding the mid-sync peer — must refuse.
488+
name: "syncsource-target-blank-diskstate",
489+
target: syncSourceUnstampedReplica(rd, "n1", "n2"),
490+
siblings: []*apiv1.Resource{
491+
syncSourceUnstampedReplica(rd, "n1", "n2"),
492+
diskfulReplica(rd, "n2", drbdDiskStateSyncTarget),
493+
},
494+
refuse: true,
495+
},
496+
{
497+
// Fail-safe: an entirely blank-state diskful target while a
498+
// peer is mid-sync — refuse in doubt.
499+
name: "blank-diskstate-target-midsync",
500+
target: diskfulReplica(rd, "n1", ""),
501+
siblings: []*apiv1.Resource{
502+
diskfulReplica(rd, "n1", ""),
503+
diskfulReplica(rd, "n2", drbdDiskStateSyncTarget),
504+
},
505+
refuse: true,
506+
},
507+
{
508+
// A blank-state diskful target with NO peer mid-sync is the
509+
// genuine last-copy case — still allowed (no regression).
510+
name: "blank-diskstate-target-no-sync",
511+
target: diskfulReplica(rd, "n1", ""),
512+
siblings: []*apiv1.Resource{
513+
diskfulReplica(rd, "n1", ""),
514+
disklessReplica(rd, "n2", apiv1.ResourceFlagDiskless),
515+
},
516+
refuse: false,
517+
},
518+
{
519+
// A SyncSource sibling (blank diskState) is a confirmed
520+
// surviving source — dropping one UpToDate copy is allowed.
521+
name: "syncsource-sibling-survives",
522+
target: diskfulReplica(rd, "n1", drbdDiskStateUpToDate),
523+
siblings: []*apiv1.Resource{
524+
diskfulReplica(rd, "n1", drbdDiskStateUpToDate),
525+
syncSourceUnstampedReplica(rd, "n2", "n3"),
526+
diskfulReplica(rd, "n3", drbdDiskStateSyncTarget),
527+
},
528+
refuse: false,
529+
},
349530
{
350531
name: "diskless-target",
351532
target: disklessReplica(rd, "n3", apiv1.ResourceFlagDiskless),

0 commit comments

Comments
 (0)