Skip to content

Commit dd9dd4e

Browse files
stuggiclaude
andcommitted
[b/r] Document Data Mover WaitForFirstConsumer limitation and workaround
Document known Velero issue where Data Mover restores deadlock with WaitForFirstConsumer StorageClasses (LVM/topolvm). Add workaround using temporary dummy pods to trigger PVC binding, with node targeting for balanced storage distribution. Upstream issues: velero#7561, velero#8044, velero#9343 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 2b76977 commit dd9dd4e

1 file changed

Lines changed: 88 additions & 0 deletions

File tree

docs/dev/backup-restore/restore/README.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,94 @@ oc get volumesnapshotcontent
164164
oc describe pvc <pvc-name> -n openstack
165165
```
166166

167+
### Data Mover restore stuck with WaitForFirstConsumer storage (LVM)
168+
169+
**Known issue:** When using the OADP Data Mover (`snapshotMoveData: true`) with a
170+
StorageClass that has `volumeBindingMode: WaitForFirstConsumer` (e.g., LVM/topolvm),
171+
the PVC restore at order 00 will deadlock. The data mover waits for the PVC to be
172+
consumed by a pod before downloading data, but with WaitForFirstConsumer the PVC
173+
won't bind until a pod references it. Since PVCs are restored before any workload
174+
pods exist, this creates a deadlock.
175+
176+
**Symptoms:**
177+
- Restore stuck in `WaitingForPluginOperations` phase
178+
- DataDownload CRs in `Accepted` or `<none>` phase, never progressing
179+
- PVCs in `Pending` state with event: "waiting for first consumer to be created"
180+
- Node-agent logs: `error to wait target PVC consumed ... context deadline exceeded`
181+
182+
**Upstream issues:**
183+
- [velero#7561](https://github.com/vmware-tanzu/velero/issues/7561) — WaitForFirstConsumer incompatibility
184+
- [velero#8044](https://github.com/vmware-tanzu/velero/issues/8044) — Enhancement proposal
185+
- [velero#9343](https://github.com/vmware-tanzu/velero/issues/9343) — Topology-aware storage
186+
187+
**Workaround:** Create temporary pods that reference the pending PVCs to trigger
188+
binding. The data mover will then proceed with the download.
189+
190+
```bash
191+
# 1. List pending PVCs
192+
oc get pvc -n openstack --no-headers | awk '{print $1}'
193+
194+
# 2. List available nodes
195+
oc get nodes -l node-role.kubernetes.io/worker --no-headers -o custom-columns=NAME:.metadata.name
196+
197+
# 3. Create a dummy pod for each PVC, targeting a specific node.
198+
# Distribute PVCs across nodes for balanced storage usage.
199+
# With LVM, the PVC will be provisioned on the node the pod targets.
200+
create_dummy_pod() {
201+
local pvc_name=$1
202+
local node_name=$2
203+
local ns=${3:-openstack}
204+
local pod_name="pvc-consumer-${pvc_name}"
205+
# Truncate pod name to 63 chars (k8s limit)
206+
pod_name="${pod_name:0:63}"
207+
cat <<EOF | oc apply -f -
208+
apiVersion: v1
209+
kind: Pod
210+
metadata:
211+
name: ${pod_name}
212+
namespace: ${ns}
213+
spec:
214+
nodeName: ${node_name}
215+
containers:
216+
- name: pause
217+
image: registry.k8s.io/pause:3.9
218+
volumeMounts:
219+
- name: data
220+
mountPath: /mnt/data
221+
volumes:
222+
- name: data
223+
persistentVolumeClaim:
224+
claimName: ${pvc_name}
225+
EOF
226+
echo "Created pod ${pod_name} on ${node_name} for PVC ${pvc_name}"
227+
}
228+
229+
# Example: distribute PVCs across 3 nodes
230+
NODES=($(oc get nodes -l node-role.kubernetes.io/worker --no-headers -o custom-columns=NAME:.metadata.name))
231+
PVCS=($(oc get pvc -n openstack --no-headers | awk '$2 == "Pending" {print $1}'))
232+
for i in "${!PVCS[@]}"; do
233+
node_idx=$((i % ${#NODES[@]}))
234+
create_dummy_pod "${PVCS[$i]}" "${NODES[$node_idx]}"
235+
done
236+
237+
# 4. Wait for PVCs to bind
238+
oc get pvc -n openstack -w
239+
240+
# 5. Wait for DataDownloads to complete
241+
oc get datadownloads -n openshift-adp -o custom-columns=NAME:.metadata.name,PHASE:.status.phase,BYTES:.status.progress.totalBytes
242+
243+
# 6. Delete dummy pods after all DataDownloads are Completed
244+
for pvc in "${PVCS[@]}"; do
245+
pod_name="pvc-consumer-${pvc}"
246+
pod_name="${pod_name:0:63}"
247+
oc delete pod "${pod_name}" -n openstack --ignore-not-found
248+
done
249+
```
250+
251+
**Note:** This workaround is not needed when restoring from local CSI snapshots
252+
(without data mover). It only affects cross-cluster or disaster recovery restores
253+
where PVC data is downloaded from the BackupStorageLocation (S3/MinIO).
254+
167255
### Database restore issues
168256

169257
```bash

0 commit comments

Comments
 (0)