Skip to content

Commit 30a54d0

Browse files
jmsperuclaude
authored andcommitted
NAS backup: resume paused VM on backup failure and fix missing exit
When a NAS backup job fails (e.g. due to backup storage being full or I/O errors), the VM may remain indefinitely paused because: 1. The cleanup() function never checks or resumes the VM's paused state that was set by virsh backup-begin during the push backup operation. 2. The 'Failed' case in the backup job monitoring loop calls cleanup() but lacks an 'exit' statement, causing an infinite loop where the script repeatedly detects the failed job and calls cleanup(). 3. Similarly, backup_stopped_vm() calls cleanup() on qemu-img convert failure but does not exit, allowing the loop to continue with subsequent disks despite the failure. This fix: - Adds VM state detection and resume to cleanup(), ensuring the VM is always resumed if found in a paused state during error handling - Adds missing 'exit 1' after cleanup() in the Failed backup job case to prevent the infinite monitoring loop - Adds missing 'exit 1' after cleanup() in backup_stopped_vm() on qemu-img convert failure Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 93239e0 commit 30a54d0

File tree

1 file changed

+16
-1
lines changed

1 file changed

+16
-1
lines changed

scripts/vm/hypervisor/kvm/nasbackup.sh

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -142,7 +142,8 @@ backup_running_vm() {
142142
break ;;
143143
Failed)
144144
echo "Virsh backup job failed"
145-
cleanup ;;
145+
cleanup
146+
exit 1 ;;
146147
esac
147148
sleep 5
148149
done
@@ -178,6 +179,7 @@ backup_stopped_vm() {
178179
if ! qemu-img convert -O qcow2 "$disk" "$output" > "$logFile" 2> >(cat >&2); then
179180
echo "qemu-img convert failed for $disk $output"
180181
cleanup
182+
exit 1
181183
fi
182184
name="datadisk"
183185
done
@@ -222,6 +224,19 @@ mount_operation() {
222224
cleanup() {
223225
local status=0
224226

227+
# Resume the VM if it was paused during backup to prevent it from
228+
# remaining indefinitely paused when the backup job fails (e.g. due
229+
# to storage full or I/O errors on the backup target)
230+
local vm_state
231+
vm_state=$(virsh -c qemu:///system domstate "$VM" 2>/dev/null)
232+
if [[ "$vm_state" == "paused" ]]; then
233+
log -ne "Resuming paused VM $VM during backup cleanup"
234+
if ! virsh -c qemu:///system resume "$VM" > /dev/null 2>&1; then
235+
echo "Failed to resume VM $VM"
236+
status=1
237+
fi
238+
fi
239+
225240
rm -rf "$dest" || { echo "Failed to delete $dest"; status=1; }
226241
umount "$mount_point" || { echo "Failed to unmount $mount_point"; status=1; }
227242
rmdir "$mount_point" || { echo "Failed to remove mount point $mount_point"; status=1; }

0 commit comments

Comments
 (0)