Skip to content

Fix KVM incremental volume snapshot creation#12666

Open
JoaoJandre wants to merge 2 commits intoapache:4.22from
scclouds:fix-incremental-snapshot-creation
Open

Fix KVM incremental volume snapshot creation#12666
JoaoJandre wants to merge 2 commits intoapache:4.22from
scclouds:fix-incremental-snapshot-creation

Conversation

@JoaoJandre
Copy link
Copy Markdown
Contributor

Description

During the creation of incremental snapshots, CloudStack sends an asynchronous command to Libvirt to back up the volume. After sending the command, ACS waits for Libvirt to signal the completion of the execution to continue with the snapshot process. However, sporadically, Libvirt signals the completion of the command before the operating system actually releases the write lock on the snapshot file. When this occurs, an error is thrown when ACS attempts to rebase the snapshot.

This PR changes the rebase so that if ACS encounters a lock error while rebasing the snapshot, another attempt is made after 60 seconds.

Types of changes

  • Breaking change (fix or feature that would cause existing functionality to change)
  • New feature (non-breaking change which adds functionality)
  • Bug fix (non-breaking change which fixes an issue)
  • Enhancement (improves an existing feature and functionality)
  • Cleanup (Code refactoring and cleanup, that may add test cases)
  • Build/CI
  • Test (unit or integration test code)

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • Major
  • Minor

Bug Severity

  • BLOCKER
  • Critical
  • Major
  • Minor
  • Trivial

Screenshots (if appropriate):

How Has This Been Tested?

This issue is extremely hard to reproduce. But I have tested that the normal incremental snapshot workflow still works as expected.

Copy link
Copy Markdown
Member

@weizhouapache weizhouapache left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code lgtm

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 19, 2026

Codecov Report

❌ Patch coverage is 5.26316% with 18 lines in your changes missing coverage. Please review.
✅ Project coverage is 17.60%. Comparing base (5caf6cd) to head (5d0f4d9).
⚠️ Report is 51 commits behind head on 4.22.

Files with missing lines Patch % Lines
...ud/hypervisor/kvm/storage/KVMStorageProcessor.java 0.00% 18 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               4.22   #12666      +/-   ##
============================================
- Coverage     17.60%   17.60%   -0.01%     
- Complexity    15659    15678      +19     
============================================
  Files          5917     5918       +1     
  Lines        531394   531775     +381     
  Branches      64970    65025      +55     
============================================
+ Hits          93575    93637      +62     
- Misses       427269   427571     +302     
- Partials      10550    10567      +17     
Flag Coverage Δ
uitests 3.70% <ø> (-0.01%) ⬇️
unittests 18.67% <5.26%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce sporadic failures during KVM incremental volume snapshot creation by retrying qemu-img rebase when libvirt/qemu-img reports a transient image lock.

Changes:

  • Detect the specific “image is in use” lock error during snapshot rebase.
  • Add a one-time retry of the rebase after a 60-second delay.
  • Change non-lock rebase failures to throw and abort the snapshot workflow (previously logged and continued).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@sureshanaparti
Copy link
Copy Markdown
Contributor

@JoaoJandre can you re-target this to 4.22 branch?

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@JoaoJandre JoaoJandre force-pushed the fix-incremental-snapshot-creation branch from 4dce326 to 1768b5b Compare February 19, 2026 16:21
@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan package

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan package

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

@blueorangutan
Copy link
Copy Markdown

Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17424

@DaanHoogland
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@DaanHoogland a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

@blueorangutan
Copy link
Copy Markdown

[SF] Trillian Build Failed (tid-15841)

Copy link
Copy Markdown
Contributor

@sureshanaparti sureshanaparti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clgtm

@sureshanaparti
Copy link
Copy Markdown
Contributor

@blueorangutan test

@blueorangutan
Copy link
Copy Markdown

@sureshanaparti a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

6 participants