Skip to content

Implement Better VM revert using VDI revert#7132

Merged
psafont merged 10 commits into
xapi-project:masterfrom
psafont:dev/pau/vm-revert-master
Jun 18, 2026
Merged

Implement Better VM revert using VDI revert#7132
psafont merged 10 commits into
xapi-project:masterfrom
psafont:dev/pau/vm-revert-master

Conversation

@psafont

@psafont psafont commented Jun 17, 2026

Copy link
Copy Markdown
Member

This PR contains the changes needed to implement the proposal shown in https://github.com/xapi-project/xen-api/blob/master/doc/content/design/snapshot-revert.md

Please note that the storage backends need to implement VDI_REVERT in order to benefit from this change.

The fallback logic to the previous method complicates the implementation quite a bit, although it's made a bit more manageable using sets.
The complexity comes from the current infrastructure to clone and delete, which mixes VBDs and VDIs, and that two VMs are related: the 'live' or 'destination' VM and the snapshot VM.

Note that only disks are affected by the change, and both CDs and suspend VDIs are treated the same way as before.

There's one change in user-visible behaviour: now the UUID of the VDIs do not change when reverting if VDI.revert is used.

This PR is a port to master of #7116

psafont and others added 8 commits June 17, 2026 10:38
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit 7f931c1)
This is for both the live VDI (revert_from) and the snapshot (revert_to)

These correspond to the VM operations `reverting` and `revert`,
respectively.

Only snapshots backed by an SR that allows clones are allowed to be
reverted to. This is because SRs supporting revert is not enough to use
vdi revert without clone: having an incorrect snapshot_of field will
also cause the use of clone. With this we can ensure that if a bad edge
case is met the system has a working fallback.

Reverting to "live" snapshot VDIs is allowed, this happens when
reverting to a checkpoint.

Currently reverting requires the cloning feature because it's not
possible to discriminate between backends that support vdi revert and
ones that done. This will be possible once all of them do

Test that VDIs...
* can be reverted to a snapshot.
* can be reverted to an attached snapshot.
* cannot be reverted to a leaf VDI.
* cannot be reverted to an attached leaf.

Co-authored-by: John Else <john.else@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit 639bc3d)
This operation allows storage backends to implement VDI revert natively,
as well as advertise it. The operation is unused by xapi for the time
being.

Co-authored-by: David Scott <dave.scott@eu.citrix.com>
Co-authored-by: John Else <john.else@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit c368ca8)
The API function is needed to be able to forward it across the pool and
to block other operations on the VDI while it's running.

It's unused for the time being.

Co-authored-by: John Else <john.else@citrix.com>
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit 6c95e34)
The code now tries to call VDI.revert on all the disk VDIs, and if that
fails, it falls back to the original revert method that destroys VM
disks, clones snapshot disks and then fixes up metadata.

This means that the successfully reverted disks must be excluded from
the original method. This is done by introducing sets of VDIs and VBDs
and using set logic on them.

The CD disks and the suspend VDI are treated like before: destroy +
clone.

The code is more convoluted that I would have liked because of existing
clone infrastructure mixes VBDs with VDIs, so they need to be converted
back and forth to be able to run the previous method correctly,
especially for updating the snapshot_of field of the cloned snapshot
disks.

Quicktests encodes the difference in behaviour from the original method
to the native one: now VDI UUIDs are not changed when reverting.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit d1ae2a7)
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit 97823b4)
In case where there's an error while reverting on the Storage Backend,
the state of the SR might become unstable, and a recovery using the
clone method for reverting might introduce bigger issues. Instead stop
the revert immediately.

The cases where ignoring the errors is acceptable is when either the SR
does not advertise support for VDI_REVERT, and when the SR does not
implement revert for a particular VDI. for example, if the SR implements
the revert for only some of the formats it implements.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit f887de1)
This needed a fix for the datamodel_lifecycle and the datamodel hash

The minor schema version does not need to be changed since it was
already bumped in the corresponding cherry-pick that bumped it. I
elected to do that instead of keeping it the same.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
@psafont psafont force-pushed the dev/pau/vm-revert-master branch from 8b663ad to 1480fbd Compare June 17, 2026 09:38
Comment thread ocaml/idl/datamodel_lifecycle.ml Outdated
Comment thread ocaml/xapi/xapi_vdi.ml
Comment thread ocaml/xapi/xapi_vm_snapshot.ml
@gthvn1

gthvn1 commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Also I have another question not related directly to your PR. While looking in XAPI the code related to clone, I found that there is a rpu_allowed_vm_operations in xapi/xapi_globs.ml. And there is a rpu_allowed_vdi_operations. Could the VDI revert be added to the list? Or is there a good reason not to?

@psafont

psafont commented Jun 17, 2026

Copy link
Copy Markdown
Member Author

Also I have another question not related directly to your PR. While looking in TOOLSTACK the code related to clone, I found that there is a rpu_allowed_vm_operations in xapi/xapi_globs.ml. And there is a rpu_allowed_vdi_operations. Could the VDI revert be added to the list? Or is there a good reason not to?

I'm not sure it's safe to do since different hosts will have different versions of SM, this could be opened up at a later date if it ends up being a problem

psafont added 2 commits June 17, 2026 16:30
This shows the current code leaks the VBDs

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit 38cfbc5)
Because all the snapshot's CD VBDs are cloned in all cases, the reverted
VM's CD VBDs must be destroyed in all cases, add them unconditionally to
the VBDs being destroyed.

Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
(cherry picked from commit 20e8b3b)
@psafont psafont force-pushed the dev/pau/vm-revert-master branch from 53dffb0 to f326892 Compare June 17, 2026 15:30
@psafont psafont requested a review from gthvn1 June 17, 2026 15:31
@psafont psafont added this pull request to the merge queue Jun 18, 2026
Merged via the queue into xapi-project:master with commit 362b992 Jun 18, 2026
16 checks passed
@psafont psafont deleted the dev/pau/vm-revert-master branch June 18, 2026 13:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants