Implement Better VM revert using VDI revert#7132
Conversation
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit 7f931c1)
This is for both the live VDI (revert_from) and the snapshot (revert_to) These correspond to the VM operations `reverting` and `revert`, respectively. Only snapshots backed by an SR that allows clones are allowed to be reverted to. This is because SRs supporting revert is not enough to use vdi revert without clone: having an incorrect snapshot_of field will also cause the use of clone. With this we can ensure that if a bad edge case is met the system has a working fallback. Reverting to "live" snapshot VDIs is allowed, this happens when reverting to a checkpoint. Currently reverting requires the cloning feature because it's not possible to discriminate between backends that support vdi revert and ones that done. This will be possible once all of them do Test that VDIs... * can be reverted to a snapshot. * can be reverted to an attached snapshot. * cannot be reverted to a leaf VDI. * cannot be reverted to an attached leaf. Co-authored-by: John Else <john.else@citrix.com> Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit 639bc3d)
This operation allows storage backends to implement VDI revert natively, as well as advertise it. The operation is unused by xapi for the time being. Co-authored-by: David Scott <dave.scott@eu.citrix.com> Co-authored-by: John Else <john.else@citrix.com> Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit c368ca8)
The API function is needed to be able to forward it across the pool and to block other operations on the VDI while it's running. It's unused for the time being. Co-authored-by: John Else <john.else@citrix.com> Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit 6c95e34)
The code now tries to call VDI.revert on all the disk VDIs, and if that fails, it falls back to the original revert method that destroys VM disks, clones snapshot disks and then fixes up metadata. This means that the successfully reverted disks must be excluded from the original method. This is done by introducing sets of VDIs and VBDs and using set logic on them. The CD disks and the suspend VDI are treated like before: destroy + clone. The code is more convoluted that I would have liked because of existing clone infrastructure mixes VBDs with VDIs, so they need to be converted back and forth to be able to run the previous method correctly, especially for updating the snapshot_of field of the cloned snapshot disks. Quicktests encodes the difference in behaviour from the original method to the native one: now VDI UUIDs are not changed when reverting. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit d1ae2a7)
Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit 97823b4)
In case where there's an error while reverting on the Storage Backend, the state of the SR might become unstable, and a recovery using the clone method for reverting might introduce bigger issues. Instead stop the revert immediately. The cases where ignoring the errors is acceptable is when either the SR does not advertise support for VDI_REVERT, and when the SR does not implement revert for a particular VDI. for example, if the SR implements the revert for only some of the formats it implements. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit f887de1)
This needed a fix for the datamodel_lifecycle and the datamodel hash The minor schema version does not need to be changed since it was already bumped in the corresponding cherry-pick that bumped it. I elected to do that instead of keeping it the same. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech>
8b663ad to
1480fbd
Compare
|
Also I have another question not related directly to your PR. While looking in XAPI the code related to clone, I found that there is a |
I'm not sure it's safe to do since different hosts will have different versions of SM, this could be opened up at a later date if it ends up being a problem |
This shows the current code leaks the VBDs Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit 38cfbc5)
Because all the snapshot's CD VBDs are cloned in all cases, the reverted VM's CD VBDs must be destroyed in all cases, add them unconditionally to the VBDs being destroyed. Signed-off-by: Pau Ruiz Safont <pau.safont@vates.tech> (cherry picked from commit 20e8b3b)
53dffb0 to
f326892
Compare
This PR contains the changes needed to implement the proposal shown in https://github.com/xapi-project/xen-api/blob/master/doc/content/design/snapshot-revert.md
Please note that the storage backends need to implement VDI_REVERT in order to benefit from this change.
The fallback logic to the previous method complicates the implementation quite a bit, although it's made a bit more manageable using sets.
The complexity comes from the current infrastructure to clone and delete, which mixes VBDs and VDIs, and that two VMs are related: the 'live' or 'destination' VM and the snapshot VM.
Note that only disks are affected by the change, and both CDs and suspend VDIs are treated the same way as before.
There's one change in user-visible behaviour: now the UUID of the VDIs do not change when reverting if VDI.revert is used.
This PR is a port to master of #7116