message_forwarding: Don't block cross-pool migrations for unbootable VMs#7062
Conversation
|
For context, we've seen issues where a VM with high memory requirements that couldn't start on the source pool was erroring out on cross-pool migration to a pool where its memory requirements were satisfied. A workaround was to copy the VM instead of migrating it. The issue is due to xapi wanting to do the cross-pool migration from a host that could boot the VM, so that if the user decides to start the VM in the middle of the migration, the sender host stays the same. If xapi can't find a host that could boot the VM, however, it rejects a cross-pool migration with This PR allows cross-pool migration in such cases, settling for any host that could act as a sender (sees VM's SRs), since the user can't interrupt the migration by starting up the VM. |
| -> | ||
| (* If non-live VM can't start anywhere in the pool, migrate it | ||
| anyway from any host that can see its SRs *) | ||
| forward_to_access_srs ~local_fn ~__context ~vm ~remote_fn |
There was a problem hiding this comment.
Would this work for the intra-pool migration as well?
I understand the `No_live case includes both intra and inter pool migration.
There was a problem hiding this comment.
I don't think you can migrate a non-live VM inside the pool, I get VM_BAD_POWER_STATE. I've added an additional check just to be safe, though
There was a problem hiding this comment.
I can't understand the difference between inter-pool and intra-pool migration for a non live VM. A intra-pool non-live VM migration will fail on VM_BAD_POWER_STATE but an inter-pool non-live VM migration will succeed. 🤔
Moreover, not sure why it's always to try forward_to_suitable_host first if the non-live VM would not start during migration at all.
There was a problem hiding this comment.
This is because the design for the cross-pool migration states that a VM can be started while the migration is in progress. So the origin pool must move the VM in a host that can start it before migrating. This PR relaxes this requirement by not blocking if it cannot be started at all
forward_to_suitable_host picks a host that can boot up the VM, erroring out with NO_HOSTS_AVAILABLE otherwise. In case of a non-live VM that can't boot anywhere in the source pool (because it requires a larger amount of memory, for example), this would block its migration to a suitable pool altogether. Catch the error and launch the migration from any host that can see the VM's SRs instead of blocking it. Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
80f2c8b to
1971042
Compare
| ~vgpu_map | ||
| in | ||
| let host = List.assoc Xapi_vm_migrate._host dest |> Ref.of_string in | ||
| let cross_pool = not (Db.is_valid_ref __context host) in |
There was a problem hiding this comment.
Why is this the right predicate to decide cross/intra pool? It's surprising.
There was a problem hiding this comment.
xapi_vm_migrate does a similar check:
let migration_type ~__context ~remote =
try
ignore (Db.Host.get_uuid ~__context ~self:remote.dest_host) ;
debug "This is an intra-pool migration" ;
`intra_pool
with _ ->
debug "This is a cross-pool migration" ;
`cross_pool
both based on the fact that the remote host wouldn't be in this pool's DB
forward_to_suitable_hostpicks a host that can boot up the VM, erroring out withNO_HOSTS_AVAILABLEotherwise. In case of a non-live VM that can't boot anywhere in the source pool (because it requires a larger amount of memory, for example), this would block its migration to a suitable pool altogether.Catch the error and launch the migration from any host that can see the VM's SRs instead of blocking it.