Skip to content

message_forwarding: Don't block cross-pool migrations for unbootable VMs#7062

Merged
last-genius merged 1 commit into
xapi-project:masterfrom
last-genius:asv/crosspoolfix
May 14, 2026
Merged

message_forwarding: Don't block cross-pool migrations for unbootable VMs#7062
last-genius merged 1 commit into
xapi-project:masterfrom
last-genius:asv/crosspoolfix

Conversation

@last-genius

Copy link
Copy Markdown
Contributor

forward_to_suitable_host picks a host that can boot up the VM, erroring out with NO_HOSTS_AVAILABLE otherwise. In case of a non-live VM that can't boot anywhere in the source pool (because it requires a larger amount of memory, for example), this would block its migration to a suitable pool altogether.

Catch the error and launch the migration from any host that can see the VM's SRs instead of blocking it.

@last-genius

last-genius commented May 12, 2026

Copy link
Copy Markdown
Contributor Author

For context, we've seen issues where a VM with high memory requirements that couldn't start on the source pool was erroring out on cross-pool migration to a pool where its memory requirements were satisfied. A workaround was to copy the VM instead of migrating it.

The issue is due to xapi wanting to do the cross-pool migration from a host that could boot the VM, so that if the user decides to start the VM in the middle of the migration, the sender host stays the same. If xapi can't find a host that could boot the VM, however, it rejects a cross-pool migration with NO_HOSTS_AVAILABLE (meaning, there are no hosts on the source pool that could boot the VM, which is counter-intuitive from the user perspective).

This PR allows cross-pool migration in such cases, settling for any host that could act as a sender (sees VM's SRs), since the user can't interrupt the migration by starting up the VM.

->
(* If non-live VM can't start anywhere in the pool, migrate it
anyway from any host that can see its SRs *)
forward_to_access_srs ~local_fn ~__context ~vm ~remote_fn

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this work for the intra-pool migration as well?
I understand the `No_live case includes both intra and inter pool migration.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can migrate a non-live VM inside the pool, I get VM_BAD_POWER_STATE. I've added an additional check just to be safe, though

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't understand the difference between inter-pool and intra-pool migration for a non live VM. A intra-pool non-live VM migration will fail on VM_BAD_POWER_STATE but an inter-pool non-live VM migration will succeed. 🤔

Moreover, not sure why it's always to try forward_to_suitable_host first if the non-live VM would not start during migration at all.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because the design for the cross-pool migration states that a VM can be started while the migration is in progress. So the origin pool must move the VM in a host that can start it before migrating. This PR relaxes this requirement by not blocking if it cannot be started at all

forward_to_suitable_host picks a host that can boot up the VM, erroring out
with NO_HOSTS_AVAILABLE otherwise. In case of a non-live VM that can't boot
anywhere in the source pool (because it requires a larger amount of memory, for
example), this would block its migration to a suitable pool altogether.

Catch the error and launch the migration from any host that can see the VM's
SRs instead of blocking it.

Signed-off-by: Andrii Sultanov <andriy.sultanov@vates.tech>
~vgpu_map
in
let host = List.assoc Xapi_vm_migrate._host dest |> Ref.of_string in
let cross_pool = not (Db.is_valid_ref __context host) in

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this the right predicate to decide cross/intra pool? It's surprising.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

xapi_vm_migrate does a similar check:

let migration_type ~__context ~remote =
  try
    ignore (Db.Host.get_uuid ~__context ~self:remote.dest_host) ;
    debug "This is an intra-pool migration" ;
    `intra_pool
  with _ ->
    debug "This is a cross-pool migration" ;
    `cross_pool

both based on the fact that the remote host wouldn't be in this pool's DB

@last-genius last-genius added this pull request to the merge queue May 14, 2026
Merged via the queue into xapi-project:master with commit 88ce554 May 14, 2026
16 checks passed
psafont pushed a commit to psafont/xen-api that referenced this pull request May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants