Skip to content

chore(vm): add system migration policy#2152

Closed
LopatinDmitr wants to merge 90 commits into
release-1.6-devfrom
chore/vm/add-system-migration-policy
Closed

chore(vm): add system migration policy#2152
LopatinDmitr wants to merge 90 commits into
release-1.6-devfrom
chore/vm/add-system-migration-policy

Conversation

@LopatinDmitr
Copy link
Copy Markdown
Contributor

@LopatinDmitr LopatinDmitr commented Mar 25, 2026

Description

Add a system-level live migration policy override sourced from ModuleConfig/virtualization annotation virtualization.deckhouse.io/system-migration-policy.

The controller now reads this annotation at startup and, when valid, applies it globally in live migration policy calculation.

What is the expected result?

  1. Set annotation on ModuleConfig/virtualization:
    virtualization.deckhouse.io/system-migration-policy: <valid policy>.
  2. Restart/rollout virtualization-controller.
  3. Run VMOP migration/eviction.
  4. Confirm effective migration configuration follows the system policy override (VM spec and VMOP force do not override it).
  5. If annotation is missing or invalid, behavior remains unchanged.

Checklist

  • The code is covered by unit tests.
  • e2e tests passed.
  • Documentation updated according to the changes.
  • Changes were tested in the Kubernetes cluster manually.

Changelog entries

section: core
type: chore
summary: "Add a system live migration policy override via ModuleConfig annotation for VMOP/live migration policy calculation."
impact_level: low

Isteb4k and others added 30 commits March 2, 2026 14:12
docs: add release notes for v1.6.0

---------

Signed-off-by: Isteb4k <dmitry.rakitin@flant.com>
Signed-off-by: Vladislav Panfilov <vladislav.panfilov@flant.com>
Co-authored-by: Vladislav Panfilov <vladislav.panfilov@flant.com>
Update due 1.6.0.

---------

Signed-off-by: Vladislav Panfilov <vladislav.panfilov@flant.com>
Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
Co-authored-by: Pavel Tishkov <pavel.tishkov@flant.com>
Improved test/dvp-static-cluster/scripts/gen-kubeconfig.sh kubeconfig generation flow and error handling.

Refactored retry logic to avoid redundant checks of the same kubeconfig and made retries explicit at generation level.
Added robust failure handling and clearer exit behavior:
strict bash mode (set -Eeuo pipefail)
centralized error-exit helper (exit_with_error)
signal/error traps with meaningful exit codes and diagnostics


---------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
Re-generate changelog v1.6.0

Signed-off-by: deckhouse-BOaTswain <89150800+deckhouse-boatswain@users.noreply.github.com>
Co-authored-by: Isteb4k <Isteb4k@users.noreply.github.com>
Description
Reduced ssh command timeout wait to 5 seconds. UntilSSHReady now wait 60 seconds in test PowerState.

Why do we need it, and what problem does it solve?
There was also only one attempt to connect to the server. SSH actually only knocked once, because the timeout for SSH is 30 seconds, and for Eventually it's also 30 seconds.

---------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
Description
Add describe for nodes when test fails.
Also increase timeout for UntilVMAgentReady in VirtualMachineConfiguration
The error output for the error in function UntilVMAgentReady has become clearer

---------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
Co-authored-by: Roman Sysoev <36233932+hardcoretime@users.noreply.github.com>
Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
…2053)

Description
Сhange image from Alpine to Ubuntu due to problems with lsblk utility output

----------------

Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
#2046)

Signed-off-by: Valeriy Khorunzhin <valeriy.khorunzhin@flant.com>
- Ensure upload layer errors are not missed.
- Wrap "DVCR is out of space" error.

Signed-off-by: Roman Sysoev <roman.sysoev@flant.com>
- Use kube-api-rewriter machinery from the external repo
- Only KubeVirt and CDI rules are needed here.

Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
* chore(module): add SecurityPolicyException resources

- Add exceptions for all Pods that require more permissions than provided by the PSS Restricted:
  - ds/virt-handler
  - ds/virtualization-dra
  - ds/vm-route-forge
- Add a dev note about SecurityPolicyExceptions.

Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>

---------

Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
Description
Add e2e test for USBDevices and NodeUSBDevice. That test attach USBDevice to VM and write data on it, migrate virtual machine and check written data.

---------

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
Co-authored-by: Roman Sysoev <36233932+hardcoretime@users.noreply.github.com>
…34 version of k8s (#2059)

Add k8s cluster version configuration for e2e tests on nested clusters, ceph cluster use 1.34 version of k8s

---------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
Signed-off-by: Ivan Mikheykin <ivan.mikheykin@flant.com>
#2048)

Signed-off-by: Valeriy Khorunzhin <valeriy.khorunzhin@flant.com>
Description
This PR updates E2E test network and image configuration to make VM connectivity and migration scenarios more deterministic.

Key changes:
- Configure test ClusterNetwork to existing VLAN 4006 (cn-4006-for-e2e-test) instead of VLAN 1003 (cn-1003-for-e2e-test).
- Normalize image usage in E2E tests (switch most cases from perf image to stable Alpine UEFI image where needed).
- Fix object builders for Ubuntu resources (VI/CVI/VD) to use Ubuntu image URL instead of Alpine BIOS URL.
- Add dedicated constructors for Alpine BIOS/UEFI images in object helpers.
Update additional network interfaces test:
- move additional IPs to per-test-case params,
- pass explicit IPs into connectivity checks,
- adjust Alpine cloud-init service startup commands.


---------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
fix dvcr gc rbac rules

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>
Signed-off-by: Valeriy Khorunzhin <valeriy.khorunzhin@flant.com>
* docs: add metadata.name length limit

Signed-off-by: Vladislav Panfilov <vladislav.panfilov@flant.com>

* docs: update metadata.name length limits

Signed-off-by: Vladislav Panfilov <vladislav.panfilov@flant.com>

---------

Signed-off-by: Vladislav Panfilov <vladislav.panfilov@flant.com>
Signed-off-by: Maksim Fedotov <maksim.fedotov@flant.com>
Description
The virtualization-setup-dummy-hcd node group configuration used in end-to-end clusters now only supports Debian OS.
A bug in the node group configuration has been fixed; the missing linux-modules-extra-$KERNEL_VERSION package has been added.
Removed exit 1 from script in node group configuration to continue cluster bootstrap.

---------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
Add release notes v1.5.2.

---------

Signed-off-by: Isteb4k <dmitry.rakitin@flant.com>
Signed-off-by: Vladislav Panfilov <97229646+prismagod@users.noreply.github.com>
Co-authored-by: Vladislav Panfilov <97229646+prismagod@users.noreply.github.com>
Re-generate changelog v1.5.2

Signed-off-by: deckhouse-BOaTswain <89150800+deckhouse-boatswain@users.noreply.github.com>
Co-authored-by: nevermarine <nevermarine@users.noreply.github.com>
Fix a lowercase RFC 1123 error for vd ImporterNetworkPolicy

---------

Signed-off-by: Isteb4k <dmitry.rakitin@flant.com>
Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
…1950)

Add webhook validation that rejects migration operations for VMs with local storage in CE edition

---------

Signed-off-by: Daniil Loktev <lokt.daniil@gmail.com>
Fix VM status (Running condition) to properly display CSI driver and volume attachment errors
Fix misleading errors related to SDN ("waiting for SDN module") (NetworkReady condition)

When a VM fails to start due to CSI driver issues (e.g., missing CSI driver, volume attachment failures), the Console UI shows incorrect error messages: "Cannot determine the status of additional interfaces, waiting for a response from the SDN module", which come from NetworkReady condition. The actual CSI error was hidden in pod events and not surfaced in the VM status.
---------

Signed-off-by: Daniil Loktev <lokt.daniil@gmail.com>
Signed-off-by: Daniil Loktev <70405899+loktev-d@users.noreply.github.com>
…dation issue (#2063)

Temporarily revert VM/VMS dashboard location due to validation issue

Signed-off-by: Pavel Tishkov <pavel.tishkov@flant.com>
yaroslavborbat and others added 14 commits March 23, 2026 18:04
chore(module): update module requirements

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>
Description
Fixes semver parsing for module version requirements with two-component versions.

Added NormalizeSemVerRange() function in tools/moduleversions/internal/version/normalize.go that automatically converts two-component versions to three-component format:

~1.74 → ~1.74.0
^1.74 → ^1.74.0
>=1.74 <2.0 → >=1.74.0 <2.0.0
The normalization is applied before passing the range string to semver.ParseRange() in the requirements checker.

Also fix requirements for deckhouse back to ">= 1.74.2"
---------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
…iver. (#2087)

fix(dra): enable NRI (Node Resource Interface) hook in the DRA USB driver

Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>
Add release notes for v1.6.2

---------

Signed-off-by: Isteb4k <dmitry.rakitin@flant.com>
Signed-off-by: Vladislav Panfilov <97229646+prismagod@users.noreply.github.com>
Co-authored-by: Vladislav Panfilov <97229646+prismagod@users.noreply.github.com>
…oval (#2124)

Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
Description
Removal of e2e tests with Ceph storage from the nightly pipeline. Removed:

e2e-ceph job from e2e-matrix.yml
e2e-reusable-pipeline.yml file (was used for Ceph)
All Ceph manifests and scripts in test/dvp-static-cluster/storage/ceph/

---------------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
test(e2e): skip vd snapshot wait when CSI snapshots lag behind

Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
…pvc (#2115)

fix(api): improve storage class validation messages for VD and VI on PVC

- Enhance error messages during storage class for VD and VI on PVC
- Add unit tests for VirtualDisk storage class validation, separating CE and EE test sets.

Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
…2116)

- Prevent changing VirtualDisk storage class to an arbitrary value while migration is in progress by allowing only rollback to the source PVC storage class. Add validator tests for forbidden A->B->C changes and allowed rollback to A.

Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
… in Go files. (#2140)

Description
stylecheck is deprecated in golangci-lint v2. The correct linter name for nolint directives is staticcheck. Using deprecated linter names causes warnings or errors.

Change //nolint:stylecheck,nolintlint  →  //nolint:staticcheck,nolintlint

-----------------

Signed-off-by: Nikita Korolev <nikita.korolev@flant.com>
)

refactor(hooks): discover kube feature gates via metrics endpoint
---------
Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>
Signed-off-by: Yaroslav Borbat <yaroslav.borbat@flant.com>
…#2144)

* fix(vmop): prevent Maintenance mode from getting stuck during restore

Return reconcile.Result instead of nil to properly complete the reconciliation
loop when snapshot steps exit early (exit maintenance step, waiting disk ready step).

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>

* fix(vmop): set maintenance condition to false instead of early return

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>

---------

Signed-off-by: Daniil Antoshin <daniil.antoshin@flant.com>
@LopatinDmitr LopatinDmitr added this to the v1.8.0 milestone Mar 25, 2026
@LopatinDmitr LopatinDmitr force-pushed the chore/vm/add-system-migration-policy branch from d899db1 to 641625d Compare March 25, 2026 12:39
Signed-off-by: Dmitry Lopatin <dmitry.lopatin@flant.com>
@LopatinDmitr LopatinDmitr force-pushed the chore/vm/add-system-migration-policy branch from 641625d to b2e9698 Compare March 25, 2026 13:44
@LopatinDmitr LopatinDmitr changed the base branch from main to release-1.6-dev March 25, 2026 14:27
@LopatinDmitr LopatinDmitr deleted the chore/vm/add-system-migration-policy branch March 25, 2026 15:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.