cozystack
diff --git a/‎CHANGELOG.rst‎
Lines changed: 28 additions & 0 deletions b/‎CHANGELOG.rst‎
Lines changed: 28 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions b/‎CLAUDE.md‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎README.md‎
Lines changed: 32 additions & 47 deletions b/‎README.md‎
Lines changed: 32 additions & 47 deletions
diff --git a/‎examples/rhel/prepare-rhel.yml‎
Lines changed: 75 additions & 0 deletions b/‎examples/rhel/prepare-rhel.yml‎
Lines changed: 75 additions & 0 deletions
@@ -19,6 +19,34 @@ Unreleased
   ``isp-full-generic`` platform variant when nodes lack a native load
   balancer (cloud VMs, bare metal).
 
+Unreleased
+==========
+
+Bugfixes
+--------
+
+- Prepare playbooks now enable
+  ``device_ownership_from_security_context`` on the containerd CRI
+  plugin (k3s drop-in
+  ``config-v3.toml.d/10-cozystack-cri.toml``). KubeVirt's CDI importer
+  writes disk images into raw block volumes as a non-root pod, which
+  requires containerd to chown the block device to the pod's
+  SecurityContext; k3s disables this by default. Without it the
+  importer failed with ``blockdev: cannot open /dev/cdi-block-volume:
+  Permission denied``, the ``DataVolume`` hung in ``ImportInProgress``,
+  and VMs referencing the disk stayed ``Pending``. Gated behind
+  ``cozystack_enable_kubevirt``; drop-in directory overridable via
+  ``cozystack_k3s_containerd_dropin_dir`` (relocates the file only — the
+  content is hardcoded for containerd 2.x / config version 3 as shipped
+  by current k3s; a containerd 1.x cluster needs a hand-written
+  ``config.toml.d`` drop-in instead).
+  Setting ``cozystack_enable_kubevirt`` to ``false`` removes a
+  previously written drop-in so the host state matches the toggle, and
+  the restart handler only restarts a k3s unit that is actually present
+  (a genuine restart failure now fails the play instead of being
+  silently ignored).
+
+
 v1.4.0
 ======
 
 
@@ -144,3 +144,4 @@ The host only needs the kernel modules and, for KVM, a working `/dev/kvm`.
 - **`br_netfilter` missing**: `net.bridge.bridge-nf-call-*` sysctls fail
   with "No such file or directory". Load the module before applying the
   sysctl.
+- **containerd `device_ownership_from_security_context` disabled**: k3s ships it off; without the `config-v3.toml.d/10-cozystack-cri.toml` drop-in, KubeVirt's non-root CDI importer cannot open a raw block volume (`blockdev: cannot open /dev/cdi-block-volume: Permission denied`), the DataVolume hangs in `ImportInProgress`, and VMs that reference the disk stay Pending. Apply when KubeVirt is enabled (gated on `cozystack_enable_kubevirt`).
@@ -12,9 +12,7 @@ Supported targets:
 
 Cloud-image users **must** set `cozystack_flush_iptables: true` for multi-master k3s to bootstrap — Ubuntu cloud images ship with `REJECT icmp-host-prohibited` in INPUT that blocks etcd peer port 2380 between nodes. See **Node Prerequisites → Known limitations** below.
 
-Deploys the Cozystack operator and Platform Package using the
-`kubernetes.core.helm` module with automatic Helm and helm-diff
-installation.
+Deploys the Cozystack operator and Platform Package using the `kubernetes.core.helm` module with automatic Helm and helm-diff installation.
 
 ## Prerequisites
 
@@ -30,9 +28,7 @@ ansible-galaxy collection install --requirements-file requirements.yml
 
 - SSH access to the target nodes
 
-The role automatically installs Helm and the
-[helm-diff](https://github.com/databus23/helm-diff) plugin
-on the control-plane node. No manual Helm installation is needed.
+The role automatically installs Helm and the [helm-diff](https://github.com/databus23/helm-diff) plugin on the control-plane node. No manual Helm installation is needed.
 
 ### Node Prerequisites
 
@@ -168,11 +164,25 @@ tun
 kvm_intel  # or kvm_amd depending on the CPU
 ```
 
+#### Enabled by default: containerd device ownership for CDI block imports
+
+When KubeVirt is enabled, the prepare playbook drops a containerd CRI config that sets `device_ownership_from_security_context = true`. KubeVirt's CDI (Containerized Data Importer) writes VM disk images into raw **block** volumes from a non-root importer pod; containerd only chowns the block device to the pod's `SecurityContext` UID/GID when this option is on, and k3s ships it disabled. Without it the importer fails with `blockdev: cannot open /dev/cdi-block-volume: Permission denied`, the `DataVolume` is stuck in `ImportInProgress`, and every VM that references the disk stays `Pending` — one of the silent "VMs stuck in Pending" failure modes called out above.
+
+Written as a drop-in that containerd merges on top of k3s's generated `config.toml`:
+
+```text
+/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d/10-cozystack-cri.toml
+```
+
+`config-v3.toml.d` and the `io.containerd.cri.v1.runtime` plugin table are the containerd 2.x (config version 3) paths shipped by current k3s (the example inventories pin `k3s_version: v1.36.1+k3s1`), and the drop-in content is hardcoded for that — `version = 3` and the v3 table. `cozystack_k3s_containerd_dropin_dir` only relocates the file; it does not rewrite the content. So on a containerd 1.x cluster (older k3s) this drop-in does not apply as-is — write your own under `config.toml.d/` with `version = 2` and the `io.containerd.grpc.v1.cri` table. The drop-in is read at first k3s start in the full pipeline; on a re-run against a running cluster a handler restarts k3s so the change takes effect.
+
+k3s also exposes a native `--nonroot-devices` flag (valid on both server and agent) that sets the same containerd option. This collection uses the config drop-in instead because it applies uniformly to every node in the `cluster` group — including agent/worker nodes, for which the example playbooks do not wire `extra_agent_args` — and because it can be applied to an already-running cluster, which an install-time k3s flag cannot.
+
+The restart handler only fires when the drop-in is first created or its content changes; idempotent re-runs leave k3s untouched. When it does fire, `systemctl restart k3s` (or `k3s-agent`) briefly disrupts the control plane and the node's workloads on that host, so apply such a change in a maintenance window rather than casually mid-day.
+
 #### Known limitations
 
-ZFS support depends on the OS ecosystem and kernel flavor. The prepare
-playbooks skip ZFS automation gracefully in these cases and emit an
-informational notice:
+ZFS support depends on the OS ecosystem and kernel flavor. The prepare playbooks skip ZFS automation gracefully in these cases and emit an informational notice:
 
 | OS / kernel | ZFS automation | Reason |
 | --- | --- | --- |
@@ -213,9 +223,7 @@ Enable and start:
 
 #### iptables (cloud providers)
 
-Cloud providers (OCI, AWS, GCP) may ship images with restrictive iptables
-INPUT rules that block inter-node Kubernetes traffic (API 6443, kubelet 10250,
-etcd 2379-2380) even when security groups allow it.
+Cloud providers (OCI, AWS, GCP) may ship images with restrictive iptables INPUT rules that block inter-node Kubernetes traffic (API 6443, kubelet 10250, etcd 2379-2380) even when security groups allow it.
 
 Fix: flush the INPUT chain and set policy to ACCEPT before deploying k3s.
 
@@ -249,11 +257,7 @@ cluster-cidr: 10.42.0.0/16
 service-cidr: 10.43.0.0/16
 ```
 
-These CIDRs are the k3s defaults. The example prepare playbooks
-(e.g., `examples/ubuntu/prepare-ubuntu.yml`) set them via the
-`server_config_yaml` variable used by `k3s.orchestration`. The role
-variables `cozystack_pod_cidr` and `cozystack_svc_cidr` must match —
-they default to the same values.
+These CIDRs are the k3s defaults. The example prepare playbooks (e.g., `examples/ubuntu/prepare-ubuntu.yml`) set them via the `server_config_yaml` variable used by `k3s.orchestration`. The role variables `cozystack_pod_cidr` and `cozystack_svc_cidr` must match — they default to the same values.
 
 ## Installation
 
@@ -273,8 +277,7 @@ collections:
 
 ## Quick start
 
-1. Create your environment (pick your distro — see `examples/ubuntu/`,
-   `examples/rhel/`, or `examples/suse/`):
+1. Create your environment (pick your distro — see `examples/ubuntu/`, `examples/rhel/`, or `examples/suse/`):
 
 ```text
 my-env/
@@ -314,9 +317,7 @@ Both stages are handled automatically by the `cozystack` role.
 
 ## Role: cozystack.installer.cozystack
 
-Installs Cozystack via the official `cozy-installer` Helm chart using
-the `kubernetes.core.helm` module with automatic Helm and helm-diff
-installation.
+Installs Cozystack via the official `cozy-installer` Helm chart using the `kubernetes.core.helm` module with automatic Helm and helm-diff installation.
 
 Runs on `server[0]` only.
 
@@ -353,14 +354,13 @@ Runs on `server[0]` only.
 
 ### Example playbook variables
 
-These variables are consumed only by the example prepare playbooks in
-`examples/*/`, not by the role itself. Set them as inventory host/group
-vars to opt out of the corresponding prepare step:
+These variables are consumed only by the example prepare playbooks in `examples/*/`, not by the role itself. Set them as inventory host/group vars to opt out of the corresponding prepare step:
 
 | Variable | Default | Description |
 | --- | --- | --- |
 | `cozystack_enable_zfs` | `true` | Example playbooks: install ZFS userspace and load the module. Set `false` to skip. |
-| `cozystack_enable_kubevirt` | `true` | Example playbooks: load KubeVirt kernel modules. Set `false` to skip. |
+| `cozystack_enable_kubevirt` | `true` | Example playbooks: load KubeVirt kernel modules **and** install the containerd `device_ownership_from_security_context` drop-in for CDI block imports. Set `false` to skip both. |
+| `cozystack_k3s_containerd_dropin_dir` | `/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d` | Example playbooks: directory for the containerd CRI drop-in (gated on `cozystack_enable_kubevirt`). Only relocates the file — the drop-in content is hardcoded for containerd 2.x (config v3); a containerd 1.x cluster needs a hand-written `config.toml.d` drop-in instead. |
 | `cozystack_flush_iptables` | `false` | Example playbooks: flush the iptables INPUT chain before k3s installs. Set `true` on Ubuntu/Debian cloud images (OCI/AWS/GCP) where the default INPUT chain ends with `REJECT icmp-host-prohibited` and blocks k3s inter-node ports 2380/6443. |
 | `cozystack_zfs_release_rpm_extra` | `{}` | `examples/rhel/` only: merged on top of the built-in `cozystack_zfs_release_rpm_by_major` dict, so you can add (or override) a single EL-major → OpenZFS release RPM entry from inventory without wiping the base dict. Example: `{"10": "https://zfsonlinux.org/epel/zfs-release-X-Y.el10.noarch.rpm"}` once upstream ships one. |
 | `cozystack_enable_drbd_dkms` | `true` | `examples/ubuntu/` only: install `drbd-dkms` from the LINBIT PPA on Ubuntu LTS 22.04 / 24.04 hosts so DRBD's kernel module is signed via dkms+shim under Secure Boot. Set `false` on Talos hosts (Talos ships pre-signed DRBD modules in extensions) or where Secure Boot is disabled and the in-cluster compile path is preferred. The toggle stops *future* installs but does NOT undo a prior install — manually `apt purge drbd-dkms` and remove the LINBIT entry from `/etc/apt/sources.list.d/` if you flipped to `false` after a successful run. |
@@ -371,8 +371,7 @@ vars to opt out of the corresponding prepare step:
 
 This collection is designed to work alongside [k3s.orchestration](https://github.com/k3s-io/k3s-ansible). The inventory structure (groups: `cluster`, `server`, `agent`) is fully compatible.
 
-Example full pipeline (`site.yml`) — see `examples/ubuntu/`, `examples/rhel/`,
-or `examples/suse/`:
+Example full pipeline (`site.yml`) — see `examples/ubuntu/`, `examples/rhel/`, or `examples/suse/`:
 
 ```yaml
 - name: Prepare nodes
@@ -393,12 +392,9 @@ On cloud providers with NAT (OCI, AWS, GCP), nodes have internal IPs different f
 
 ### Multi-master setup (kube-ovn RAFT)
 
-Kube-ovn requires `MASTER_NODES` — a comma-separated list of all
-control-plane node IPs for OVN RAFT consensus. By default, the role
-auto-detects these IPs from the `server` inventory group host keys.
+Kube-ovn requires `MASTER_NODES` — a comma-separated list of all control-plane node IPs for OVN RAFT consensus. By default, the role auto-detects these IPs from the `server` inventory group host keys.
 
-This works when host keys are internal IPs (the recommended inventory
-pattern):
+This works when host keys are internal IPs (the recommended inventory pattern):
 
 ```yaml
 server:
@@ -409,30 +405,19 @@ server:
       ansible_host: 203.0.113.11
 ```
 
-If your inventory uses hostnames or non-IP host keys, set
-`cozystack_master_nodes` explicitly:
+If your inventory uses hostnames or non-IP host keys, set `cozystack_master_nodes` explicitly:
 
 ```yaml
 cozystack_master_nodes: "10.0.0.10,10.0.0.11,10.0.0.12"
 ```
 
 ### Automatic Helm installation
 
-The role installs Helm and the
-[helm-diff](https://github.com/databus23/helm-diff) plugin on the
-target node automatically. The `helm-diff` plugin enables true
-idempotency — repeated runs report no changes when the release is
-already up to date.
+The role installs Helm and the [helm-diff](https://github.com/databus23/helm-diff) plugin on the target node automatically. The `helm-diff` plugin enables true idempotency — repeated runs report no changes when the release is already up to date.
 
 ### Customizing variables
 
-The example prepare playbooks define internal variables (like
-`cozystack_k3s_server_args`) in the play `vars` section. User-facing
-variables such as `cozystack_k3s_extra_args` and
-`cozystack_flush_iptables` should be set **in the inventory**, not in
-the playbook. Ansible play `vars` take precedence over inventory
-variables, so defining them in both places causes the inventory values
-to be silently ignored.
+The example prepare playbooks define internal variables (like `cozystack_k3s_server_args`) in the play `vars` section. User-facing variables such as `cozystack_k3s_extra_args` and `cozystack_flush_iptables` should be set **in the inventory**, not in the playbook. Ansible play `vars` take precedence over inventory variables, so defining them in both places causes the inventory values to be silently ignored.
 
 ### Idempotency
 
 
@@ -122,6 +122,29 @@
         state: restarted
       failed_when: false  # tolerated: same reason as the enable task below
 
+    # Refresh service facts on the same notify topic so the restart
+    # handler below sees the current unit set. Defined first, so it runs
+    # first (handlers fire in definition order, not notify order).
+    - name: Refresh service facts before k3s restart
+      ansible.builtin.service_facts:
+      listen: Restart k3s to apply containerd config
+
+    - name: Restart k3s to apply containerd config
+      ansible.builtin.systemd:
+        name: "{{ item }}"
+        state: restarted
+      loop:
+        - k3s
+        - k3s-agent
+      # Restart only the unit that exists on this node: a server runs
+      # k3s, an agent runs k3s-agent, and on a full-pipeline run neither
+      # exists yet when prepare runs (the drop-in is read at first k3s
+      # start instead). service_facts keys systemd units with the
+      # .service suffix. A unit that IS present but fails to restart
+      # still fails the play — a malformed drop-in or a k3s that will not
+      # come back is surfaced, not masked by failed_when: false.
+      when: (item ~ '.service') in ansible_facts.services
+
   tasks:
     - name: Create k3s_cluster group for k3s.orchestration
       ansible.builtin.group_by:
@@ -188,6 +211,58 @@
              | map(attribute='item')
              | list }}
 
+    # CDI (Containerized Data Importer) streams VM disk images into raw
+    # block volumes from a NON-root importer pod. containerd only chowns
+    # the block device to the pod's SecurityContext UID/GID when
+    # device_ownership_from_security_context is enabled on the CRI
+    # plugin, and k3s ships it disabled. Without it the importer dies
+    # with "blockdev: cannot open /dev/cdi-block-volume: Permission
+    # denied", the DataVolume hangs in ImportInProgress, and every VM
+    # that references the disk stays Pending.
+    #
+    # The drop-in is merged by containerd on top of k3s's generated
+    # config.toml via the config-v3.toml.d import glob — read at first
+    # k3s start (full pipeline) or applied by the handler on re-runs
+    # against a running cluster. config-v3.toml.d and
+    # io.containerd.cri.v1.runtime are the containerd 2.x (config
+    # version 3) paths shipped by current k3s, and the content is
+    # hardcoded for that schema. cozystack_k3s_containerd_dropin_dir
+    # only relocates the file (e.g. a non-default k3s data-dir); it does
+    # not rewrite the content, so a containerd 1.x cluster needs a
+    # hand-written config.toml.d drop-in (version = 2,
+    # io.containerd.grpc.v1.cri) instead.
+    - name: Ensure k3s containerd config drop-in directory exists
+      ansible.builtin.file:
+        path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}"
+        state: directory
+        mode: "0755"
+      when: cozystack_enable_kubevirt | default(true) | bool
+
+    - name: Enable device_ownership_from_security_context for CDI block imports
+      ansible.builtin.copy:
+        dest: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
+        mode: "0644"
+        content: |
+          version = 3
+
+          [plugins.'io.containerd.cri.v1.runtime']
+            device_ownership_from_security_context = true
+      when: cozystack_enable_kubevirt | default(true) | bool
+      notify: Restart k3s to apply containerd config
+
+    # Reverse the drop-in when KubeVirt is turned off: a host that
+    # carried 10-cozystack-cri.toml from an earlier enabled run would
+    # otherwise keep device_ownership_from_security_context on, so the
+    # host state no longer matches the toggle. Removal notifies the
+    # restart handler so a running cluster drops the setting too. (No-op
+    # when the file was never written — file: absent reports unchanged.)
+    - name: Remove containerd CDI drop-in when KubeVirt is disabled
+      ansible.builtin.file:
+        path: "{{ cozystack_k3s_containerd_dropin_dir | default('/var/lib/rancher/k3s/agent/etc/containerd/config-v3.toml.d') }}/10-cozystack-cri.toml"
+        state: absent
+      when: not (cozystack_enable_kubevirt | default(true) | bool)
+      notify: Restart k3s to apply containerd config
+
     - name: Ensure multipath drop-in directory exists
       ansible.builtin.file:
         path: /etc/multipath/conf.d