runtime: Add annotation-based block device mounting#40
Draft
antoine-gaillard wants to merge 36 commits into
Draft
runtime: Add annotation-based block device mounting#40antoine-gaillard wants to merge 36 commits into
antoine-gaillard wants to merge 36 commits into
Conversation
01bc551 to
f6b741c
Compare
zaymat
reviewed
Jan 22, 2026
b3e0ea1 to
d8481bc
Compare
f6b741c to
cbad5c6
Compare
zaymat
reviewed
Feb 19, 2026
zaymat
left a comment
There was a problem hiding this comment.
overall lgtm. The logic to check we're not mount devices outside of what's defined in the pod spec seems ok.
Small nit on optimization.
Now let's do the Rust part 😁
| var storages []*grpc.Storage | ||
| devicesToRemove := make(map[string]bool) | ||
|
|
||
| for devicePath, mountConfig := range blockMounts { |
There was a problem hiding this comment.
small optimization: if you instead go through container's devices and check if you found an entry for dev.ContainerPath in the blockMounts map, you go through each device once.
With the current option, you have a (# device)*(# auto mount) complexity.
d8481bc to
371561b
Compare
microVM sandbox resources are computed from pod sandbox annotations. In particular, the number of vCPU is calculated by using CPU quota divided by CPU period. However, on clusters where CFS quotas are disabled, or if the pod doesn't specify any limit, the compute size is 0. When using resource hot pluging, the value value will be the size of the CPU set, which doesn't impact the performance of the microVM pod. But when using static sandbox management, the computed value will be 0 and the microVM will be dramatically undersized. This change takes into account CPU shares while computing the number of vCPU, and default the CPU Shares/1024 in case CPU quota and/or periods are zeros.
Co-authored-by: Maxime VISONNEAU <maxime.visonneau@gmail.com>
- Add scratch-based Dockerfile for kata data volume - Move Dockerfile to docker/ subdir and fix config file handling - Fix Dockerfile to extract only essential kata files - Add containerd runtime dropin configuration files 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>
An early call to closing the stdin channel made the stdout & stderr also closed. This waits for stdout & stderr to be properly finished by reading the whole buffer before closing everything. On the other, this also fixes a race condition where it was impossible to run multiple execs until the other one was over. This moves the lock only where it is necessary without locking exec processes. Fixes kata-containers#10387 Signed-off-by: Maxime Bertin <mbertin@luccasoftware.com> Co-authored-by: Maxime Bertin <mbertin@luccasoftware.com>
The WORKFLOW_TOKEN no longer exists, so artefact uploads fail. Use the built-in token instead.
Signed-off-by: Hadrien Patte <hadrien.patte@datadoghq.com>
Add support for [`netkit`](https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?id=22360fad5889cbefe1eca695b0cc0273ab280b56) network devices similarly to how `veth` devices are currently handled. Signed-off-by: Hadrien Patte <hadrien.patte@datadoghq.com>
Netkit devices in L3 mode have no MAC address and require IP routing instead of L2 bridging. Since L3 routing is not currently implemented, reject these devices early with a clear error message directing users to use netkit L2 mode or veth devices instead. Signed-off-by: Hadrien Patte <hadrien.patte@datadoghq.com>
371561b to
4ebe9de
Compare
Add support for mounting block devices (volumeDevices) as filesystems
inside the guest VM via annotation. This allows CSI block mode PVCs to
be automatically mounted by kata-agent, eliminating the need for
privileged containers.
Annotation format:
io.katacontainers.volume.block-mounts: '{"<devicePath>": {"mount": "<path>", "fstype": "<fs>"}}'
Example:
io.katacontainers.volume.block-mounts: '{"/dev/xvda": {"mount": "/data", "fstype": "ext4"}}'
Supported options:
- mount: destination path in container (required)
- fstype: filesystem type - ext4, xfs, btrfs (default: ext4)
- options: mount options array (default: ["rw"])
- fsGroup: optional gid for filesystem group ownership
Implementation:
- Delegates driver/source selection to handleBlockVolume() for DeviceBlock
to ensure proper struct-based detection (e.g., blockDrive.Pmem takes
precedence over config)
- Extracts PCIPath directly for VhostUserBlk devices
- removeDevicesFromOCISpec is a plain function (no receiver needed)
- Comprehensive test coverage including pmem device test proving struct-based
detection works (nvdimm driver despite VirtioBlock config)
Fixes netkit_endpoint_test.go compilation by using proper constructors:
- Use PciPathFromString() instead of struct literal
- Use CcwDeviceFrom() instead of struct literal
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
…ount Fresh ephemeral EBS volumes arrive without a filesystem. The kata-agent fails with EINVAL when trying to mount an unformatted device, and the guest rootfs does not ship mkfs or blkid. Format on the host side in createAnnotationBlockStorages using the host device path from BlockDrive. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Iterate container devices once with map lookup instead of nested loops. Detect duplicate container devices and report unmatched annotation keys with device paths. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a81694e to
4861c33
Compare
4ebe9de to
77e6f1b
Compare
177fa81 to
af4c8f5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem: CSI block mode PVCs are passed as raw device nodes. Containers must mount them manually, requiring
privileged: trueorCAP_SYS_ADMIN. Additionally, new volumes without a snapshot (e.g. fresh EBS) arrive unformatted, and the guest rootfs doesn't shipmkfs/blkid.Solution: New annotation instructs kata-agent to mount the device: