|
| 1 | +# Block device write-zeroes |
| 2 | + |
| 3 | +Firecracker supports the `VIRTIO_BLK_F_WRITE_ZEROES` feature, which allows the |
| 4 | +guest to ask the device to zero a range of sectors without transferring a buffer |
| 5 | +of zeros over the virtqueue. Common consumers are `mkfs` (clearing inode tables |
| 6 | +and journals), filesystem snapshots, encrypted-volume initial wipe, and |
| 7 | +`blkdiscard -z` / `blkzeroout` from userspace. |
| 8 | + |
| 9 | +## How it works |
| 10 | + |
| 11 | +For all non-read-only block devices, Firecracker automatically advertises the |
| 12 | +`VIRTIO_BLK_F_WRITE_ZEROES` feature to the guest driver. No API configuration |
| 13 | +is required — write-zeroes support is always-on for writable drives. |
| 14 | + |
| 15 | +Each `VIRTIO_BLK_T_WRITE_ZEROES` request carries a 16-byte segment with a |
| 16 | +`flags` field. Bit 0 (`VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP`) tells the device |
| 17 | +whether it may also deallocate the underlying backing-file blocks. Firecracker |
| 18 | +advertises `write_zeroes_may_unmap=1`, so guests are free to set this flag. |
| 19 | + |
| 20 | +Firecracker translates the guest's UNMAP bit into a `fallocate(2)` mode on the |
| 21 | +backing file: |
| 22 | + |
| 23 | +| UNMAP | fallocate mode | Effect | |
| 24 | +|-------|---------------------------------------------|---------------------------------------| |
| 25 | +| 0 | `FALLOC_FL_ZERO_RANGE \| FALLOC_FL_KEEP_SIZE` | zeros in place, no deallocation | |
| 26 | +| 1 | `FALLOC_FL_PUNCH_HOLE \| FALLOC_FL_KEEP_SIZE` | zeros + deallocate (sparse holes) | |
| 27 | + |
| 28 | +The virtio spec requires that when UNMAP is clear the device MUST NOT |
| 29 | +deallocate sectors (so `ZERO_RANGE` is mandatory for that path); when UNMAP |
| 30 | +is set, the device MAY deallocate, and `PUNCH_HOLE` reads as zeros on every |
| 31 | +filesystem that supports it. |
| 32 | + |
| 33 | +## Host requirements |
| 34 | + |
| 35 | +The backing file must reside on a filesystem that supports the corresponding |
| 36 | +`fallocate` mode: |
| 37 | + |
| 38 | +- `FALLOC_FL_PUNCH_HOLE` (UNMAP=1) is widely supported: ext4, xfs, btrfs, tmpfs. |
| 39 | +- `FALLOC_FL_ZERO_RANGE` (UNMAP=0) is supported on ext4, xfs, btrfs; on tmpfs |
| 40 | + it requires Linux 6.8+. Other filesystems may not support it. |
| 41 | + |
| 42 | +If `fallocate` returns `EOPNOTSUPP` for either mode, Firecracker logs a one-time |
| 43 | +warning and replies with `VIRTIO_BLK_S_UNSUPP`. The Linux virtio-blk driver |
| 44 | +propagates that status through the block layer and stops issuing further |
| 45 | +write-zeroes requests, so subsequent guest writes fall back to plain |
| 46 | +`REQ_OP_WRITE` traffic. Firecracker short-circuits any in-flight write-zeroes |
| 47 | +requests with `VIRTIO_BLK_S_UNSUPP` for the rest of the device's lifetime — no |
| 48 | +additional `fallocate` calls are made. |
| 49 | + |
| 50 | +The EOPNOTSUPP cache is shared across UNMAP=0 and UNMAP=1 paths: a single |
| 51 | +fallback flag disables both. This is conservative — a filesystem that |
| 52 | +supports `PUNCH_HOLE` but not `ZERO_RANGE` will see UNMAP=1 requests rejected |
| 53 | +once an UNMAP=0 request fails — but it matches the discard fallback design |
| 54 | +and avoids subtle host-side state. |
| 55 | + |
| 56 | +## Limitations |
| 57 | + |
| 58 | +- Write-zeroes is only available for non-read-only block devices. |
| 59 | +- At most one segment per request is supported (`max_write_zeroes_seg = 1`). |
| 60 | +- Only bit 0 (UNMAP) of the segment flags is allowed; non-zero reserved bits |
| 61 | + are rejected with an I/O error. |
0 commit comments