Skip to content

Commit 6207bc8

Browse files
committed
doc(block): document VIRTIO_BLK_F_WRITE_ZEROES support
Add docs/api_requests/block-write-zeroes.md describing: - automatic advertisement on writable devices - UNMAP=0 → FALLOC_FL_ZERO_RANGE (zeros in place) - UNMAP=1 → FALLOC_FL_PUNCH_HOLE (zeros + deallocate) - host filesystem requirements - EOPNOTSUPP fallback (silent VIRTIO_BLK_S_UNSUPP, shared cache) - known limitations Remove the "write_zeroes is not supported" line from block-discard.md now that the feature is implemented. Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
1 parent 85a059d commit 6207bc8

2 files changed

Lines changed: 61 additions & 2 deletions

File tree

docs/api_requests/block-discard.md

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -40,5 +40,3 @@ discard requests with `VIRTIO_BLK_S_UNSUPP` immediately — no additional
4040
- At most one discard segment per request is supported (`max_discard_seg = 1`).
4141
- The discard segment flags field must be zero; non-zero flags are rejected with
4242
an I/O error.
43-
- The `write_zeroes` variant of the feature (`VIRTIO_BLK_T_WRITE_ZEROES`) is not
44-
supported.
Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Block device write-zeroes
2+
3+
Firecracker supports the `VIRTIO_BLK_F_WRITE_ZEROES` feature, which allows the
4+
guest to ask the device to zero a range of sectors without transferring a buffer
5+
of zeros over the virtqueue. Common consumers are `mkfs` (clearing inode tables
6+
and journals), filesystem snapshots, encrypted-volume initial wipe, and
7+
`blkdiscard -z` / `blkzeroout` from userspace.
8+
9+
## How it works
10+
11+
For all non-read-only block devices, Firecracker automatically advertises the
12+
`VIRTIO_BLK_F_WRITE_ZEROES` feature to the guest driver. No API configuration
13+
is required — write-zeroes support is always-on for writable drives.
14+
15+
Each `VIRTIO_BLK_T_WRITE_ZEROES` request carries a 16-byte segment with a
16+
`flags` field. Bit 0 (`VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP`) tells the device
17+
whether it may also deallocate the underlying backing-file blocks. Firecracker
18+
advertises `write_zeroes_may_unmap=1`, so guests are free to set this flag.
19+
20+
Firecracker translates the guest's UNMAP bit into a `fallocate(2)` mode on the
21+
backing file:
22+
23+
| UNMAP | fallocate mode | Effect |
24+
|-------|---------------------------------------------|---------------------------------------|
25+
| 0 | `FALLOC_FL_ZERO_RANGE \| FALLOC_FL_KEEP_SIZE` | zeros in place, no deallocation |
26+
| 1 | `FALLOC_FL_PUNCH_HOLE \| FALLOC_FL_KEEP_SIZE` | zeros + deallocate (sparse holes) |
27+
28+
The virtio spec requires that when UNMAP is clear the device MUST NOT
29+
deallocate sectors (so `ZERO_RANGE` is mandatory for that path); when UNMAP
30+
is set, the device MAY deallocate, and `PUNCH_HOLE` reads as zeros on every
31+
filesystem that supports it.
32+
33+
## Host requirements
34+
35+
The backing file must reside on a filesystem that supports the corresponding
36+
`fallocate` mode:
37+
38+
- `FALLOC_FL_PUNCH_HOLE` (UNMAP=1) is widely supported: ext4, xfs, btrfs, tmpfs.
39+
- `FALLOC_FL_ZERO_RANGE` (UNMAP=0) is supported on ext4, xfs, btrfs; on tmpfs
40+
it requires Linux 6.8+. Other filesystems may not support it.
41+
42+
If `fallocate` returns `EOPNOTSUPP` for either mode, Firecracker logs a one-time
43+
warning and replies with `VIRTIO_BLK_S_UNSUPP`. The Linux virtio-blk driver
44+
propagates that status through the block layer and stops issuing further
45+
write-zeroes requests, so subsequent guest writes fall back to plain
46+
`REQ_OP_WRITE` traffic. Firecracker short-circuits any in-flight write-zeroes
47+
requests with `VIRTIO_BLK_S_UNSUPP` for the rest of the device's lifetime — no
48+
additional `fallocate` calls are made.
49+
50+
The EOPNOTSUPP cache is shared across UNMAP=0 and UNMAP=1 paths: a single
51+
fallback flag disables both. This is conservative — a filesystem that
52+
supports `PUNCH_HOLE` but not `ZERO_RANGE` will see UNMAP=1 requests rejected
53+
once an UNMAP=0 request fails — but it matches the discard fallback design
54+
and avoids subtle host-side state.
55+
56+
## Limitations
57+
58+
- Write-zeroes is only available for non-read-only block devices.
59+
- At most one segment per request is supported (`max_write_zeroes_seg = 1`).
60+
- Only bit 0 (UNMAP) of the segment flags is allowed; non-zero reserved bits
61+
are rejected with an I/O error.

0 commit comments

Comments
 (0)