Skip to content
Open
36 changes: 18 additions & 18 deletions docs/api_requests/block-write-zeroes.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@ and journals), filesystem snapshots, encrypted-volume initial wipe, and
## How it works

For all non-read-only block devices, Firecracker automatically advertises the
`VIRTIO_BLK_F_WRITE_ZEROES` feature to the guest driver. No API configuration
is required — write-zeroes support is always-on for writable drives.
`VIRTIO_BLK_F_WRITE_ZEROES` feature to the guest driver. No API configuration is
required — write-zeroes support is always-on for writable drives.

Each `VIRTIO_BLK_T_WRITE_ZEROES` request carries a 16-byte segment with a
`flags` field. Bit 0 (`VIRTIO_BLK_WRITE_ZEROES_FLAG_UNMAP`) tells the device
Expand All @@ -20,24 +20,24 @@ advertises `write_zeroes_may_unmap=1`, so guests are free to set this flag.
Firecracker translates the guest's UNMAP bit into a `fallocate(2)` mode on the
backing file:

| UNMAP | fallocate mode | Effect |
|-------|---------------------------------------------|---------------------------------------|
| 0 | `FALLOC_FL_ZERO_RANGE \| FALLOC_FL_KEEP_SIZE` | zeros in place, no deallocation |
| 1 | `FALLOC_FL_PUNCH_HOLE \| FALLOC_FL_KEEP_SIZE` | zeros + deallocate (sparse holes) |
| UNMAP | fallocate mode | Effect |
| ----- | --------------------------------------------- | --------------------------------- |
| 0 | `FALLOC_FL_ZERO_RANGE \| FALLOC_FL_KEEP_SIZE` | zeros in place, no deallocation |
| 1 | `FALLOC_FL_PUNCH_HOLE \| FALLOC_FL_KEEP_SIZE` | zeros + deallocate (sparse holes) |

The virtio spec requires that when UNMAP is clear the device MUST NOT
deallocate sectors (so `ZERO_RANGE` is mandatory for that path); when UNMAP
is set, the device MAY deallocate, and `PUNCH_HOLE` reads as zeros on every
filesystem that supports it.
The virtio spec requires that when UNMAP is clear the device MUST NOT deallocate
sectors (so `ZERO_RANGE` is mandatory for that path); when UNMAP is set, the
device MAY deallocate, and `PUNCH_HOLE` reads as zeros on every filesystem that
supports it.

## Host requirements

The backing file must reside on a filesystem that supports the corresponding
`fallocate` mode:

- `FALLOC_FL_PUNCH_HOLE` (UNMAP=1) is widely supported: ext4, xfs, btrfs, tmpfs.
- `FALLOC_FL_ZERO_RANGE` (UNMAP=0) is supported on ext4, xfs, btrfs; on tmpfs
it requires Linux 6.8+. Other filesystems may not support it.
- `FALLOC_FL_ZERO_RANGE` (UNMAP=0) is supported on ext4, xfs, btrfs; on tmpfs it
requires Linux 6.8+. Other filesystems may not support it.

If `fallocate` returns `EOPNOTSUPP` for either mode, Firecracker logs a one-time
warning and replies with `VIRTIO_BLK_S_UNSUPP`. The Linux virtio-blk driver
Expand All @@ -48,14 +48,14 @@ requests with `VIRTIO_BLK_S_UNSUPP` for the rest of the device's lifetime — no
additional `fallocate` calls are made.

The EOPNOTSUPP cache is shared across UNMAP=0 and UNMAP=1 paths: a single
fallback flag disables both. This is conservative — a filesystem that
supports `PUNCH_HOLE` but not `ZERO_RANGE` will see UNMAP=1 requests rejected
once an UNMAP=0 request fails — but it matches the discard fallback design
and avoids subtle host-side state.
fallback flag disables both. This is conservative — a filesystem that supports
`PUNCH_HOLE` but not `ZERO_RANGE` will see UNMAP=1 requests rejected once an
UNMAP=0 request fails — but it matches the discard fallback design and avoids
subtle host-side state.

## Limitations

- Write-zeroes is only available for non-read-only block devices.
- At most one segment per request is supported (`max_write_zeroes_seg = 1`).
- Only bit 0 (UNMAP) of the segment flags is allowed; non-zero reserved bits
are rejected with an I/O error.
- Only bit 0 (UNMAP) of the segment flags is allowed; non-zero reserved bits are
rejected with an I/O error.
31 changes: 15 additions & 16 deletions docs/ballooning.md
Original file line number Diff line number Diff line change
Expand Up @@ -468,22 +468,21 @@ your scenario.
#### `VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK`

Whenever `free_page_hinting` is enabled, Firecracker also advertises
`VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK` (bit 6). When negotiated, the guest
driver waits for the device to signal-used each hint buffer before
pushing the corresponding page onto its internal free list — closing
the data-loss race described in the warning above without any host-side
protocol change.

The bit only takes effect on guests whose kernel carries the supporting
patch (Jack Thomson's `virtio_balloon: Support wait on ACK for hinting`,
not yet upstream as of this writing). On unsupported guests the driver
self-clears the bit during `validate`, so the advertise is ignored and
hinting falls back to the unsynchronised behaviour. There is no separate
configuration knob — opting into `free_page_hinting` is sufficient.

Note that the per-buffer round trip introduces extra wait time per hint
cycle on supported guests; the safety/perf trade-off is intentional and
documented at the kernel-patch level.
`VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK` (bit 6). When negotiated, the guest driver
waits for the device to signal-used each hint buffer before pushing the
corresponding page onto its internal free list — closing the data-loss race
described in the warning above without any host-side protocol change.

The bit only takes effect on guests whose kernel carries the supporting patch
(Jack Thomson's `virtio_balloon: Support wait on ACK for hinting`, not yet
upstream as of this writing). On unsupported guests the driver self-clears the
bit during `validate`, so the advertise is ignored and hinting falls back to the
unsynchronised behaviour. There is no separate configuration knob — opting into
`free_page_hinting` is sufficient.

Note that the per-buffer round trip introduces extra wait time per hint cycle on
supported guests; the safety/perf trade-off is intentional and documented at the
kernel-patch level.

## Balloon Caveats

Expand Down
3 changes: 3 additions & 0 deletions src/firecracker/src/api_server/request/memory_info.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,6 @@
// Copyright 2026 Amazon.com, Inc. or its affiliates. All Rights Reserved.
// SPDX-License-Identifier: Apache-2.0

use micro_http::Method;
use vmm::rpc_interface::VmmAction;

Expand Down
21 changes: 20 additions & 1 deletion src/firecracker/src/api_server/request/snapshot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,8 @@ fn parse_put_snapshot_load(body: &Body) -> Result<ParsedRequest, RequestError> {
resume_vm: snapshot_config.resume_vm,
network_overrides: snapshot_config.network_overrides,
clock_realtime: snapshot_config.clock_realtime,
#[cfg(feature = "gdb")]
gdb_socket_path: snapshot_config.gdb_socket_path,
};

// Construct the `ParsedRequest` object.
Expand Down Expand Up @@ -198,6 +200,8 @@ mod tests {
resume_vm: false,
network_overrides: vec![],
clock_realtime: false,
#[cfg(feature = "gdb")]
gdb_socket_path: None,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -230,6 +234,8 @@ mod tests {
resume_vm: false,
network_overrides: vec![],
clock_realtime: false,
#[cfg(feature = "gdb")]
gdb_socket_path: None,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -262,6 +268,8 @@ mod tests {
resume_vm: true,
network_overrides: vec![],
clock_realtime: false,
#[cfg(feature = "gdb")]
gdb_socket_path: None,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -303,6 +311,8 @@ mod tests {
host_dev_name: String::from("vmtap2"),
}],
clock_realtime: false,
#[cfg(feature = "gdb")]
gdb_socket_path: None,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -332,6 +342,8 @@ mod tests {
resume_vm: true,
network_overrides: vec![],
clock_realtime: false,
#[cfg(feature = "gdb")]
gdb_socket_path: None,
};
let parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert_eq!(
Expand Down Expand Up @@ -435,9 +447,16 @@ mod tests {
resume_vm: false,
network_overrides: vec![],
clock_realtime: false,
#[cfg(feature = "gdb")]
gdb_socket_path: None,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(parsed_request.parsing_info().take_deprecation_message().is_none());
assert!(
parsed_request
.parsing_info()
.take_deprecation_message()
.is_none()
);
assert_eq!(
vmm_action_from_request(parsed_request),
VmmAction::LoadSnapshot(expected_config)
Expand Down
7 changes: 7 additions & 0 deletions src/firecracker/swagger/firecracker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1758,6 +1758,13 @@ definitions:
elapsed since the snapshot was taken. When false (default), kvmclock resumes
from where it was at snapshot time. This option may be extended to other clock
sources and CPU architectures in the future."
gdb_socket_path:
type: string
description:
"Only available when Firecracker is built with the `gdb` feature. When set,
start the GDB server on this unix socket for the restored guest, for
source-level debugging of the guest kernel. Debug builds only; not for
production."


TokenBucket:
Expand Down
7 changes: 7 additions & 0 deletions src/snapshot-editor/src/info.rs
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,13 @@ fn info_vcpu_states(snapshot: &Snapshot<MicrovmState>) -> Result<(), InfoVmState
for (i, state) in snapshot.data.vcpu_states.iter().enumerate() {
println!("vcpu {i}:");
println!("{state:#?}");
// The derived Debug of `saved_msrs` only shows the kvm_msrs headers, not
// the entries (a FAM array). Print index/data so tooling can read MSR
// values (e.g. LSTAR, to recover the KASLR slide from a snapshot).
#[cfg(target_arch = "x86_64")]
for entry in state.saved_msrs.iter().flat_map(|m| m.as_slice()) {
println!(" msr index={:#x} data={:#x}", entry.index, entry.data);
}
}
Ok(())
}
Expand Down
2 changes: 1 addition & 1 deletion src/vmm/src/arch/aarch64/gic/mod.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,9 @@ mod regs;

use gicv2::GICv2;
use gicv3::GICv3;
pub use gicv3::regs::its_regs::ItsRegisterState;
use kvm_ioctls::{DeviceFd, VmFd};
pub use regs::{GicRegState, GicState, GicVcpuState, VgicSysRegsState};
pub use gicv3::regs::its_regs::ItsRegisterState;

use super::layout;

Expand Down
61 changes: 58 additions & 3 deletions src/vmm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,18 @@ impl std::convert::From<linux_loader::cmdline::Error> for StartMicrovmError {
}
}

/// Resolves the GDB unix socket path. An explicit `machine-config.gdb_socket_path`
/// takes precedence; otherwise fall back to the `FIRECRACKER_GDB_SOCKET` environment
/// variable. The env fallback lets tooling that launches Firecracker (e.g. the e2b
/// orchestrator / resume-build, which inherit the environment) enable GDB without
/// setting machine-config.
#[cfg(feature = "gdb")]
fn resolve_gdb_socket_path(configured: &Option<String>) -> Option<String> {
configured
.clone()
.or_else(|| std::env::var("FIRECRACKER_GDB_SOCKET").ok())
}

/// Builds and starts a microVM based on the current Firecracker VmResources configuration.
///
/// The built microVM and all the created vCPUs start off in the paused state.
Expand Down Expand Up @@ -343,9 +355,16 @@ pub fn build_microvm_for_boot(
.map_err(VmmError::VcpuStart)?;

#[cfg(feature = "gdb")]
if let Some(gdb_socket_path) = &vm_resources.machine_config.gdb_socket_path {
gdb::gdb_thread(vmm.clone(), gdb_rx, entry_point.entry_addr, gdb_socket_path)
.map_err(StartMicrovmError::GdbServer)?;
if let Some(gdb_socket_path) =
resolve_gdb_socket_path(&vm_resources.machine_config.gdb_socket_path)
{
gdb::gdb_thread(
vmm.clone(),
gdb_rx,
entry_point.entry_addr,
&gdb_socket_path,
)
.map_err(StartMicrovmError::GdbServer)?;
} else {
debug!("No GDB socket provided not starting gdb server.");
}
Expand Down Expand Up @@ -528,6 +547,31 @@ pub fn build_microvm_from_snapshot(
page_size: vm_resources.machine_config.huge_pages.page_size(),
};

// GDB debug support for restored microVMs (x86_64 only). Mirror the boot
// path: attach the debug-event channel to every restored vCPU before they
// start, then start the GDB server thread once the vCPUs are running. The
// server arms a hardware breakpoint at the restored instruction pointer so
// GDB takes control at the resume point on the first continue.
//
// Only wire the channel up when a GDB socket is actually configured: with no
// socket, no server thread drains the receiver, so a vCPU debug event would
// `send` on a dropped receiver and panic. Gating the attach keeps the channel
// paired with its consumer (and leaves the vCPUs' gdb_event as None otherwise).
#[cfg(all(feature = "gdb", target_arch = "x86_64"))]
let gdb_socket_path =
resolve_gdb_socket_path(&vm_resources.machine_config.gdb_socket_path);

#[cfg(all(feature = "gdb", target_arch = "x86_64"))]
let gdb_rx = if gdb_socket_path.is_some() {
let (gdb_tx, gdb_rx) = mpsc::channel();
vcpus
.iter_mut()
.for_each(|vcpu| vcpu.attach_debug_info(gdb_tx.clone()));
Some(gdb_rx)
} else {
None
};

// Move vcpus to their own threads and start their state machine in the 'Paused' state.
vmm.start_vcpus(
vcpus,
Expand All @@ -540,6 +584,17 @@ pub fn build_microvm_from_snapshot(
let vmm = Arc::new(Mutex::new(vmm));
event_manager.add_subscriber(vmm.clone());

#[cfg(all(feature = "gdb", target_arch = "x86_64"))]
if let Some(gdb_socket_path) = gdb_socket_path {
// On restore the vCPUs resume at their saved RIP; arm the entry
// breakpoint there so GDB stops at the resume point.
let entry_addr = GuestAddress(microvm_state.vcpu_states[0].regs.rip);
gdb::gdb_thread(vmm.clone(), gdb_rx.unwrap(), entry_addr, &gdb_socket_path)
.map_err(StartMicrovmError::GdbServer)?;
} else {
debug!("No GDB socket provided not starting gdb server.");
}
Comment thread
cursor[bot] marked this conversation as resolved.

// Load seccomp filters for the VMM thread.
// Keep this as the last step of the building process.
crate::seccomp::apply_filter(
Expand Down
12 changes: 6 additions & 6 deletions src/vmm/src/devices/virtio/balloon/device.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@ use super::{
MIB_TO_4K_PAGES, STATS_INDEX, VIRTIO_BALLOON_F_DEFLATE_ON_OOM,
VIRTIO_BALLOON_F_FREE_PAGE_HINTING, VIRTIO_BALLOON_F_FREE_PAGE_REPORTING,
VIRTIO_BALLOON_F_HINT_WAIT_ON_ACK, VIRTIO_BALLOON_F_STATS_VQ, VIRTIO_BALLOON_PFN_SHIFT,
VIRTIO_BALLOON_S_ALLOC_STALL,
VIRTIO_BALLOON_S_ASYNC_RECLAIM, VIRTIO_BALLOON_S_ASYNC_SCAN, VIRTIO_BALLOON_S_AVAIL,
VIRTIO_BALLOON_S_CACHES, VIRTIO_BALLOON_S_DIRECT_RECLAIM, VIRTIO_BALLOON_S_DIRECT_SCAN,
VIRTIO_BALLOON_S_HTLB_PGALLOC, VIRTIO_BALLOON_S_HTLB_PGFAIL, VIRTIO_BALLOON_S_MAJFLT,
VIRTIO_BALLOON_S_MEMFREE, VIRTIO_BALLOON_S_MEMTOT, VIRTIO_BALLOON_S_MINFLT,
VIRTIO_BALLOON_S_OOM_KILL, VIRTIO_BALLOON_S_SWAP_IN, VIRTIO_BALLOON_S_SWAP_OUT,
VIRTIO_BALLOON_S_ALLOC_STALL, VIRTIO_BALLOON_S_ASYNC_RECLAIM, VIRTIO_BALLOON_S_ASYNC_SCAN,
VIRTIO_BALLOON_S_AVAIL, VIRTIO_BALLOON_S_CACHES, VIRTIO_BALLOON_S_DIRECT_RECLAIM,
VIRTIO_BALLOON_S_DIRECT_SCAN, VIRTIO_BALLOON_S_HTLB_PGALLOC, VIRTIO_BALLOON_S_HTLB_PGFAIL,
VIRTIO_BALLOON_S_MAJFLT, VIRTIO_BALLOON_S_MEMFREE, VIRTIO_BALLOON_S_MEMTOT,
VIRTIO_BALLOON_S_MINFLT, VIRTIO_BALLOON_S_OOM_KILL, VIRTIO_BALLOON_S_SWAP_IN,
VIRTIO_BALLOON_S_SWAP_OUT,
};
use crate::devices::virtio::balloon::BalloonError;
use crate::devices::virtio::device::ActiveState;
Expand Down
3 changes: 2 additions & 1 deletion src/vmm/src/devices/virtio/block/virtio/device.rs
Original file line number Diff line number Diff line change
Expand Up @@ -197,7 +197,8 @@ pub struct ConfigSpace {
pub max_write_zeroes_seg: u32, // offset 52
pub write_zeroes_may_unmap: u8, // offset 56
pub(crate) _unused1: [u8; 3], // offset 57 (spec field — virtio_blk_config.unused1)
pub(crate) _pad: [u8; 4], // offset 60 (Rust alignment padding to 64; spec ends at 60)
pub(crate) _pad: [u8; 4], /* offset 60 (Rust alignment padding to 64; spec ends
* at 60) */
}
const _: () = assert!(std::mem::size_of::<ConfigSpace>() == 64);
// Compile-time guards against accidental layout drift. The byte offsets here
Expand Down
7 changes: 1 addition & 6 deletions src/vmm/src/devices/virtio/block/virtio/io/sync_io.rs
Original file line number Diff line number Diff line change
Expand Up @@ -101,12 +101,7 @@ impl SyncFileEngine {
}
}

pub fn write_zeroes(
&mut self,
offset: u64,
len: u64,
unmap: bool,
) -> Result<(), SyncIoError> {
pub fn write_zeroes(&mut self, offset: u64, len: u64, unmap: bool) -> Result<(), SyncIoError> {
// UNMAP=1 reuses PUNCH_HOLE (the spec lets the device deallocate);
// UNMAP=0 must zero in place without deallocating, so use ZERO_RANGE.
let mode = if unmap {
Expand Down
Loading
Loading