Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,6 +50,12 @@ and this project adheres to
balloon statistics descriptor length to prevent a guest-controlled oversized
descriptor from temporarily stalling the VMM event loop. Only affects microVMs
with `stats_polling_interval_s > 0`.
- [#5809](https://github.com/firecracker-microvm/firecracker/pull/5809): Fixed a
bug on host Linux >= 5.16 for x86_64 guests using the `kvm-clock` clock source
causing the monotonic clock to jump on restore by the wall-clock time elapsed
since the snapshot was taken. Users using `kvm-clock` that want to explicitly
advance the clock with `KVM_CLOCK_REALTIME` can opt back in using the new
`clock_realtime` flag in `LoadSnapshot` API.

## [1.15.0]

Expand Down
6 changes: 5 additions & 1 deletion docs/design.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,7 +118,11 @@ and/or creating their own custom CPU templates.

#### Clocksources available to guests

Firecracker only exposes kvm-clock to customers.
Firecracker exposes the following clock sources to guests:

- x86_64: kvm-clock and tsc. Linux guests >=5.10 will pick tsc by default if
stable.
- aarch64: arch_sys_counter

### I/O: Storage, Networking and Rate Limiting

Expand Down
5 changes: 5 additions & 0 deletions docs/snapshotting/snapshot-support.md
Original file line number Diff line number Diff line change
Expand Up @@ -493,6 +493,11 @@ resumed with the guest OS wall-clock continuing from the moment of the snapshot
creation. For this reason, the wall-clock should be updated to the current time,
on the guest-side. More details on how you could do this can be found at a
[related FAQ](../../FAQ.md#my-guest-wall-clock-is-drifting-how-can-i-fix-it).
When using `kvm-clock` as clock source on `x86_64`, it's possible to optionally
set the `clock_realtime: true` in the `LoadSnapshot` request to advance the
clock on the guest at restore time (host Linux >= 5.16 is required to support
this feature). Note that this may cause issues within the guest as the clock
will appear to suddenly jump.

## Provisioning host disk space for snapshots

Expand Down
6 changes: 6 additions & 0 deletions src/firecracker/src/api_server/request/snapshot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,7 @@ fn parse_put_snapshot_load(body: &Body) -> Result<ParsedRequest, RequestError> {
resume_vm: snapshot_config.resume_vm,
network_overrides: snapshot_config.network_overrides,
vsock_override: snapshot_config.vsock_override,
clock_realtime: snapshot_config.clock_realtime,
};

// Construct the `ParsedRequest` object.
Expand Down Expand Up @@ -189,6 +190,7 @@ mod tests {
resume_vm: false,
network_overrides: vec![],
vsock_override: None,
clock_realtime: false,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -220,6 +222,7 @@ mod tests {
resume_vm: false,
network_overrides: vec![],
vsock_override: None,
clock_realtime: false,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -251,6 +254,7 @@ mod tests {
resume_vm: true,
network_overrides: vec![],
vsock_override: None,
clock_realtime: false,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -291,6 +295,7 @@ mod tests {
host_dev_name: String::from("vmtap2"),
}],
vsock_override: None,
clock_realtime: false,
};
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert!(
Expand Down Expand Up @@ -319,6 +324,7 @@ mod tests {
resume_vm: true,
network_overrides: vec![],
vsock_override: None,
clock_realtime: false,
};
let parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
assert_eq!(
Expand Down
10 changes: 9 additions & 1 deletion src/firecracker/swagger/firecracker.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -1602,7 +1602,7 @@ definitions:
type: string
description:
The new path for the backing Unix Domain Socket.

SnapshotLoadParams:
type: object
description:
Expand Down Expand Up @@ -1650,6 +1650,14 @@ definitions:
for restoring a snapshot with a different socket path than the one used
when the snapshot was created. For example, when the original socket path
is no longer available or when deploying to a different environment.
clock_realtime:
type: boolean
description:
"[x86_64 only] When set to true, passes KVM_CLOCK_REALTIME to
KVM_SET_CLOCK on restore, advancing kvmclock by the wall-clock time
elapsed since the snapshot was taken. When false (default), kvmclock resumes
from where it was at snapshot time. This option may be extended to other clock
sources and CPU architectures in the future."


TokenBucket:
Expand Down
83 changes: 68 additions & 15 deletions src/vmm/src/arch/x86_64/vm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ use std::fmt;
use std::sync::{Arc, Mutex};

use kvm_bindings::{
KVM_CLOCK_TSC_STABLE, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
KVM_CLOCK_REALTIME, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
KVM_PIT_SPEAKER_DUMMY, MsrList, kvm_clock_data, kvm_irqchip, kvm_pit_config, kvm_pit_state2,
};
use kvm_ioctls::Cap;
Expand All @@ -30,6 +30,8 @@ pub enum ArchVmError {
SetPit2(kvm_ioctls::Error),
/// Set clock error: {0}
SetClock(kvm_ioctls::Error),
/// clock_realtime requested but not present in the snapshot state
ClockRealtimeNotInState,
/// Set IrqChipPicMaster error: {0}
SetIrqChipPicMaster(kvm_ioctls::Error),
/// Set IrqChipPicSlave error: {0}
Expand Down Expand Up @@ -127,13 +129,25 @@ impl ArchVm {
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
pub fn restore_state(&mut self, state: &VmState) -> Result<(), ArchVmError> {
pub fn restore_state(
&mut self,
state: &VmState,
clock_realtime: bool,
) -> Result<(), ArchVmError> {
self.fd()
.set_pit2(&state.pitstate)
.map_err(ArchVmError::SetPit2)?;
self.fd()
.set_clock(&state.clock)
.map_err(ArchVmError::SetClock)?;
let mut clock = state.clock;
clock.flags = if clock_realtime {
// clock_realtime needs to be present in the snapshot
if clock.flags & KVM_CLOCK_REALTIME == 0 {
return Err(ArchVmError::ClockRealtimeNotInState);
}
KVM_CLOCK_REALTIME
} else {
0
};
self.fd().set_clock(&clock).map_err(ArchVmError::SetClock)?;
self.fd()
.set_irqchip(&state.pic_master)
.map_err(ArchVmError::SetIrqChipPicMaster)?;
Expand Down Expand Up @@ -167,9 +181,7 @@ impl ArchVm {
pub fn save_state(&self) -> Result<VmState, ArchVmError> {
let pitstate = self.fd().get_pit2().map_err(ArchVmError::VmGetPit2)?;

let mut clock = self.fd().get_clock().map_err(ArchVmError::VmGetClock)?;
// This bit is not accepted in SET_CLOCK, clear it.
clock.flags &= !KVM_CLOCK_TSC_STABLE;
let clock = self.fd().get_clock().map_err(ArchVmError::VmGetClock)?;

let mut pic_master = kvm_irqchip {
chip_id: KVM_IRQCHIP_PIC_MASTER,
Expand Down Expand Up @@ -248,10 +260,13 @@ impl fmt::Debug for VmState {
#[cfg(test)]
mod tests {
use kvm_bindings::{
KVM_CLOCK_TSC_STABLE, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
KVM_CLOCK_REALTIME, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
KVM_PIT_SPEAKER_DUMMY,
};
use kvm_ioctls::Cap;
use std::time::SystemTime;

use crate::arch::ArchVmError;
use crate::vstate::vm::VmState;
use crate::vstate::vm::tests::{setup_vm, setup_vm_with_memory};

Expand All @@ -270,15 +285,53 @@ mod tests {
vm_state.pitstate.flags | KVM_PIT_SPEAKER_DUMMY,
KVM_PIT_SPEAKER_DUMMY
);
assert_eq!(vm_state.clock.flags & KVM_CLOCK_TSC_STABLE, 0);
assert_eq!(vm_state.pic_master.chip_id, KVM_IRQCHIP_PIC_MASTER);
assert_eq!(vm_state.pic_slave.chip_id, KVM_IRQCHIP_PIC_SLAVE);
assert_eq!(vm_state.ioapic.chip_id, KVM_IRQCHIP_IOAPIC);

let (_, mut vm) = setup_vm_with_memory(0x1000);
vm.setup_irqchip().unwrap();

vm.restore_state(&vm_state).unwrap();
vm.restore_state(&vm_state, false).unwrap();
}

#[cfg(target_arch = "x86_64")]
#[test]
fn test_vm_save_restore_state_kvm_clock_realtime() {
let (kvm, vm) = setup_vm_with_memory(0x1000);
vm.setup_irqchip().unwrap();

let clock_realtime_supported =
kvm.fd.check_extension_int(Cap::AdjustClock).cast_unsigned() & KVM_CLOCK_REALTIME != 0;

// mock a state without realtime information
let mut vm_state = vm.save_state().unwrap();
vm_state.clock.flags &= !KVM_CLOCK_REALTIME;

let (_, mut vm) = setup_vm_with_memory(0x1000);
vm.setup_irqchip().unwrap();

let res = vm.restore_state(&vm_state, true);
assert!(res == Err(ArchVmError::ClockRealtimeNotInState));

// mock a state with realtime information
vm_state.clock.flags |= KVM_CLOCK_REALTIME;
vm_state.clock.realtime = SystemTime::now()
.duration_since(SystemTime::UNIX_EPOCH)
.unwrap()
.as_nanos()
.try_into()
.unwrap();

let (_, mut vm) = setup_vm_with_memory(0x1000);
vm.setup_irqchip().unwrap();

let res = vm.restore_state(&vm_state, true);
if clock_realtime_supported {
res.unwrap()
} else {
assert!(matches!(res, Err(ArchVmError::SetClock(err)) if err.errno() == libc::EINVAL))
}
}

#[cfg(target_arch = "x86_64")]
Expand All @@ -296,18 +349,18 @@ mod tests {
// Try to restore an invalid PIC Master chip ID
let orig_master_chip_id = vm_state.pic_master.chip_id;
vm_state.pic_master.chip_id = KVM_NR_IRQCHIPS;
vm.restore_state(&vm_state).unwrap_err();
vm.restore_state(&vm_state, false).unwrap_err();
vm_state.pic_master.chip_id = orig_master_chip_id;

// Try to restore an invalid PIC Slave chip ID
let orig_slave_chip_id = vm_state.pic_slave.chip_id;
vm_state.pic_slave.chip_id = KVM_NR_IRQCHIPS;
vm.restore_state(&vm_state).unwrap_err();
vm.restore_state(&vm_state, false).unwrap_err();
vm_state.pic_slave.chip_id = orig_slave_chip_id;

// Try to restore an invalid IOPIC chip ID
vm_state.ioapic.chip_id = KVM_NR_IRQCHIPS;
vm.restore_state(&vm_state).unwrap_err();
vm.restore_state(&vm_state, false).unwrap_err();
}

#[cfg(target_arch = "x86_64")]
Expand All @@ -321,6 +374,6 @@ mod tests {
let serialized_data = bitcode::serialize(&state).unwrap();
let restored_state: VmState = bitcode::deserialize(&serialized_data).unwrap();

vm.restore_state(&restored_state).unwrap();
vm.restore_state(&restored_state, false).unwrap();
}
}
8 changes: 7 additions & 1 deletion src/vmm/src/builder.rs
Original file line number Diff line number Diff line change
Expand Up @@ -423,6 +423,8 @@ pub enum BuildMicrovmFromSnapshotError {
SeccompFiltersInternal(#[from] crate::seccomp::InstallationError),
/// Failed to restore devices: {0}
RestoreDevices(#[from] DeviceManagerPersistError),
/// clock_realtime is not supported on aarch64.
UnsupportedClockRealtime,
}

/// Builds and starts a microVM based on the provided MicrovmState.
Expand All @@ -438,6 +440,7 @@ pub fn build_microvm_from_snapshot(
uffd: Option<Uffd>,
seccomp_filters: &BpfThreadMap,
vm_resources: &mut VmResources,
clock_realtime: bool,
) -> Result<Arc<Mutex<Vmm>>, BuildMicrovmFromSnapshotError> {
// Build Vmm.
debug!("event_start: build microvm from snapshot");
Expand Down Expand Up @@ -479,14 +482,17 @@ pub fn build_microvm_from_snapshot(

#[cfg(target_arch = "aarch64")]
{
if clock_realtime {
return Err(BuildMicrovmFromSnapshotError::UnsupportedClockRealtime);
}
let mpidrs = construct_kvm_mpidrs(&microvm_state.vcpu_states);
// Restore kvm vm state.
vm.restore_state(&mpidrs, &microvm_state.vm_state)?;
}

// Restore kvm vm state.
#[cfg(target_arch = "x86_64")]
vm.restore_state(&microvm_state.vm_state)?;
vm.restore_state(&microvm_state.vm_state, clock_realtime)?;

// Restore the boot source config paths.
vm_resources.boot_source.config = microvm_state.vm_info.boot_source;
Expand Down
1 change: 1 addition & 0 deletions src/vmm/src/persist.rs
Original file line number Diff line number Diff line change
Expand Up @@ -466,6 +466,7 @@ pub fn restore_from_snapshot(
uffd,
seccomp_filters,
vm_resources,
params.clock_realtime,
)
.map_err(RestoreFromSnapshotError::Build)
}
Expand Down
1 change: 1 addition & 0 deletions src/vmm/src/rpc_interface.rs
Original file line number Diff line number Diff line change
Expand Up @@ -1286,6 +1286,7 @@ mod tests {
resume_vm: false,
network_overrides: vec![],
vsock_override: None,
clock_realtime: false,
},
)));
check_unsupported(runtime_request(VmmAction::SetEntropyDevice(
Expand Down
7 changes: 7 additions & 0 deletions src/vmm/src/vmm_config/snapshot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,10 @@ pub struct LoadSnapshotParams {
pub network_overrides: Vec<NetworkOverride>,
/// When set, the vsock backend UDS path will be overridden
pub vsock_override: Option<VsockOverride>,
/// [x86_64 only] When set to true, passes `KVM_CLOCK_REALTIME` to `KVM_SET_CLOCK` on restore,
/// advancing kvmclock by the wall-clock time elapsed since the snapshot was taken. When false
/// (default), kvmclock resumes from where it was at snapshot time.
pub clock_realtime: bool,
}

/// Stores the configuration for loading a snapshot that is provided by the user.
Expand Down Expand Up @@ -113,6 +117,9 @@ pub struct LoadSnapshotConfig {
/// Whether or not to override the vsock backend UDS path.
#[serde(skip_serializing_if = "Option::is_none")]
pub vsock_override: Option<VsockOverride>,
/// [x86_64 only] When set to true, passes `KVM_CLOCK_REALTIME` to `KVM_SET_CLOCK` on restore.
#[serde(default)]
pub clock_realtime: bool,
}

/// Stores the configuration used for managing snapshot memory.
Expand Down
2 changes: 1 addition & 1 deletion src/vmm/src/vstate/vm.rs
Original file line number Diff line number Diff line change
Expand Up @@ -866,7 +866,7 @@ pub(crate) mod tests {
let serialized_data = bitcode::serialize(&state).unwrap();

let restored_state: VmState = bitcode::deserialize(&serialized_data).unwrap();
vm.restore_state(&restored_state).unwrap();
vm.restore_state(&restored_state, false).unwrap();

let mut resource_allocator = vm.resource_allocator();
let gsi_new = resource_allocator.allocate_gsi_msi(1).unwrap()[0];
Expand Down
2 changes: 2 additions & 0 deletions src/vmm/tests/integration_tests.rs
Original file line number Diff line number Diff line change
Expand Up @@ -296,6 +296,7 @@ fn verify_load_snapshot(snapshot_file: TempFile, memory_file: TempFile) {
resume_vm: true,
network_overrides: vec![],
vsock_override: None,
clock_realtime: false,
}))
.unwrap();

Expand Down Expand Up @@ -381,6 +382,7 @@ fn verify_load_snap_disallowed_after_boot_resources(res: VmmAction, res_name: &s
resume_vm: false,
network_overrides: vec![],
vsock_override: None,
clock_realtime: false,
});
let err = preboot_api_controller.handle_preboot_request(req);
assert!(
Expand Down
Loading
Loading