Skip to content

Commit 0e18583

Browse files
committed
fix(kvm-clock): do not jump monotonic clock on restore
Firecracker has never advanced the clock on restore for any of the supported clocksources. Since Linux 5.16, the KVM_CLOCK_REALTIME has been passed to kvm-clock, causing the monotonic time in the guest to jump when using kvm-clock as clock source. Despite being unexpected and not what Firecracker should do, we recognize this may be a valid usecase so this patch adds a way to configure it, keeping the default to the expected documented behaviour. This patch adds a new API flag to LoadSnapshot, clock_realtime, that advances the clock on restore when set (default is False). Rather than the clock flags being decided at snapshot time, the restore path ignores those flags and decides what to do depending on the clock_realtime flag. This is because the other available flag (KVM_CLOCK_TSC_STABLE) cannot even be passed to `set_clock`, meaning the only valid flag is KVM_CLOCK_REALTIME. The name of the flag was kept generic as we may add this behaviour for the other clock sources in the future, if the need arises. Signed-off-by: Riccardo Mancini <mancio@amazon.com>
1 parent 15f2ad8 commit 0e18583

13 files changed

Lines changed: 181 additions & 21 deletions

File tree

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,11 @@ and this project adheres to
5050
balloon statistics descriptor length to prevent a guest-controlled oversized
5151
descriptor from temporarily stalling the VMM event loop. Only affects microVMs
5252
with `stats_polling_interval_s > 0`.
53+
- [#5809](https://github.com/firecracker-microvm/firecracker/pull/5809): Fixed a
54+
bug on host Linux >= 5.16 for x86_64 guests using the `kvm-clock` clock source
55+
causing the monotonic clock to jump on restore. Users that want to keep the
56+
previous behaviour can opt back in using the new `clock_realtime` flag in
57+
`LoadSnapshot` API.
5358

5459
## [1.15.0]
5560

docs/snapshotting/snapshot-support.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -493,6 +493,10 @@ resumed with the guest OS wall-clock continuing from the moment of the snapshot
493493
creation. For this reason, the wall-clock should be updated to the current time,
494494
on the guest-side. More details on how you could do this can be found at a
495495
[related FAQ](../../FAQ.md#my-guest-wall-clock-is-drifting-how-can-i-fix-it).
496+
When using `kvm-clock` as clock source on `x86_64`, it's possible to optionally
497+
set the `clock_realtime: true` in the `LoadSnapshot` request to advance the
498+
clock source on the guest at restore time. Note that this may cause issues
499+
within the guest as the clock source will appear to suddenly jump.
496500
497501
## Provisioning host disk space for snapshots
498502

src/firecracker/src/api_server/request/snapshot.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@ fn parse_put_snapshot_load(body: &Body) -> Result<ParsedRequest, RequestError> {
111111
resume_vm: snapshot_config.resume_vm,
112112
network_overrides: snapshot_config.network_overrides,
113113
vsock_override: snapshot_config.vsock_override,
114+
clock_realtime: snapshot_config.clock_realtime,
114115
};
115116

116117
// Construct the `ParsedRequest` object.
@@ -189,6 +190,7 @@ mod tests {
189190
resume_vm: false,
190191
network_overrides: vec![],
191192
vsock_override: None,
193+
clock_realtime: false,
192194
};
193195
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
194196
assert!(
@@ -220,6 +222,7 @@ mod tests {
220222
resume_vm: false,
221223
network_overrides: vec![],
222224
vsock_override: None,
225+
clock_realtime: false,
223226
};
224227
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
225228
assert!(
@@ -251,6 +254,7 @@ mod tests {
251254
resume_vm: true,
252255
network_overrides: vec![],
253256
vsock_override: None,
257+
clock_realtime: false,
254258
};
255259
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
256260
assert!(
@@ -291,6 +295,7 @@ mod tests {
291295
host_dev_name: String::from("vmtap2"),
292296
}],
293297
vsock_override: None,
298+
clock_realtime: false,
294299
};
295300
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
296301
assert!(
@@ -319,6 +324,7 @@ mod tests {
319324
resume_vm: true,
320325
network_overrides: vec![],
321326
vsock_override: None,
327+
clock_realtime: false,
322328
};
323329
let parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
324330
assert_eq!(

src/firecracker/swagger/firecracker.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1602,7 +1602,7 @@ definitions:
16021602
type: string
16031603
description:
16041604
The new path for the backing Unix Domain Socket.
1605-
1605+
16061606
SnapshotLoadParams:
16071607
type: object
16081608
description:
@@ -1650,6 +1650,13 @@ definitions:
16501650
for restoring a snapshot with a different socket path than the one used
16511651
when the snapshot was created. For example, when the original socket path
16521652
is no longer available or when deploying to a different environment.
1653+
clock_realtime:
1654+
type: boolean
1655+
description:
1656+
"[x86_64 only] When set to true, passes KVM_CLOCK_REALTIME to
1657+
KVM_SET_CLOCK on restore, advancing kvmclock by the wall-clock time
1658+
elapsed since the snapshot was taken. When false (default), kvmclock resumes
1659+
from where it was at snapshot time."
16531660

16541661

16551662
TokenBucket:

src/vmm/src/arch/x86_64/vm.rs

Lines changed: 32 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ use std::fmt;
55
use std::sync::{Arc, Mutex};
66

77
use kvm_bindings::{
8-
KVM_CLOCK_TSC_STABLE, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
8+
KVM_CLOCK_REALTIME, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
99
KVM_PIT_SPEAKER_DUMMY, MsrList, kvm_clock_data, kvm_irqchip, kvm_pit_config, kvm_pit_state2,
1010
};
1111
use kvm_ioctls::Cap;
@@ -127,13 +127,21 @@ impl ArchVm {
127127
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
128128
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
129129
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
130-
pub fn restore_state(&mut self, state: &VmState) -> Result<(), ArchVmError> {
130+
pub fn restore_state(
131+
&mut self,
132+
state: &VmState,
133+
clock_realtime: bool,
134+
) -> Result<(), ArchVmError> {
131135
self.fd()
132136
.set_pit2(&state.pitstate)
133137
.map_err(ArchVmError::SetPit2)?;
134-
self.fd()
135-
.set_clock(&state.clock)
136-
.map_err(ArchVmError::SetClock)?;
138+
let mut clock = state.clock;
139+
clock.flags = if clock_realtime {
140+
KVM_CLOCK_REALTIME
141+
} else {
142+
0
143+
};
144+
self.fd().set_clock(&clock).map_err(ArchVmError::SetClock)?;
137145
self.fd()
138146
.set_irqchip(&state.pic_master)
139147
.map_err(ArchVmError::SetIrqChipPicMaster)?;
@@ -167,9 +175,7 @@ impl ArchVm {
167175
pub fn save_state(&self) -> Result<VmState, ArchVmError> {
168176
let pitstate = self.fd().get_pit2().map_err(ArchVmError::VmGetPit2)?;
169177

170-
let mut clock = self.fd().get_clock().map_err(ArchVmError::VmGetClock)?;
171-
// This bit is not accepted in SET_CLOCK, clear it.
172-
clock.flags &= !KVM_CLOCK_TSC_STABLE;
178+
let clock = self.fd().get_clock().map_err(ArchVmError::VmGetClock)?;
173179

174180
let mut pic_master = kvm_irqchip {
175181
chip_id: KVM_IRQCHIP_PIC_MASTER,
@@ -248,8 +254,7 @@ impl fmt::Debug for VmState {
248254
#[cfg(test)]
249255
mod tests {
250256
use kvm_bindings::{
251-
KVM_CLOCK_TSC_STABLE, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
252-
KVM_PIT_SPEAKER_DUMMY,
257+
KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE, KVM_PIT_SPEAKER_DUMMY,
253258
};
254259

255260
use crate::vstate::vm::VmState;
@@ -270,15 +275,26 @@ mod tests {
270275
vm_state.pitstate.flags | KVM_PIT_SPEAKER_DUMMY,
271276
KVM_PIT_SPEAKER_DUMMY
272277
);
273-
assert_eq!(vm_state.clock.flags & KVM_CLOCK_TSC_STABLE, 0);
274278
assert_eq!(vm_state.pic_master.chip_id, KVM_IRQCHIP_PIC_MASTER);
275279
assert_eq!(vm_state.pic_slave.chip_id, KVM_IRQCHIP_PIC_SLAVE);
276280
assert_eq!(vm_state.ioapic.chip_id, KVM_IRQCHIP_IOAPIC);
277281

278282
let (_, mut vm) = setup_vm_with_memory(0x1000);
279283
vm.setup_irqchip().unwrap();
280284

281-
vm.restore_state(&vm_state).unwrap();
285+
vm.restore_state(&vm_state, false).unwrap();
286+
}
287+
288+
#[cfg(target_arch = "x86_64")]
289+
#[test]
290+
fn test_vm_save_restore_state_kvm_clock_realtime() {
291+
let (_, vm) = setup_vm_with_memory(0x1000);
292+
vm.setup_irqchip().unwrap();
293+
let vm_state = vm.save_state().unwrap();
294+
295+
let (_, mut vm) = setup_vm_with_memory(0x1000);
296+
vm.setup_irqchip().unwrap();
297+
vm.restore_state(&vm_state, true).unwrap();
282298
}
283299

284300
#[cfg(target_arch = "x86_64")]
@@ -296,18 +312,18 @@ mod tests {
296312
// Try to restore an invalid PIC Master chip ID
297313
let orig_master_chip_id = vm_state.pic_master.chip_id;
298314
vm_state.pic_master.chip_id = KVM_NR_IRQCHIPS;
299-
vm.restore_state(&vm_state).unwrap_err();
315+
vm.restore_state(&vm_state, false).unwrap_err();
300316
vm_state.pic_master.chip_id = orig_master_chip_id;
301317

302318
// Try to restore an invalid PIC Slave chip ID
303319
let orig_slave_chip_id = vm_state.pic_slave.chip_id;
304320
vm_state.pic_slave.chip_id = KVM_NR_IRQCHIPS;
305-
vm.restore_state(&vm_state).unwrap_err();
321+
vm.restore_state(&vm_state, false).unwrap_err();
306322
vm_state.pic_slave.chip_id = orig_slave_chip_id;
307323

308324
// Try to restore an invalid IOPIC chip ID
309325
vm_state.ioapic.chip_id = KVM_NR_IRQCHIPS;
310-
vm.restore_state(&vm_state).unwrap_err();
326+
vm.restore_state(&vm_state, false).unwrap_err();
311327
}
312328

313329
#[cfg(target_arch = "x86_64")]
@@ -321,6 +337,6 @@ mod tests {
321337
let serialized_data = bitcode::serialize(&state).unwrap();
322338
let restored_state: VmState = bitcode::deserialize(&serialized_data).unwrap();
323339

324-
vm.restore_state(&restored_state).unwrap();
340+
vm.restore_state(&restored_state, false).unwrap();
325341
}
326342
}

src/vmm/src/builder.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,7 @@ pub fn build_microvm_from_snapshot(
438438
uffd: Option<Uffd>,
439439
seccomp_filters: &BpfThreadMap,
440440
vm_resources: &mut VmResources,
441+
clock_realtime: bool,
441442
) -> Result<Arc<Mutex<Vmm>>, BuildMicrovmFromSnapshotError> {
442443
// Build Vmm.
443444
debug!("event_start: build microvm from snapshot");
@@ -480,13 +481,15 @@ pub fn build_microvm_from_snapshot(
480481
#[cfg(target_arch = "aarch64")]
481482
{
482483
let mpidrs = construct_kvm_mpidrs(&microvm_state.vcpu_states);
484+
// Ignore unused_variables check for this x86_64-specific flag
485+
let _ = clock_realtime;
483486
// Restore kvm vm state.
484487
vm.restore_state(&mpidrs, &microvm_state.vm_state)?;
485488
}
486489

487490
// Restore kvm vm state.
488491
#[cfg(target_arch = "x86_64")]
489-
vm.restore_state(&microvm_state.vm_state)?;
492+
vm.restore_state(&microvm_state.vm_state, clock_realtime)?;
490493

491494
// Restore the boot source config paths.
492495
vm_resources.boot_source.config = microvm_state.vm_info.boot_source;

src/vmm/src/persist.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,7 @@ pub fn restore_from_snapshot(
466466
uffd,
467467
seccomp_filters,
468468
vm_resources,
469+
params.clock_realtime,
469470
)
470471
.map_err(RestoreFromSnapshotError::Build)
471472
}

src/vmm/src/rpc_interface.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1286,6 +1286,7 @@ mod tests {
12861286
resume_vm: false,
12871287
network_overrides: vec![],
12881288
vsock_override: None,
1289+
clock_realtime: false,
12891290
},
12901291
)));
12911292
check_unsupported(runtime_request(VmmAction::SetEntropyDevice(

src/vmm/src/vmm_config/snapshot.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,11 @@ pub struct LoadSnapshotParams {
8181
pub network_overrides: Vec<NetworkOverride>,
8282
/// When set, the vsock backend UDS path will be overridden
8383
pub vsock_override: Option<VsockOverride>,
84+
/// [x86_64 only] When set to true, passes `KVM_CLOCK_REALTIME` to `KVM_SET_CLOCK` on restore,
85+
/// advancing kvmclock by the wall-clock time elapsed since the snapshot was taken. This is
86+
/// intended for live-migration scenarios. When false (default), kvmclock resumes from where it
87+
/// was at snapshot time.
88+
pub clock_realtime: bool,
8489
}
8590

8691
/// Stores the configuration for loading a snapshot that is provided by the user.
@@ -113,6 +118,9 @@ pub struct LoadSnapshotConfig {
113118
/// Whether or not to override the vsock backend UDS path.
114119
#[serde(skip_serializing_if = "Option::is_none")]
115120
pub vsock_override: Option<VsockOverride>,
121+
/// [x86_64 only] When set to true, passes `KVM_CLOCK_REALTIME` to `KVM_SET_CLOCK` on restore.
122+
#[serde(default)]
123+
pub clock_realtime: bool,
116124
}
117125

118126
/// Stores the configuration used for managing snapshot memory.

src/vmm/src/vstate/vm.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -879,7 +879,7 @@ pub(crate) mod tests {
879879
let serialized_data = bitcode::serialize(&state).unwrap();
880880

881881
let restored_state: VmState = bitcode::deserialize(&serialized_data).unwrap();
882-
vm.restore_state(&restored_state).unwrap();
882+
vm.restore_state(&restored_state, false).unwrap();
883883

884884
let mut resource_allocator = vm.resource_allocator();
885885
let gsi_new = resource_allocator.allocate_gsi_msi(1).unwrap()[0];

0 commit comments

Comments
 (0)