Skip to content

Commit 25e487f

Browse files
committed
fix(kvm-clock): do not jump monotonic clock on restore
Firecracker has never advanced the clock on restore for any of the supported clocksources. Since Linux 5.16, the KVM_CLOCK_REALTIME has been passed to kvm-clock, causing the monotonic time in the guest to jump when using kvm-clock as clock source. Despite being unexpected and not what Firecracker should do, we recognize this may be a valid usecase so this patch adds a way to configure it, keeping the default to the expected documented behaviour. This patch adds a new API flag to LoadSnapshot, clock_realtime, that advances the clock on restore when set (default is False). Rather than the clock flags being decided at snapshot time, the restore path ignores those flags and decides what to do depending on the clock_realtime flag. This is because the other available flag (KVM_CLOCK_TSC_STABLE) cannot even be passed to `set_clock`, meaning the only valid flag is KVM_CLOCK_REALTIME. The name of the flag was kept generic as we may add this behaviour for the other clock sources in the future, if the need arises. Signed-off-by: Riccardo Mancini <mancio@amazon.com>
1 parent 15f2ad8 commit 25e487f

File tree

13 files changed

+225
-21
lines changed

13 files changed

+225
-21
lines changed

CHANGELOG.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,11 @@ and this project adheres to
5050
balloon statistics descriptor length to prevent a guest-controlled oversized
5151
descriptor from temporarily stalling the VMM event loop. Only affects microVMs
5252
with `stats_polling_interval_s > 0`.
53+
- [#5809](https://github.com/firecracker-microvm/firecracker/pull/5809): Fixed a
54+
bug on host Linux >= 5.16 for x86_64 guests using the `kvm-clock` clock source
55+
causing the monotonic clock to jump on restore. Users that want to keep the
56+
previous behaviour can opt back in using the new `clock_realtime` flag in
57+
`LoadSnapshot` API.
5358

5459
## [1.15.0]
5560

docs/snapshotting/snapshot-support.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -493,6 +493,10 @@ resumed with the guest OS wall-clock continuing from the moment of the snapshot
493493
creation. For this reason, the wall-clock should be updated to the current time,
494494
on the guest-side. More details on how you could do this can be found at a
495495
[related FAQ](../../FAQ.md#my-guest-wall-clock-is-drifting-how-can-i-fix-it).
496+
When using `kvm-clock` as clock source on `x86_64`, it's possible to optionally
497+
set the `clock_realtime: true` in the `LoadSnapshot` request to advance the
498+
clock source on the guest at restore time. Note that this may cause issues
499+
within the guest as the clock source will appear to suddenly jump.
496500
497501
## Provisioning host disk space for snapshots
498502

src/firecracker/src/api_server/request/snapshot.rs

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,7 @@ fn parse_put_snapshot_load(body: &Body) -> Result<ParsedRequest, RequestError> {
111111
resume_vm: snapshot_config.resume_vm,
112112
network_overrides: snapshot_config.network_overrides,
113113
vsock_override: snapshot_config.vsock_override,
114+
clock_realtime: snapshot_config.clock_realtime,
114115
};
115116

116117
// Construct the `ParsedRequest` object.
@@ -189,6 +190,7 @@ mod tests {
189190
resume_vm: false,
190191
network_overrides: vec![],
191192
vsock_override: None,
193+
clock_realtime: false,
192194
};
193195
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
194196
assert!(
@@ -220,6 +222,7 @@ mod tests {
220222
resume_vm: false,
221223
network_overrides: vec![],
222224
vsock_override: None,
225+
clock_realtime: false,
223226
};
224227
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
225228
assert!(
@@ -251,6 +254,7 @@ mod tests {
251254
resume_vm: true,
252255
network_overrides: vec![],
253256
vsock_override: None,
257+
clock_realtime: false,
254258
};
255259
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
256260
assert!(
@@ -291,6 +295,7 @@ mod tests {
291295
host_dev_name: String::from("vmtap2"),
292296
}],
293297
vsock_override: None,
298+
clock_realtime: false,
294299
};
295300
let mut parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
296301
assert!(
@@ -319,6 +324,7 @@ mod tests {
319324
resume_vm: true,
320325
network_overrides: vec![],
321326
vsock_override: None,
327+
clock_realtime: false,
322328
};
323329
let parsed_request = parse_put_snapshot(&Body::new(body), Some("load")).unwrap();
324330
assert_eq!(

src/firecracker/swagger/firecracker.yaml

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1602,7 +1602,7 @@ definitions:
16021602
type: string
16031603
description:
16041604
The new path for the backing Unix Domain Socket.
1605-
1605+
16061606
SnapshotLoadParams:
16071607
type: object
16081608
description:
@@ -1650,6 +1650,13 @@ definitions:
16501650
for restoring a snapshot with a different socket path than the one used
16511651
when the snapshot was created. For example, when the original socket path
16521652
is no longer available or when deploying to a different environment.
1653+
clock_realtime:
1654+
type: boolean
1655+
description:
1656+
"[x86_64 only] When set to true, passes KVM_CLOCK_REALTIME to
1657+
KVM_SET_CLOCK on restore, advancing kvmclock by the wall-clock time
1658+
elapsed since the snapshot was taken. When false (default), kvmclock resumes
1659+
from where it was at snapshot time."
16531660

16541661

16551662
TokenBucket:

src/vmm/src/arch/x86_64/vm.rs

Lines changed: 72 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ use std::fmt;
55
use std::sync::{Arc, Mutex};
66

77
use kvm_bindings::{
8-
KVM_CLOCK_TSC_STABLE, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
8+
KVM_CLOCK_REALTIME, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
99
KVM_PIT_SPEAKER_DUMMY, MsrList, kvm_clock_data, kvm_irqchip, kvm_pit_config, kvm_pit_state2,
1010
};
1111
use kvm_ioctls::Cap;
@@ -29,7 +29,13 @@ pub enum ArchVmError {
2929
/// Set PIT2 error: {0}
3030
SetPit2(kvm_ioctls::Error),
3131
/// Set clock error: {0}
32+
GetClock(kvm_ioctls::Error),
33+
/// Set clock error: {0}
3234
SetClock(kvm_ioctls::Error),
35+
/// KVM_CLOCK_REALTIME is unsupported by the current kernel
36+
UnsupportedClockRealtime,
37+
/// clock_realtime requested but not present in the snapshot state
38+
ClockRealtimeNotInState,
3339
/// Set IrqChipPicMaster error: {0}
3440
SetIrqChipPicMaster(kvm_ioctls::Error),
3541
/// Set IrqChipPicSlave error: {0}
@@ -117,6 +123,14 @@ impl ArchVm {
117123
Ok(())
118124
}
119125

126+
fn check_kvm_clock_realtime_supported(&self) -> Result<(), ArchVmError> {
127+
let clock = self.fd().get_clock().map_err(ArchVmError::GetClock)?;
128+
if clock.flags & KVM_CLOCK_REALTIME == 0 {
129+
return Err(ArchVmError::UnsupportedClockRealtime);
130+
}
131+
Ok(())
132+
}
133+
120134
/// Restores the KVM VM state.
121135
///
122136
/// # Errors
@@ -127,13 +141,27 @@ impl ArchVm {
127141
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
128142
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
129143
/// - [`kvm_ioctls::VmFd::set_irqchip`] errors.
130-
pub fn restore_state(&mut self, state: &VmState) -> Result<(), ArchVmError> {
144+
pub fn restore_state(
145+
&mut self,
146+
state: &VmState,
147+
clock_realtime: bool,
148+
) -> Result<(), ArchVmError> {
131149
self.fd()
132150
.set_pit2(&state.pitstate)
133151
.map_err(ArchVmError::SetPit2)?;
134-
self.fd()
135-
.set_clock(&state.clock)
136-
.map_err(ArchVmError::SetClock)?;
152+
let mut clock = state.clock;
153+
clock.flags = if clock_realtime {
154+
// clock_realtime requires a host Linux >=5.16
155+
self.check_kvm_clock_realtime_supported()?;
156+
// clock_realtime needs to be present in the snapshot
157+
if clock.flags & KVM_CLOCK_REALTIME == 0 {
158+
return Err(ArchVmError::ClockRealtimeNotInState);
159+
}
160+
KVM_CLOCK_REALTIME
161+
} else {
162+
0
163+
};
164+
self.fd().set_clock(&clock).map_err(ArchVmError::SetClock)?;
137165
self.fd()
138166
.set_irqchip(&state.pic_master)
139167
.map_err(ArchVmError::SetIrqChipPicMaster)?;
@@ -167,9 +195,7 @@ impl ArchVm {
167195
pub fn save_state(&self) -> Result<VmState, ArchVmError> {
168196
let pitstate = self.fd().get_pit2().map_err(ArchVmError::VmGetPit2)?;
169197

170-
let mut clock = self.fd().get_clock().map_err(ArchVmError::VmGetClock)?;
171-
// This bit is not accepted in SET_CLOCK, clear it.
172-
clock.flags &= !KVM_CLOCK_TSC_STABLE;
198+
let clock = self.fd().get_clock().map_err(ArchVmError::VmGetClock)?;
173199

174200
let mut pic_master = kvm_irqchip {
175201
chip_id: KVM_IRQCHIP_PIC_MASTER,
@@ -248,8 +274,7 @@ impl fmt::Debug for VmState {
248274
#[cfg(test)]
249275
mod tests {
250276
use kvm_bindings::{
251-
KVM_CLOCK_TSC_STABLE, KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE,
252-
KVM_PIT_SPEAKER_DUMMY,
277+
KVM_IRQCHIP_IOAPIC, KVM_IRQCHIP_PIC_MASTER, KVM_IRQCHIP_PIC_SLAVE, KVM_PIT_SPEAKER_DUMMY,
253278
};
254279

255280
use crate::vstate::vm::VmState;
@@ -270,15 +295,46 @@ mod tests {
270295
vm_state.pitstate.flags | KVM_PIT_SPEAKER_DUMMY,
271296
KVM_PIT_SPEAKER_DUMMY
272297
);
273-
assert_eq!(vm_state.clock.flags & KVM_CLOCK_TSC_STABLE, 0);
274298
assert_eq!(vm_state.pic_master.chip_id, KVM_IRQCHIP_PIC_MASTER);
275299
assert_eq!(vm_state.pic_slave.chip_id, KVM_IRQCHIP_PIC_SLAVE);
276300
assert_eq!(vm_state.ioapic.chip_id, KVM_IRQCHIP_IOAPIC);
277301

278302
let (_, mut vm) = setup_vm_with_memory(0x1000);
279303
vm.setup_irqchip().unwrap();
280304

281-
vm.restore_state(&vm_state).unwrap();
305+
vm.restore_state(&vm_state, false).unwrap();
306+
}
307+
308+
#[cfg(target_arch = "x86_64")]
309+
#[test]
310+
fn test_vm_save_restore_state_kvm_clock_realtime() {
311+
let (_, vm) = setup_vm_with_memory(0x1000);
312+
vm.setup_irqchip().unwrap();
313+
let vm_state = vm.save_state().unwrap();
314+
315+
let (_, mut vm) = setup_vm_with_memory(0x1000);
316+
vm.setup_irqchip().unwrap();
317+
318+
let res = vm.restore_state(&vm_state, true);
319+
if vm.check_kvm_clock_realtime_supported().is_err() {
320+
assert!(res == Err(crate::arch::ArchVmError::UnsupportedClockRealtime))
321+
} else {
322+
res.unwrap()
323+
}
324+
325+
// mock a state without realtime information
326+
let mut vm_state = vm_state;
327+
vm_state.clock.flags &= !kvm_bindings::KVM_CLOCK_REALTIME;
328+
329+
let (_, mut vm) = setup_vm_with_memory(0x1000);
330+
vm.setup_irqchip().unwrap();
331+
332+
let res = vm.restore_state(&vm_state, true);
333+
if vm.check_kvm_clock_realtime_supported().is_err() {
334+
assert!(res == Err(crate::arch::ArchVmError::UnsupportedClockRealtime))
335+
} else {
336+
assert!(res == Err(crate::arch::ArchVmError::ClockRealtimeNotInState))
337+
}
282338
}
283339

284340
#[cfg(target_arch = "x86_64")]
@@ -296,18 +352,18 @@ mod tests {
296352
// Try to restore an invalid PIC Master chip ID
297353
let orig_master_chip_id = vm_state.pic_master.chip_id;
298354
vm_state.pic_master.chip_id = KVM_NR_IRQCHIPS;
299-
vm.restore_state(&vm_state).unwrap_err();
355+
vm.restore_state(&vm_state, false).unwrap_err();
300356
vm_state.pic_master.chip_id = orig_master_chip_id;
301357

302358
// Try to restore an invalid PIC Slave chip ID
303359
let orig_slave_chip_id = vm_state.pic_slave.chip_id;
304360
vm_state.pic_slave.chip_id = KVM_NR_IRQCHIPS;
305-
vm.restore_state(&vm_state).unwrap_err();
361+
vm.restore_state(&vm_state, false).unwrap_err();
306362
vm_state.pic_slave.chip_id = orig_slave_chip_id;
307363

308364
// Try to restore an invalid IOPIC chip ID
309365
vm_state.ioapic.chip_id = KVM_NR_IRQCHIPS;
310-
vm.restore_state(&vm_state).unwrap_err();
366+
vm.restore_state(&vm_state, false).unwrap_err();
311367
}
312368

313369
#[cfg(target_arch = "x86_64")]
@@ -321,6 +377,6 @@ mod tests {
321377
let serialized_data = bitcode::serialize(&state).unwrap();
322378
let restored_state: VmState = bitcode::deserialize(&serialized_data).unwrap();
323379

324-
vm.restore_state(&restored_state).unwrap();
380+
vm.restore_state(&restored_state, false).unwrap();
325381
}
326382
}

src/vmm/src/builder.rs

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -438,6 +438,7 @@ pub fn build_microvm_from_snapshot(
438438
uffd: Option<Uffd>,
439439
seccomp_filters: &BpfThreadMap,
440440
vm_resources: &mut VmResources,
441+
clock_realtime: bool,
441442
) -> Result<Arc<Mutex<Vmm>>, BuildMicrovmFromSnapshotError> {
442443
// Build Vmm.
443444
debug!("event_start: build microvm from snapshot");
@@ -480,13 +481,15 @@ pub fn build_microvm_from_snapshot(
480481
#[cfg(target_arch = "aarch64")]
481482
{
482483
let mpidrs = construct_kvm_mpidrs(&microvm_state.vcpu_states);
484+
// Ignore unused_variables check for this x86_64-specific flag
485+
let _ = clock_realtime;
483486
// Restore kvm vm state.
484487
vm.restore_state(&mpidrs, &microvm_state.vm_state)?;
485488
}
486489

487490
// Restore kvm vm state.
488491
#[cfg(target_arch = "x86_64")]
489-
vm.restore_state(&microvm_state.vm_state)?;
492+
vm.restore_state(&microvm_state.vm_state, clock_realtime)?;
490493

491494
// Restore the boot source config paths.
492495
vm_resources.boot_source.config = microvm_state.vm_info.boot_source;

src/vmm/src/persist.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,6 +466,7 @@ pub fn restore_from_snapshot(
466466
uffd,
467467
seccomp_filters,
468468
vm_resources,
469+
params.clock_realtime,
469470
)
470471
.map_err(RestoreFromSnapshotError::Build)
471472
}

src/vmm/src/rpc_interface.rs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1286,6 +1286,7 @@ mod tests {
12861286
resume_vm: false,
12871287
network_overrides: vec![],
12881288
vsock_override: None,
1289+
clock_realtime: false,
12891290
},
12901291
)));
12911292
check_unsupported(runtime_request(VmmAction::SetEntropyDevice(

src/vmm/src/vmm_config/snapshot.rs

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,11 @@ pub struct LoadSnapshotParams {
8181
pub network_overrides: Vec<NetworkOverride>,
8282
/// When set, the vsock backend UDS path will be overridden
8383
pub vsock_override: Option<VsockOverride>,
84+
/// [x86_64 only] When set to true, passes `KVM_CLOCK_REALTIME` to `KVM_SET_CLOCK` on restore,
85+
/// advancing kvmclock by the wall-clock time elapsed since the snapshot was taken. This is
86+
/// intended for live-migration scenarios. When false (default), kvmclock resumes from where it
87+
/// was at snapshot time.
88+
pub clock_realtime: bool,
8489
}
8590

8691
/// Stores the configuration for loading a snapshot that is provided by the user.
@@ -113,6 +118,9 @@ pub struct LoadSnapshotConfig {
113118
/// Whether or not to override the vsock backend UDS path.
114119
#[serde(skip_serializing_if = "Option::is_none")]
115120
pub vsock_override: Option<VsockOverride>,
121+
/// [x86_64 only] When set to true, passes `KVM_CLOCK_REALTIME` to `KVM_SET_CLOCK` on restore.
122+
#[serde(default)]
123+
pub clock_realtime: bool,
116124
}
117125

118126
/// Stores the configuration used for managing snapshot memory.

src/vmm/src/vstate/vm.rs

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -879,7 +879,7 @@ pub(crate) mod tests {
879879
let serialized_data = bitcode::serialize(&state).unwrap();
880880

881881
let restored_state: VmState = bitcode::deserialize(&serialized_data).unwrap();
882-
vm.restore_state(&restored_state).unwrap();
882+
vm.restore_state(&restored_state, false).unwrap();
883883

884884
let mut resource_allocator = vm.resource_allocator();
885885
let gsi_new = resource_allocator.allocate_gsi_msi(1).unwrap()[0];

0 commit comments

Comments
 (0)