Skip to content

Commit 83e99fa

Browse files
committed
fix(tests): avoid soft lockup in test_serial_active_tx_snapshot
The test was hanging because `cat /dev/zero > /dev/ttyS0` on a multi-vCPU guest triggers a soft lockup in the 8250 IRQ handler. The root cause is that FC's serial emulation re-asserts IIR (THRE pending) on every THR write, causing the guest's serial8250_interrupt() loop to run up to PASS_LIMIT=512 iterations per interrupt with IRQs disabled (4-12ms). When writer and IRQ handler are on different CPUs, the writer continuously re-asserts IIR between the handler's reads, so the loop never exits naturally. Fix by: - Using SSH as the control channel instead of serial (which is unusable during the flood) - Using 4 vCPUs with explicit IRQ affinity pinning: all IRQs to CPU0 (keeping SSH/virtio-net responsive), serial IRQ to CPU1 (isolating the lockup), writer pinned to CPU3 (cross-CPU to exercise the worst case) - After snapshot/restore, killing the writer via SSH then verifying serial console functionality This preserves the original test intent (snapshot with active serial TX on a multi-vCPU guest) while working around the known device emulation bug. Signed-off-by: Jack Thomson <jackabt@amazon.com>
1 parent e7e0efe commit 83e99fa

1 file changed

Lines changed: 29 additions & 16 deletions

File tree

tests/integration_tests/functional/test_serial_io.py

Lines changed: 29 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -66,25 +66,33 @@ def test_serial_active_tx_snapshot(uvm_plain, microvm_factory):
6666
microvm.help.enable_console()
6767
microvm.spawn(serial_out_path=None)
6868
microvm.basic_config(
69-
vcpu_count=2,
69+
vcpu_count=4,
7070
mem_size_mib=256,
7171
)
72+
microvm.add_net_iface()
7273
serial = Serial(microvm)
7374
serial.open()
7475
microvm.start()
7576

76-
# looking for the # prompt at the end
77-
serial.rx(microvm.distro.shell_prompt)
78-
79-
# Start an unbounded serial transmission from inside the guest such that
80-
# there will be an active transmission at the point of pausing the VM to
81-
# take the snapshot. This will saturate the TX buffer of the UART and it
82-
# might make the guest driver enable TX interrupts.
83-
serial.tx("cat /dev/zero")
84-
# Give the guest time to start the transmission
85-
time.sleep(1)
77+
# Pin all IRQs to CPU0 first, then move serial to CPU1. This ensures the
78+
# serial flood lockup (which starves the IRQ-handling CPU) is isolated to
79+
# CPU1, while SSH (virtio-net) stays on CPU0 and remains reachable.
80+
# Pin the writer (cat) to CPU3 so it's on a different CPU than the serial
81+
# IRQ handler — this is the cross-CPU configuration that triggers the
82+
# lockup (writer re-asserts IIR from CPU3 while handler loops on CPU1).
83+
microvm.ssh.check_output(
84+
"for irq in $(ls /proc/irq/ | grep -E '^[0-9]+$'); do"
85+
" echo 1 > /proc/irq/$irq/smp_affinity 2>/dev/null || true;"
86+
" done;"
87+
" SER_IRQ=$(awk '/ttyS0/{print $1}' /proc/interrupts | tr -d :);"
88+
" echo 2 > /proc/irq/$SER_IRQ/smp_affinity;"
89+
" nohup taskset -c 3 cat /dev/zero > /dev/ttyS0 2>/dev/null &"
90+
)
91+
# Give the guest time to saturate the TX buffer
92+
time.sleep(2)
8693

87-
# Create snapshot.
94+
# Create snapshot — FC pause works even during the serial flood because
95+
# vCPU threads exit KVM_RUN between IO exits.
8896
snapshot = microvm.snapshot_full()
8997
# Kill base microVM.
9098
microvm.kill()
@@ -94,12 +102,17 @@ def test_serial_active_tx_snapshot(uvm_plain, microvm_factory):
94102
vm.help.enable_console()
95103
vm.spawn(serial_out_path=None)
96104
vm.restore_from_snapshot(snapshot, resume=True)
105+
106+
# The restored VM resumes the cat flood. Kill it via SSH — the serial IRQ
107+
# affinity persists from the snapshot so only CPU1 is affected and SSH on
108+
# CPU0 remains reachable.
109+
vm.ssh.check_output("pkill -9 cat || true")
110+
time.sleep(0.5)
111+
112+
# Verify the serial console is functional after stopping the flood
97113
serial = Serial(vm)
98114
serial.open()
99-
100-
# Send Ctrl-C to the guest to stop the ongoing transmission and regain the shell
101-
serial.tx("\x03", end="")
102-
# looking for the # prompt at the end
115+
serial.tx("")
103116
serial.rx(vm.distro.shell_prompt)
104117
serial.tx("pwd")
105118
res = serial.rx("#")

0 commit comments

Comments
 (0)