Skip to content

Commit 823ea57

Browse files
kalyazinclaude
andcommitted
fix(gdb): drain stale debug events before resuming vcpus
When more than one vcpu hits a breakpoint while the VM runs, each sends a debug event and parks itself in the paused emulation state. The gdb event loop reports the first and force-pauses the rest, but their already-queued debug events are never consumed. On the next resume those stale events remain, so a following `wait_for_stop_reason` dequeues one and processes it against a vcpu that has since resumed: it marks a running vcpu as paused, desyncing the pause/resume handshake until the vcpu threads exit and the event channel disconnects — surfacing as a fatal `GdbQueueError` ("Remote connection closed" on the client) under a sustained multi-vcpu breakpoint storm. Drain the debug-event queue at the start of `resume_all_vcpus`. Every vcpu is paused there, so none can emit an event and anything queued is provably stale; dropping it is safe and keeps `vcpu_state` in sync with the vcpus. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Signed-off-by: Nikita Kalyazin <nikita.kalyazin@e2b.dev>
1 parent 1af7f3e commit 823ea57

1 file changed

Lines changed: 9 additions & 0 deletions

File tree

src/vmm/src/gdb/target.rs

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -219,6 +219,15 @@ impl FirecrackerTarget {
219219
/// Resumes execution of all paused Vcpus, update them with current kvm debug info
220220
/// and resumes
221221
fn resume_all_vcpus(&mut self) -> Result<(), GdbTargetError> {
222+
// Every vcpu is paused at this point (all-stop), so it is blocked in its
223+
// emulation loop and cannot emit a debug event. Any event still queued is
224+
// therefore stale: a sibling that also hit the breakpoint before we stopped
225+
// the VM. Drain these now — if left queued, the next `wait_for_stop_reason`
226+
// would process a stale event against a vcpu that has since resumed, marking
227+
// a running vcpu as paused and desyncing the pause/resume handshake until the
228+
// vcpu threads exit (surfacing as a fatal GdbQueueError).
229+
while self.gdb_event.try_recv().is_ok() {}
230+
222231
for idx in 0..self.vcpu_state.len() {
223232
self.update_vcpu_kvm_debug(idx, &self.hw_breakpoints)?;
224233
}

0 commit comments

Comments
 (0)