Skip to content

virtio-serial driver bugfixes#817

Open
Martinits wants to merge 2 commits into
intel:mainfrom
Martinits:main
Open

virtio-serial driver bugfixes#817
Martinits wants to merge 2 commits into
intel:mainfrom
Martinits:main

Conversation

@Martinits
Copy link
Copy Markdown

Fix 2 migtd virtio-serial driver bugs:

  • Missing control msgs since qemu may put multiple control msg but the driver pops one each time.
  • A late PORT_OPEN event may cause a PortNotAvailable when port.open(), try to drain control queue when PortNotAvailable and retry open.

recv_control() previously popped only one used entry from the control RX
queue per call. When QEMU delivers multiple control events (e.g. DeviceAdd,
PortOpen), the remaining events could be left pending and never handled,
leading to missing RX prefill / vq empty failures.

Change recv_control() to drain all pending control messages after the
initial wait, ensuring the full control sequence is processed.

Signed-off-by: Wenbo Li <liwenbo.liwenbo@bytedance.com>
@Martinits Martinits requested review from jyao1 and sgrams as code owners April 28, 2026 07:24
@jyao1
Copy link
Copy Markdown
Contributor

jyao1 commented May 5, 2026

could you please fix CI failure?

VirtioSerialPort::open() may return PortNotAvailable because the PORT_OPEN
control event arrives after the initial control setup. On each
PortNotAvailable, non-blockingly drain pending control messages for at
most 100 times.

Signed-off-by: Wenbo Li <liwenbo.liwenbo@bytedance.com>
@Martinits
Copy link
Copy Markdown
Author

@jyao1 CI failures fixed.

@sgrams
Copy link
Copy Markdown
Contributor

sgrams commented Jun 8, 2026

Two small things to address before we can merge this:

  1. recv_control_nonblock is missing the dma_allocation.get -> safe_len = min(vq_buf.len, record.dma_size) clamp that recv_control performs. Both paths consume a host-controlled vq_buf.len; the clamp is a deliberate defense against an overlong len. Worth mirroring so both helpers are equally hardened.
  2. Retry-exhaustion path silently returns Ok(port) - if all 100 retries fail with PortNotAvailable, the for loop falls through and setup_transport returns an unopened port. The failure then surfaces at the first send/recv with no breadcrumb. Track success with a bool and return Err(VirtioSerialError::PortNotAvailable(VIRTIO_SERIAL_PORT_ID)) + a log::error! on exhaustion.

@Martinits can you please help to address it? It should be a 3-line change, see below:

For 1)

            for vq_buf in &h2g {
                if let Some(record) = self.dma_allocation.get(&vq_buf.addr) {
                    let safe_len = core::cmp::min(vq_buf.len as usize, record.dma_size);
                    let control_msg = unsafe {
                        core::slice::from_raw_parts(vq_buf.addr as *const u8, safe_len)
                    };
                    self.handle_control_msg(control_msg)?;
                    self.free_dma_memory(vq_buf.addr)
                        .ok_or(VirtioSerialError::OutOfResource)?;
                }
            }

For 2)

        let port = VirtioSerialPort::new(VIRTIO_SERIAL_PORT_ID);
        let mut opened = false;
        for _ in 0..VIRTIO_SERIAL_OPEN_RETRY_TIMES {
            match port.open() {
                Ok(()) => {
                    opened = true;
                    break;
                }
                Err(VirtioSerialError::PortNotAvailable(_)) => {
                    // `PORT_OPEN` can arrive after init_control; pump control queue during retry.
                    // If we drained pending control msgs, try open again immediately to avoid
                    // waiting for the next retry tick.
                    Timer::after(VIRTIO_SERIAL_OPEN_RETRY_DELAY).await;
                    VirtioSerial::try_poll_control()?;
                }
                Err(e) => return Err(e.into()),
            }
        }
        if !opened {
            log::error!(
                "virtio-serial port {} not available after {} retries ({}ms total)\n",
                VIRTIO_SERIAL_PORT_ID,
                VIRTIO_SERIAL_OPEN_RETRY_TIMES,
                VIRTIO_SERIAL_OPEN_RETRY_TIMES * VIRTIO_SERIAL_OPEN_RETRY_DELAY.as_millis() as usize,
            );
            return Err(VirtioSerialError::PortNotAvailable(VIRTIO_SERIAL_PORT_ID).into());
        }
        return Ok(port);

Copy link
Copy Markdown
Contributor

@sgrams sgrams left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous comment

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants