Skip to content

Bug: tsi_connect commits to S_INET too early with SOCK_DGRAM, causing routed but unreachable UDP packets to be emitted in guest network #132

@chaserhkj

Description

@chaserhkj

The current implementation of tsi_connect roughly goes like (ref)

static int tsi_connect(struct socket *sock, struct sockaddr *addr, int addr_len, int flags)
{
    // ...
    if (isocket) {
        err = isocket->ops->connect(isocket, addr, addr_len,
                                    flags & ~O_NONBLOCK);
        if (err == 0 || err == -EALREADY) {
            tsk->status = S_INET;
            goto release;
        } else if (err == -EINPROGRESS) {
            tsk->status = S_INET;
            goto release;
        }
    }
}

However this creates a problem with SOCK_DGRAM, upon connect with a DGRAM socket, isocket only checks if the target destination IP is routed but does not check if it is actually reachable (contrary to SOCK_STREAM where TCP handshake is performed thus confirming reachability).

This creates a problem for an edge case where if the guest configured its routing table with an IP that is reachable for the host, the guest kernel will always try to send UDP packets to that IP on the guest side.

Particularly with libkrun setting a dummy interface with IP 10.0.0.1/8 in past versions, and with k8s and podman using default pod address from this same range, this can make krun pods in these environment not able to route to other pods, and potentially problems with DNS as well.

Although this was fixed in this commit by replacing the dummy IP range with a less-used one. IMO This is still just a workaround not a proper fix. Under exotic network setups and edge cases, this might still trigger with overlapping network blocks.

I think the proper fix is to make DGRAM connects stay in S_HYBRID, copy the address to tsk->sendto_addr and defer the personality decision to sendmsg time, like this:

static int tsi_connect(struct socket *sock, struct sockaddr *addr, int addr_len, int flags)
{
    // ...
    if (isocket) {
        if (sock->type == SOCK_DGRAM) {
            // Store the address for later use in sendmsg
            if (!tsk->sendto_addr) {
                tsk->sendto_addr = kmalloc(addr_len, GFP_KERNEL);
                tsk->sendto_addr_len = addr_len;
            }
            memcpy(tsk->sendto_addr, addr, addr_len);
            goto release;  // stay in current state (S_HYBRID)
        }
        err = isocket->ops->connect(isocket, addr, addr_len,
                                    flags & ~O_NONBLOCK);
        // normal commit to S_INET for SOCK_STREAM ...
    }
}

I went for an issue because I am not sure how to contribute to the code in this repo since they are just a patchset and directly modifying a patch feels wrong... If there is a kernel source tree branch or mailing list you can point me to I would be more than happy to submit a patch for this there.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions