Skip to content

net/tcp: close write-worker fd on async write error (fix #3711)#3874

Open
dondetir wants to merge 1 commit intoOpenSIPS:3.4from
dondetir:fix/tcp-write-worker-fd-leak-3711
Open

net/tcp: close write-worker fd on async write error (fix #3711)#3874
dondetir wants to merge 1 commit intoOpenSIPS:3.4from
dondetir:fix/tcp-write-worker-fd-leak-3711

Conversation

@dondetir
Copy link
Copy Markdown
Contributor

Problem

Issue #3711 reports fd leaks when TLS connections fail to establish. PR #3634 fixed the immediate failure path (n < 0 — cert verify error returned synchronously), but a second path still leaks: the async handshake timeout path (n = 0).

Root cause

When tls_async_connect returns 0 (handshake timer fired), the connection is queued via ASYNC_WRITE_GENW and handed to the TCP write worker. The worker receives fd s via SCM_RIGHTS — an independent fd copy — then calls tls_async_write, which fails and returns -1.

In handle_io's IO_WATCH_WRITE handler (net/net_tcp_proc.c), the resp < 0 branch breaks before reaching the close(s) that success/pending paths hit:

tcpconn_release_error(con, 1,"Write error");
break; // ← jumps past close(s) below
// ...
close(s); // only success/pending paths reach here

tcpconn_release_error() signals main to close main's copy. The worker's copy s is never closed → one fd leaked per failed async write.

Fix

Single line, cherry-pick of master commit bffcf00:

tcpconn_release_error(con, 1,"Write error");
close(s); /* we always close the socket received for writing */
break;

Verification on 3.4.14

Mock server accepts TCP but withholds TLS data (forces tls_async_handshake_timeout to fire):

A server presenting an untrusted cert (immediate SSL error) hits the n < 0 path covered by #3634 and does not reproduce this leak. The mock server must hold TCP open without completing TLS to trigger the timeout path.

Scope : Master already has this via bffcf00. Backport to 3.4. Same fix likely needed in 3.5.x/3.6.x.

In the IO_WATCH_WRITE handler, when the stream write handler returns
<0 (e.g. TLS async write fails after the 2s handshake timeout), the
error branch breaks out of the switch before reaching the close(s)
call that the success/pending paths hit at line 327.

The fd s is delivered to the write worker via SCM_RIGHTS and is an
independent copy in the worker's fd table — it must be explicitly
closed on every exit path.  tcpconn_release_error() signals main to
close main's copy; close(s) closes the worker's copy.  Without it
one fd leaks per failed async write, confirmed: 7 leaked fds from
51 TLS OPTIONS (handshake timeout path) vs. 0 with the fix.

Cherry-pick of bffcf00 from master.

Fixes: OpenSIPS#3711
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant