Skip to content

Stop stack-capture UAF in ConnectionPoolImpl timeout timer#222

Merged
gophergogo merged 2 commits intomainfrom
fix/pool-pending-connection-stack-capture
Apr 21, 2026
Merged

Stop stack-capture UAF in ConnectionPoolImpl timeout timer#222
gophergogo merged 2 commits intomainfrom
fix/pool-pending-connection-stack-capture

Conversation

@gophergogo
Copy link
Copy Markdown
Collaborator

Summary

  • ConnectionPoolImpl::createNewConnection kept its PendingConnection on the stack and handed the address to the timeout-timer lambda before std::move-pushing the entry onto pending_connections_. Once the function returned, the lambda's capture dangled — so when the timer fired, the callback dereferenced freed memory. Fix: emplace an empty slot on the list first and bind the capture to the stable list-element address.
  • onConnectionTimeout previously used std::remove_if + list::erase, which moves elements around to compact them — including the entry whose timer lambda is currently executing on top of itself. Switched to std::list::remove_if (member) so the node is unlinked in place, identified by address rather than unique_ptr equality.
  • New real-IO regression test ConnectionPoolPendingTimeoutTest.TimeoutFiresCleanly drives newConnection against TEST-NET-1 (192.0.2.1:1, RFC 5737) with a 150 ms pool timeout, so the timer wins the race vs. the kernel SYN retransmit budget. Against the pre-fix code it segfaults (exit 139); with the fix it delivers onPoolFailure(Timeout) in ~150 ms.

Test plan

  • test_connection_pool_pending_timeout — new test passes with fix (152 ms)
  • Same test against pre-fix source: SIGSEGV (exit 139) — confirms the test actually catches the regression
  • test_connection_manager_impl — pass
  • test_connection_impl — pass
  • test_connection_manager_real_io — pass (7/7)
  • test_connection_pool_callbacks — pass (13/13)
  • test_connection_manager_callback_fixes — pass (14/14)

@gophergogo gophergogo changed the title Stop stack-capture UAF in ConnectionPoolImpl timeout timer Stop stack-capture UAF in ConnectionPoolImpl timeout timer (#221) Apr 21, 2026
gophergogo added 2 commits April 20, 2026 20:26
)

ConnectionPoolImpl::createNewConnection constructed PendingConnection on
the caller's stack, handed its address to the timeout-timer lambda, and
only then std::move-pushed the entry onto pending_connections_. The
captured pointer dangled as soon as createNewConnection returned, so
the timer callback touched freed memory when it fired.

Emplace an empty slot on pending_connections_ first and bind the timer
capture to that list element. std::list guarantees per-element address
stability across push/erase of other elements, which is the invariant
the capture relies on.

onConnectionTimeout previously compacted the list via std::remove_if
plus list::erase. The non-member remove_if moves elements to shuffle
the erased one to the end — which would move the entry whose timer
lambda is currently executing on top of itself. Switch to
std::list::remove_if (member) so the node is unlinked in place, and
identify it by address rather than unique_ptr equality to avoid
accidentally matching an unrelated entry.
Drive the pool's timeout path with real IO: bind to TEST-NET-1
(192.0.2.1:1, RFC 5737 non-routable) and set a short 150ms
connection_timeout so the timer fires long before the kernel's SYN
retransmit budget gives up. On the pre-fix code this segfaults in the
timer callback (the lambda dereferences a destroyed stack slot); with
the fix applied the dispatcher delivers onPoolFailure(Timeout) cleanly
in ~150ms.

Use RealIoTestBase so pool construction, newConnection, and teardown
all run inside the dispatcher thread — the only context libevent and
ConnectionImpl will accept for creating file events and timers.
@caleb2h caleb2h changed the title Stop stack-capture UAF in ConnectionPoolImpl timeout timer (#221) Stop stack-capture UAF in ConnectionPoolImpl timeout timer Apr 21, 2026
@gophergogo gophergogo force-pushed the fix/pool-pending-connection-stack-capture branch from 2131456 to 5b792e3 Compare April 21, 2026 03:32
@gophergogo gophergogo merged commit 7205ba5 into main Apr 21, 2026
1 check passed
gophergogo pushed a commit that referenced this pull request Apr 21, 2026
)

ConnectionPoolImpl::createNewConnection constructed PendingConnection on
the caller's stack, handed its address to the timeout-timer lambda, and
only then std::move-pushed the entry onto pending_connections_. The
captured pointer dangled as soon as createNewConnection returned, so
the timer callback touched freed memory when it fired.

Emplace an empty slot on pending_connections_ first and bind the timer
capture to that list element. std::list guarantees per-element address
stability across push/erase of other elements, which is the invariant
the capture relies on.

onConnectionTimeout previously compacted the list via std::remove_if
plus list::erase. The non-member remove_if moves elements to shuffle
the erased one to the end — which would move the entry whose timer
lambda is currently executing on top of itself. Switch to
std::list::remove_if (member) so the node is unlinked in place, and
identify it by address rather than unique_ptr equality to avoid
accidentally matching an unrelated entry.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants