Fix connection lifetime and init-path crashes#212
Merged
Conversation
added 7 commits
April 19, 2026 14:13
…me (#212) The previous implementation ran the callback synchronously when called from the dispatcher thread, which breaks deferredDelete: the drain runs while the caller's callback is still on the stack, destroying objects mid-iteration and causing use-after-free in connection close handlers. Route on-thread scheduling through a 0ms timer so libevent fires the drain after the current handler returns. Adds test_deferred_delete_ordering covering the contract.
) onConnectionEvent(RemoteClose|LocalClose) runs inside the connection's own callback loop. Destroying the connection synchronously triggered a use-after-free once the loop resumed. The previous workaround wrapped the unique_ptr in a shared_ptr and posted it — now replaced with the dispatcher's deferred-delete queue, which Connection already supports via DeferredDeletable. Also adds a dispatcher-thread assert at the top of onConnectionEvent so any off-thread synthesis of connection events trips immediately instead of silently corrupting state.
startBackgroundTasks assigned timers to local unique_ptrs that dropped at end of scope, so session cleanup and resource-update ticks never fired. Each fire also constructed a fresh timer to re-arm, which both leaked state and destroyed the currently-firing timer from inside its own callback. Hold both timers as McpServer members, and on each fire re-arm the same timer via enableTimer. stopBackgroundTasks now disables them explicitly so a pending fire can't run after shutdown begins.
McpServer registered itself on every connection and relied on a global current_connection_ pointer to identify which connection fired a close event. That pointer was stale whenever more than one connection was active, so session teardown ran against the wrong session (or none). Introduce ConnectionLifecycleCallbacks — a tiny adapter bound to a single (server, connection) pair. Each connection gets its own instance, held in lifecycle_callbacks_ until close. The close path now knows exactly which connection is dying and tears down its session directly. Both the connection and the adapter are handed to the dispatcher's deferred-delete queue on close, since we are inside the adapter's own onEvent() when we drop them. Adds a dispatcher-thread assert at the entry so any off-thread caller trips immediately.
…212) The detached worker that waits for the initialize response previously also wrote server_capabilities_, initialized_, and kicked protocol_state_machine_->handleEvent(INITIALIZED) directly. Those fields are read from the dispatcher thread, so the worker's writes were a data race, and the state-machine event was being handled off the dispatcher thread that owns its timers. Restructure so the worker only blocks and parses, then posts the state commit and promise resolution back to main_dispatcher_. The promise resolves after the post so get()-returning callers observe the published state.
Client connections carry SSE responses whose body is an open-ended stream of events — quiet periods between events are expected and valid. The 60s body timeout was firing on these, and the state machine's executeTransition() then crashed on a still-live stream. Set body_timeout to 0 (disabled) for client mode; keep 60s on server mode where request bodies are expected to complete promptly.
Covers three invariants the preceding commits rely on: - Nested deferredDelete: a destructor that itself calls deferredDelete must not lose the inner object — it drains on a later pass. Guards the real-world pattern where a connection's destructor releases sub-objects that also need deferral. - body_timeout == 0 disables the body timer entirely. The client-mode HttpCodecFilter sets this so SSE response streams with long idle gaps between events don't trigger a false timeout. - Periodic timer re-arm on the same TimerPtr: the pattern McpServer's background tasks use. Confirms repeated fires and prompt cessation on disableTimer(), replacing the previous scheme that built a fresh timer inside each fire.
f434d04 to
b4b6f5d
Compare
added 3 commits
April 19, 2026 14:47
McpConnectionManager::onConnectionEvent now asserts isThreadSafe(), so calling it from the test thread aborts. Spin up a background dispatcher thread in the fixture and route the two existing OnConnectionEventPreventsDuplicate* cases through a runOnDispatcher helper. Adds OnCloseRunsCleanlyOnDispatcherThread, which walks Connected -> RemoteClose -> LocalClose on the dispatcher, and RepeatedEventSequencesOnDispatcherThreadDoNotCrash, which replays the sequence five times to catch any lifetime bug that only surfaces under repeated close events.
Adds LifecycleAdapterDefersSelfAndPeerFromOwnCallback, a primitive-level test that models McpServer::ConnectionLifecycleCallbacks: an adapter which, from inside its own onEvent, hands both itself and the owning peer connection to deferredDelete. Confirms neither destructs synchronously inside that callback — doing so would use-after-free the ConnectionCallbacks iterator in production — and that both drain on a later dispatcher pass.
The state-machine test already proves body_timeout == 0 disables the
body timer. What it does not prove is that HttpCodecFilter in client
mode is actually wired to pass 0. The client-mode fix is a single
ternary in the filter's constructor; a regression (e.g. someone
reverting to a flat 60s) would bring back the SSE crash silently.
Add a narrow introspection accessor:
- HttpCodecStateMachine::config() — read-only view of the
as-configured values (immutable after construction).
- HttpCodecFilter::bodyTimeout() — forwards to the state machine's
configured body_timeout.
Use it in two new tests: server mode retains 60s, client mode is 0.
caleb2h
approved these changes
Apr 19, 2026
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
…me (#212) The previous implementation ran the callback synchronously when called from the dispatcher thread, which breaks deferredDelete: the drain runs while the caller's callback is still on the stack, destroying objects mid-iteration and causing use-after-free in connection close handlers. Route on-thread scheduling through a 0ms timer so libevent fires the drain after the current handler returns. Adds test_deferred_delete_ordering covering the contract.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
) onConnectionEvent(RemoteClose|LocalClose) runs inside the connection's own callback loop. Destroying the connection synchronously triggered a use-after-free once the loop resumed. The previous workaround wrapped the unique_ptr in a shared_ptr and posted it — now replaced with the dispatcher's deferred-delete queue, which Connection already supports via DeferredDeletable. Also adds a dispatcher-thread assert at the top of onConnectionEvent so any off-thread synthesis of connection events trips immediately instead of silently corrupting state.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
startBackgroundTasks assigned timers to local unique_ptrs that dropped at end of scope, so session cleanup and resource-update ticks never fired. Each fire also constructed a fresh timer to re-arm, which both leaked state and destroyed the currently-firing timer from inside its own callback. Hold both timers as McpServer members, and on each fire re-arm the same timer via enableTimer. stopBackgroundTasks now disables them explicitly so a pending fire can't run after shutdown begins.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
McpServer registered itself on every connection and relied on a global current_connection_ pointer to identify which connection fired a close event. That pointer was stale whenever more than one connection was active, so session teardown ran against the wrong session (or none). Introduce ConnectionLifecycleCallbacks — a tiny adapter bound to a single (server, connection) pair. Each connection gets its own instance, held in lifecycle_callbacks_ until close. The close path now knows exactly which connection is dying and tears down its session directly. Both the connection and the adapter are handed to the dispatcher's deferred-delete queue on close, since we are inside the adapter's own onEvent() when we drop them. Adds a dispatcher-thread assert at the entry so any off-thread caller trips immediately.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
…212) The detached worker that waits for the initialize response previously also wrote server_capabilities_, initialized_, and kicked protocol_state_machine_->handleEvent(INITIALIZED) directly. Those fields are read from the dispatcher thread, so the worker's writes were a data race, and the state-machine event was being handled off the dispatcher thread that owns its timers. Restructure so the worker only blocks and parses, then posts the state commit and promise resolution back to main_dispatcher_. The promise resolves after the post so get()-returning callers observe the published state.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
Client connections carry SSE responses whose body is an open-ended stream of events — quiet periods between events are expected and valid. The 60s body timeout was firing on these, and the state machine's executeTransition() then crashed on a still-live stream. Set body_timeout to 0 (disabled) for client mode; keep 60s on server mode where request bodies are expected to complete promptly.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
Covers three invariants the preceding commits rely on: - Nested deferredDelete: a destructor that itself calls deferredDelete must not lose the inner object — it drains on a later pass. Guards the real-world pattern where a connection's destructor releases sub-objects that also need deferral. - body_timeout == 0 disables the body timer entirely. The client-mode HttpCodecFilter sets this so SSE response streams with long idle gaps between events don't trigger a false timeout. - Periodic timer re-arm on the same TimerPtr: the pattern McpServer's background tasks use. Confirms repeated fires and prompt cessation on disableTimer(), replacing the previous scheme that built a fresh timer inside each fire.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
McpConnectionManager::onConnectionEvent now asserts isThreadSafe(), so calling it from the test thread aborts. Spin up a background dispatcher thread in the fixture and route the two existing OnConnectionEventPreventsDuplicate* cases through a runOnDispatcher helper. Adds OnCloseRunsCleanlyOnDispatcherThread, which walks Connected -> RemoteClose -> LocalClose on the dispatcher, and RepeatedEventSequencesOnDispatcherThreadDoNotCrash, which replays the sequence five times to catch any lifetime bug that only surfaces under repeated close events.
caleb2h
pushed a commit
that referenced
this pull request
Apr 19, 2026
Adds LifecycleAdapterDefersSelfAndPeerFromOwnCallback, a primitive-level test that models McpServer::ConnectionLifecycleCallbacks: an adapter which, from inside its own onEvent, hands both itself and the owning peer connection to deferredDelete. Confirms neither destructs synchronously inside that callback — doing so would use-after-free the ConnectionCallbacks iterator in production — and that both drain on a later dispatcher pass.
gophergogo
pushed a commit
that referenced
this pull request
Apr 20, 2026
McpServerConfig already carries http_sse_path, http_rpc_path, and the new external_url (added earlier in this stack), but the fallback "default HTTP/SSE filter chain factory" construction path was still building the factory with no path arguments — so the SSE server transport was hardcoded to /sse and /mcp with no external URL, regardless of what the operator configured. Forward all three fields through the ctor. Also set http_path to http_rpc_path explicitly so a server that uses this fallback path surfaces a consistent RPC endpoint; http_host stays "localhost" because it's the client-mode Host header and irrelevant in server mode. Note: upstream's version of this change also rewrote the active_connections_ cleanup on close to post-based deferral, but the PR #212 crash-fix stack already replaced that code with Dispatcher::deferredDelete, which is the correct primitive here and handles the use-after-free window that post() only partially covers. Intentionally skipping that part of the upstream diff.
This was referenced Apr 20, 2026
gophergogo
pushed a commit
that referenced
this pull request
Apr 20, 2026
) McpServerConfig already carries http_sse_path, http_rpc_path, and the new external_url (added earlier in this stack), but the fallback "default HTTP/SSE filter chain factory" construction path was still building the factory with no path arguments — so the SSE server transport was hardcoded to /sse and /mcp with no external URL, regardless of what the operator configured. Forward all three fields through the ctor. Also set http_path to http_rpc_path explicitly so a server that uses this fallback path surfaces a consistent RPC endpoint; http_host stays "localhost" because it's the client-mode Host header and irrelevant in server mode. Note: upstream's version of this change also rewrote the active_connections_ cleanup on close to post-based deferral, but the PR #212 crash-fix stack already replaced that code with Dispatcher::deferredDelete, which is the correct primitive here and handles the use-after-free window that post() only partially covers. Intentionally skipping that part of the upstream diff.
gophergogo
pushed a commit
that referenced
this pull request
Apr 20, 2026
) McpServerConfig already carries http_sse_path, http_rpc_path, and the new external_url (added earlier in this stack), but the fallback "default HTTP/SSE filter chain factory" construction path was still building the factory with no path arguments — so the SSE server transport was hardcoded to /sse and /mcp with no external URL, regardless of what the operator configured. Forward all three fields through the ctor. Also set http_path to http_rpc_path explicitly so a server that uses this fallback path surfaces a consistent RPC endpoint; http_host stays "localhost" because it's the client-mode Host header and irrelevant in server mode. Note: upstream's version of this change also rewrote the active_connections_ cleanup on close to post-based deferral, but the PR #212 crash-fix stack already replaced that code with Dispatcher::deferredDelete, which is the correct primitive here and handles the use-after-free window that post() only partially covers. Intentionally skipping that part of the upstream diff.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Six independent crash/ordering fixes, split one-per-commit for review. Each addresses a real production crash or hang observed in PR #211's larger bundle; this PR lands the crash fixes cleanly without the SSE-transport / libcurl work, which will follow as separate PRs.
Commits
scheduleCallbackCurrentIterationto defer past caller's stack frame — the primitive was running the callback synchronously when called on the dispatcher thread, breaking thedeferredDeletecontract. Route on-thread scheduling through a 0ms timer. Addstest_deferred_delete_orderingcovering the contract.Dispatcher::deferredDelete— replaces theshared_ptr<ConnectionPtr>+post()workaround with the correct primitive. Adds a dispatcher-thread assert on entry.McpServer— timers were localunique_ptrs that dropped at end of scope, so ticks never fired. Hold as members; re-arm viaenableTimeron the same timer.current_connection_fallback. Each connection gets its ownConnectionLifecycleCallbacks, handed todeferredDeletealongside the connection on close.initializeProtocolstate commit back to the dispatcher thread — worker thread was mutatingserver_capabilities_,initialized_, and the protocol state machine off-thread. Now parses on the worker and commits state viamain_dispatcher_->post.HttpCodecStateMachine::executeTransition(). Server mode keeps the bounded timeout.Test plan
test_deferred_delete_ordering(new) — 4 cases: synchronous-deferralcontract, batched drain, nested-during-drain, lifecycle-adapter
self+peer pattern
test_timer_lifetime(new) — re-arm-same-TimerPtr contract used byMcpServer background tasks
test_http_codec_state_machine— addsBodyTimeoutDisabledWhenZeroproving body_timeout==0 disables the body timer
test_http_codec_filter— addsHttpCodecFilterBodyTimeoutTest×2, confirming client mode wires body_timeout=0 and server mode
retains 60s
test_mcp_connection_manager— two new cases(
OnCloseRunsCleanlyOnDispatcherThread,RepeatedEventSequencesOnDispatcherThreadDoNotCrash) plus the twoexisting
OnConnectionEventPreventsDuplicate*cases refactored todrive through the dispatcher to satisfy the new
isThreadSafe()assert
ctestsuite: 118/118 passingAnd in Out-of-scope, add a line clarifying the remaining integration-level gaps that PR D picks up:
close path and background timer lifetime (tracked for PR D — need full
ApplicationBase bootstrap; adapter/timer contracts already covered at
primitive level in this PR)