Skip to content

RUBY-3364 fix connection pool thread starvation under oversubscription#3028

Merged
comandeo-mongo merged 8 commits intomongodb:masterfrom
comandeo-mongo:3364
Apr 28, 2026
Merged

RUBY-3364 fix connection pool thread starvation under oversubscription#3028
comandeo-mongo merged 8 commits intomongodb:masterfrom
comandeo-mongo:3364

Conversation

@comandeo-mongo
Copy link
Copy Markdown
Contributor

Summary

Fixes thread starvation in Mongo::Server::ConnectionPool when worker threads
significantly outnumber the pool size.

Root cause: mutex barging at the checkout predicate gate. A thread returning
from do_check_in re-enters retrieve_and_connect_connection while
unavailable_connections < max_size is still true, bypasses the wait loop
entirely, and takes the freshly-released connection before any thread blocked
in @size_cv.wait can re-acquire @lock. With 200 threads and a 5-connection
pool, 5 threads form a closed cycle and the other 195 starve for the entire
run (each completes 2 ops and hits the wait timeout once).

This is not non-FIFO condition variable behavior — MRI's ::ConditionVariable
signals waiters in insertion order. The barging happens at the mutex
re-acquisition boundary, not inside the CV. The full diagnosis and the
rationale for the minimal fix (vs. a larger per-waiter-CV redesign) is in
the design doc that accompanies this change.

Fix

Introduce per-gate waiter counters (@size_waiters, @max_connecting_waiters)
and require any thread that arrives with a non-zero counter to enter the
wait loop at least once, regardless of whether the predicate is currently
satisfied. Once a thread has served one wait cycle it is eligible on the
next iteration.

Preserves the existing architecture and synchronization primitives; no
changes to do_check_in signaling or @connection_requests counting.

Validation

workload before after
200 threads, pool=5, 10s queries, 30s run min=2, max=17340, ~390 timeouts min=584, max=585, 0 timeouts

All existing suites pass against a local replica set:

  • spec/mongo/server/connection_pool_spec.rb + connection_pool/ subdir: 114 examples, 0 failures, 3 pending
  • spec/spec_tests/cmap_spec.rb: 226 examples, 0 failures
  • spec/integration/connection_pool_populator_spec.rb + connection_spec.rb: 23 examples, 0 failures, 10 pending
  • bundle exec rubocop lib/mongo/server/connection_pool.rb spec/mongo/server/connection_pool_fairness_spec.rb: clean

Changes

  • lib/mongo/server/connection_pool.rb: add @size_waiters / @max_connecting_waiters counters; gate both checkout waits on them
  • spec/mongo/server/connection_pool_fairness_spec.rb: new regression test asserting per-thread service-count ratio and zero checkout timeouts under oversubscription. Verified to fail on the pre-fix code (ratio=0.0, min=1, max=~4000) and pass on the fixed code (ratio >= 0.99).
  • profile/connection_pool_fairness.rb: fairness diagnostic harness kept under profile/ alongside driver_bench. Used to reproduce RUBY-3364 and to validate the fix; not run in default CI.

Test Plan

  • CI green on all Evergreen variants
  • Reviewer runs profile/connection_pool_fairness.rb locally and confirms even distribution
  • Consider whether the ratio >= 0.5 threshold in the regression spec is appropriate for all CI variants (local observation is >= 0.99; 0.5 was chosen to absorb CI noise)

Drives the pool under high thread:connection oversubscription and reports
per-thread service counts and wait-time distribution. Used to diagnose
thread starvation under contention. Kept as a permanent diagnostic tool
under profile/ alongside driver_bench.

Tunable via THREADS, POOL_SIZE, DURATION_SEC, WAIT_TIMEOUT env vars.
Requires a running cluster at MONGODB_URI (defaults to local replica set).
@comandeo-mongo comandeo-mongo marked this pull request as ready for review April 23, 2026 13:57
@comandeo-mongo comandeo-mongo requested a review from a team as a code owner April 23, 2026 13:57
@comandeo-mongo comandeo-mongo requested review from Copilot and jamis April 23, 2026 13:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Addresses RUBY-3364 by improving fairness in Mongo::Server::ConnectionPool under heavy thread oversubscription, aiming to prevent long-lived starvation cycles.

Changes:

  • Add per-gate waiter counters (@size_waiters, @max_connecting_waiters) and incorporate them into checkout gating logic.
  • Add a new RSpec regression test to detect starvation/unfair service distribution under oversubscription.
  • Add a profiling harness to reproduce and measure pool fairness characteristics outside CI.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
lib/mongo/server/connection_pool.rb Introduces waiter counters and modifies checkout gate loops to reduce mutex barging/starvation.
spec/mongo/server/connection_pool_fairness_spec.rb Adds a regression spec asserting no checkout timeouts and roughly even per-thread service.
profile/connection_pool_fairness.rb Adds a standalone profiling script to measure fairness and wait-time distributions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1317 to +1323
until (max_size == 0 || unavailable_connections < max_size) && !must_wait
wait = deadline - Utils.monotonic_time
raise_check_out_timeout!(connection_global_id) if wait <= 0
@size_cv.wait(wait)
@size_waiters += 1
begin
@size_cv.wait(wait)
ensure
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forcing a @size_cv.wait solely because @size_waiters > 0 can create a lost-wakeup/spurious-timeout scenario: a thread may enter the wait even though unavailable_connections < max_size is already true, and if no further @size_cv.signal occurs (e.g., multiple check-ins happened before this thread began waiting, leaving capacity available but no subsequent signals), it will sleep until deadline and raise a checkout timeout despite capacity being available. This fairness mechanism likely needs a handoff/ticket-style turnstile (or some form of waiter-to-waiter signaling) so a “queue tax” wait does not depend on an external signal when the predicate is already satisfied.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This is a real lost-wakeup: a newcomer entering during the wake-up window (observing a stale @size_waiters > 0 before the waking waiter has decremented it) can be parked on @size_cv with capacity available, and no further signal is guaranteed.

Fixed in b7b50f2: after a successful checkout updates @checked_out_connections, signal @size_cv if there is a waiter AND the gate predicate is satisfied for them. The predicate guard matters — an unconditional signal would wake a waiter whose predicate still fails, causing it to re-queue itself at the back of the FIFO and break the wait-queue-fairness CMAP spec test.

Reproduction test added in fa55ebd (white-box, instruments the pool CV to force the race window open). Verified it fails on the prior commit and passes after the fix.

Comment on lines 1364 to +1379
@@ -1353,10 +1375,16 @@ def wait_for_connection(connection_global_id, deadline)
decrement_connection_requests_and_signal
raise_check_out_timeout!(connection_global_id)
end
@max_connecting_cv.wait(wait)
@max_connecting_waiters += 1
begin
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same must_wait pattern on @max_connecting_cv can block even when a connection is already available (@available_connections.any?), and can similarly lead to a spurious timeout if the thread starts waiting after the last relevant signal and no further signals occur. Consider ensuring the fairness mechanism does not require an additional CV signal when the gate predicate is already satisfied (e.g., via an explicit FIFO/turnstile handoff among waiters).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same fix applied to @max_connecting_cv in wait_for_connection. Commit b7b50f2.

Predicate-guarded signal (@available_connections.any? || @pending_connections.length < @max_connecting) avoids the FIFO-reordering side effect that would otherwise surface on the CMAP wait-queue-fairness spec test.

Comment on lines +88 to +91
# Drain queue into array.
rows = []
rows << results.pop until results.empty?

Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This harness enqueues a full per-operation row [tid, t1, t2, dur, err] into results for the entire run, then drains it into rows. Under fast operations (or longer durations), this can grow to a very large number of Ruby objects and can OOM the process, making the diagnostic unreliable. Consider aggregating statistics online (counts/histograms per thread) and only optionally storing full rows behind an env flag or sampling rate.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. Deferring to a follow-up. At the default workload (200 threads × 30s ≈ 120k rows, ~24 MB of Ruby objects) the harness is fine; it would need online aggregation for multi-minute runs. Since this script lives under profile/ and is run by humans for diagnostic purposes rather than in CI, the memory ceiling is acceptable for now. I will add a header note.

Comment thread lib/mongo/server/connection_pool.rb Outdated
# satisfied. Without this, re-entering threads (those that just
# checked a connection back in) barge past existing waiters and
# the 195 blocked threads in a 200:5 scenario never wake.
must_wait = @size_waiters > 0
Copy link

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new must_wait gating can block even when max_size == 0 (unlimited pool). With max_size == 0, the size gate should never wait; however the current condition still waits whenever @size_waiters > 0, which can introduce unnecessary blocking and even checkout timeouts in an otherwise-unbounded pool. Consider bypassing the must_wait logic entirely when max_size == 0 (or folding it into must_wait so it is always false in this case).

Suggested change
must_wait = @size_waiters > 0
# Do not apply this gate to unlimited pools, where max_size == 0
# means there is no size slot to wait for.
must_wait = max_size != 0 && @size_waiters > 0

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied: must_wait = max_size != 0 && @size_waiters > 0. Commit b7b50f2.

max_connecting does not have an unlimited sentinel (it is always a positive integer), so no parallel change is needed for the second gate.

@comandeo-mongo comandeo-mongo marked this pull request as draft April 23, 2026 14:16
Copilot review on PR mongodb#3028 identified two correctness bugs in the initial
anti-barging fix:

1. Lost wakeup (comments #1/#2): When a successful waiter leaves the
   size gate, it does not signal the next waiter. A newcomer that
   arrived during the wake-up window (observing a stale size_waiters>0)
   can be parked on size_cv with capacity already available, and will
   not be woken by any subsequent event unless an external check-in
   happens. Symptom: newcomer waits for the full wait_queue_timeout
   despite capacity being free, even under light load.

   Fix: after a successful checkout (once @checked_out_connections is
   updated), signal size_cv if there are waiters AND the gate predicate
   is satisfied for them. The predicate guard matters: signaling
   unconditionally would wake a waiter whose predicate still fails,
   causing it to requeue itself at the back of the FIFO and break the
   wait-queue-fairness CMAP spec test. The same pattern is mirrored in
   wait_for_connection for @max_connecting_cv.

2. Unlimited pool (comment mongodb#4): must_wait fired even when max_size==0,
   introducing pointless blocking for unbounded pools where there is
   no size constraint to wait on. Skip the gate in that case.

Validation:
- New white-box regression test reproduces the lost wakeup
  deterministically by instrumenting size_cv.wait to release the lock
  during the wake-up window.
- spec/spec_tests/cmap_spec.rb wait-queue-fairness case (CMAP unified
  test): 16 examples, 0 failures.
- spec/mongo/server/connection_pool_spec.rb and
  spec/mongo/server/connection_pool/: 115 examples, 0 failures,
  3 pending.
- spec/spec_tests/cmap_spec.rb (full): 226 examples, 0 failures.
- profile/connection_pool_fairness.rb at 200:5 ratio: 0 timeouts,
  even distribution.
Deterministically reproduces the lost-wakeup scenario Copilot identified
on PR mongodb#3028 by monkey-patching size_cv.wait on the pool's own CV
instance to release the mutex for 300ms after the first wait returns.
That window lets a newcomer enter while @size_waiters is still stale
(not yet decremented), queue up on size_cv, and verify it is woken by
the baton-pass signal rather than waiting for wait_queue_timeout.

Without the fix: newcomer latency == wait_queue_timeout (test fails).
With the fix:    newcomer latency << wait_queue_timeout (test passes).
Copilot review on PR mongodb#3028 flagged that the harness accumulates every
operation into memory. At default settings the memory footprint is
modest, but long runs could grow unbounded. Documented in the header
with guidance to switch to online aggregation for multi-minute runs.
@comandeo-mongo comandeo-mongo marked this pull request as ready for review April 23, 2026 15:37
@comandeo-mongo comandeo-mongo merged commit 88ead99 into mongodb:master Apr 28, 2026
198 checks passed
@comandeo-mongo comandeo-mongo deleted the 3364 branch April 28, 2026 06:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants