RUBY-3364 fix connection pool thread starvation under oversubscription by comandeo-mongo · Pull Request #3028 · mongodb/mongo-ruby-driver

comandeo-mongo · 2026-04-23T13:07:19Z

Summary

Fixes thread starvation in Mongo::Server::ConnectionPool when worker threads
significantly outnumber the pool size.

Root cause: mutex barging at the checkout predicate gate. A thread returning
from do_check_in re-enters retrieve_and_connect_connection while
unavailable_connections < max_size is still true, bypasses the wait loop
entirely, and takes the freshly-released connection before any thread blocked
in @size_cv.wait can re-acquire @lock. With 200 threads and a 5-connection
pool, 5 threads form a closed cycle and the other 195 starve for the entire
run (each completes 2 ops and hits the wait timeout once).

This is not non-FIFO condition variable behavior — MRI's ::ConditionVariable
signals waiters in insertion order. The barging happens at the mutex
re-acquisition boundary, not inside the CV. The full diagnosis and the
rationale for the minimal fix (vs. a larger per-waiter-CV redesign) is in
the design doc that accompanies this change.

Fix

Introduce per-gate waiter counters (@size_waiters, @max_connecting_waiters)
and require any thread that arrives with a non-zero counter to enter the
wait loop at least once, regardless of whether the predicate is currently
satisfied. Once a thread has served one wait cycle it is eligible on the
next iteration.

Preserves the existing architecture and synchronization primitives; no
changes to do_check_in signaling or @connection_requests counting.

Validation

workload	before	after
200 threads, pool=5, 10s queries, 30s run	min=2, max=17340, ~390 timeouts	min=584, max=585, 0 timeouts

All existing suites pass against a local replica set:

spec/mongo/server/connection_pool_spec.rb + connection_pool/ subdir: 114 examples, 0 failures, 3 pending
spec/spec_tests/cmap_spec.rb: 226 examples, 0 failures
spec/integration/connection_pool_populator_spec.rb + connection_spec.rb: 23 examples, 0 failures, 10 pending
bundle exec rubocop lib/mongo/server/connection_pool.rb spec/mongo/server/connection_pool_fairness_spec.rb: clean

Changes

lib/mongo/server/connection_pool.rb: add @size_waiters / @max_connecting_waiters counters; gate both checkout waits on them
spec/mongo/server/connection_pool_fairness_spec.rb: new regression test asserting per-thread service-count ratio and zero checkout timeouts under oversubscription. Verified to fail on the pre-fix code (ratio=0.0, min=1, max=~4000) and pass on the fixed code (ratio >= 0.99).
profile/connection_pool_fairness.rb: fairness diagnostic harness kept under profile/ alongside driver_bench. Used to reproduce RUBY-3364 and to validate the fix; not run in default CI.

Test Plan

CI green on all Evergreen variants
Reviewer runs profile/connection_pool_fairness.rb locally and confirms even distribution
Consider whether the ratio >= 0.5 threshold in the regression spec is appropriate for all CI variants (local observation is >= 0.99; 0.5 was chosen to absorb CI noise)

Drives the pool under high thread:connection oversubscription and reports per-thread service counts and wait-time distribution. Used to diagnose thread starvation under contention. Kept as a permanent diagnostic tool under profile/ alongside driver_bench. Tunable via THREADS, POOL_SIZE, DURATION_SEC, WAIT_TIMEOUT env vars. Requires a running cluster at MONGODB_URI (defaults to local replica set).

…nnection

…ection

Copilot

Pull request overview

Addresses RUBY-3364 by improving fairness in Mongo::Server::ConnectionPool under heavy thread oversubscription, aiming to prevent long-lived starvation cycles.

Changes:

Add per-gate waiter counters (@size_waiters, @max_connecting_waiters) and incorporate them into checkout gating logic.
Add a new RSpec regression test to detect starvation/unfair service distribution under oversubscription.
Add a profiling harness to reproduce and measure pool fairness characteristics outside CI.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File	Description
lib/mongo/server/connection_pool.rb	Introduces waiter counters and modifies checkout gate loops to reduce mutex barging/starvation.
spec/mongo/server/connection_pool_fairness_spec.rb	Adds a regression spec asserting no checkout timeouts and roughly even per-thread service.
profile/connection_pool_fairness.rb	Adds a standalone profiling script to measure fairness and wait-time distributions.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-23T14:03:40Z

+          until (max_size == 0 || unavailable_connections < max_size) && !must_wait
            wait = deadline - Utils.monotonic_time
            raise_check_out_timeout!(connection_global_id) if wait <= 0
-            @size_cv.wait(wait)
+            @size_waiters += 1
+            begin
+              @size_cv.wait(wait)
+            ensure


Forcing a @size_cv.wait solely because @size_waiters > 0 can create a lost-wakeup/spurious-timeout scenario: a thread may enter the wait even though unavailable_connections < max_size is already true, and if no further @size_cv.signal occurs (e.g., multiple check-ins happened before this thread began waiting, leaving capacity available but no subsequent signals), it will sleep until deadline and raise a checkout timeout despite capacity being available. This fairness mechanism likely needs a handoff/ticket-style turnstile (or some form of waiter-to-waiter signaling) so a “queue tax” wait does not depend on an external signal when the predicate is already satisfied.

Good catch. This is a real lost-wakeup: a newcomer entering during the wake-up window (observing a stale @size_waiters > 0 before the waking waiter has decremented it) can be parked on @size_cv with capacity available, and no further signal is guaranteed.

Fixed in b7b50f2: after a successful checkout updates @checked_out_connections, signal @size_cv if there is a waiter AND the gate predicate is satisfied for them. The predicate guard matters — an unconditional signal would wake a waiter whose predicate still fails, causing it to re-queue itself at the back of the FIFO and break the wait-queue-fairness CMAP spec test.

Reproduction test added in fa55ebd (white-box, instruments the pool CV to force the race window open). Verified it fails on the prior commit and passes after the fix.

Copilot · 2026-04-23T14:03:41Z

@@ -1353,10 +1375,16 @@ def wait_for_connection(connection_global_id, deadline)
              decrement_connection_requests_and_signal
              raise_check_out_timeout!(connection_global_id)
            end
-            @max_connecting_cv.wait(wait)
+            @max_connecting_waiters += 1
+            begin


The same must_wait pattern on @max_connecting_cv can block even when a connection is already available (@available_connections.any?), and can similarly lead to a spurious timeout if the thread starts waiting after the last relevant signal and no further signals occur. Consider ensuring the fairness mechanism does not require an additional CV signal when the gate predicate is already satisfied (e.g., via an explicit FIFO/turnstile handoff among waiters).

Same fix applied to @max_connecting_cv in wait_for_connection. Commit b7b50f2.

Predicate-guarded signal (@available_connections.any? || @pending_connections.length < @max_connecting) avoids the FIFO-reordering side effect that would otherwise surface on the CMAP wait-queue-fairness spec test.

Copilot · 2026-04-23T14:03:41Z

+# Drain queue into array.
+rows = []
+rows << results.pop until results.empty?
+


This harness enqueues a full per-operation row [tid, t1, t2, dur, err] into results for the entire run, then drains it into rows. Under fast operations (or longer durations), this can grow to a very large number of Ruby objects and can OOM the process, making the diagnostic unreliable. Consider aggregating statistics online (counts/histograms per thread) and only optionally storing full rows behind an env flag or sampling rate.

Noted. Deferring to a follow-up. At the default workload (200 threads × 30s ≈ 120k rows, ~24 MB of Ruby objects) the harness is fine; it would need online aggregation for multi-minute runs. Since this script lives under profile/ and is run by humans for diagnostic purposes rather than in CI, the memory ceiling is acceptable for now. I will add a header note.

Copilot · 2026-04-23T14:03:41Z

+          # satisfied. Without this, re-entering threads (those that just
+          # checked a connection back in) barge past existing waiters and
+          # the 195 blocked threads in a 200:5 scenario never wake.
+          must_wait = @size_waiters > 0


The new must_wait gating can block even when max_size == 0 (unlimited pool). With max_size == 0, the size gate should never wait; however the current condition still waits whenever @size_waiters > 0, which can introduce unnecessary blocking and even checkout timeouts in an otherwise-unbounded pool. Consider bypassing the must_wait logic entirely when max_size == 0 (or folding it into must_wait so it is always false in this case).

Suggested change

must_wait = @size_waiters > 0

# Do not apply this gate to unlimited pools, where max_size == 0

# means there is no size slot to wait for.

must_wait = max_size != 0 && @size_waiters > 0

Applied: must_wait = max_size != 0 && @size_waiters > 0. Commit b7b50f2.

max_connecting does not have an unlimited sentinel (it is always a positive integer), so no parallel change is needed for the second gate.

Copilot review on PR mongodb#3028 identified two correctness bugs in the initial anti-barging fix: 1. Lost wakeup (comments #1/#2): When a successful waiter leaves the size gate, it does not signal the next waiter. A newcomer that arrived during the wake-up window (observing a stale size_waiters>0) can be parked on size_cv with capacity already available, and will not be woken by any subsequent event unless an external check-in happens. Symptom: newcomer waits for the full wait_queue_timeout despite capacity being free, even under light load. Fix: after a successful checkout (once @checked_out_connections is updated), signal size_cv if there are waiters AND the gate predicate is satisfied for them. The predicate guard matters: signaling unconditionally would wake a waiter whose predicate still fails, causing it to requeue itself at the back of the FIFO and break the wait-queue-fairness CMAP spec test. The same pattern is mirrored in wait_for_connection for @max_connecting_cv. 2. Unlimited pool (comment mongodb#4): must_wait fired even when max_size==0, introducing pointless blocking for unbounded pools where there is no size constraint to wait on. Skip the gate in that case. Validation: - New white-box regression test reproduces the lost wakeup deterministically by instrumenting size_cv.wait to release the lock during the wake-up window. - spec/spec_tests/cmap_spec.rb wait-queue-fairness case (CMAP unified test): 16 examples, 0 failures. - spec/mongo/server/connection_pool_spec.rb and spec/mongo/server/connection_pool/: 115 examples, 0 failures, 3 pending. - spec/spec_tests/cmap_spec.rb (full): 226 examples, 0 failures. - profile/connection_pool_fairness.rb at 200:5 ratio: 0 timeouts, even distribution.

Deterministically reproduces the lost-wakeup scenario Copilot identified on PR mongodb#3028 by monkey-patching size_cv.wait on the pool's own CV instance to release the mutex for 300ms after the first wait returns. That window lets a newcomer enter while @size_waiters is still stale (not yet decremented), queue up on size_cv, and verify it is woken by the baton-pass signal rather than waiting for wait_queue_timeout. Without the fix: newcomer latency == wait_queue_timeout (test fails). With the fix: newcomer latency << wait_queue_timeout (test passes).

Copilot review on PR mongodb#3028 flagged that the harness accumulates every operation into memory. At default settings the memory footprint is modest, but long runs could grow unbounded. Documented in the header with guidance to switch to online aggregation for multi-minute runs.

comandeo-mongo added 5 commits April 23, 2026 14:28

RUBY-3364 add size_waiters and max_connecting_waiters counters

f807391

RUBY-3364 prevent barging at the size gate in retrieve_and_connect_co…

6e1e48e

…nnection

RUBY-3364 prevent barging at the max_connecting gate in wait_for_conn…

00dbe76

…ection

RUBY-3364 add fairness regression spec for connection pool

dcf8be4

comandeo-mongo marked this pull request as ready for review April 23, 2026 13:57

comandeo-mongo requested a review from a team as a code owner April 23, 2026 13:57

comandeo-mongo requested review from Copilot and jamis April 23, 2026 13:57

Copilot started reviewing on behalf of comandeo-mongo April 23, 2026 13:57 View session

Copilot AI reviewed Apr 23, 2026

View reviewed changes

comandeo-mongo marked this pull request as draft April 23, 2026 14:16

comandeo-mongo added 3 commits April 23, 2026 16:57

comandeo-mongo marked this pull request as ready for review April 23, 2026 15:37

jamis approved these changes Apr 27, 2026

View reviewed changes

comandeo-mongo merged commit 88ead99 into mongodb:master Apr 28, 2026
198 checks passed

comandeo-mongo deleted the 3364 branch April 28, 2026 06:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RUBY-3364 fix connection pool thread starvation under oversubscription#3028

RUBY-3364 fix connection pool thread starvation under oversubscription#3028
comandeo-mongo merged 8 commits intomongodb:masterfrom
comandeo-mongo:3364

comandeo-mongo commented Apr 23, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

comandeo-mongo Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

comandeo-mongo Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

comandeo-mongo Apr 23, 2026

Uh oh!

Copilot AI Apr 23, 2026

Uh oh!

comandeo-mongo Apr 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-          must_wait = @size_waiters > 0
+          # Do not apply this gate to unlimited pools, where max_size == 0
+          # means there is no size slot to wait for.
+          must_wait = max_size != 0 && @size_waiters > 0

Conversation

comandeo-mongo commented Apr 23, 2026

Summary

Fix

Validation

Changes

Test Plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

comandeo-mongo Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

comandeo-mongo Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

comandeo-mongo Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

comandeo-mongo Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants