CSHARP-5935: Command activities may be skipped when using pooled connection#1918
CSHARP-5935: Command activities may be skipped when using pooled connection#1918ajcvickers merged 7 commits intomongodb:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Fixes a tracing/instrumentation bug where command-level OpenTelemetry activities could be permanently skipped on pooled connections when an ActivityListener was registered after the connection was created.
Changes:
- Update
CommandEventHelperto decide whether to start command activities based on currentActivitySourcelisteners (instead of only at connection construction time). - Stabilize the OpenTelemetry smoke test by capturing activities thread-safely and waiting for async activity completion.
- Minor smoke test / test infrastructure adjustments (key vault client URI selection; unobserved-exception tracking test case selection).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
src/MongoDB.Driver/Core/Connections/CommandEventHelper.cs |
Adjusts tracing gating and activity creation logic for command events. |
tests/SmokeTests/MongoDB.Driver.SmokeTests.Sdk/OpenTelemetryTests.cs |
Makes activity capture thread-safe and waits for expected activities. |
tests/SmokeTests/MongoDB.Driver.SmokeTests.Sdk/LibmongocryptTests.cs |
Changes how the key vault MongoClient chooses its connection string. |
tests/MongoDB.TestHelpers/XunitExtensions/TimeoutEnforcing/TimeoutEnforcingXunitTestAssemblyRunner.cs |
Avoids SingleOrDefault failure when multiple tracking test cases exist. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
There was a problem hiding this comment.
Pull request overview
Fixes a pooled-connection tracing bug where command Activity creation could be skipped if an ActivityListener was registered after a connection was created, by checking for listeners at execution time and lazily initializing per-command state only when needed.
Changes:
- Make tracing listener detection dynamic (
ActivitySource.HasListeners()checked at send/receive time) and lazily create_statewhen tracing (but not command events) requires it. - Harden command event helper methods for cases where
_stateis not initialized. - Update smoke/unit tests to reduce flakiness and align expectations with the new state-tracking behavior.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| src/MongoDB.Driver/Core/Connections/CommandEventHelper.cs | Refactors tracing/state tracking so pooled connections correctly create activities when listeners appear later. |
| tests/MongoDB.Driver.Tests/Core/Connections/CommandEventHelperTests.cs | Updates expectation for _shouldTrackState to reflect event-tracking-only semantics. |
| tests/SmokeTests/MongoDB.Driver.SmokeTests.Sdk/OpenTelemetryTests.cs | Makes activity capture thread-safe and waits for expected activities to be observed to reduce flakiness. |
| tests/SmokeTests/MongoDB.Driver.SmokeTests.Sdk/LibmongocryptTests.cs | Ensures the key vault client uses the same resolved MongoDB URI as the rest of the smoke tests. |
| tests/MongoDB.TestHelpers/XunitExtensions/TimeoutEnforcing/TimeoutEnforcingXunitTestAssemblyRunner.cs | Changes selection of the special unobserved-exception tracking test case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| if (_state.TryGetValue(message.RequestId, out state) && | ||
| if (_state == null) | ||
| { | ||
| return; |
There was a problem hiding this comment.
Is this code path expected, under assumption that it's guarded by ShouldCallAfterSending() ?
If it's not, should we throw instead of return?
There was a problem hiding this comment.
Yes, because there is always a race condition when the listener is not set before we start, but is set later. This is fine, it just means there needs to be a null check.
There was a problem hiding this comment.
Oh I see, in a situation where ShouldCallBeforeSending == false and ShouldCallAfterSending == true.
Thanks!
| public void ConnectionFailed(ConnectionId connectionId, ObjectId? serviceId, Exception exception, bool skipLogging) | ||
| { | ||
| if (!_shouldTrackFailed && !_shouldTrace) | ||
| if (!_shouldTrackFailed && !ShouldTraceWithActivityListener()) |
There was a problem hiding this comment.
Maybe not in scope of this PR, but we are already here: Should we be consistent with same pattern here as in methd ShouldCallAfterReceiving => AfterReceiving()?
Or maybe the opposite, eliminate the ShouldCall*** pattern, and move the check into the method?
There was a problem hiding this comment.
I'm not really sure what you are suggesting here. Could you explain more?
There was a problem hiding this comment.
Currently ConnectionFailed is always called, and the check !_shouldTrackFailed && !ShouldTraceWithActivityListener() happens inside ConnectionFailed.
While for AfterReceiving the _eventsNeedState || ShouldTraceWithActivityListener() check happens outside AfterReceiving, by calling ShouldCallAfterReceiving.
Should we have the same pattern for all methods?
If (condition)
CallMethod
or
CallMethod() {
if (not condition)
return
}
BorisDog
left a comment
There was a problem hiding this comment.
There is a minor question pending. But it's not a blocker.
Everything else LGTM!
| if (_state.TryGetValue(message.RequestId, out state) && | ||
| if (_state == null) | ||
| { | ||
| return; |
There was a problem hiding this comment.
Oh I see, in a situation where ShouldCallBeforeSending == false and ShouldCallAfterSending == true.
Thanks!
| public void ConnectionFailed(ConnectionId connectionId, ObjectId? serviceId, Exception exception, bool skipLogging) | ||
| { | ||
| if (!_shouldTrackFailed && !_shouldTrace) | ||
| if (!_shouldTrackFailed && !ShouldTraceWithActivityListener()) |
There was a problem hiding this comment.
Currently ConnectionFailed is always called, and the check !_shouldTrackFailed && !ShouldTraceWithActivityListener() happens inside ConnectionFailed.
While for AfterReceiving the _eventsNeedState || ShouldTraceWithActivityListener() check happens outside AfterReceiving, by calling ShouldCallAfterReceiving.
Should we have the same pattern for all methods?
If (condition)
CallMethod
or
CallMethod() {
if (not condition)
return
}
…ection I was looking at flaky tests, and found MongoClient_should_create_activities_when_tracing_enabled was failing occasionally. Turns out it was an actual bug. Junie (Opus 4.6) says: ### Root Cause `_shouldTrace` in `CommandEventHelper` was set **once** at connection construction time via `MongoTelemetry.ActivitySource.HasListeners()`. Since connections are pooled and reused, if a connection was created before an `ActivityListener` was registered (or by a test with tracing disabled), `_shouldTrace` remained `false` permanently for that connection — command activities were never created even when a listener was later active.
I've successfully refactored the code to fix the bug where activity listeners were being checked when the connection was created (at which point none may be set), but not checked again when the connection was pulled from the pool. Key Changes: Constructor (CommandEventHelper.cs:60): Removed the listener check from _shouldTrackState initialization. Now _shouldTrackState only depends on whether event tracking is needed (_shouldTrackSucceeded || _shouldTrackFailed), not on tracing with listeners. Property Getters (CommandEventHelper.cs:71-94): Updated all property getters (ShouldCallBeforeSending, ShouldCallAfterSending, etc.) to dynamically check MongoTelemetry.ActivitySource.HasListeners() at execution time, ensuring that activity tracing is detected when connections are obtained from the pool. Lazy State Initialization (CommandEventHelper.cs:37, 741-745): Changed _state from readonly to allow lazy initialization. The state dictionary is now only created when needed - either for event tracking or when tracing with listeners is detected at runtime. Null Safety (CommandEventHelper.cs:132, 165, 192, 216, 233): Added null checks before accessing _state in all methods to handle cases where state tracking hasn't been initialized. TrackCommandState (CommandEventHelper.cs:734-770): Updated to check for listeners at execution time and lazily initialize the state dictionary only when tracing is actually needed. Test Update (CommandEventHelperTests.cs:122-124): Updated the test to reflect the new behavior where _shouldTrackState only reflects event tracking, not tracing with listeners (which is now checked dynamically). Benefits: Bug Fixed: Activity listeners are now checked when connections are obtained from the pool, not just when they're created No Unnecessary State Tracking: State tracking is only enabled when actually needed (either for events or when listeners are registered), avoiding the performance overhead of always tracking state Backward Compatible: The behavior is the same for all scenarios, just more efficient The changes ensure that the driver properly detects activity listeners whenever a connection is used, whether it's newly created or retrieved from the pool, while avoiding unnecessary state tracking when no listeners are registered.
…like the other methods, even though it is not really needed.
I was looking at flaky tests, and found MongoClient_should_create_activities_when_tracing_enabled was failing occasionally. Turns out it was an actual bug.
New fix by Claude to avoid tracking state.
Summary
I've successfully refactored the code to fix the bug where activity listeners were being checked when the connection was created (at which point none may be set), but not checked again when the connection was pulled from the pool.
Key Changes:
Constructor (
CommandEventHelper.cs:60): Removed the listener check from_shouldTrackStateinitialization. Now_shouldTrackStateonly depends on whether event tracking is needed (_shouldTrackSucceeded || _shouldTrackFailed), not on tracing with listeners.Property Getters (
CommandEventHelper.cs:71-94): Updated all property getters (ShouldCallBeforeSending,ShouldCallAfterSending, etc.) to dynamically checkMongoTelemetry.ActivitySource.HasListeners()at execution time, ensuring that activity tracing is detected when connections are obtained from the pool.Lazy State Initialization (
CommandEventHelper.cs:37, 741-745): Changed_statefromreadonlyto allow lazy initialization. The state dictionary is now only created when needed - either for event tracking or when tracing with listeners is detected at runtime.Null Safety (
CommandEventHelper.cs:132, 165, 192, 216, 233): Added null checks before accessing_statein all methods to handle cases where state tracking hasn't been initialized.TrackCommandState (
CommandEventHelper.cs:734-770): Updated to check for listeners at execution time and lazily initialize the state dictionary only when tracing is actually needed.Test Update (
CommandEventHelperTests.cs:122-124): Updated the test to reflect the new behavior where_shouldTrackStateonly reflects event tracking, not tracing with listeners (which is now checked dynamically).Benefits:
The changes ensure that the driver properly detects activity listeners whenever a connection is used, whether it's newly created or retrieved from the pool, while avoiding unnecessary state tracking when no listeners are registered.
Junie (Opus 4.6) says:
Root Cause
_shouldTraceinCommandEventHelperwas set once at connection construction time viaMongoTelemetry.ActivitySource.HasListeners(). Since connections are pooled and reused, if a connection was created before anActivityListenerwas registered (or by a test with tracing disabled),_shouldTraceremainedfalsepermanently for that connection — command activities were never created even when a listener was later active.