Skip to content

Commit 9a101de

Browse files
authored
DRIVERS-3404 - Server selection deprioritization only for overload er… (#1900)
1 parent 99704fa commit 9a101de

3 files changed

Lines changed: 93 additions & 6 deletions

File tree

source/retryable-reads/retryable-reads.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -207,8 +207,14 @@ capture this original retryable error. Drivers should then proceed with selectin
207207

208208
###### 3a. Selecting the server for retry
209209

210-
The server address on which the operation failed MUST be provided to the server selection mechanism as a member of the
211-
deprioritized server address list.
210+
For sharded clusters, the server address on which the operation failed MUST be provided to the server selection
211+
mechanism as a member of the deprioritized server address list.
212+
213+
For all other topologies, the server address on which the operation failed MUST be provided to the server selection
214+
mechanism as a member of the deprioritized server address list only if the error is labelled with
215+
`SystemOverloadedError`. All other retryable errors MUST NOT cause the server address to be added to the deprioritized
216+
server address list. This requirement preserves the existing behavior of retryable reads for non-overload errors and
217+
avoids unintended consequences for operations utilizing primaryPreferred and secondaryPreferred read preferences.
212218

213219
If the driver cannot select a server for a retry attempt or the newly selected server does not support retryable reads,
214220
retrying is not possible and drivers MUST raise the previous retryable error. In both cases, the caller is able to infer
@@ -295,7 +301,11 @@ function executeRetryableRead(command, session) {
295301
} else {
296302
// If a previous attempt was made, deprioritize the previous server address
297303
// where the command failed.
298-
deprioritizedServers.push(previousServer.address);
304+
// Sharded clusters deprioritize on all retryable errors.
305+
// Other topologies only deprioritize on overload errors.
306+
if previousServer.isSharded || previousError.hasLabel("SystemOverloadedError") {
307+
deprioritizedServers.push(previousServer.address);
308+
}
299309
server = selectServer(deprioritizedServers);
300310
}
301311
} catch (ServerSelectionException exception) {
@@ -558,6 +568,8 @@ any customers experiencing degraded performance can simply disable `retryableRea
558568
559569
## Changelog
560570
571+
- 2026-02-19: Clarified that server deprioritization on replica sets only occurs for `SystemOverloadedError` errors.
572+
561573
- 2025-12-08: Clarified that server deprioritization during retries must use a list of server addresses.
562574
563575
- 2024-04-30: Migrated from reStructuredText to Markdown.

source/retryable-reads/tests/README.md

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -123,8 +123,74 @@ This test MUST be executed against a sharded cluster that supports `retryReads=t
123123

124124
7. Disable the fail point on `s0`.
125125

126+
### 3. Retrying Reads in a Replica Set
127+
128+
These tests will be used to ensure drivers properly retry reads against a replica set.
129+
130+
#### 3.1 Retryable Reads Caused by Overload Errors Are Retried on a Different Replicaset Server When One is Available
131+
132+
This test MUST be executed against a MongoDB 4.4+ replica set that has at least one secondary, supports
133+
`retryReads=true`, and has enabled the `configureFailPoint` command with the `errorLabels` option.
134+
135+
1. Create a client `client` with `retryReads=true`, `readPreference=primaryPreferred`, and command event monitoring
136+
enabled.
137+
138+
2. Configure the following fail point for `client`:
139+
140+
```javascript
141+
{
142+
configureFailPoint: "failCommand",
143+
mode: { times: 1 },
144+
data: {
145+
failCommands: ["find"],
146+
errorLabels: ["RetryableError", "SystemOverloadedError"]
147+
errorCode: 6
148+
}
149+
}
150+
```
151+
152+
3. Reset the command event monitor to clear the failpoint command from its stored events.
153+
154+
4. Execute a `find` command with `client`.
155+
156+
5. Assert that one failed command event and one successful command event occurred.
157+
158+
6. Assert that both events occurred on different servers.
159+
160+
#### 3.2 Retryable Reads Caused by Non-Overload Errors Are Retried on the Same Replicaset Server
161+
162+
This test MUST be executed against a MongoDB 4.4+ replica set that has at least one secondary, supports
163+
`retryReads=true`, and has enabled the `configureFailPoint` command with the `errorLabels` option.
164+
165+
1. Create a client `client` with `retryReads=true`, `readPreference=primaryPreferred`, and command event monitoring
166+
enabled.
167+
168+
2. Configure the following fail point for `client`:
169+
170+
```javascript
171+
{
172+
configureFailPoint: "failCommand",
173+
mode: { times: 1 },
174+
data: {
175+
failCommands: ["find"],
176+
errorLabels: ["RetryableError"]
177+
errorCode: 6
178+
}
179+
}
180+
```
181+
182+
3. Reset the command event monitor to clear the failpoint command from its stored events.
183+
184+
4. Execute a `find` command with `client`.
185+
186+
5. Assert that one failed command event and one successful command event occurred.
187+
188+
6. Assert that both events occurred on the same server.
189+
126190
## Changelog
127191

192+
- 2026-02-19: Add prose tests for retrying against a replica set.
193+
128194
- 2024-04-30: Migrated from reStructuredText to Markdown.
129195

130196
- 2024-03-06: Convert legacy retryable reads tests to unified format.

source/retryable-writes/retryable-writes.md

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -317,8 +317,11 @@ Drivers MUST then retry the operation as many times as necessary until any one o
317317

318318
- CSOT is not enabled and one retry was attempted.
319319

320-
For each retry attempt, drivers MUST select a writable server. The server address on which the operation failed MUST be
321-
provided to the server selection mechanism as a member of the deprioritized server address list.
320+
For each retry attempt, drivers MUST select a writable server. For sharded clusters, the server address on which the
321+
operation failed MUST be provided to the server selection mechanism as a member of the deprioritized server address
322+
list. For all other topologies, the server address on which the operation failed MUST be provided to the server
323+
selection mechanism as a member of the deprioritized server address list only if the error is labelled with
324+
`SystemOverloadedError`. This requirement preserves the existing behavior of retryable writes for non-overload errors.
322325

323326
If the driver cannot select a server for a retry attempt or the selected server does not support retryable writes,
324327
retrying is not possible and drivers MUST raise the retryable error from the previous attempt. In both cases, the caller
@@ -436,8 +439,12 @@ function executeRetryableWrite(command, session) {
436439
* If we cannot select a writable server, do not proceed with retrying and
437440
* throw the previous error. The caller can then infer that an attempt was
438441
* made and failed. */
442+
// Sharded clusters deprioritize on all retryable errors.
443+
// Other topologies only deprioritize on overload errors.
439444
try {
440-
deprioritizedServers.push(server.address);
445+
if server.isSharded || previousError.hasLabel("SystemOverloadedError") {
446+
deprioritizedServers.push(server.address);
447+
}
441448
server = selectServer("writable", deprioritizedServers);
442449
} catch (Exception ignoredError) {
443450
throw previousError;
@@ -693,6 +700,8 @@ retryWrites is not true would be inconsistent with the server and potentially co
693700

694701
## Changelog
695702

703+
- 2026-02-19: Clarified that server deprioritization on replica sets only occurs for `SystemOverloadedError` errors.
704+
696705
- 2026-01-14: Clarify which error to return when more than one error with the `NoWritesPerformed` label is encountered.
697706

698707
- 2025-12-08: Clarified that server deprioritization during retries must use a list of server addresses.

0 commit comments

Comments
 (0)