Skip to content

Commit 1b15cd2

Browse files
authored
DRIVERS-3427: Finalize client backpressure implementation for phase 1 rollout (#1919)
1 parent 6650c98 commit 1b15cd2

24 files changed

Lines changed: 424 additions & 2404 deletions

source/client-backpressure/client-backpressure.md

Lines changed: 62 additions & 90 deletions
Original file line numberDiff line numberDiff line change
@@ -122,7 +122,9 @@ rules:
122122
2. A retry attempt will only be permitted if:
123123
1. The error is a retryable overload error.
124124
2. We have not reached `MAX_RETRIES`.
125-
- The value of `MAX_RETRIES` is 5 and non-configurable.
125+
- The default value of `MAX_RETRIES` is 2. Drivers MUST expose `maxAdaptiveRetries` as a configurable option for
126+
this maximum. In the future, this default value or the default behavior of the driver may change without being
127+
considered a breaking change.
126128
- This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within
127129
the timeout to avoid retry storms.
128130
3. (CSOT-only): There is still time for a retry attempt according to the
@@ -133,20 +135,17 @@ rules:
133135
- To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and
134136
[retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See
135137
[Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand)
136-
3. If the request is eligible for retry (as outlined in step 2 above and step 4 in the
137-
[adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST apply
138-
exponential backoff according to the following formula:
139-
`backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
138+
3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply exponential backoff
139+
according to the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
140140
- `jitter` is a random jitter value between 0 and 1.
141141
- `BASE_BACKOFF` is constant 100ms.
142142
- `MAX_BACKOFF` is 10000ms.
143-
- This results in delays of 100ms, 200ms, 400ms, 800ms, and 1600ms before accounting for jitter.
144-
4. If the request is eligible for retry (as outlined in step 2 above and step 4 in the
145-
[adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST add the
146-
previously used server's address to the list of deprioritized server addresses for
147-
[server selection](../server-selection/server-selection.md).
148-
5. If the request is eligible for retry (as outlined in step 2 above and step 4 in the
149-
[adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below) and is a retryable write:
143+
- This results in delays of 100ms and 200ms before accounting for jitter.
144+
4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the
145+
client MUST add the previously used server's address to the list of deprioritized server addresses for
146+
[server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a
147+
configurable boolean option that defaults to `false`.
148+
5. If the request is eligible for retry (as outlined in step 2 above) and is a retryable write:
150149
1. If the command is a part of a transaction, the instructions for command modification on retry for commands in
151150
transactions MUST be followed, as outlined in the
152151
[transactions](../transactions/transactions.md#interaction-with-retryable-writes) specification.
@@ -159,26 +158,12 @@ rules:
159158
specifications.
160159
- For the purposes of error propagation, `runCommand` is considered a write.
161160

162-
##### Adaptive retry requirements
163-
164-
If adaptive retries are enabled, the following rules MUST also be obeyed:
165-
166-
1. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens.
167-
- The value is 0.1 and non-configurable.
168-
2. If the command succeeds on a retry attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE`+1 tokens.
169-
3. If a retry attempt fails with an error that is not an overload error, drivers MUST deposit 1 token.
170-
- An error that does not contain the `SystemOverloadedError` error label indicates that the server is healthy enough
171-
to handle requests. For the purposes of retry budget tracking, this counts as a success.
172-
4. A retry attempt will only be permitted if a token can be consumed from the token bucket.
173-
5. A retry attempt consumes 1 token from the token bucket.
174-
175161
#### Interaction with Other Retry Policies
176162

177163
The retry policy in this specification is separate from the other retry policies defined in the
178164
[retryable reads](../retryable-reads/retryable-reads.md) and [retryable writes](../retryable-writes/retryable-writes.md)
179165
specifications. Drivers MUST ensure:
180166

181-
- Only overload errors consume tokens from the token bucket before retrying.
182167
- When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error.
183168
- If an overload error is encountered:
184169
- Regardless of whether CSOT is enabled or not, the maximum number of retries for any retry policy becomes
@@ -198,8 +183,7 @@ included, such as error handling with `NoWritesPerformed` labels.
198183
BASE_BACKOFF = 0.1 # 100ms
199184
MAX_BACKOFF = 10 # 10000ms
200185

201-
RETRY_TOKEN_RETURN_RATE = 0.1
202-
MAX_RETRIES = 5
186+
MAX_RETRIES = 2
203187

204188
def execute_command_retryable(command, ...):
205189
deprioritized_servers = []
@@ -211,23 +195,13 @@ def execute_command_retryable(command, ...):
211195
server = select_server(deprioritized_servers)
212196
connection = server.getConnection()
213197
res = execute_command(connection, command)
214-
if adaptive_retry:
215-
# Deposit tokens into the bucket on success.
216-
tokens = RETRY_TOKEN_RETURN_RATE
217-
if attempt > 0:
218-
tokens += 1
219-
token_bucket.deposit(tokens)
220198
return res
221199
except PyMongoError as exc:
222200
is_retryable = (is_retryable_write(command, exc)
223201
or is_retryable_read(command, exc)
224202
or (exc.contains_error_label("RetryableError") and exc.contains_error_label("SystemOverloadedError")))
225203
is_overload = exc.contains_error_label("SystemOverloadedError")
226204

227-
# if a retry fails with an error which is not an overload error, deposit 1 token
228-
if adaptive_retry and attempt > 0 and not is_overload:
229-
token_bucket.deposit(1)
230-
231205
# Raise if the error is non-retryable.
232206
if not is_retryable:
233207
raise
@@ -238,8 +212,10 @@ def execute_command_retryable(command, ...):
238212

239213
if attempt > allowed_retries:
240214
raise
241-
242-
deprioritized_servers.append(server.address)
215+
216+
# enableOverloadRetargeting is true
217+
if overload_retargeting:
218+
deprioritized_servers.append(server.address)
243219

244220
if is_overload:
245221
jitter = random.random() # Random float between [0.0, 1.0).
@@ -250,59 +226,9 @@ def execute_command_retryable(command, ...):
250226
if time.monotonic() + backoff > _csot.get_deadline():
251227
raise
252228

253-
if adaptive_retry and not token_bucket.consume(1):
254-
raise
255-
256229
time.sleep(backoff)
257230
```
258231

259-
### Token Bucket
260-
261-
The overload retry policy introduces an opt-in per-client [token bucket](https://en.wikipedia.org/wiki/Token_bucket) to
262-
limit overload error retry attempts. Although the server rejects excess commands as quickly as possible, doing so costs
263-
CPU and creates extra contention on the connection pool which can eventually negatively affect goodput. To reduce this
264-
risk, the token bucket will limit retry attempts during a prolonged overload.
265-
266-
The token bucket MUST be disabled by default and can be enabled through the
267-
[adaptiveRetries=True](../uri-options/uri-options.md) connection and client options.
268-
269-
The token bucket starts at its maximum capacity of 1000 for consistency with the server.
270-
271-
Each MongoClient instance MUST have its own token bucket. When adaptive retries are enabled, the token bucket MUST be
272-
created when the MongoClient is initialized and exist for the lifetime of the MongoClient. Drivers MUST ensure the token
273-
bucket implementation is thread-safe as it may be accessed concurrently by multiple operations.
274-
275-
#### Pseudocode
276-
277-
The token bucket is implemented via a thread safe counter. For languages without atomics, this can be implemented via a
278-
lock, for example:
279-
280-
```python
281-
DEFAULT_RETRY_TOKEN_CAPACITY = 1000
282-
class TokenBucket:
283-
"""A token bucket implementation for rate limiting."""
284-
def __init__(
285-
self,
286-
capacity: float = DEFAULT_RETRY_TOKEN_CAPACITY,
287-
):
288-
self.lock = Lock()
289-
self.capacity = capacity
290-
self.tokens = capacity
291-
292-
def consume(self, n: float) -> bool:
293-
"""Consume n tokens from the bucket if available."""
294-
with self.lock:
295-
if self.tokens >= n:
296-
self.tokens -= n
297-
return True
298-
return False
299-
300-
def deposit(self, n: float) -> None:
301-
"""Deposit n tokens back into the bucket."""
302-
with self.lock:
303-
self.tokens = min(self.capacity, self.tokens + n)
304-
```
305-
306232
#### Handshake changes
307233

308234
Drivers conforming to this spec MUST add `"backpressure": True` to the
@@ -466,8 +392,54 @@ Additionally, both `retryReads` and `retryWrites` are enabled by default, so for
466392
retried. This approach also prevents accidentally retrying a read command when only `retryWrites` is enabled, or
467393
retrying a write command when only `retryReads` is enabled.
468394

395+
### Why make `maxAdaptiveRetries` configurable?
396+
397+
Modelling and the underpinning theory for backpressure shows that the n-retries approach (retry up to N times on
398+
overload errors without a token bucket) can introduce retry storms as overload increases. However, the specifics of the
399+
workload and cluster serving that workload significantly impacts the threshold at which retry volume becomes an
400+
additional burden rather than a throughput improvement. Some applications and clusters may be very tolerant of many
401+
additional retries, while others may want to break out of the loop much earlier.
402+
403+
The selection of 2 as a default attempts to broadly pick a sensible default for most users that will on average be a
404+
benefit rather than a negative during overload. However, savvy users, the users expected to be most affected by overload
405+
and have the most insight into the specifics of their workload and cluster, will likely find that tweaking this value on
406+
a per-workload basis produces better results. Additionally, there are situations where disabling overload retries
407+
entirely is optimal, such as non-critical workloads against a cluster shared with critical workloads. Without a knob,
408+
those situations will cause users to either have a strictly worse experience with a new driver, or force them to
409+
downgrade to an older driver to avoid the issue. These are two strong motivations to add a knob for
410+
`maxAdaptiveRetries`.
411+
412+
### Why make `enableOverloadRetargeting` configurable?
413+
414+
The current contract we've made with users utilizing `primaryPreferred` is that reads will only go to a secondary if the
415+
primary is unavailable. The documentation does not explicitly define unavailable, but in practice that means the primary
416+
is unselectable. Overload retargeting makes the primary unselectable for a retry operation if it returned an overload
417+
error on a previous attempt. This materially changes how often secondary reads occur. Since secondary reads can result
418+
in stale data, enabling overload retargeting increases the chance that users of `primaryPreferred` will get stale data
419+
when they did not previously. This is a potentially significant change in expected behavior. Therefore, overload
420+
retargeting is disabled by default with a knob to enable it.
421+
422+
Overload retargeting significantly increases availability during overload, but it does increase the risk of getting
423+
stale data when used with `primaryPreferred`. Users of `primaryPreferred` may widely end up preferring that behavior. If
424+
that is the case, overload retargeting may be enabled by default in the future.
425+
426+
`secondaryPreferred` does not have this same staleness issue, but it still materially changes what the preference means
427+
from "almost always secondary" to "sometimes primary".
428+
429+
Note that for sharded clusters, drivers always attempt to retarget across `mongos` instances on all retryable errors,
430+
including overload errors, regardless of how `enableOverloadRetargeting` is set. `mongos` has a separate flag to
431+
retarget overload errors within shards that is independent of the driver's configuration.
432+
433+
Alternative design considered: Overload retargeting could have been implemented as a read preference option rather than
434+
a client-level option. This would allow more granular control: enabling retargeting for  specific operations, databases,
435+
or collections. However, a read preference option would require server changes to recognize the new field, and would add
436+
another dimension to read preference selection that users need to reason about. A client-level binary setting is simpler
437+
to understand and configure.
438+
469439
## Changelog
470440

441+
- 2026-03-30: Introduce phase 1 support without token buckets.
442+
471443
- 2026-02-20: Disable token buckets by default.
472444

473445
- 2026-01-09: Initial version.

source/client-backpressure/tests/README.md

Lines changed: 20 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,8 @@ be manually implemented by each driver.
1414

1515
#### Test 1: Operation Retry Uses Exponential Backoff
1616

17-
Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered.
17+
Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered. This test MUST be
18+
executed against a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels` option.
1819

1920
1. Let `client` be a `MongoClient`
2021
2. Let `collection` be a collection
@@ -49,29 +50,19 @@ Drivers should test that retries do not occur immediately when a SystemOverloade
4950

5051
5. Execute step 3 again.
5152

52-
6. Compare the two time between the two runs.
53+
6. Compare the time between the two runs.
5354

5455
```python
55-
assertTrue(with_backoff_time - no_backoff_time >= 2.1)
56+
assertTrue(absolute_value(with_backoff_time - (no_backoff_time + 0.3 seconds)) < 0.3 seconds)
5657
```
5758

58-
The sum of 5 backoffs is 3.1 seconds. There is a 1-second window to account for potential variance between the two
59-
runs.
60-
61-
#### Test 2: Token Bucket Capacity is Enforced
62-
63-
Drivers should test that retry token buckets are created at their maximum capacity and that that capacity is enforced.
64-
65-
1. Let `client` be a `MongoClient` with `adaptiveRetries=True`.
66-
2. Assert that the client's retry token bucket is at full capacity and that the capacity is
67-
`DEFAULT_RETRY_TOKEN_CAPACITY`.
68-
3. Using `client`, execute a successful `ping` command.
69-
4. Assert that the successful command did not increase the number of tokens in the bucket above
70-
`DEFAULT_RETRY_TOKEN_CAPACITY`.
59+
The sum of 2 backoffs is 0.3 seconds. There is a 0.3-second window to account for potential variance between the
60+
two runs.
7161

7262
#### Test 3: Overload Errors are Retried a Maximum of MAX_RETRIES times
7363

74-
Drivers should test that without adaptive retries enabled, overload errors are retried a maximum of five times.
64+
Drivers should test that overload errors are retried a maximum of MAX_RETRIES times. This test MUST be executed against
65+
a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels` option.
7566

7667
1. Let `client` be a `MongoClient` with command event monitoring enabled.
7768

@@ -95,24 +86,24 @@ Drivers should test that without adaptive retries enabled, overload errors are r
9586

9687
5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.
9788

98-
6. Assert that the total number of started commands is MAX_RETRIES + 1 (6).
89+
6. Assert that the total number of started commands is MAX_RETRIES + 1 (3).
9990

100-
#### Test 4: Adaptive Retries are Limited by Token Bucket Tokens
91+
#### Test 4: Overload Errors are Retried a Maximum of maxAdaptiveRetries times when configured
10192

102-
Drivers should test that when enabled, adaptive retries are limited by the number of tokens in the bucket.
93+
Drivers should test that overload errors are retried a maximum of `maxAdaptiveRetries` times, when configured. This test
94+
MUST be executed against a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels`
95+
option.
10396

104-
1. Let `client` be a `MongoClient` with `adaptiveRetries=True` and command event monitoring enabled.
97+
1. Let `client` be a `MongoClient` with `maxAdaptiveRetries=1` and command event monitoring enabled.
10598

106-
2. Set `client`'s retry token bucket to have 2 tokens.
107-
108-
3. Let `coll` be a collection.
99+
2. Let `coll` be a collection.
109100

110-
4. Configure the following failpoint:
101+
3. Configure the following failpoint:
111102

112103
```javascript
113104
{
114105
configureFailPoint: 'failCommand',
115-
mode: {times: 3},
106+
mode: 'alwaysOn',
116107
data: {
117108
failCommands: ['find'],
118109
errorCode: 462, // IngressRequestRateLimitExceeded
@@ -121,8 +112,8 @@ Drivers should test that when enabled, adaptive retries are limited by the numbe
121112
}
122113
```
123114

124-
5. Perform a find operation with `coll` that fails.
115+
4. Perform a find operation with `coll` that fails.
125116

126-
6. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.
117+
5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.
127118

128-
7. Assert that the total number of started commands is 3: one for the initial attempt and two for the retries.
119+
6. Assert that the total number of started commands is `maxAdaptiveRetries` + 1 (2).

source/client-backpressure/tests/backpressure-connection-checkin.json

Lines changed: 0 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -85,24 +85,6 @@
8585
"client": "client",
8686
"eventType": "cmap",
8787
"events": [
88-
{
89-
"connectionCheckedOutEvent": {}
90-
},
91-
{
92-
"connectionCheckedInEvent": {}
93-
},
94-
{
95-
"connectionCheckedOutEvent": {}
96-
},
97-
{
98-
"connectionCheckedInEvent": {}
99-
},
100-
{
101-
"connectionCheckedOutEvent": {}
102-
},
103-
{
104-
"connectionCheckedInEvent": {}
105-
},
10688
{
10789
"connectionCheckedOutEvent": {}
10890
},

source/client-backpressure/tests/backpressure-connection-checkin.yml

Lines changed: 0 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -58,10 +58,3 @@ tests:
5858
- connectionCheckedInEvent: {}
5959
- connectionCheckedOutEvent: {}
6060
- connectionCheckedInEvent: {}
61-
- connectionCheckedOutEvent: {}
62-
- connectionCheckedInEvent: {}
63-
- connectionCheckedOutEvent: {}
64-
- connectionCheckedInEvent: {}
65-
- connectionCheckedOutEvent: {}
66-
- connectionCheckedInEvent: {}
67-

0 commit comments

Comments
 (0)