Skip to content
Merged
152 changes: 62 additions & 90 deletions source/client-backpressure/client-backpressure.md
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,9 @@ rules:
2. A retry attempt will only be permitted if:
1. The error is a retryable overload error.
2. We have not reached `MAX_RETRIES`.
- The value of `MAX_RETRIES` is 5 and non-configurable.
- The default value of `MAX_RETRIES` is 2. Drivers MUST expose `maxAdaptiveRetries` as a configurable option for
this maximum. In the future, this default value or the default behavior of the driver may change without being
considered a breaking change.
- This intentionally changes the behavior of CSOT which otherwise would retry an unlimited number of times within
the timeout to avoid retry storms.
3. (CSOT-only): There is still time for a retry attempt according to the
Expand All @@ -133,20 +135,17 @@ rules:
- To retry `runCommand`, both [retryWrites](../retryable-writes/retryable-writes.md#retrywrites) and
[retryReads](../retryable-reads/retryable-reads.md#retryreads) MUST be enabled. See
[Why must both `retryWrites` and `retryReads` be enabled to retry runCommand?](client-backpressure.md#why-must-both-retrywrites-and-retryreads-be-enabled-to-retry-runcommand)
3. If the request is eligible for retry (as outlined in step 2 above and step 4 in the
[adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST apply
exponential backoff according to the following formula:
`backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
3. If the request is eligible for retry (as outlined in step 2 above), the client MUST apply exponential backoff
according to the following formula: `backoff = jitter * min(MAX_BACKOFF, BASE_BACKOFF * 2^(attempt - 1))`
- `jitter` is a random jitter value between 0 and 1.
- `BASE_BACKOFF` is constant 100ms.
- `MAX_BACKOFF` is 10000ms.
- This results in delays of 100ms, 200ms, 400ms, 800ms, and 1600ms before accounting for jitter.
4. If the request is eligible for retry (as outlined in step 2 above and step 4 in the
[adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below), the client MUST add the
previously used server's address to the list of deprioritized server addresses for
[server selection](../server-selection/server-selection.md).
5. If the request is eligible for retry (as outlined in step 2 above and step 4 in the
[adaptive retry requirements](client-backpressure.md#adaptive-retry-requirements) below) and is a retryable write:
- This results in delays of 100ms and 200ms before accounting for jitter.
4. If the request is eligible for retry (as outlined in step 2 above) and `enableOverloadRetargeting` is enabled, the
client MUST add the previously used server's address to the list of deprioritized server addresses for
[server selection](../server-selection/server-selection.md). Drivers MUST expose `enableOverloadRetargeting` as a
configurable boolean option that defaults to `false`.
Comment thread
dariakp marked this conversation as resolved.
5. If the request is eligible for retry (as outlined in step 2 above) and is a retryable write:
1. If the command is a part of a transaction, the instructions for command modification on retry for commands in
transactions MUST be followed, as outlined in the
[transactions](../transactions/transactions.md#interaction-with-retryable-writes) specification.
Expand All @@ -159,26 +158,12 @@ rules:
specifications.
- For the purposes of error propagation, `runCommand` is considered a write.

##### Adaptive retry requirements

If adaptive retries are enabled, the following rules MUST also be obeyed:

1. If the command succeeds on the first attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE` tokens.
- The value is 0.1 and non-configurable.
2. If the command succeeds on a retry attempt, drivers MUST deposit `RETRY_TOKEN_RETURN_RATE`+1 tokens.
3. If a retry attempt fails with an error that is not an overload error, drivers MUST deposit 1 token.
- An error that does not contain the `SystemOverloadedError` error label indicates that the server is healthy enough
to handle requests. For the purposes of retry budget tracking, this counts as a success.
4. A retry attempt will only be permitted if a token can be consumed from the token bucket.
5. A retry attempt consumes 1 token from the token bucket.

#### Interaction with Other Retry Policies

The retry policy in this specification is separate from the other retry policies defined in the
[retryable reads](../retryable-reads/retryable-reads.md) and [retryable writes](../retryable-writes/retryable-writes.md)
specifications. Drivers MUST ensure:

- Only overload errors consume tokens from the token bucket before retrying.
- When a failed attempt is retried, backoff MUST be applied if and only if the error is an overload error.
- If an overload error is encountered:
- Regardless of whether CSOT is enabled or not, the maximum number of retries for any retry policy becomes
Expand All @@ -198,8 +183,7 @@ included, such as error handling with `NoWritesPerformed` labels.
BASE_BACKOFF = 0.1 # 100ms
MAX_BACKOFF = 10 # 10000ms

RETRY_TOKEN_RETURN_RATE = 0.1
MAX_RETRIES = 5
MAX_RETRIES = 2

def execute_command_retryable(command, ...):
deprioritized_servers = []
Expand All @@ -211,23 +195,13 @@ def execute_command_retryable(command, ...):
server = select_server(deprioritized_servers)
connection = server.getConnection()
res = execute_command(connection, command)
if adaptive_retry:
# Deposit tokens into the bucket on success.
tokens = RETRY_TOKEN_RETURN_RATE
if attempt > 0:
tokens += 1
token_bucket.deposit(tokens)
return res
except PyMongoError as exc:
is_retryable = (is_retryable_write(command, exc)
or is_retryable_read(command, exc)
or (exc.contains_error_label("RetryableError") and exc.contains_error_label("SystemOverloadedError")))
is_overload = exc.contains_error_label("SystemOverloadedError")

# if a retry fails with an error which is not an overload error, deposit 1 token
if adaptive_retry and attempt > 0 and not is_overload:
token_bucket.deposit(1)

# Raise if the error is non-retryable.
if not is_retryable:
raise
Expand All @@ -238,8 +212,10 @@ def execute_command_retryable(command, ...):

if attempt > allowed_retries:
raise

deprioritized_servers.append(server.address)

# enableOverloadRetargeting is true
if overload_retargeting:
deprioritized_servers.append(server.address)

if is_overload:
jitter = random.random() # Random float between [0.0, 1.0).
Expand All @@ -250,59 +226,9 @@ def execute_command_retryable(command, ...):
if time.monotonic() + backoff > _csot.get_deadline():
raise

if adaptive_retry and not token_bucket.consume(1):
raise

time.sleep(backoff)
```

### Token Bucket

The overload retry policy introduces an opt-in per-client [token bucket](https://en.wikipedia.org/wiki/Token_bucket) to
limit overload error retry attempts. Although the server rejects excess commands as quickly as possible, doing so costs
CPU and creates extra contention on the connection pool which can eventually negatively affect goodput. To reduce this
risk, the token bucket will limit retry attempts during a prolonged overload.

The token bucket MUST be disabled by default and can be enabled through the
[adaptiveRetries=True](../uri-options/uri-options.md) connection and client options.

The token bucket starts at its maximum capacity of 1000 for consistency with the server.

Each MongoClient instance MUST have its own token bucket. When adaptive retries are enabled, the token bucket MUST be
created when the MongoClient is initialized and exist for the lifetime of the MongoClient. Drivers MUST ensure the token
bucket implementation is thread-safe as it may be accessed concurrently by multiple operations.

#### Pseudocode

The token bucket is implemented via a thread safe counter. For languages without atomics, this can be implemented via a
lock, for example:

```python
DEFAULT_RETRY_TOKEN_CAPACITY = 1000
class TokenBucket:
"""A token bucket implementation for rate limiting."""
def __init__(
self,
capacity: float = DEFAULT_RETRY_TOKEN_CAPACITY,
):
self.lock = Lock()
self.capacity = capacity
self.tokens = capacity

def consume(self, n: float) -> bool:
"""Consume n tokens from the bucket if available."""
with self.lock:
if self.tokens >= n:
self.tokens -= n
return True
return False

def deposit(self, n: float) -> None:
"""Deposit n tokens back into the bucket."""
with self.lock:
self.tokens = min(self.capacity, self.tokens + n)
```

#### Handshake changes

Drivers conforming to this spec MUST add `"backpressure": True` to the
Expand Down Expand Up @@ -466,8 +392,54 @@ Additionally, both `retryReads` and `retryWrites` are enabled by default, so for
retried. This approach also prevents accidentally retrying a read command when only `retryWrites` is enabled, or
retrying a write command when only `retryReads` is enabled.

### Why make `maxAdaptiveRetries` configurable?

Modelling and the underpinning theory for backpressure shows that the n-retries approach (retry up to N times on
overload errors without a token bucket) can introduce retry storms as overload increases. However, the specifics of the
workload and cluster serving that workload significantly impacts the threshold at which retry volume becomes an
additional burden rather than a throughput improvement. Some applications and clusters may be very tolerant of many
additional retries, while others may want to break out of the loop much earlier.

The selection of 2 as a default attempts to broadly pick a sensible default for most users that will on average be a
benefit rather than a negative during overload. However, savvy users, the users expected to be most affected by overload
and have the most insight into the specifics of their workload and cluster, will likely find that tweaking this value on
a per-workload basis produces better results. Additionally, there are situations where disabling overload retries
entirely is optimal, such as non-critical workloads against a cluster shared with critical workloads. Without a knob,
those situations will cause users to either have a strictly worse experience with a new driver, or force them to
downgrade to an older driver to avoid the issue. These are two strong motivations to add a knob for
`maxAdaptiveRetries`.

### Why make `enableOverloadRetargeting` configurable?

The current contract we've made with users utilizing `primaryPreferred` is that reads will only go to a secondary if the
primary is unavailable. The documentation does not explicitly define unavailable, but in practice that means the primary
is unselectable. Overload retargeting makes the primary unselectable for a retry operation if it returned an overload
error on a previous attempt. This materially changes how often secondary reads occur. Since secondary reads can result
in stale data, enabling overload retargeting increases the chance that users of `primaryPreferred` will get stale data
when they did not previously. This is a potentially significant change in expected behavior. Therefore, overload
retargeting is disabled by default with a knob to enable it.

Overload retargeting significantly increases availability during overload, but it does increase the risk of getting
stale data when used with `primaryPreferred`. Users of `primaryPreferred` may widely end up preferring that behavior. If
that is the case, overload retargeting may be enabled by default in the future.

`secondaryPreferred` does not have this same staleness issue, but it still materially changes what the preference means
from "almost always secondary" to "sometimes primary".

Note that for sharded clusters, drivers always attempt to retarget across `mongos` instances on all retryable errors,
including overload errors, regardless of how `enableOverloadRetargeting` is set. `mongos` has a separate flag to
retarget overload errors within shards that is independent of the driver's configuration.

Alternative design considered: Overload retargeting could have been implemented as a read preference option rather than
a client-level option. This would allow more granular control: enabling retargeting for  specific operations, databases,
or collections. However, a read preference option would require server changes to recognize the new field, and would add
another dimension to read preference selection that users need to reason about. A client-level binary setting is simpler
to understand and configure.

## Changelog

- 2026-03-30: Introduce phase 1 support without token buckets.

- 2026-02-20: Disable token buckets by default.

- 2026-01-09: Initial version.
49 changes: 20 additions & 29 deletions source/client-backpressure/tests/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ be manually implemented by each driver.

#### Test 1: Operation Retry Uses Exponential Backoff

Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered.
Drivers should test that retries do not occur immediately when a SystemOverloadedError is encountered. This test MUST be
executed against a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels` option.

1. Let `client` be a `MongoClient`
2. Let `collection` be a collection
Expand Down Expand Up @@ -49,29 +50,19 @@ Drivers should test that retries do not occur immediately when a SystemOverloade

5. Execute step 3 again.

6. Compare the two time between the two runs.
6. Compare the time between the two runs.

```python
assertTrue(with_backoff_time - no_backoff_time >= 2.1)
assertTrue(absolute_value(with_backoff_time - (no_backoff_time + 0.3 seconds)) < 0.3 seconds)
```

The sum of 5 backoffs is 3.1 seconds. There is a 1-second window to account for potential variance between the two
runs.

#### Test 2: Token Bucket Capacity is Enforced

Drivers should test that retry token buckets are created at their maximum capacity and that that capacity is enforced.

1. Let `client` be a `MongoClient` with `adaptiveRetries=True`.
2. Assert that the client's retry token bucket is at full capacity and that the capacity is
`DEFAULT_RETRY_TOKEN_CAPACITY`.
3. Using `client`, execute a successful `ping` command.
4. Assert that the successful command did not increase the number of tokens in the bucket above
`DEFAULT_RETRY_TOKEN_CAPACITY`.
The sum of 2 backoffs is 0.3 seconds. There is a 0.3-second window to account for potential variance between the
two runs.
Comment thread
NoahStapp marked this conversation as resolved.

#### Test 3: Overload Errors are Retried a Maximum of MAX_RETRIES times

Drivers should test that without adaptive retries enabled, overload errors are retried a maximum of five times.
Drivers should test that overload errors are retried a maximum of MAX_RETRIES times. This test MUST be executed against
a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels` option.

1. Let `client` be a `MongoClient` with command event monitoring enabled.

Expand All @@ -95,24 +86,24 @@ Drivers should test that without adaptive retries enabled, overload errors are r

5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.

6. Assert that the total number of started commands is MAX_RETRIES + 1 (6).
6. Assert that the total number of started commands is MAX_RETRIES + 1 (3).

#### Test 4: Adaptive Retries are Limited by Token Bucket Tokens
#### Test 4: Overload Errors are Retried a Maximum of maxAdaptiveRetries times when configured
Comment thread
isabelatkinson marked this conversation as resolved.

Drivers should test that when enabled, adaptive retries are limited by the number of tokens in the bucket.
Drivers should test that overload errors are retried a maximum of `maxAdaptiveRetries` times, when configured. This test
MUST be executed against a MongoDB 4.4+ server that has enabled the `configureFailPoint` command with the `errorLabels`
option.

1. Let `client` be a `MongoClient` with `adaptiveRetries=True` and command event monitoring enabled.
1. Let `client` be a `MongoClient` with `maxAdaptiveRetries=1` and command event monitoring enabled.

2. Set `client`'s retry token bucket to have 2 tokens.

3. Let `coll` be a collection.
2. Let `coll` be a collection.

4. Configure the following failpoint:
3. Configure the following failpoint:

```javascript
{
configureFailPoint: 'failCommand',
mode: {times: 3},
mode: 'alwaysOn',
data: {
failCommands: ['find'],
errorCode: 462, // IngressRequestRateLimitExceeded
Expand All @@ -121,8 +112,8 @@ Drivers should test that when enabled, adaptive retries are limited by the numbe
}
```

5. Perform a find operation with `coll` that fails.
4. Perform a find operation with `coll` that fails.

6. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.
5. Assert that the raised error contains both the `RetryableError` and `SystemOverloadedError` error labels.

7. Assert that the total number of started commands is 3: one for the initial attempt and two for the retries.
6. Assert that the total number of started commands is `maxAdaptiveRetries` + 1 (2).
Original file line number Diff line number Diff line change
Expand Up @@ -85,24 +85,6 @@
"client": "client",
"eventType": "cmap",
"events": [
{
"connectionCheckedOutEvent": {}
},
{
"connectionCheckedInEvent": {}
},
{
"connectionCheckedOutEvent": {}
},
{
"connectionCheckedInEvent": {}
},
{
"connectionCheckedOutEvent": {}
},
{
"connectionCheckedInEvent": {}
},
{
"connectionCheckedOutEvent": {}
},
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -58,10 +58,3 @@ tests:
- connectionCheckedInEvent: {}
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {}
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {}
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {}
- connectionCheckedOutEvent: {}
- connectionCheckedInEvent: {}

Loading
Loading