Skip to content

CDRIVER-6092 CDRIVER-6262 CDRIVER-6268 implement exponential backoff and jitter in retry loops#2240

Merged
connorsmacd merged 60 commits intomongodb:masterfrom
connorsmacd:exponential-retry-backoff.CDRIVER-6092
Mar 10, 2026
Merged

CDRIVER-6092 CDRIVER-6262 CDRIVER-6268 implement exponential backoff and jitter in retry loops#2240
connorsmacd merged 60 commits intomongodb:masterfrom
connorsmacd:exponential-retry-backoff.CDRIVER-6092

Conversation

@connorsmacd
Copy link
Copy Markdown
Collaborator

@connorsmacd connorsmacd commented Mar 4, 2026

Reference

CDRIVER-6092
CDRIVER-6262
CDRIVER-6268

CDRIVER-6092 is the primary task. CDRIVER-6262 and CDRIVER-6268 fix/augment some of the unified tests introduced in this PR.

Summary

This PR implements the exponential retry backoff algorithm defined by DRIVERS-3239.

Refactor existing retry "gotos" into loops

The existing implementation for retryable reads and writes use a simple goto label for iteration. To better align with the client backpressure implementation, I decided traditional loops would be more suitable. Therefore, I chose to first refactor the existing implementation of retryable reads and writes to use loops without making any backpressure-related changes. These changes are found in the following commits:

Token bucket (mongoc_token_bucket_t)

Client backpressure uses a token bucket as a simple rate limiting mechanism. The token bucket is only used when adaptiveRetries=True is specified as a URI option. Note that the C implementation of the token bucket closely matches the pseudocode found in the spec.

Common exponential retry backoff implementation for both retryable commands and withTransaction

#2198 introduced a very similar exponential retry backoff algorithm for withTransaction. Both algorithms use the same math to compute durations. However, retryable commands use a different growth factor, initial backoff duration, and maximum backoff duration.

For the purposes of code reuse, I introduced mongoc_retry_backoff_generator_t (renaming suggestions are welcome). This component encapsulates retry iteration to model a generator-like interface. Instead of using hardcoded values for the maximum retry attempt, mongoc_retry_backoff_generator_t computes these values programatically. See: Refactor with_transaction to use backoff generator.

Note

I was initially considering putting these changes into a separate PR that only affected withTransaction, but I felt it was hard to contextualize the changes without seeing the retryable command algorithm as well.

First pass implementation

Before attempting to introduce more generic retryable command interface, I first implemented exponential retry backoff in each existing retry loop. The first passes can be found in the following commits:

Important

The above implementations were based of a previous version of the spec where the token bucket was used unconditionally. With the most recent version of the spec, the token bucket is only used if the adaptiveRetries=True URI option is passed.

Generic retryable command interface

To increase code reuse and better align with the spec, I refactored the implementation described above to use a generic retry loop component called mongoc_retryable_cmd_t. The function mongoc_retryable_cmd_run models, as closely as possible, the pseudocode found in the spec. The retry loops differ from each other in how the command is executed and how the retry server is selected. To account for this, mongoc_retryable_cmd_t has a v-table-like construct that allows each implementation to customize these steps of the algorithm.

Tip

For reviewers, it may be helpful to compare the commits from the first pass implementation to understand how the various parts of the loops were broken up to use mongoc_retryable_cmd_t.

Other changes

@connorsmacd connorsmacd force-pushed the exponential-retry-backoff.CDRIVER-6092 branch from b65a467 to dae4720 Compare March 4, 2026 19:07
@connorsmacd connorsmacd force-pushed the exponential-retry-backoff.CDRIVER-6092 branch from dae4720 to 2407ba1 Compare March 4, 2026 19:09
Copy link
Copy Markdown
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the goofs in the suggested test. Posting drafted feedback since I expect the test failures might be a source of confusion. Will continue reviewing.

Comment thread src/libmongoc/src/mongoc/mongoc-uri.h
Comment thread src/libmongoc/src/mongoc/mongoc-retryable-cmd.c
Comment thread src/libmongoc/tests/test-client-backpressure.c Outdated
Comment thread src/libmongoc/tests/test-client-backpressure.c Outdated
@kevinAlbs kevinAlbs self-requested a review March 6, 2026 18:20
Copy link
Copy Markdown
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only substantial remaining comment is suggestion to move token_bucket to the mongoc_topology_t.

Comment thread src/libmongoc/src/mongoc/mongoc-uri.h
Comment thread src/libmongoc/tests/test-client-backpressure.c Outdated
Comment thread src/libmongoc/tests/test-client-backpressure.c Outdated
Comment thread src/libmongoc/src/mongoc/mongoc-client-private.h Outdated
Comment thread src/libmongoc/src/mongoc/mongoc-client-session.c Outdated
Comment thread src/libmongoc/src/mongoc/mongoc-cluster.c Outdated
Comment thread src/libmongoc/src/mongoc/mongoc-client.c
@connorsmacd connorsmacd changed the title CDRIVER-6092 CDRIVER-6262 implement exponential backoff and jitter in retry loops CDRIVER-6092 CDRIVER-6262 CDRIVER-6268 implement exponential backoff and jitter in retry loops Mar 10, 2026
@connorsmacd connorsmacd requested a review from kevinAlbs March 10, 2026 19:28
Copy link
Copy Markdown
Collaborator

@kevinAlbs kevinAlbs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@connorsmacd connorsmacd merged commit ef48769 into mongodb:master Mar 10, 2026
46 of 48 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants