Skip to content

Commit a567847

Browse files
kushagraThaparKushagra ThaparCopilot
authored
[Cosmos] Clarify retry_write behavior for 408/500/503 in ErrorCodesAndRetries.md (#46497)
* Cosmos: clarify retry_write behavior for 408/500/503 in ErrorCodesAndRetries The doc previously stated that the Python SDK does not retry write operations on 408 (Request Timeout), 500 (Internal Server Error), or 503 (Service Unavailable). This is only true by default. When the client is initialized with RetryOptions where retry_write > 0, the SDK will retry the write on the next preferred region, if any is available: - 408: handled by _TimeoutFailoverRetryPolicy.is_operation_retryable - 500: same cross-region failover applies for writes when retry_write > 0 - 503: enforced by _ServiceResponseRetryPolicy The 503 row also previously said 'for all Operations' which masked the read vs. write distinction. Split into explicit read and write rows. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> * Docs: address Copilot review feedback on 408/500/503 retry rows - 408: clarify default-vs-retry_write behavior; the previous wording said server-side timeouts are not retried for writes while also stating retry_write triggers _TimeoutFailoverRetryPolicy retries, which conflict. Reword to separate default behavior from retry_write-enabled behavior. - 500: 'retry_write' is configured as a CosmosClient kwarg, not via RetryOptions (which only controls throttling/backoff). Update wording. - 503: handled by _ServiceUnavailableRetryPolicy (not _ServiceResponseRetryPolicy) and is NOT gated by retry_write -- writes are retried for 503 in the next preferred region regardless. Correct the row. - Replace HTML entity '&gt;' inside backticks with literal '>' so it renders correctly in inline code spans. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --------- Co-authored-by: Kushagra Thapar <kushagrathapar@microsoft.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
1 parent aedf90d commit a567847

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

sdk/cosmos/azure-cosmos/docs/ErrorCodesAndRetries.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,14 @@ The Cosmos DB Python SDK has several default policies that will deal with retryi
88
| 401 | For all operations: </br><ul><li> This is an unauthorized exception due to invalid auth tokens being used for the request. The client does NOT retry requests when this exception is encountered.</li></ul> |
99
| 403 | <ul><li>For Substatus 3 (Write Forbidden) and Substatus 1008 (Database Account Not Found): </br><ul><li>This exception occurs when a geo-replicated database account runs into writable/readable location changes (say, after a failover).</li><li>This exception can occur regardless of the Consistency level set for the account. </li><li>The client refreshes it's location endpoints and retries requests when the user has enabled endpoint discovery in their client (default behavior).</li></ul></li><li>For all other cases: </br><ul><li> The client does NOT retry requests when this exception is encountered. </li> |
1010
| 404/1002 | <ul><li>For write operations: </br><ul><li>If multiple write locations are enabled for the account, the SDK will fetch the write endpoints and retry once per each of these. </li><li>The client refreshes it's location endpoints and retries requests when the user has enabled endpoint discovery in their client (default behavior).</li><li>If the account does not have multiple write locations enabled, the SDK will retry only once in the account primary region. </li></ul></li><li>For read operations: </br><ul><li>If multiple write locations are enabled for the account, the SDK will fetch the read endpoints and retry once per each of these. </li><li>The client refreshes it's location endpoints and retries requests when the user has enabled endpoint discovery in their client (default behavior).</li><li>If the account does not have multiple write locations enabled, the SDK will retry only once in the account primary region. </li> |
11-
| 408 | <ul><li>For Write Operations: <br><ul><li>Timeout exceptions can be encountered by both the client as well as the server. Server-side timeout exceptions are not retried for write operations as it is not possible to determine if the write was in fact successfully committed on the server. For a client-generated timeout exception, either the request was sent over the wire to the server by the client and the network request timeout exceeded while waiting for a response, or the request was not sent over the wire to the server which resulted in a client-generated timeout. The client does NOT retry for either.</li></ul><li>For Query and Point Read Operations:</br><ul><li>The SDK will retry on the next preferred region, if any is available.</li></ul> </li></ul> |
11+
| 408 | <ul><li>For Write Operations: <br><ul><li>Timeout exceptions can be encountered by both the client as well as the server. For client-generated timeouts, either the request was sent over the wire and timed out while waiting for a response, or the request was not sent over the wire before the client timed out. For service-side timeouts, it may not be possible to determine whether the write was successfully committed. By default, the client does NOT retry 408 write operations.</li><li>If the client is initialized with `retry_write > 0` (a `CosmosClient` keyword argument), `_TimeoutFailoverRetryPolicy` can retry write operations on the next preferred region, if any is available.</li></ul><li>For Query and Point Read Operations:</br><ul><li>The SDK will retry on the next preferred region, if any is available.</li></ul> </li></ul> |
1212
| 409 | <ul><li>For Write Operations: </br><ul><li>This exception occurs when an attempt is made by the application to Create/Insert an Item that already exists.</li><li>This exception can occur regardless of the Consistency level set for the account. </li><li>This exception can occur for write operations when an attempt is made to create an existing item or when a unique key constraint violation occurs. </li><li>The client does NOT retry on Conflict exceptions </li></ul></li><li>For Query and Point Read Operations: </br><ul><li>N/A as this exception is only encountered for Create/Insert operations. </li></ul></li> |
1313
| 410/1002 | <ul><li>For all operations: </br><ul><li>This exception occurs when a partition is split (or merged in the future) and no longer exists, and can occur regardless of the Consistency level set for the account.</li><li>The SDK will refresh its partition key range cache and trigger a single retry, fetching the new ranges from the gateway once it finds an empty cache. </li> |
1414
| 412 | <ul><li>For Write Operations: </br><ul><li>This exception is encountered when the etag that is sent to the server for validation prior to updating an Item, does not match the etag of the Item on the server. </li><li>The client does NOT retry this operation locally or against any of the remote regions for the account as retries would not help alleviate the etag mismatch. </li><li>The application would need to trigger a retry by first reading the Item, fetching the latest etag and issuing the Upsert/Replace operation. </br><ul><li>This operation can continue to fail with the same exception when multiple updates are executed concurrently for the same Item. </li><li>An upper bound on the number of retries before handing off the Item to a dead letter queue should be implemented by the application. </li></ul></li></ul></li><li>For Query and point read Operations: </br><ul><li>N/A as this exception is only encountered for Create/Insert/Replace/Upsert operations. </li></ul></li></ul> |
1515
| 429 | For all Operations: </br><ul><li>By default, the client retries the request for a maximum of 9 times (or for a maximum of 30 seconds, whichever limit is reached first). </li><li>The client can also be initialized with a custom retry policy, which overrides the two limits mentioned above. </li><li>After all the retries are exhausted, the client bubbles up the exception to the application. </li><li>**For a multi-region account**, the client does NOT retry the request against a remote region for the account. </li><li>When the application receives a Request Rate too large exception (429), the application would need to instrument its own retry logic and dead letter queues. </li></ul> |
1616
| 449 | <ul><li>For Write Operations: </br><ul><li>This exception is encountered when a resource is concurrently updated on the server, which can happen due to concurrent writes, user triggered while conflicts are concurrently being resolved etc. </li><li>Only one update can be executed at a time per item. The other concurrent requests will fail with a Concurrent Execution Exception (449). </li><li>The client does NOT retry requests that failed with a 449. </li></ul></li><li>For Query and point read Operations: </br><ul><li>N/A as this exception is only encountered for Create/Insert/Replace/Upsert operations. </li></ul></li></ul> |
17-
| 500 | <ul><li>For Write Operations: </br><ul><li>The client does NOT retry write requests. </li></ul></li><li>For Read Operations: </br><ul><li>The request will be retried by the SDK on the next preferred regions. </li></ul></li></ul> |
18-
| 503 | When a Service Unavailable exception is encountered, for all Operations: </br><ul><li>The request will be retried by the SDK on the next preferred regions. |
17+
| 500 | <ul><li>For Write Operations: </br><ul><li>By default, the client does NOT retry write requests.</li><li>If the client is initialized with `retry_write > 0` (a `CosmosClient` keyword argument), the SDK will retry the write on the next preferred region, if any is available.</li></ul></li><li>For Read Operations: </br><ul><li>The request will be retried by the SDK on the next preferred regions. </li></ul></li></ul> |
18+
| 503 | When a Service Unavailable exception is encountered: </br><ul><li>For Read Operations: the request will be retried by the SDK on the next preferred region.</li><li>For Write Operations: the request will also be retried by the SDK on the next preferred region, if any is available. 503 handling is performed by `_ServiceUnavailableRetryPolicy` and is not gated by `retry_write`.</li></ul> |
1919

2020
### Connection Issues Retry Flow And Marking Unavailable
2121

0 commit comments

Comments
 (0)