[NO REVIEW] Fix 400/1024 error when container is re-created with same name#49131
[NO REVIEW] Fix 400/1024 error when container is re-created with same name#49131xinlian12 wants to merge 3 commits intoAzure:mainfrom
Conversation
Reset request context (resolvedCollectionRid, forceNameCacheRefresh, INTENDED_COLLECTION_RID_HEADER) in StaleResourceRetryPolicy before retry so the retry re-resolves the collection and sends the correct RID. Fixes Azure#49097 Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This pull request addresses a Cosmos DB retry bug where requests can keep sending a stale x-ms-cosmos-intended-collection-rid header after a container is deleted and re-created with the same name (notably impacting gateway mode and 400/1024 retries). It resets request context state after refreshing the collection cache so the retry re-resolves the container RID and updates headers correctly.
Changes:
- Reset
RxDocumentServiceRequestretry state inStaleResourceRetryPolicy(force name cache refresh, clear resolved collection RID, remove intended-collection-rid header). - Add a parameterized unit test validating request-context/header reset behavior for both 410/1000 and 400/1024 scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
| sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/StaleResourceRetryPolicy.java | Resets request context and intended collection RID header after cache refresh to ensure retry re-resolves the container. |
| sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/StaleResourceExceptionRetryPolicyTest.java | Adds parameterized unit coverage ensuring request context/header/session cleanup occurs on stale-container retry paths. |
| // and sends the updated intended-collection-rid header. | ||
| if (this.request != null) { | ||
| this.request.forceNameCacheRefresh = true; | ||
| this.request.requestContext.resolvedCollectionRid = null; |
| // Reset request context so the retry re-resolves the collection | ||
| // and sends the updated intended-collection-rid header. | ||
| if (this.request != null) { | ||
| this.request.forceNameCacheRefresh = true; |
There was a problem hiding this comment.
do we still need this.request.forceNameCacheRefresh = true;? the collection cache is already refresh above
why not just set the this.request.requestContext.resolvedCollectionRid = refreshedCollectionrid?
…eRefresh - Set resolvedCollectionRid to refreshedCollectionRid instead of null - Remove unnecessary forceNameCacheRefresh (cache already refreshed) - Add requestContext null guard to prevent NPE Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove INTENDED_COLLECTION_RID_HEADER in RenameCollectionAwareClientRetryPolicy (404/1002 READ_SESSION_NOT_AVAILABLE) to prevent stale RID on retry. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Problem
When a container is deleted and re-created with the same name, subsequent read/write/query operations fail with:
(HTTP 400, sub-status 1024 / INCORRECT_CONTAINER_RID_SUB_STATUS)
This affects gateway mode because the stale intended-collection-rid header persists through retry.
Root Cause
StaleResourceRetryPolicycorrectly handles 400/1024 by refreshing the collection cache and clearing session tokens, but it did not reset the request context before the retry:request.requestContext.resolvedCollectionRidstill held the old collection RIDrequest.forceNameCacheRefreshwas stillfalsex-ms-cosmos-intended-collection-ridHTTP header still carried the old RIDOn retry,
RxGatewayStoreModel.addIntendedCollectionRid()checks if the header is already set and skips updating it — so the stale RID is sent again, causing the same 400/1024 error.Fix
After refreshing the collection cache in
StaleResourceRetryPolicy.shouldRetry(), reset:resolvedCollectionRid→nullforceNameCacheRefresh→trueINTENDED_COLLECTION_RID_HEADERThis ensures the retry re-resolves the collection and sends the correct (new) RID.
Testing
Added parameterized unit test
requestContextResetOnRetrycovering both error codes:The test verifies that after
shouldRetry():resolvedCollectionRidis cleared tonullforceNameCacheRefreshis set totrueINTENDED_COLLECTION_RID_HEADERis removedFixes #49097