Harden Azure OpenAI embedding retry for 429 throttling#1114
Merged
BenjaminMichaelis merged 5 commits intoMay 16, 2026
Conversation
…ackoff retry - Add RetryOptions configuration model with configurable backoff parameters - Implement retry logic with exponential backoff + jitter for transient Azure OpenAI errors - Honor Retry-After header from 429 responses - Wrap embedding generation calls with automatic retry wrapper - Ensure batch processing can recover from transient failures - Wire configuration via options pattern with safe defaults - Add comprehensive logging for retry attempts and final failures Fixes issue where transient 429 errors from text-embedding-3-small-v1 would fail entire embedding batch. Now retries with exponential backoff (max 5 attempts by default) before failing with clear error context.
- Switch to ASP.NET-style nested options path AIOptions:EmbeddingRetry - Rename retry options model to avoid Azure.Core RetryOptions ambiguity - Add data annotations and runtime validation for retry configuration - Handle ClientResultException transient status codes (429/5xx/408) - Parse and honor Retry-After header when present - Use LoggerMessage source-generated logging instead of CA1848 suppression - Use Random.Shared for thread-safe jitter in singleton service - Preserve caller cancellation semantics (no retry/wrap on requested cancel) - Use CancellationToken.None for staging cleanup to avoid masking root failures - Cap exponential delay with MaxDelayMs to avoid overflow
Contributor
There was a problem hiding this comment.
Pull request overview
Adds resilient retry handling for transient Azure OpenAI errors (429, 408, 5xx, timeouts) in EmbeddingService, configurable via a new EmbeddingRetryOptions bound at AIOptions:EmbeddingRetry. Final retry exhaustion is logged via LoggerMessage source generators and surfaced as a clear terminal exception, while caller cancellation is preserved and staging cleanup is hardened to not mask the original error.
Changes:
- Introduce
EmbeddingRetryOptions(with data annotations +Validate()) and bind/register it inServiceCollectionExtensionsfor both DI entry points. - Wrap
GenerateEmbeddingAsyncand the per-batch embedding call inExecuteWithRetryAsync(exponential backoff + jitter + Retry-After), with source-generated logging and a non-cancellable staging-collection cleanup on failure. - Add
EmbeddingRetrydefaults toappsettings.jsonand ignorebuild_output.txtin.gitignore.
Reviewed changes
Copilot reviewed 4 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| EssentialCSharp.Chat.Shared/Services/EmbeddingService.cs | Adds retry pipeline, transient-error detection, Retry-After parsing, LoggerMessage methods, and switches staging cleanup to CancellationToken.None. |
| EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs | New options class with [Range] annotations, SectionPath constant, and imperative Validate(). |
| EssentialCSharp.Chat.Shared/Extensions/ServiceCollectionExtensions.cs | Registers and binds EmbeddingRetryOptions in both AddAzureOpenAIServices overloads with data-annotation + custom validation. |
| EssentialCSharp.Web/appsettings.json | Adds default AIOptions:EmbeddingRetry configuration block. |
| .gitignore | Ignores build_output.txt. |
Comments suppressed due to low confidence (1)
EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs:82
Validate()only enforces lower-bound (and partial cross-field) checks, but the[Range]attributes on the properties define upper bounds (e.g.,MaxRetriescapped at 20,BaseDelayMs/MaxDelayMsat 600000,BackoffMultiplierat 10.0). Because the registration callsValidateDataAnnotations()and.Validate(o => { o.Validate(); return true; }), the imperativeValidate()will accept values (e.g.,MaxRetries = 1000,BackoffMultiplier = 100.0) that the data-annotation validator rejects, leading to inconsistent behavior depending on which validation path is exercised (and when called directly from the secondary constructor /ValidateRetryOptions, the data annotations are never applied). Consider mirroring the[Range]upper bounds inValidate()so both paths agree.
public void Validate()
{
if (MaxRetries < 0)
throw new InvalidOperationException("MaxRetries must be non-negative.");
if (BaseDelayMs < 0)
throw new InvalidOperationException("BaseDelayMs must be non-negative.");
if (MaxDelayMs <= 0)
throw new InvalidOperationException("MaxDelayMs must be positive.");
if (BaseDelayMs > MaxDelayMs)
throw new InvalidOperationException("BaseDelayMs must be less than or equal to MaxDelayMs.");
if (BackoffMultiplier < 1.0)
throw new InvalidOperationException("BackoffMultiplier must be >= 1.0.");
if (MaxJitterFraction < 0.0 || MaxJitterFraction > 1.0)
throw new InvalidOperationException("MaxJitterFraction must be between 0.0 and 1.0.");
}
- Clarify MaxRetries XML docs as retries (not total attempts) - Clamp server Retry-After delays to MaxDelayMs - Rethrow original transient exception after retry exhaustion - Remove unnecessary string interpolation marker
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why
Embedding generation could fail fast on transient Azure OpenAI throttling (HTTP 429), which interrupted full vector rebuild runs. We need resilient retry behavior that respects service guidance while still failing clearly when retries are exhausted.
What changed
EmbeddingServicefor transient failures, includingClientResultExceptionstatus-based detection (429/5xx/408), exponential backoff, jitter, and Retry-After header support when present.LoggerMessagemethods to match project logging conventions.Configuration and ASP.NET conventions
EmbeddingRetryOptionsand bound it via standard options binding atAIOptions:EmbeddingRetry.EmbeddingRetrydefaults inEssentialCSharp.Web/appsettings.json.MaxDelayMscap to prevent delay overflow/unbounded waits.Additional cleanup
build_output.txtand added it to.gitignore.RetryOptionsmodel after renaming to avoid ambiguity withAzure.Core.RetryOptions.Validation
EssentialCSharp.Chat.Shared/EssentialCSharp.Chat.Common.csprojsuccessfully after changes.