Skip to content

Commit 920c021

Browse files
Add adaptive embedding throughput shaping for Azure 429 limits (#1115)
## Why The previous retry-only fix still failed under sustained S0 throttling: large embedding requests kept exhausting retries at the same payload size. We need throughput shaping so rebuilds can continue progressing under rate limits instead of stalling at repeated 429 exhaustion. ## What changed - Added adaptive batch downshifting in embedding rebuilds: - starts at configured max batch size - on 429/RateLimitReached, splits throttled batches and retries smaller sub-batches - reuses the smaller successful size for subsequent requests in the same run - fails clearly if batch size 1 still exhausts retries - Added explicit request pacing controls: - `AIOptions:EmbeddingRetry:MaxEmbeddingBatchSize` (default 2048) - `AIOptions:EmbeddingRetry:MinInterRequestDelayMs` (default 250) - embedding requests are serialized and paced between calls to reduce sustained RPM pressure - Hardened Retry-After parsing: - supports `retry-after`, `retry-after-ms`, `x-ms-retry-after-ms` - supports extracting `retry after N seconds` from exception message text - Added coarse progress logging during rebuilds (not per call): - logs start configuration - logs progress at 10% milestones when total count is known - falls back to every 500 chunks when total count is unknown - includes current adaptive batch size in progress messages ## Validation - `dotnet build EssentialCSharp.Chat.Shared/EssentialCSharp.Chat.Common.csproj -c Release --nologo` - `dotnet test EssentialCSharp.Chat.Tests/EssentialCSharp.Chat.Tests.csproj -c Release --no-restore -v q` Both passed.
1 parent 5b3721a commit 920c021

4 files changed

Lines changed: 332 additions & 26 deletions

File tree

EssentialCSharp.Chat.Shared/Extensions/ServiceCollectionExtensions.cs

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,8 @@ public static IServiceCollection AddAzureOpenAIServices(
100100
// Configure AI options from configuration
101101
services.Configure<AIOptions>(configuration.GetSection("AIOptions"));
102102

103-
// Configure retry options from configuration section
104-
// Environment variables like EmbeddingRetry:MaxRetries will override defaults
103+
// Configure retry options from configuration section.
104+
// Environment variables can override via AIOptions__EmbeddingRetry__*.
105105
services.AddOptions<EmbeddingRetryOptions>()
106106
.Bind(configuration.GetSection(EmbeddingRetryOptions.SectionPath))
107107
.ValidateDataAnnotations()

EssentialCSharp.Chat.Shared/Models/EmbeddingRetryOptions.cs

Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,20 @@ public sealed class EmbeddingRetryOptions
3434
[Range(1, 600000)]
3535
public int MaxDelayMs { get; set; } = 60000;
3636

37+
/// <summary>
38+
/// Maximum embedding request payload size sent per API call.
39+
/// The service may adaptively downshift below this value when throttled.
40+
/// </summary>
41+
[Range(1, 2048)]
42+
public int MaxEmbeddingBatchSize { get; set; } = 2048;
43+
44+
/// <summary>
45+
/// Minimum delay between embedding API requests in milliseconds.
46+
/// This adds request pacing to reduce sustained rate-limit pressure.
47+
/// </summary>
48+
[Range(0, 600000)]
49+
public int MinInterRequestDelayMs { get; set; } = 250;
50+
3751
/// <summary>
3852
/// Exponential backoff multiplier. Each retry delay is multiplied by this value.
3953
/// For example, with baseDelay=1000ms and multiplier=2.0:
@@ -74,6 +88,15 @@ public void Validate()
7488
if (BaseDelayMs > MaxDelayMs)
7589
throw new InvalidOperationException("BaseDelayMs must be less than or equal to MaxDelayMs.");
7690

91+
if (MaxEmbeddingBatchSize <= 0)
92+
throw new InvalidOperationException("MaxEmbeddingBatchSize must be positive.");
93+
94+
if (MaxEmbeddingBatchSize > 2048)
95+
throw new InvalidOperationException("MaxEmbeddingBatchSize cannot exceed Azure embedding API limit (2048).");
96+
97+
if (MinInterRequestDelayMs < 0)
98+
throw new InvalidOperationException("MinInterRequestDelayMs must be non-negative.");
99+
77100
if (BackoffMultiplier < 1.0)
78101
throw new InvalidOperationException("BackoffMultiplier must be >= 1.0.");
79102

0 commit comments

Comments
 (0)