Skip to content

Commit f3a2722

Browse files
authored
🤖 fix: avoid extra Anthropic cache breakpoints with explicit TTL (#3112)
Summary This PR fixes a direct Anthropic regression where explicitly setting `anthropic.cacheTtl` caused Mux to emit one extra cache-control breakpoint, pushing tool-enabled requests over Anthropic's four-breakpoint limit. Background Mux already applies Anthropic prompt caching through manual cache markers on the cached system prompt, conversation tail, and last tool. When `buildProviderOptions()` also emitted top-level `anthropic.cacheControl`, the Anthropic SDK serialized an additional top-level `cache_control` block on direct requests. That produced the user-visible failure: `A maximum of 4 blocks with cache_control may be provided. Found 5.` Implementation The fix stops emitting top-level Anthropic `cacheControl` from `buildProviderOptions()` while preserving the existing manual cache-marker flow. To guard against future regressions, the PR also adds a helper that counts Anthropic cache breakpoints in shaped request payloads and tests that pin the intended breakpoint budget. A targeted StreamManager regression test verifies that explicit `1h` TTL values still propagate through the manual cache path even without the top-level provider option. Validation - `bun test src/common/utils/ai/providerOptions.test.ts src/node/services/providerModelFactory.test.ts src/common/utils/ai/cacheStrategy.test.ts src/node/services/streamManager.test.ts` - `nix shell nixpkgs#hadolint -c make static-check` - Dogfooded in an isolated `make dev-server-sandbox` instance using env-backed direct Anthropic credentials: - selected Anthropic in onboarding - set prompt cache TTL to `1 hour` - added the current repo as the first project - opened an Exec workspace and sent a tool-using request - verified the request completed successfully without the previous `Found 5` Anthropic error - verified the UI showed prompt-cache read/create stats for the successful request Risks The main regression risk is Anthropic request shaping across direct and routed paths. This change is intentionally narrow: it removes the redundant top-level direct-provider cache marker while keeping the existing manual cache markers intact, and adds tests at both the provider-options layer and the final shaped-request layer. Pains `make static-check` requires `hadolint`, which was not installed in the workspace environment. I ran it through `nix shell nixpkgs#hadolint -c make static-check` so the full required local validation still passed. --- <details> <summary>📋 Implementation Plan</summary> # Fix plan: direct Anthropic cache-marker duplication when explicit cache TTL is set ## Recommendation **Recommended approach: keep Mux's existing 3 manual Anthropic cache breakpoints, and stop emitting the extra top-level Anthropic `cacheControl` field from `buildProviderOptions()`.** - **Net product-code LoC estimate:** **+20 to +55** - Why this is the best fit: - It removes the only repo-visible behavior change that happens **only when `anthropic.cacheTtl` is explicitly set**. - It preserves the current manual breakpoint strategy already documented in `src/common/utils/ai/cacheStrategy.ts`: 1. cached system prompt 2. cached conversation tail / last message 3. cached last tool - It avoids a wider refactor across `messagePipeline.ts`, `streamManager.ts`, and `providerModelFactory.ts` unless follow-up cleanup is still desired after the regression is fixed. <details> <summary>Evidence supporting the root-cause diagnosis</summary> - The user hit Anthropic's runtime error: **"A maximum of 4 blocks with cache_control may be provided. Found 5."** on a **direct Anthropic** request. - The repo already applies **3 manual Anthropic cache breakpoints** across these files: - `src/common/utils/ai/cacheStrategy.ts` - `createCachedSystemMessage()` - `applyCacheControl()` - `applyCacheControlToTools()` - `src/node/services/messagePipeline.ts` applies `applyCacheControl()` after message transforms. - `src/node/services/streamManager.ts` prepends the cached system message and marks the last tool. - `src/common/utils/ai/cacheStrategy.ts` explicitly documents Anthropic's **4-breakpoint limit** and says the intended design is to use **3 total**. - `src/common/utils/ai/providerOptions.ts` is the one place that adds an **extra top-level** Anthropic `cacheControl` field, and it does so **only when `muxProviderOptions.anthropic.cacheTtl` is explicitly set**. - `src/node/services/aiService.ts` already passes the explicit TTL separately into both: - `prepareMessagesForProvider(...)` (`anthropicCacheTtl` argument) - `streamManager.startStream(...)` (`anthropicCacheTtlOverride` argument) - That means the explicit TTL already reaches the manual cache-marker path **without needing** top-level `providerOptions.anthropic.cacheControl`. - So the most conservative repo-backed explanation is: - **unset TTL** -> manual 3-breakpoint path - **explicit TTL** -> same manual 3-breakpoint path **plus** an extra top-level Anthropic cache-control path - Anthropic rejects the resulting request once the effective marker count reaches 5. </details> ## Alternate approach (not recommended for the first fix) **Centralize all Anthropic cache injection in `src/node/services/providerModelFactory.ts` and remove the higher-level cache-marker transforms.** - **Net product-code LoC estimate:** **-40 to -110** - Upside: one source of truth for the wire payload. - Downside: materially larger behavior change, touches more call sites, and increases regression surface for system prompts, tools, retries, and gateway routing. - Recommendation: defer this unless the surgical fix fails to cover another hidden duplication path. ## Implementation plan ### Phase 1 — Remove the redundant top-level Anthropic cache-control path **Files/symbols** - `src/common/utils/ai/providerOptions.ts` - `src/common/utils/ai/providerOptions.test.ts` **Changes** 1. Update `buildProviderOptions()` so Anthropic models **do not emit** top-level `anthropic.cacheControl`, even when `muxProviderOptions.anthropic.cacheTtl` is set to `"5m"` or `"1h"`. 2. Keep the rest of the Anthropic provider options intact: - `thinking` - `effort` - `disableParallelToolUse` - `sendReasoning` 3. Add a short code comment documenting why the top-level field is intentionally omitted: - explicit Anthropic TTL is already threaded through Mux's manual cache-marker helpers - sending an extra top-level cache-control field can create duplicate cache breakpoints and violate Anthropic's 4-breakpoint limit **Quality gate after Phase 1** - Update `src/common/utils/ai/providerOptions.test.ts` to assert that explicit Anthropic TTL **no longer appears** in top-level provider options. - Cover both: - standard Anthropic models - effort/adaptive-thinking Anthropic models (for example Opus 4.6 / Sonnet 4.6 cases already exercised in this test file) ### Phase 2 — Add a narrow regression guard at the wire-shaping layer **Files/symbols** - `src/node/services/providerModelFactory.ts` - `src/node/services/providerModelFactory.test.ts` **Changes** 1. Extract or add a small pure helper near `wrapFetchWithAnthropicCacheControl()` that can **count Anthropic cache breakpoints in the final request body**. 2. Count all cache-bearing locations relevant to this repo's current shaping strategy, including: - cached system blocks/messages - cached tools - cached last-message content parts - gateway-style `providerOptions.anthropic.cacheControl` message markers if present 3. Reuse that helper in tests, and optionally add a defensive runtime assertion or warning right before sending the mutated request body. - Goal: fail loudly in development/tests if a future change pushes the request above Anthropic's limit again. - Keep the runtime behavior minimal; do not expand this into a broad fallback/rewrite mechanism in the first fix. **Quality gate after Phase 2** - Add direct-provider regression coverage in `src/node/services/providerModelFactory.test.ts` that builds a representative Anthropic request shape with: - cached system prompt - cached last tool - cached last message - explicit `cacheTtl: "1h"` - Assert that the final shaped request stays at **<= 4** breakpoints, and preferably at the intended **3**. ### Phase 3 — Verify TTL still propagates through the manual cache-marker path **Files/symbols** - `src/node/services/aiService.ts` - `src/node/services/messagePipeline.ts` - `src/node/services/streamManager.ts` - `src/common/utils/ai/cacheStrategy.ts` - existing tests in: - `src/common/utils/ai/cacheStrategy.test.ts` - `src/node/services/streamManager.test.ts` or `src/node/services/aiService.test.ts` (only if a small targeted regression test is needed) **Changes** 1. Leave the existing manual cache-marker plumbing intact for the first fix. 2. Add or update one targeted regression test proving that explicit Anthropic TTL still reaches the manual cache path even after top-level `cacheControl` is removed. - Best case: reuse an existing unit seam rather than adding a new integration harness. - Only expand into `aiService` / `streamManager` tests if `providerOptions` + `providerModelFactory` tests are not enough to pin the behavior down. 3. Preserve the documented 3-breakpoint strategy in `cacheStrategy.ts`; do not refactor that layer yet. **Quality gate after Phase 3** - Confirm the test suite still proves: - system prompt caching works - last tool caching works - last message caching works - explicit TTL values (`"1h"`) are preserved on the manual path ## Acceptance criteria - Direct Anthropic requests with explicit `anthropic.cacheTtl: "1h"` no longer exceed Anthropic's 4-breakpoint limit. - The final direct-provider request shape remains at the intended **3 manual cache breakpoints** unless a future Anthropic-specific feature intentionally adds another. - Explicit TTL still applies to the existing manual cache markers; removing top-level `providerOptions.anthropic.cacheControl` must **not** silently disable 1-hour prompt caching. - Anthropic models without explicit TTL continue to use the existing manual cache-marker strategy. - The change does not introduce a regression for gateway-routed Anthropic models. ## Validation plan 1. **Targeted unit tests** - `bun test src/common/utils/ai/providerOptions.test.ts` - `bun test src/node/services/providerModelFactory.test.ts` - `bun test src/common/utils/ai/cacheStrategy.test.ts` 2. **Focused service regression test** - run the smallest relevant additional test file only if Phase 3 adds coverage in `aiService.test.ts` or `streamManager.test.ts` 3. **Static validation** - `make typecheck` - `make lint` if the touched files introduce new lint exposure 4. **Optional integration check** - If Anthropic credentials are available in the environment, run a narrow Anthropic integration exercise after the unit tests pass. - Prefer a direct-provider reproduction with explicit `cacheTtl: "1h"` and at least one tool-enabled request. ## Dogfooding plan **Goal:** reproduce the original failure mode on the app path the user actually hit, then verify the fix with evidence a reviewer can inspect. ### Setup - Configure the **direct Anthropic provider** (not mux-gateway). - Use an Anthropic model that supports the affected prompt-caching path. - Enable explicit `anthropic.cacheTtl: "1h"`. - Use **Exec** mode or any other tool-enabled flow that exercises tool definitions in the request. ### Repro / verification flow 1. Start the app in a local dev session. 2. Select the direct Anthropic provider and confirm `cacheTtl` is `"1h"`. 3. Run a simple tool-eligible request in Exec mode. 4. Verify the request completes **without** the Anthropic API error about **5 cache-control blocks**. 5. If a debug request snapshot or local debug logging is available, verify the final outgoing Anthropic payload is at **<= 4** breakpoints. ### Evidence to capture - **Screenshot 1:** provider/model settings showing direct Anthropic + explicit `1h` TTL - **Screenshot 2:** successful Exec/tool-enabled response where the old error no longer appears - **Screenshot 3 (if available):** debug snapshot or log evidence showing the final cache-breakpoint count - **Video recording:** a short end-to-end repro/verification run covering provider selection, request submission, and successful completion ### Suggested tooling for verification - In exec mode, use the repo's normal desktop/dev workflow to reproduce the conversation flow. - If automation is helpful during implementation review, use the desktop/browser automation tools available in exec mode to drive the app and capture screenshots/video artifacts. ## Risks / non-goals - **Non-goal for this fix:** full cache-system centralization across `messagePipeline`, `streamManager`, and `providerModelFactory`. - **Risk:** if another hidden Anthropic SDK path also materializes extra cache markers, removing top-level `cacheControl` may not be sufficient by itself. - Mitigation: add the wire-level breakpoint counter test in Phase 2 so the final payload shape is asserted directly. - **Risk:** some tests may currently treat top-level `anthropic.cacheControl` as the source of truth for TTL propagation. - Mitigation: update those tests to assert the new invariant: TTL is carried by the manual cache-marker path, not by top-level provider options. </details> --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$12.62`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=12.62 -->
1 parent 78f6c78 commit f3a2722

File tree

5 files changed

+261
-26
lines changed

5 files changed

+261
-26
lines changed

src/common/utils/ai/providerOptions.test.ts

Lines changed: 6 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@ describe("buildProviderOptions - Anthropic", () => {
235235
});
236236

237237
describe("Anthropic cache TTL overrides", () => {
238-
test("should include cacheControl ttl when configured", () => {
238+
test("should omit top-level cacheControl even when cache TTL is configured", () => {
239239
const result = buildProviderOptions(
240240
"anthropic:claude-sonnet-4-5",
241241
"off",
@@ -250,15 +250,11 @@ describe("buildProviderOptions - Anthropic", () => {
250250
anthropic: {
251251
disableParallelToolUse: false,
252252
sendReasoning: true,
253-
cacheControl: {
254-
type: "ephemeral",
255-
ttl: "1h",
256-
},
257253
},
258254
});
259255
});
260256

261-
test("should include cacheControl ttl for Opus 4.6 effort models", () => {
257+
test("should preserve Opus 4.6 reasoning options without top-level cacheControl", () => {
262258
const result = buildProviderOptions(
263259
"anthropic:claude-opus-4-6",
264260
"medium",
@@ -276,18 +272,14 @@ describe("buildProviderOptions - Anthropic", () => {
276272
thinking: {
277273
type: "adaptive",
278274
},
279-
cacheControl: {
280-
type: "ephemeral",
281-
ttl: "5m",
282-
},
283275
effort: "medium",
284276
},
285277
});
286278
});
287279
});
288280

289281
describe("disableBetaFeatures", () => {
290-
test("should omit cacheControl when disableBetaFeatures is true even with cacheTtl set", () => {
282+
test("should keep omitting top-level cacheControl when disableBetaFeatures is true", () => {
291283
const result = buildProviderOptions(
292284
"anthropic:claude-sonnet-4-5",
293285
"medium",
@@ -303,7 +295,7 @@ describe("buildProviderOptions - Anthropic", () => {
303295
expect(anthropic.sendReasoning).toBe(true);
304296
});
305297

306-
test("should include cacheControl normally when disableBetaFeatures is false", () => {
298+
test("should keep omitting top-level cacheControl when disableBetaFeatures is false", () => {
307299
const result = buildProviderOptions(
308300
"anthropic:claude-sonnet-4-5",
309301
"medium",
@@ -315,7 +307,8 @@ describe("buildProviderOptions - Anthropic", () => {
315307
);
316308
const anthropic = (result as Record<string, unknown>).anthropic as Record<string, unknown>;
317309

318-
expect(anthropic.cacheControl).toEqual({ type: "ephemeral", ttl: "1h" });
310+
expect(anthropic.cacheControl).toBeUndefined();
311+
expect(anthropic.sendReasoning).toBe(true);
319312
});
320313
});
321314
});

src/common/utils/ai/providerOptions.ts

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -254,9 +254,11 @@ export function buildProviderOptions(
254254

255255
// Build Anthropic-specific options
256256
if (formatProvider === "anthropic") {
257-
const disableBeta = muxProviderOptions?.anthropic?.disableBetaFeatures === true;
258-
const cacheTtl = disableBeta ? undefined : muxProviderOptions?.anthropic?.cacheTtl;
259-
const cacheControl = cacheTtl ? { type: "ephemeral" as const, ttl: cacheTtl } : undefined;
257+
// Anthropic prompt caching is already applied on Mux's manual cache markers
258+
// (cached system message, conversation tail, last tool) deeper in the
259+
// request pipeline. Do not also send top-level cacheControl here: the SDK
260+
// serializes it to a top-level cache_control block, which adds an extra
261+
// breakpoint on direct Anthropic requests.
260262

261263
// Opus 4.5+ and Sonnet 4.6 use the effort parameter for reasoning control.
262264
// Opus 4.6 / Sonnet 4.6 use adaptive thinking (model decides when/how much to think).
@@ -291,7 +293,6 @@ export function buildProviderOptions(
291293
disableParallelToolUse: false,
292294
sendReasoning: true,
293295
...(thinking && { thinking }),
294-
...(cacheControl && { cacheControl }),
295296
effort: effortLevel,
296297
},
297298
} satisfies { anthropic: AnthropicProviderOptions };
@@ -310,7 +311,6 @@ export function buildProviderOptions(
310311
anthropic: {
311312
disableParallelToolUse: false, // Always enable concurrent tool execution
312313
sendReasoning: true, // Include reasoning traces in requests sent to the model
313-
...(cacheControl && { cacheControl }),
314314
// Conditionally add thinking configuration (non-Opus 4.5 models)
315315
...(budgetTokens > 0 && {
316316
thinking: {

src/node/services/providerModelFactory.test.ts

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ import {
88
ProviderModelFactory,
99
buildAIProviderRequestHeaders,
1010
classifyCopilotInitiator,
11+
countAnthropicCacheBreakpoints,
1112
modelCostsIncluded,
1213
MUX_AI_PROVIDER_USER_AGENT,
1314
resolveAIProviderHeaderSource,
@@ -500,6 +501,81 @@ describe("classifyCopilotInitiator", () => {
500501
});
501502
});
502503

504+
describe("countAnthropicCacheBreakpoints", () => {
505+
it("counts the intended three manual Anthropic cache breakpoints for direct requests", () => {
506+
const requestBody = {
507+
model: "claude-sonnet-4-5",
508+
system: [
509+
{
510+
type: "text",
511+
text: "You are a helpful assistant",
512+
cache_control: { type: "ephemeral", ttl: "1h" },
513+
},
514+
],
515+
messages: [
516+
{
517+
role: "user",
518+
content: [
519+
{ type: "text", text: "hello" },
520+
{
521+
type: "text",
522+
text: "world",
523+
cache_control: { type: "ephemeral", ttl: "1h" },
524+
},
525+
],
526+
},
527+
],
528+
tools: [
529+
{
530+
name: "read_file",
531+
input_schema: { type: "object" },
532+
},
533+
{
534+
name: "bash",
535+
input_schema: { type: "object" },
536+
cache_control: { type: "ephemeral", ttl: "1h" },
537+
},
538+
],
539+
};
540+
541+
expect(countAnthropicCacheBreakpoints(requestBody)).toBe(3);
542+
});
543+
544+
it("treats a top-level Anthropic cache_control block as an extra breakpoint", () => {
545+
const requestBody = {
546+
cache_control: { type: "ephemeral", ttl: "1h" },
547+
system: [
548+
{
549+
type: "text",
550+
text: "You are a helpful assistant",
551+
cache_control: { type: "ephemeral", ttl: "1h" },
552+
},
553+
],
554+
messages: [
555+
{
556+
role: "user",
557+
content: [
558+
{
559+
type: "text",
560+
text: "world",
561+
cache_control: { type: "ephemeral", ttl: "1h" },
562+
},
563+
],
564+
},
565+
],
566+
tools: [
567+
{
568+
name: "bash",
569+
input_schema: { type: "object" },
570+
cache_control: { type: "ephemeral", ttl: "1h" },
571+
},
572+
],
573+
};
574+
575+
expect(countAnthropicCacheBreakpoints(requestBody)).toBe(4);
576+
});
577+
});
578+
503579
describe("resolveAIProviderHeaderSource", () => {
504580
it("uses Request headers when init.headers is not provided", () => {
505581
const input = new Request("https://example.com", {

src/node/services/providerModelFactory.ts

Lines changed: 63 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -206,11 +206,68 @@ function mergeAnthropicCacheControl(
206206
return merged;
207207
}
208208

209+
function isRecord(value: unknown): value is Record<string, unknown> {
210+
return typeof value === "object" && value !== null;
211+
}
212+
213+
function hasAnthropicProviderCacheControl(value: unknown): boolean {
214+
if (!isRecord(value)) {
215+
return false;
216+
}
217+
218+
const anthropicOptions = value.anthropic;
219+
return isRecord(anthropicOptions) && isRecord(anthropicOptions.cacheControl);
220+
}
221+
222+
/**
223+
* Count Anthropic prompt-cache breakpoints in a shaped request payload.
224+
*
225+
* This intentionally counts both raw `cache_control` blocks and untransformed
226+
* `providerOptions.anthropic.cacheControl` markers so tests can guard the
227+
* budget across direct and gateway request-shaping paths.
228+
*/
229+
export function countAnthropicCacheBreakpoints(requestBody: unknown): number {
230+
const pending: unknown[] = [requestBody];
231+
let count = 0;
232+
233+
while (pending.length > 0) {
234+
const current = pending.pop();
235+
if (current == null) {
236+
continue;
237+
}
238+
239+
if (Array.isArray(current)) {
240+
for (const value of current) {
241+
pending.push(value);
242+
}
243+
continue;
244+
}
245+
246+
if (!isRecord(current)) {
247+
continue;
248+
}
249+
250+
if (isRecord(current.cache_control)) {
251+
count += 1;
252+
}
253+
254+
if (hasAnthropicProviderCacheControl(current.providerOptions)) {
255+
count += 1;
256+
}
257+
258+
for (const value of Object.values(current)) {
259+
pending.push(value);
260+
}
261+
}
262+
263+
return count;
264+
}
265+
209266
/**
210-
* Wrap fetch to inject Anthropic cache_control directly into the request body.
211-
* The AI SDK's providerOptions.anthropic.cacheControl doesn't get translated
212-
* to raw cache_control for tools or message content parts, so we inject it
213-
* at the HTTP level.
267+
* Wrap fetch to normalize Anthropic cache_control directly on the final request body.
268+
*
269+
* This keeps routed Anthropic payloads aligned with Mux's manual cache markers
270+
* and lets a higher-level cacheTtl override win at the last wire-shaping step.
214271
*
215272
* Injects cache_control on:
216273
* 1. Last tool (caches all tool definitions)
@@ -782,8 +839,7 @@ export class ProviderModelFactory {
782839

783840
// Lazy-load Anthropic provider to reduce startup time
784841
const { createAnthropic } = await PROVIDER_REGISTRY.anthropic();
785-
// Wrap fetch to inject cache_control on tools and messages
786-
// (SDK doesn't translate providerOptions to cache_control for these)
842+
// Wrap fetch to normalize cache_control on the final Anthropic payload.
787843
// Use getProviderFetch to preserve any user-configured custom fetch (e.g., proxies)
788844
const baseFetch = getProviderFetch(providerConfig);
789845
const disableBeta = muxProviderOptions?.anthropic?.disableBetaFeatures === true;
@@ -1375,8 +1431,7 @@ export class ProviderModelFactory {
13751431
const { couponCode } = creds;
13761432

13771433
const { createGateway } = await PROVIDER_REGISTRY["mux-gateway"]();
1378-
// For Anthropic models via gateway, wrap fetch to inject cache_control on tools
1379-
// (gateway provider doesn't process providerOptions.anthropic.cacheControl)
1434+
// For Anthropic models via gateway, normalize cache_control on the final payload.
13801435
// Use getProviderFetch to preserve any user-configured custom fetch (e.g., proxies)
13811436
const baseFetch = getProviderFetch(providerConfig);
13821437
const isAnthropicModel = modelId.startsWith("anthropic/");

0 commit comments

Comments
 (0)