🤖 fix: avoid extra Anthropic cache breakpoints with explicit TTL (#3112)

ThomasK33 · web-flow · commit f3a272240580 · 2026-04-02T16:55:02.000Z
Summary
This PR fixes a direct Anthropic regression where explicitly setting
`anthropic.cacheTtl` caused Mux to emit one extra cache-control
breakpoint, pushing tool-enabled requests over Anthropic's
four-breakpoint limit.

Background
Mux already applies Anthropic prompt caching through manual cache
markers on the cached system prompt, conversation tail, and last tool.
When `buildProviderOptions()` also emitted top-level
`anthropic.cacheControl`, the Anthropic SDK serialized an additional
top-level `cache_control` block on direct requests. That produced the
user-visible failure: `A maximum of 4 blocks with cache_control may be
provided. Found 5.`

Implementation
The fix stops emitting top-level Anthropic `cacheControl` from
`buildProviderOptions()` while preserving the existing manual
cache-marker flow. To guard against future regressions, the PR also adds
a helper that counts Anthropic cache breakpoints in shaped request
payloads and tests that pin the intended breakpoint budget. A targeted
StreamManager regression test verifies that explicit `1h` TTL values
still propagate through the manual cache path even without the top-level
provider option.

Validation
- `bun test src/common/utils/ai/providerOptions.test.ts
src/node/services/providerModelFactory.test.ts
src/common/utils/ai/cacheStrategy.test.ts
src/node/services/streamManager.test.ts`
- `nix shell nixpkgs#hadolint -c make static-check`
- Dogfooded in an isolated `make dev-server-sandbox` instance using
env-backed direct Anthropic credentials:
  - selected Anthropic in onboarding
  - set prompt cache TTL to `1 hour`
  - added the current repo as the first project
  - opened an Exec workspace and sent a tool-using request
- verified the request completed successfully without the previous
`Found 5` Anthropic error
- verified the UI showed prompt-cache read/create stats for the
successful request

Risks
The main regression risk is Anthropic request shaping across direct and
routed paths. This change is intentionally narrow: it removes the
redundant top-level direct-provider cache marker while keeping the
existing manual cache markers intact, and adds tests at both the
provider-options layer and the final shaped-request layer.

Pains
`make static-check` requires `hadolint`, which was not installed in the
workspace environment. I ran it through `nix shell nixpkgs#hadolint -c
make static-check` so the full required local validation still passed.

---
&lt;details&gt;
&lt;summary&gt;📋 Implementation Plan&lt;/summary&gt;

# Fix plan: direct Anthropic cache-marker duplication when explicit
cache TTL is set

## Recommendation

**Recommended approach: keep Mux's existing 3 manual Anthropic cache
breakpoints, and stop emitting the extra top-level Anthropic
`cacheControl` field from `buildProviderOptions()`.**

- **Net product-code LoC estimate:** **+20 to +55**
- Why this is the best fit:
- It removes the only repo-visible behavior change that happens **only
when `anthropic.cacheTtl` is explicitly set**.
- It preserves the current manual breakpoint strategy already documented
in `src/common/utils/ai/cacheStrategy.ts`:
    1. cached system prompt
    2. cached conversation tail / last message
    3. cached last tool
- It avoids a wider refactor across `messagePipeline.ts`,
`streamManager.ts`, and `providerModelFactory.ts` unless follow-up
cleanup is still desired after the regression is fixed.

&lt;details&gt;
&lt;summary&gt;Evidence supporting the root-cause diagnosis&lt;/summary&gt;

- The user hit Anthropic's runtime error: **"A maximum of 4 blocks with
cache_control may be provided. Found 5."** on a **direct Anthropic**
request.
- The repo already applies **3 manual Anthropic cache breakpoints**
across these files:
  - `src/common/utils/ai/cacheStrategy.ts`
    - `createCachedSystemMessage()`
    - `applyCacheControl()`
    - `applyCacheControlToTools()`
- `src/node/services/messagePipeline.ts` applies `applyCacheControl()`
after message transforms.
- `src/node/services/streamManager.ts` prepends the cached system
message and marks the last tool.
- `src/common/utils/ai/cacheStrategy.ts` explicitly documents
Anthropic's **4-breakpoint limit** and says the intended design is to
use **3 total**.
- `src/common/utils/ai/providerOptions.ts` is the one place that adds an
**extra top-level** Anthropic `cacheControl` field, and it does so
**only when `muxProviderOptions.anthropic.cacheTtl` is explicitly set**.
- `src/node/services/aiService.ts` already passes the explicit TTL
separately into both:
  - `prepareMessagesForProvider(...)` (`anthropicCacheTtl` argument)
- `streamManager.startStream(...)` (`anthropicCacheTtlOverride`
argument)
- That means the explicit TTL already reaches the manual cache-marker
path **without needing** top-level
`providerOptions.anthropic.cacheControl`.
- So the most conservative repo-backed explanation is:
  - **unset TTL** -&gt; manual 3-breakpoint path
- **explicit TTL** -&gt; same manual 3-breakpoint path **plus** an extra
top-level Anthropic cache-control path
- Anthropic rejects the resulting request once the effective marker
count reaches 5.

&lt;/details&gt;

## Alternate approach (not recommended for the first fix)

**Centralize all Anthropic cache injection in
`src/node/services/providerModelFactory.ts` and remove the higher-level
cache-marker transforms.**

- **Net product-code LoC estimate:** **-40 to -110**
- Upside: one source of truth for the wire payload.
- Downside: materially larger behavior change, touches more call sites,
and increases regression surface for system prompts, tools, retries, and
gateway routing.
- Recommendation: defer this unless the surgical fix fails to cover
another hidden duplication path.

## Implementation plan

### Phase 1 — Remove the redundant top-level Anthropic cache-control
path

**Files/symbols**
- `src/common/utils/ai/providerOptions.ts`
- `src/common/utils/ai/providerOptions.test.ts`

**Changes**
1. Update `buildProviderOptions()` so Anthropic models **do not emit**
top-level `anthropic.cacheControl`, even when
`muxProviderOptions.anthropic.cacheTtl` is set to `"5m"` or `"1h"`.
2. Keep the rest of the Anthropic provider options intact:
   - `thinking`
   - `effort`
   - `disableParallelToolUse`
   - `sendReasoning`
3. Add a short code comment documenting why the top-level field is
intentionally omitted:
- explicit Anthropic TTL is already threaded through Mux's manual
cache-marker helpers
- sending an extra top-level cache-control field can create duplicate
cache breakpoints and violate Anthropic's 4-breakpoint limit

**Quality gate after Phase 1**
- Update `src/common/utils/ai/providerOptions.test.ts` to assert that
explicit Anthropic TTL **no longer appears** in top-level provider
options.
- Cover both:
  - standard Anthropic models
- effort/adaptive-thinking Anthropic models (for example Opus 4.6 /
Sonnet 4.6 cases already exercised in this test file)

### Phase 2 — Add a narrow regression guard at the wire-shaping layer

**Files/symbols**
- `src/node/services/providerModelFactory.ts`
- `src/node/services/providerModelFactory.test.ts`

**Changes**
1. Extract or add a small pure helper near
`wrapFetchWithAnthropicCacheControl()` that can **count Anthropic cache
breakpoints in the final request body**.
2. Count all cache-bearing locations relevant to this repo's current
shaping strategy, including:
   - cached system blocks/messages
   - cached tools
   - cached last-message content parts
- gateway-style `providerOptions.anthropic.cacheControl` message markers
if present
3. Reuse that helper in tests, and optionally add a defensive runtime
assertion or warning right before sending the mutated request body.
- Goal: fail loudly in development/tests if a future change pushes the
request above Anthropic's limit again.
- Keep the runtime behavior minimal; do not expand this into a broad
fallback/rewrite mechanism in the first fix.

**Quality gate after Phase 2**
- Add direct-provider regression coverage in
`src/node/services/providerModelFactory.test.ts` that builds a
representative Anthropic request shape with:
  - cached system prompt
  - cached last tool
  - cached last message
  - explicit `cacheTtl: "1h"`
- Assert that the final shaped request stays at **&lt;= 4** breakpoints,
and preferably at the intended **3**.

### Phase 3 — Verify TTL still propagates through the manual
cache-marker path

**Files/symbols**
- `src/node/services/aiService.ts`
- `src/node/services/messagePipeline.ts`
- `src/node/services/streamManager.ts`
- `src/common/utils/ai/cacheStrategy.ts`
- existing tests in:
  - `src/common/utils/ai/cacheStrategy.test.ts`
- `src/node/services/streamManager.test.ts` or
`src/node/services/aiService.test.ts` (only if a small targeted
regression test is needed)

**Changes**
1. Leave the existing manual cache-marker plumbing intact for the first
fix.
2. Add or update one targeted regression test proving that explicit
Anthropic TTL still reaches the manual cache path even after top-level
`cacheControl` is removed.
- Best case: reuse an existing unit seam rather than adding a new
integration harness.
- Only expand into `aiService` / `streamManager` tests if
`providerOptions` + `providerModelFactory` tests are not enough to pin
the behavior down.
3. Preserve the documented 3-breakpoint strategy in `cacheStrategy.ts`;
do not refactor that layer yet.

**Quality gate after Phase 3**
- Confirm the test suite still proves:
  - system prompt caching works
  - last tool caching works
  - last message caching works
  - explicit TTL values (`"1h"`) are preserved on the manual path

## Acceptance criteria

- Direct Anthropic requests with explicit `anthropic.cacheTtl: "1h"` no
longer exceed Anthropic's 4-breakpoint limit.
- The final direct-provider request shape remains at the intended **3
manual cache breakpoints** unless a future Anthropic-specific feature
intentionally adds another.
- Explicit TTL still applies to the existing manual cache markers;
removing top-level `providerOptions.anthropic.cacheControl` must **not**
silently disable 1-hour prompt caching.
- Anthropic models without explicit TTL continue to use the existing
manual cache-marker strategy.
- The change does not introduce a regression for gateway-routed
Anthropic models.

## Validation plan

1. **Targeted unit tests**
   - `bun test src/common/utils/ai/providerOptions.test.ts`
   - `bun test src/node/services/providerModelFactory.test.ts`
   - `bun test src/common/utils/ai/cacheStrategy.test.ts`
2. **Focused service regression test**
- run the smallest relevant additional test file only if Phase 3 adds
coverage in `aiService.test.ts` or `streamManager.test.ts`
3. **Static validation**
   - `make typecheck`
   - `make lint` if the touched files introduce new lint exposure
4. **Optional integration check**
- If Anthropic credentials are available in the environment, run a
narrow Anthropic integration exercise after the unit tests pass.
- Prefer a direct-provider reproduction with explicit `cacheTtl: "1h"`
and at least one tool-enabled request.

## Dogfooding plan

**Goal:** reproduce the original failure mode on the app path the user
actually hit, then verify the fix with evidence a reviewer can inspect.

### Setup
- Configure the **direct Anthropic provider** (not mux-gateway).
- Use an Anthropic model that supports the affected prompt-caching path.
- Enable explicit `anthropic.cacheTtl: "1h"`.
- Use **Exec** mode or any other tool-enabled flow that exercises tool
definitions in the request.

### Repro / verification flow
1. Start the app in a local dev session.
2. Select the direct Anthropic provider and confirm `cacheTtl` is
`"1h"`.
3. Run a simple tool-eligible request in Exec mode.
4. Verify the request completes **without** the Anthropic API error
about **5 cache-control blocks**.
5. If a debug request snapshot or local debug logging is available,
verify the final outgoing Anthropic payload is at **&lt;= 4** breakpoints.

### Evidence to capture
- **Screenshot 1:** provider/model settings showing direct Anthropic +
explicit `1h` TTL
- **Screenshot 2:** successful Exec/tool-enabled response where the old
error no longer appears
- **Screenshot 3 (if available):** debug snapshot or log evidence
showing the final cache-breakpoint count
- **Video recording:** a short end-to-end repro/verification run
covering provider selection, request submission, and successful
completion

### Suggested tooling for verification
- In exec mode, use the repo's normal desktop/dev workflow to reproduce
the conversation flow.
- If automation is helpful during implementation review, use the
desktop/browser automation tools available in exec mode to drive the app
and capture screenshots/video artifacts.

## Risks / non-goals

- **Non-goal for this fix:** full cache-system centralization across
`messagePipeline`, `streamManager`, and `providerModelFactory`.
- **Risk:** if another hidden Anthropic SDK path also materializes extra
cache markers, removing top-level `cacheControl` may not be sufficient
by itself.
- Mitigation: add the wire-level breakpoint counter test in Phase 2 so
the final payload shape is asserted directly.
- **Risk:** some tests may currently treat top-level
`anthropic.cacheControl` as the source of truth for TTL propagation.
- Mitigation: update those tests to assert the new invariant: TTL is
carried by the manual cache-marker path, not by top-level provider
options.

&lt;/details&gt;

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` •
Cost: `$12.62`_

&lt;!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=12.62
--&gt;
diff --git a/src/common/utils/ai/providerOptions.test.ts b/src/common/utils/ai/providerOptions.test.ts
@@ -235,7 +235,7 @@ describe("buildProviderOptions - Anthropic", () => {
   });
 
   describe("Anthropic cache TTL overrides", () => {
-    test("should include cacheControl ttl when configured", () => {
+    test("should omit top-level cacheControl even when cache TTL is configured", () => {
       const result = buildProviderOptions(
         "anthropic:claude-sonnet-4-5",
         "off",
@@ -250,15 +250,11 @@ describe("buildProviderOptions - Anthropic", () => {
         anthropic: {
           disableParallelToolUse: false,
           sendReasoning: true,
-          cacheControl: {
-            type: "ephemeral",
-            ttl: "1h",
-          },
         },
       });
     });
 
-    test("should include cacheControl ttl for Opus 4.6 effort models", () => {
+    test("should preserve Opus 4.6 reasoning options without top-level cacheControl", () => {
       const result = buildProviderOptions(
         "anthropic:claude-opus-4-6",
         "medium",
@@ -276,18 +272,14 @@ describe("buildProviderOptions - Anthropic", () => {
           thinking: {
             type: "adaptive",
           },
-          cacheControl: {
-            type: "ephemeral",
-            ttl: "5m",
-          },
           effort: "medium",
         },
       });
     });
   });
 
   describe("disableBetaFeatures", () => {
-    test("should omit cacheControl when disableBetaFeatures is true even with cacheTtl set", () => {
+    test("should keep omitting top-level cacheControl when disableBetaFeatures is true", () => {
       const result = buildProviderOptions(
         "anthropic:claude-sonnet-4-5",
         "medium",
@@ -303,7 +295,7 @@ describe("buildProviderOptions - Anthropic", () => {
       expect(anthropic.sendReasoning).toBe(true);
     });
 
-    test("should include cacheControl normally when disableBetaFeatures is false", () => {
+    test("should keep omitting top-level cacheControl when disableBetaFeatures is false", () => {
       const result = buildProviderOptions(
         "anthropic:claude-sonnet-4-5",
         "medium",
@@ -315,7 +307,8 @@ describe("buildProviderOptions - Anthropic", () => {
       );
       const anthropic = (result as Record<string, unknown>).anthropic as Record<string, unknown>;
 
-      expect(anthropic.cacheControl).toEqual({ type: "ephemeral", ttl: "1h" });
+      expect(anthropic.cacheControl).toBeUndefined();
+      expect(anthropic.sendReasoning).toBe(true);
     });
   });
 });
diff --git a/src/common/utils/ai/providerOptions.ts b/src/common/utils/ai/providerOptions.ts
@@ -254,9 +254,11 @@ export function buildProviderOptions(
 
   // Build Anthropic-specific options
   if (formatProvider === "anthropic") {
-    const disableBeta = muxProviderOptions?.anthropic?.disableBetaFeatures === true;
-    const cacheTtl = disableBeta ? undefined : muxProviderOptions?.anthropic?.cacheTtl;
-    const cacheControl = cacheTtl ? { type: "ephemeral" as const, ttl: cacheTtl } : undefined;
+    // Anthropic prompt caching is already applied on Mux's manual cache markers
+    // (cached system message, conversation tail, last tool) deeper in the
+    // request pipeline. Do not also send top-level cacheControl here: the SDK
+    // serializes it to a top-level cache_control block, which adds an extra
+    // breakpoint on direct Anthropic requests.
 
     // Opus 4.5+ and Sonnet 4.6 use the effort parameter for reasoning control.
     // Opus 4.6 / Sonnet 4.6 use adaptive thinking (model decides when/how much to think).
@@ -291,7 +293,6 @@ export function buildProviderOptions(
           disableParallelToolUse: false,
           sendReasoning: true,
           ...(thinking && { thinking }),
-          ...(cacheControl && { cacheControl }),
           effort: effortLevel,
         },
       } satisfies { anthropic: AnthropicProviderOptions };
@@ -310,7 +311,6 @@ export function buildProviderOptions(
       anthropic: {
         disableParallelToolUse: false, // Always enable concurrent tool execution
         sendReasoning: true, // Include reasoning traces in requests sent to the model
-        ...(cacheControl && { cacheControl }),
         // Conditionally add thinking configuration (non-Opus 4.5 models)
         ...(budgetTokens > 0 && {
           thinking: {
diff --git a/src/node/services/providerModelFactory.test.ts b/src/node/services/providerModelFactory.test.ts
@@ -8,6 +8,7 @@ import {
   ProviderModelFactory,
   buildAIProviderRequestHeaders,
   classifyCopilotInitiator,
+  countAnthropicCacheBreakpoints,
   modelCostsIncluded,
   MUX_AI_PROVIDER_USER_AGENT,
   resolveAIProviderHeaderSource,
@@ -500,6 +501,81 @@ describe("classifyCopilotInitiator", () => {
   });
 });
 
+describe("countAnthropicCacheBreakpoints", () => {
+  it("counts the intended three manual Anthropic cache breakpoints for direct requests", () => {
+    const requestBody = {
+      model: "claude-sonnet-4-5",
+      system: [
+        {
+          type: "text",
+          text: "You are a helpful assistant",
+          cache_control: { type: "ephemeral", ttl: "1h" },
+        },
+      ],
+      messages: [
+        {
+          role: "user",
+          content: [
+            { type: "text", text: "hello" },
+            {
+              type: "text",
+              text: "world",
+              cache_control: { type: "ephemeral", ttl: "1h" },
+            },
+          ],
+        },
+      ],
+      tools: [
+        {
+          name: "read_file",
+          input_schema: { type: "object" },
+        },
+        {
+          name: "bash",
+          input_schema: { type: "object" },
+          cache_control: { type: "ephemeral", ttl: "1h" },
+        },
+      ],
+    };
+
+    expect(countAnthropicCacheBreakpoints(requestBody)).toBe(3);
+  });
+
+  it("treats a top-level Anthropic cache_control block as an extra breakpoint", () => {
+    const requestBody = {
+      cache_control: { type: "ephemeral", ttl: "1h" },
+      system: [
+        {
+          type: "text",
+          text: "You are a helpful assistant",
+          cache_control: { type: "ephemeral", ttl: "1h" },
+        },
+      ],
+      messages: [
+        {
+          role: "user",
+          content: [
+            {
+              type: "text",
+              text: "world",
+              cache_control: { type: "ephemeral", ttl: "1h" },
+            },
+          ],
+        },
+      ],
+      tools: [
+        {
+          name: "bash",
+          input_schema: { type: "object" },
+          cache_control: { type: "ephemeral", ttl: "1h" },
+        },
+      ],
+    };
+
+    expect(countAnthropicCacheBreakpoints(requestBody)).toBe(4);
+  });
+});
+
 describe("resolveAIProviderHeaderSource", () => {
   it("uses Request headers when init.headers is not provided", () => {
     const input = new Request("https://example.com", {
diff --git a/src/node/services/providerModelFactory.ts b/src/node/services/providerModelFactory.ts
@@ -206,11 +206,68 @@ function mergeAnthropicCacheControl(
   return merged;
 }
 
+function isRecord(value: unknown): value is Record<string, unknown> {
+  return typeof value === "object" && value !== null;
+}
+
+function hasAnthropicProviderCacheControl(value: unknown): boolean {
+  if (!isRecord(value)) {
+    return false;
+  }
+
+  const anthropicOptions = value.anthropic;
+  return isRecord(anthropicOptions) && isRecord(anthropicOptions.cacheControl);
+}
+
+/**
+ * Count Anthropic prompt-cache breakpoints in a shaped request payload.
+ *
+ * This intentionally counts both raw `cache_control` blocks and untransformed
+ * `providerOptions.anthropic.cacheControl` markers so tests can guard the
+ * budget across direct and gateway request-shaping paths.
+ */
+export function countAnthropicCacheBreakpoints(requestBody: unknown): number {
+  const pending: unknown[] = [requestBody];
+  let count = 0;
+
+  while (pending.length > 0) {
+    const current = pending.pop();
+    if (current == null) {
+      continue;
+    }
+
+    if (Array.isArray(current)) {
+      for (const value of current) {
+        pending.push(value);
+      }
+      continue;
+    }
+
+    if (!isRecord(current)) {
+      continue;
+    }
+
+    if (isRecord(current.cache_control)) {
+      count += 1;
+    }
+
+    if (hasAnthropicProviderCacheControl(current.providerOptions)) {
+      count += 1;
+    }
+
+    for (const value of Object.values(current)) {
+      pending.push(value);
+    }
+  }
+
+  return count;
+}
+
 /**
- * Wrap fetch to inject Anthropic cache_control directly into the request body.
- * The AI SDK's providerOptions.anthropic.cacheControl doesn't get translated
- * to raw cache_control for tools or message content parts, so we inject it
- * at the HTTP level.
+ * Wrap fetch to normalize Anthropic cache_control directly on the final request body.
+ *
+ * This keeps routed Anthropic payloads aligned with Mux's manual cache markers
+ * and lets a higher-level cacheTtl override win at the last wire-shaping step.
  *
  * Injects cache_control on:
  * 1. Last tool (caches all tool definitions)
@@ -782,8 +839,7 @@ export class ProviderModelFactory {
 
         // Lazy-load Anthropic provider to reduce startup time
         const { createAnthropic } = await PROVIDER_REGISTRY.anthropic();
-        // Wrap fetch to inject cache_control on tools and messages
-        // (SDK doesn't translate providerOptions to cache_control for these)
+        // Wrap fetch to normalize cache_control on the final Anthropic payload.
         // Use getProviderFetch to preserve any user-configured custom fetch (e.g., proxies)
         const baseFetch = getProviderFetch(providerConfig);
         const disableBeta = muxProviderOptions?.anthropic?.disableBetaFeatures === true;
@@ -1375,8 +1431,7 @@ export class ProviderModelFactory {
         const { couponCode } = creds;
 
         const { createGateway } = await PROVIDER_REGISTRY["mux-gateway"]();
-        // For Anthropic models via gateway, wrap fetch to inject cache_control on tools
-        // (gateway provider doesn't process providerOptions.anthropic.cacheControl)
+        // For Anthropic models via gateway, normalize cache_control on the final payload.
         // Use getProviderFetch to preserve any user-configured custom fetch (e.g., proxies)
         const baseFetch = getProviderFetch(providerConfig);
         const isAnthropicModel = modelId.startsWith("anthropic/");
diff --git a/src/node/services/streamManager.test.ts b/src/node/services/streamManager.test.ts