Skip to content

Commit 9755c21

Browse files
committed
Add Anthropic cache anchor to survive wide-iteration cache misses
1 parent b7a4baf commit 9755c21

3 files changed

Lines changed: 629 additions & 19 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ All notable changes to Sofos are documented in this file.
66

77
### Fixed
88

9+
- **Anthropic prefix cache survives wide multi-tool iterations.** Anthropic's prompt cache walks back at most 20 blocks from each `cache_control` marker; without a secondary anchor, a single iteration adding more than ~20 blocks (parallel tool calls returning many ToolUse / ToolResult blocks at once) cold-missed the rolling lookup and re-billed the entire prefix at full price. `ConversationHistory::maintain_cache_anchor` now tracks a stateful anchor index that stays put across turns and only advances when intermediate distance crosses 18 blocks; `request_builder` stamps a secondary `cache_control` at that position so the anchor's own lookup hits exactly across wide turns. Intermediate blocks between the anchor and the new rolling still re-bill at full price on wide turns — that's a fundamental limit of Anthropic's 20-block window, not a sofos bug. See `project_anthropic_20_block_lookback.md`.
910
- **Cost no longer grows exponentially across iterations.** Two bugs introduced with the 800k context budget kept invalidating the provider prompt cache, so each agent-loop iteration re-billed the entire prefix at full price (~$1–2 first pass → ~$15 second → ~$50 third). Both are fixed:
1011
1. **Stable `prompt_cache_key`.** Every OpenAI Responses request now carries a `prompt_cache_key` pinned to the REPL's `session_state.session_id`, so consecutive requests share a prompt-cache shard.
1112
2. **No more mid-loop suffix mutation.** The "phase-1 compaction" introduced in 0.2.5 truncated tool-result bodies in older messages between iterations, which evicted the cached prefix on every iteration. Removed; the existing front-trim in `trim_if_needed` already handles overflow without touching the cached suffix, and `/compact` still handles structural summarization.

0 commit comments

Comments
 (0)