Replies: 4 comments 3 replies
-
|
Gap 1 — I don't think this is actually a problem |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Gap 3 — No 1-hour TTL option done! |
Beta Was this translation helpful? Give feedback.
0 replies
-
|
Gap 2 — The cache argument doesn't hold up |
Beta Was this translation helpful? Give feedback.
1 reply
-
|
Gap 4 — Not a problem in ECA's current design |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
How Anthropic prompt caching works
Before comparing implementations, these mechanics apply to both:
Exact prefix matching, KV-cache based. Each block's hash encodes all preceding blocks. A
cache hit at block N guarantees blocks 0–N-1 are byte-for-byte identical. Even one changed token
is a miss.
Processing order is a strict hierarchy:
tools → system → messages. Changing tools buststhe system and message caches. Changing the system prompt busts the message cache. Each layer
only caches if the layers above it are stable.
Up to 4 explicit
cache_controlbreakpoints per request. Each is independentlyinvalidatable. Breakpoints are free — you only pay for writes (1.25× base input price for 5-min
TTL, 2× for 1-hour TTL) and reads (0.1× base). One cache read breaks even on a 5-min write.
The critical 20-block lookback window. When a request has a
cache_controlbreakpoint,Anthropic automatically walks backward up to ~20 content blocks looking for any prior write it
can hit. Content more than 20 blocks before your breakpoint is invisible to cache lookup
unless you add an additional explicit breakpoint before it.
Cache TTL resets on every hit. The 5-min clock resets each time the cache is read. A
1-hour TTL (
"ttl": "1h") is available at higher write cost.Tool-call pairs each consume 2 blocks. Each
tool_call(assistant turn) +tool_call_output(user turn) pair = 2 content blocks toward the 20-block window. 10 toolcalls = 20 blocks consumed, pushing earlier history out of the automatic lookback range.
Caching strategy comparison
System prompt
cache_controlon the whole blockstatic(rarely changes) +dynamic(per-session context);cache_controlon eachTools
cache_controlon last tool definition, caching the full tools arraycache_controlon last tool definition viaadd-cache-to-last-tool, same approachMessage history (normal turn, no compaction)
cache_controlon the last content block of the last user messagecache_controlon the last content block of the last user message viaadd-cache-to-last-messageLong conversations (>20 message blocks)
Compaction
[compaction_summary_user_msg]+[kept_msgs: firstKeptEntryId … compaction]+[post-compact turns][compact_marker (invisible)]+[summary_user_msg]+[post-compact turns]firstKeptEntryId; these overlap with the prior session's cached prefixcompact_markeris stripped from the LLM's viewisSplitTurndetection prevents the cut from landing inside auser → assistant → tool_resultturnCache TTL
short)PI_CACHE_RETENTION=long→ 1-hour TTL onapi.anthropic.comResume / fork from before a compaction point
/forkcopies full raw:messages;messages-after-last-compact-markerstill applies → if a compact marker exists in the fork, the LLM still sees only summary + post-compact messagesnavigateTreemoves the leaf pointer;buildSessionContextwalks the new path with no compaction entry → LLM receives clean raw history to that point; Anthropic may have this prefix cached from the original sessionSummary of ECA gaps
Gap 1 — Long conversations exceed the 20-block lookback (quick win)
An agentic session with 10+ tool calls generates 20+ blocks from tool pairs alone. With a single
breakpoint at the last user message, earlier turns fall outside the lookback window and are never
cached.
Example: 30-block conversation, breakpoint on block 30. Next turn: breakpoint moves to block
31, lookback reaches block 11 — blocks 1–10 are invisible and never cached.
Fix: Place a second
cache_controlbreakpoint ~20 blocks before the final one. This extendscache coverage across the full conversation with minimal code change (one additional marker
placement in
add-cache-to-last-messageor its caller).Gap 2 — Compaction breaks the message-layer cache entirely
After
/compact, the message prefix sent to the LLM starts with a freshly generated summary thatAnthropic has never seen. This is a guaranteed cache miss. ECA then builds a new cached prefix
from scratch over subsequent turns.
Pi avoids this by keeping a trailing window of real messages (
firstKeptEntryId) after thesummary. These messages overlap with the prior session's cached prefix, so Anthropic can hit the
kept portion on the first post-compact turn.
Fix: Preserve the last N tokens (e.g. 8–16k) of real messages after the summary in
compact-side-effect!. Updatemessages-after-last-compact-markerto return[summary][kept_msgs...][new_turns...]. The kept window is stable across several subsequentturns and accumulates cache hits naturally. The tombstone marker introduced in commit
69011cfalready keeps full display history available, so this is purely a change to what gets passed to
the LLM.
Gap 3 — No 1-hour TTL option
Every coding session involves pauses: reading output, thinking, context-switching. With a 5-min
TTL, any pause longer than 5 minutes expires the cache and the next turn pays full re-write cost.
Fix: Add a config option (e.g.
anthropic.cacheRetention: "1h") that switches thecache_controlTTL to{"type": "ephemeral", "ttl": "1h"}. The optimal strategy would apply1-hour TTL to the stable layers (tools, static system) — which are re-used across many turns and
sessions — and keep 5-min on the rolling message layer, which changes every turn anyway.
Gap 4 — Compact boundary must not split a tool-call pair (prerequisite for Gap 2)
When implementing the kept window (Gap 2), the start of the kept window must land on a clean
message boundary — a
usermessage that is not atool_call_output. Starting inside atool_call/tool_call_outputpair would produce an invalid Anthropic request (mismatchedtool_use_id).Fix: When selecting the first kept message in compact, walk forward until the boundary is a
non-tool-output user message. This mirrors pi's
isSplitTurndetection.Priority
cache_controlplacementcompact-side-effect!+messages-after-last-compact-markeranthropic.cljReferences
7754a05: static/dynamic system prompt split69011cf: tombstone compact markerspackages/ai/src/providers/anthropic.ts:convertMessages,getCacheControlpackages/coding-agent/src/core/session-manager.ts:buildSessionContext,firstKeptEntryIdpackages/coding-agent/src/core/compaction/compaction.ts:prepareCompaction,findCutPointBeta Was this translation helpful? Give feedback.
All reactions