perf(providers): cache conversation history on Anthropic requests by kamushadenes · Pull Request #1044 · nextlevelbuilder/goclaw

kamushadenes · 2026-04-27T02:59:45Z

Summary

Adds a rolling cache_control breakpoint on the last message in every Anthropic request. Without this, conversation history is sent uncached every turn — the dominant cost line on long agent sessions.

System prompt and the last tool definition were already cached; messages were not, contradicting the description in #1042. Anthropic permits up to 4 breakpoints, so we have room.

Refs #1042.

Evidence (28h on a deployment routed via `provider_type=anthropic_native`)

Agent	Fresh input	cache_read	cache_create	Real hit ratio	Calls
chloe	3.84M	2.86M	1.18M	36% (2.86/7.88)	110
bones	239k	586k	253k	71%	32

Chloe's 36% comes purely from system + tools. Each turn sends 21–57k tokens of conversation history at full input rate. With this PR, that history rolls forward as cache_read on every subsequent turn within the 5-min TTL.

Estimated impact for Chloe: ~$443/mo → ~$130–180/mo (60–70% reduction).

Changes

internal/providers/anthropic_request.go:

New applyCacheControlToLastMessage() helper called once in buildRequestBody before the body map is assembled.
Handles all three content shapes used in the function:
- string (typical text-only user message): converted to a single text block to attach cache_control. Anthropic accepts both shapes.
- []map[string]any (multi-modal user, tool_result, assistant text+tool_use): cache_control on the last block.
- []json.RawMessage (assistant raw blocks preserving thinking signatures): re-marshals the last block; decode failures skip silently to avoid corrupting the request.

internal/providers/anthropic_message_cache_test.go:

6 unit tests covering all three content shapes, empty messages, tool_result blocks, and the invalid-JSON edge case for raw assistant content.

Trade-off considered

cache_create is 1.25x input on the new tail each turn, while cache_read is 0.10x on everything before. Net positive for any session with > 1 turn of reuse within the 5-minute TTL. For ephemeral one-shot delegations, the loss is bounded by the size of a single user message (a few hundred to a few thousand tokens) — well below the savings on the cached prefix.

A second rotating breakpoint (last 2 user-role messages) would extend cache survival across more concurrent contexts. Held off here to keep the change minimal; happy to add if reviewers prefer.

Test plan

go test ./internal/providers/ — 562 passed
go vet ./internal/providers/ — clean
go build ./... — PG build passes
go build -tags sqliteonly ./... — SQLite build passes
6 new unit tests covering string content, block array, tool_result, raw assistant blocks, invalid JSON, empty slice
Production validation pending: monitor usage_snapshots.cache_read_tokens for 24–48h after deploy

Out of scope

This does not address the OpenAI-compat path issue described in #1042 (openai_config.go:99 hardcoded CacheControl: false). That still requires Option 1 or Option 2 from the issue body. The fix here only helps agents bound to a native Anthropic adapter (provider_type=anthropic or anthropic_native via OpenRouter).

WhatsApp LID-format chat IDs contain @ (e.g. 551152861098:5@s.whatsapp.net) which is invalid in Docker container names, causing sandbox creation to fail for any WhatsApp-triggered agent session. Add @ to the sanitizeKey replacer alongside the existing : / . and space characters. Adds a test case with a realistic WhatsApp LID key. Fixes nextlevelbuilder#1029 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The stateless flag was supposed to prevent session history accumulation, but the reset logic was inverted: non-stateless jobs got reset while stateless jobs were skipped. Since the agent loop always persists messages to the session key regardless of the stateless flag, stateless cron jobs accumulated unbounded history across runs. Fix by unconditionally resetting the session before every cron execution. This is consistent with nextlevelbuilder#294 (which added the reset for non-stateless jobs) and ensures stateless jobs actually start fresh each run. Fixes nextlevelbuilder#1029-related session accumulation observed in production. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

When an LLM wraps a credentialed CLI in a shell chain like "which gh && gh pr list", lookupCredentialedBinary only checks the first binary ("which") and misses "gh". The command falls through to regular exec, running without credential injection — the CLI reports "not authenticated" and the agent gives up. Add detectCredentialedBinaryInChain() which scans all segments of a shell-operator command for registered credentialed binaries. When found, returns an actionable error telling the LLM to call the CLI directly without shell operators, instead of silently falling through. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ction When LLMs wrap credentialed CLIs in shell chains (e.g. "which gh && gh pr list"), the credentialed exec gate only checks the first binary and misses the CLI deeper in the chain. The command falls through to regular exec without credential injection. Add a per-CLI `allow_chain_exec` boolean (default: false) that controls behavior when this is detected: - **false (default)**: return an actionable error telling the LLM to call the CLI directly without shell operators (safe, no token leak) - **true**: inject all matching credential env vars into the full command chain and execute via shell (convenient but tokens visible to all commands in the chain) Changes: - Migration 000057: add `allow_chain_exec` column to `secure_cli_binaries` - Store: SecureCLIBinary struct + PG/SQLite CRUD (select, insert, update, scan) - HTTP API: create/update request structs + allowlist - Exec logic: handleCredentialedChain() with two-mode dispatch - Credential context: per-CLI note when chain exec is enabled - Web UI: toggle switch in CLI credential settings form - i18n: English labels + hint text Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a rolling cache_control breakpoint on the last message in every Anthropic request body. Without this, conversation history was sent uncached every turn, dominating cost on long agent sessions — observed 36% effective cache hit on a 187-message Slack thread that should have been ~80%. System prompt and the last tool definition were already cached; messages were not. Anthropic allows up to 4 breakpoints, leaving 2 free for messages. Handles all three content shapes used in `buildRequestBody`: - Plain string (typical text-only user message): converted to a single text block so cache_control can be attached. Anthropic accepts both shapes. - []map[string]any (multi-modal user, tool_result, assistant text+tool_use): cache_control attaches to the last block. - []json.RawMessage (assistant raw blocks preserving thinking signatures): last block is re-marshaled with cache_control; decode failures are skipped silently to avoid corrupting the request body. Refs nextlevelbuilder#1042. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

kamushadenes and others added 16 commits April 24, 2026 17:52

fix: sanitize @ in sandbox names

ce11fb4

fix: always reset cron sessions

90948f5

feat: chain exec + allow_chain_exec

160a7ad

fix: bump RequiredSchemaVersion to 57 for allow_chain_exec migration

b21495e

fix: bump schema version to 57

e25a317

fix: add missing $16 placeholder in secure_cli INSERT

ef710d2

fix: missing $16 placeholder in secure_cli INSERT

0842150

fix: add missing 16th placeholder in SQLite secure_cli INSERT

73d8432

fix: SQLite INSERT placeholder count

9a2f847

feat(sandbox): mount data volume read-only for skills/config access

f89cb1e

feat: mount data volume ro in sandbox

ffc464c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(providers): cache conversation history on Anthropic requests#1044

perf(providers): cache conversation history on Anthropic requests#1044
kamushadenes wants to merge 16 commits intonextlevelbuilder:devfrom
kamushadenes:kamushadenes/cache-messages-rolling

kamushadenes commented Apr 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kamushadenes commented Apr 27, 2026

Summary

Evidence (28h on a deployment routed via provider_type=anthropic_native)

Changes

Trade-off considered

Test plan

Out of scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Evidence (28h on a deployment routed via `provider_type=anthropic_native`)