Skip to content

fix(opencode): make OpenRouter prompt cache 1h TTL opt-in via env#30190

Open
rndmcnlly wants to merge 2 commits into
anomalyco:devfrom
rndmcnlly:fix/openrouter-cache-ttl
Open

fix(opencode): make OpenRouter prompt cache 1h TTL opt-in via env#30190
rndmcnlly wants to merge 2 commits into
anomalyco:devfrom
rndmcnlly:fix/openrouter-cache-ttl

Conversation

@rndmcnlly
Copy link
Copy Markdown

Issue for this PR

Closes #16848

(Re-submitting after #16850 was auto-closed by the stale bot.)

Type of change

  • Bug fix

What does this PR do?

OpenRouter's prompt cache defaults to a 5-minute TTL. Adds ttl: "1h" to the cacheControl provider option already set on OpenRouter messages, so caches survive normal pauses (review, edits, thinking) instead of getting rewritten every few minutes.

The earlier attempt in #16850 used a top-level prompt_cache_ttl parameter, which OpenRouter silently ignores. The per-message cache_control.ttl is the only mechanism that actually extends the TTL.

How did you verify your code works?

Ran a three-arm probe against anthropic/claude-opus-4.8, with a 6-minute gap past the 5-minute default. Approach B (cache_control.ttl: "1h") was the only one that wrote to cache and got a hit on R2 (21,635 cached tokens). Approaches A (prompt_cache_ttl=3600) and C (baseline) wrote nothing.

Script: https://gist.github.com/rndmcnlly/6680b63aeb26b5ea9ffbb56dab3030dc (uv + OPENROUTER_API_KEY, ~7 min runtime).

Checklist

  • I have tested my changes locally
  • I have not included unrelated changes in this PR

…trol

Use the documented cache_control { type: "ephemeral", ttl: "1h" } on
message content for OpenRouter. The default 5-minute TTL causes constant
cache misses and rewrites during normal usage gaps (reviewing output,
editing files, thinking). Extending to 1 hour significantly reduces
redundant cache write costs on Anthropic models via OpenRouter.

Empirically verified: only the per-message cache_control.ttl approach
survives past 5 minutes; the undocumented top-level prompt_cache_ttl
parameter is silently ignored.

Closes anomalyco#16848
@ualtinok
Copy link
Copy Markdown
Contributor

ualtinok commented Jun 1, 2026

I'm currently using 1h cache via https://github.com/cortexkit/anthropic-auth plugin and I can confirm that it reduces my consumption by ~38% in the last 15 days.

Replace the unconditional ttl: "1h" with an env-controlled resolver that
defaults to the cheaper 5m TTL, matching Claude Code's prompt-caching
semantics. 1h cache writes are billed at a higher rate (~2x base input vs
~1.25x for 5m), so it only pays off when a session resumes after >5m but
within the hour; imposing it on every request taxes autonomous/walk-away
runs that finish fast or idle past the hour.

Knobs (Anthropic-scoped, since cache_control.ttl is an Anthropic extension):
- OPENCODE_ANTHROPIC_PROMPT_CACHING_1H=1 opts into 1h
- OPENCODE_ANTHROPIC_FORCE_PROMPT_CACHING_5M=1 forces 5m (overrides the above)

Applied to both openrouter and openaiCompatible gateways. anthropic-native
and bedrock paths are unchanged.
@rndmcnlly
Copy link
Copy Markdown
Author

Revised: 1h TTL is now opt-in via env, defaulting to 5m

Pushed 317c1c0, which changes the design from an unconditional ttl: "1h" to an env-controlled resolver that defaults to the cheaper 5-minute TTL. The new knobs are Anthropic-scoped (since cache_control.ttl is an Anthropic extension):

  • OPENCODE_ANTHROPIC_PROMPT_CACHING_1H=1 opts into the 1h TTL
  • OPENCODE_ANTHROPIC_FORCE_PROMPT_CACHING_5M=1 forces 5m (overrides the opt-in)
  • unset -> 5m default

Applied to both the openrouter and openaiCompatible gateways. The anthropic-native and bedrock paths are unchanged.

Why opt-in, by analogy to Claude Code

Claude Code models this exact decision with three orthogonal env vars and, crucially, makes 1h opt-in rather than default:

Variable Effect
ENABLE_PROMPT_CACHING_1H=1 Request 1h TTL (opt-in)
FORCE_PROMPT_CACHING_5M=1 Force 5m even when 1h would apply (overrides the above)
DISABLE_PROMPT_CACHING[_OPUS/_SONNET/_HAIKU]=1 Disable caching entirely

Their stated reasoning (Claude Code env-vars, How Claude Code uses prompt caching):

On an API key, Bedrock, Vertex, Foundry, or Claude Platform on AWS, you pay the per-token rates, so the TTL stays at the cheaper five minutes by default. To opt into the one-hour TTL, set ENABLE_PROMPT_CACHING_1H=1.

On a Claude subscription, Claude Code requests the one-hour TTL automatically. Usage is included in your plan rather than billed per token, so the longer TTL costs you nothing extra.

The split is along the billing model: per-token paths default to 5m; only subscription paths (where 1h is plan-included and effectively free) auto-enable 1h. OpenRouter is a per-token path, so 5m-default-with-opt-in is the matching behavior.

The cost math the previous version ignored

1h cache writes are billed at a higher rate than 5m writes (Anthropic prompt caching, corroborated by Cadence and Prism):

Write multiplier Read
5m TTL 1.25x 0.1x
1h TTL 2.0x 0.1x

So 1h only pays off when a session resumes after a >5m gap but within the hour (one 1h write at 2.0x beats two consecutive 5m writes at 2 x 1.25 = 2.5x). For autonomous/walk-away runs that finish fast, or idle past the hour, the unconditional 1h premium was pure waste. See AgentPatterns: Extended Prompt Cache TTL for the break-even derivation:

Interactive review sessions are the canonical fit; autonomous loops and walk-away sessions are not. ... ENABLE_PROMPT_CACHING_1H paints every breakpoint with the 1-hour premium, including small dynamic blocks where the cost rarely pays back.

The original commit message's motivation (cache misses during review/edit/think gaps) is real and valid; this revision preserves it as an opt-in for exactly that workflow without taxing everyone else.

How other harnesses handle it

  • aider doesn't expose a 1h TTL at all; it keeps the cheap 5m cache warm with keepalive pings (--cache-keepalive-pings N), and caching itself is opt-in via --cache-prompts (aider caching docs).
  • Anthropic Agent SDK / raw API uses per-breakpoint cache_control.ttl: "1h" for finer control (1h blocks must precede 5m blocks in the same request).

Tests

Reverted the assertion that expected the hardcoded ttl: "1h" (default is now 5m), and added two tests: one verifying the 1h opt-in lands on both gateways while leaving anthropic-native untouched, and one verifying force-5m overrides the opt-in. bun test test/provider/transform.test.ts -> 240 pass, bun typecheck clean.

@rndmcnlly rndmcnlly changed the title fix(opencode): set 1h prompt cache TTL for OpenRouter fix(opencode): make OpenRouter prompt cache 1h TTL opt-in via env Jun 1, 2026
@ualtinok
Copy link
Copy Markdown
Contributor

ualtinok commented Jun 1, 2026

@rndmcnlly I would advise either by default or with a configuration to make 1h cache based on if the agent is a primary or subagent. For ephemeral subagents, it's better to use 5m instead of 1h.

@rndmcnlly
Copy link
Copy Markdown
Author

@ualtinok Thanks for both the field data and the design instinct here. The ~38% reduction over 15 days lands right next to the ~41% my cache simulation estimated in #16848 (comment) (#16848 (comment)), so two independent methods are converging on "this is a large pile of money."

Subagents are a clean discriminator. Ephemeral subagents rarely resume activity after a 5m gap because there's no human sitting and thinking, so they'd mostly just eat the 2.0x write premium with no payback. I dug into the code and the signal is already present at request-prep time (agent.mode === "subagent", and equivalently a non-null parentSessionID), so it's implementable without heuristics.

Where I'm still uncertain is the right long-term control surface for it: a static subagent-detection heuristic baked into defaults, the current env-var approach, a config knob, or some combination. That deserves its own discussion after we stop the bleeding rather than something to settle inline here.

My proposal: land this PR as-is. It's the conservative move: opt-in, defaults to the cheaper 5m, and matches Claude Code's per-token semantics. It only benefits users paying close enough attention to spot the env var, which is exactly the population that can evaluate the tradeoff for their workflow. That stops a major source of waste for power users today without changing anyone's defaults. Then, once it's merged, I'll open a follow-up issue on extending these benefits to a less-technical audience.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: Set prompt_cache_ttl for OpenRouter provider

2 participants