fix(opencode): make OpenRouter prompt cache 1h TTL opt-in via env by rndmcnlly · Pull Request #30190 · anomalyco/opencode

rndmcnlly · 2026-06-01T05:48:56Z

Issue for this PR

(Re-submitting after #16850 was auto-closed by the stale bot.)

Type of change

Bug fix

What does this PR do?

OpenRouter's prompt cache defaults to a 5-minute TTL. Adds ttl: "1h" to the cacheControl provider option already set on OpenRouter messages, so caches survive normal pauses (review, edits, thinking) instead of getting rewritten every few minutes.

The earlier attempt in #16850 used a top-level prompt_cache_ttl parameter, which OpenRouter silently ignores. The per-message cache_control.ttl is the only mechanism that actually extends the TTL.

How did you verify your code works?

Ran a three-arm probe against anthropic/claude-opus-4.8, with a 6-minute gap past the 5-minute default. Approach B (cache_control.ttl: "1h") was the only one that wrote to cache and got a hit on R2 (21,635 cached tokens). Approaches A (prompt_cache_ttl=3600) and C (baseline) wrote nothing.

Script: https://gist.github.com/rndmcnlly/6680b63aeb26b5ea9ffbb56dab3030dc (uv + OPENROUTER_API_KEY, ~7 min runtime).

Checklist

I have tested my changes locally
I have not included unrelated changes in this PR

…trol Use the documented cache_control { type: "ephemeral", ttl: "1h" } on message content for OpenRouter. The default 5-minute TTL causes constant cache misses and rewrites during normal usage gaps (reviewing output, editing files, thinking). Extending to 1 hour significantly reduces redundant cache write costs on Anthropic models via OpenRouter. Empirically verified: only the per-message cache_control.ttl approach survives past 5 minutes; the undocumented top-level prompt_cache_ttl parameter is silently ignored. Closes anomalyco#16848

ualtinok · 2026-06-01T05:59:00Z

I'm currently using 1h cache via https://github.com/cortexkit/anthropic-auth plugin and I can confirm that it reduces my consumption by ~38% in the last 15 days.

Replace the unconditional ttl: "1h" with an env-controlled resolver that defaults to the cheaper 5m TTL, matching Claude Code's prompt-caching semantics. 1h cache writes are billed at a higher rate (~2x base input vs ~1.25x for 5m), so it only pays off when a session resumes after >5m but within the hour; imposing it on every request taxes autonomous/walk-away runs that finish fast or idle past the hour. Knobs (Anthropic-scoped, since cache_control.ttl is an Anthropic extension): - OPENCODE_ANTHROPIC_PROMPT_CACHING_1H=1 opts into 1h - OPENCODE_ANTHROPIC_FORCE_PROMPT_CACHING_5M=1 forces 5m (overrides the above) Applied to both openrouter and openaiCompatible gateways. anthropic-native and bedrock paths are unchanged.

rndmcnlly · 2026-06-01T06:12:24Z

Revised: 1h TTL is now opt-in via env, defaulting to 5m

Pushed 317c1c0, which changes the design from an unconditional ttl: "1h" to an env-controlled resolver that defaults to the cheaper 5-minute TTL. The new knobs are Anthropic-scoped (since cache_control.ttl is an Anthropic extension):

OPENCODE_ANTHROPIC_PROMPT_CACHING_1H=1 opts into the 1h TTL
OPENCODE_ANTHROPIC_FORCE_PROMPT_CACHING_5M=1 forces 5m (overrides the opt-in)
unset -> 5m default

Applied to both the openrouter and openaiCompatible gateways. The anthropic-native and bedrock paths are unchanged.

Why opt-in, by analogy to Claude Code

Claude Code models this exact decision with three orthogonal env vars and, crucially, makes 1h opt-in rather than default:

Variable	Effect
`ENABLE_PROMPT_CACHING_1H=1`	Request 1h TTL (opt-in)
`FORCE_PROMPT_CACHING_5M=1`	Force 5m even when 1h would apply (overrides the above)
`DISABLE_PROMPT_CACHING[_OPUS/_SONNET/_HAIKU]=1`	Disable caching entirely

Their stated reasoning (Claude Code env-vars, How Claude Code uses prompt caching):

On an API key, Bedrock, Vertex, Foundry, or Claude Platform on AWS, you pay the per-token rates, so the TTL stays at the cheaper five minutes by default. To opt into the one-hour TTL, set ENABLE_PROMPT_CACHING_1H=1.

On a Claude subscription, Claude Code requests the one-hour TTL automatically. Usage is included in your plan rather than billed per token, so the longer TTL costs you nothing extra.

The split is along the billing model: per-token paths default to 5m; only subscription paths (where 1h is plan-included and effectively free) auto-enable 1h. OpenRouter is a per-token path, so 5m-default-with-opt-in is the matching behavior.

The cost math the previous version ignored

1h cache writes are billed at a higher rate than 5m writes (Anthropic prompt caching, corroborated by Cadence and Prism):

	Write multiplier	Read
5m TTL	1.25x	0.1x
1h TTL	2.0x	0.1x

So 1h only pays off when a session resumes after a >5m gap but within the hour (one 1h write at 2.0x beats two consecutive 5m writes at 2 x 1.25 = 2.5x). For autonomous/walk-away runs that finish fast, or idle past the hour, the unconditional 1h premium was pure waste. See AgentPatterns: Extended Prompt Cache TTL for the break-even derivation:

Interactive review sessions are the canonical fit; autonomous loops and walk-away sessions are not. ... ENABLE_PROMPT_CACHING_1H paints every breakpoint with the 1-hour premium, including small dynamic blocks where the cost rarely pays back.

The original commit message's motivation (cache misses during review/edit/think gaps) is real and valid; this revision preserves it as an opt-in for exactly that workflow without taxing everyone else.

How other harnesses handle it

aider doesn't expose a 1h TTL at all; it keeps the cheap 5m cache warm with keepalive pings (--cache-keepalive-pings N), and caching itself is opt-in via --cache-prompts (aider caching docs).
Anthropic Agent SDK / raw API uses per-breakpoint cache_control.ttl: "1h" for finer control (1h blocks must precede 5m blocks in the same request).

Tests

Reverted the assertion that expected the hardcoded ttl: "1h" (default is now 5m), and added two tests: one verifying the 1h opt-in lands on both gateways while leaving anthropic-native untouched, and one verifying force-5m overrides the opt-in. bun test test/provider/transform.test.ts -> 240 pass, bun typecheck clean.

ualtinok · 2026-06-01T10:38:01Z

@rndmcnlly I would advise either by default or with a configuration to make 1h cache based on if the agent is a primary or subagent. For ephemeral subagents, it's better to use 5m instead of 1h.

rndmcnlly · 2026-06-01T23:46:38Z

@ualtinok Thanks for both the field data and the design instinct here. The ~38% reduction over 15 days lands right next to the ~41% my cache simulation estimated in #16848 (comment) (#16848 (comment)), so two independent methods are converging on "this is a large pile of money."

Subagents are a clean discriminator. Ephemeral subagents rarely resume activity after a 5m gap because there's no human sitting and thinking, so they'd mostly just eat the 2.0x write premium with no payback. I dug into the code and the signal is already present at request-prep time (agent.mode === "subagent", and equivalently a non-null parentSessionID), so it's implementable without heuristics.

Where I'm still uncertain is the right long-term control surface for it: a static subagent-detection heuristic baked into defaults, the current env-var approach, a config knob, or some combination. That deserves its own discussion after we stop the bleeding rather than something to settle inline here.

My proposal: land this PR as-is. It's the conservative move: opt-in, defaults to the cheaper 5m, and matches Claude Code's per-token semantics. It only benefits users paying close enough attention to spot the env var, which is exactly the population that can evaluate the tradeoff for their workflow. That stops a major source of waste for power users today without changing anyone's defaults. Then, once it's merged, I'll open a follow-up issue on extending these benefits to a less-technical audience.

rndmcnlly mentioned this pull request Jun 1, 2026

fix(opencode): set 1h prompt cache TTL for OpenRouter #16850

Closed

6 tasks

rndmcnlly changed the title ~~fix(opencode): set 1h prompt cache TTL for OpenRouter~~ fix(opencode): make OpenRouter prompt cache 1h TTL opt-in via env Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(opencode): make OpenRouter prompt cache 1h TTL opt-in via env#30190

fix(opencode): make OpenRouter prompt cache 1h TTL opt-in via env#30190
rndmcnlly wants to merge 2 commits into
anomalyco:devfrom
rndmcnlly:fix/openrouter-cache-ttl

rndmcnlly commented Jun 1, 2026

Uh oh!

ualtinok commented Jun 1, 2026

Uh oh!

rndmcnlly commented Jun 1, 2026

Uh oh!

ualtinok commented Jun 1, 2026

Uh oh!

rndmcnlly commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rndmcnlly commented Jun 1, 2026

Issue for this PR

Type of change

What does this PR do?

How did you verify your code works?

Checklist

Uh oh!

ualtinok commented Jun 1, 2026

Uh oh!

rndmcnlly commented Jun 1, 2026

Revised: 1h TTL is now opt-in via env, defaulting to 5m

Why opt-in, by analogy to Claude Code

The cost math the previous version ignored

How other harnesses handle it

Tests

Uh oh!

ualtinok commented Jun 1, 2026

Uh oh!

rndmcnlly commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants