TL;DR
Mark the 266-line design.md and 199-line execute_analyze.md as ephemeral-cached system blocks on every SDK call. Expected reduction: 30–50% input tokens once warm.
Why this matters
Every claude -p invocation today re-sends the full methodology prompt uncached. Across a 5-iteration campaign that's:
- design.md (~266 lines × 5 iterations) + execute_analyze.md (~199 lines × 5 iterations) = ~2300 lines of methodology re-tokenized 10 times.
- Plus per-iteration handoff (~150 lines, growing) and principles.json (up to 26 entries on
mech-design-enforcement).
The Anthropic prompt cache has a 5-minute TTL. Within a single phase the cache should hit; designer→executor transitions are typically < 5 minutes.
What's already shipped
Proposed approach
- After Agentic Strategy Evolution: a three-loop methodology for optimizing multi-layer policy spaces #1 lands, restructure the prompt assembly so the methodology is a
system block with cache_control: {"type": "ephemeral"} and the per-iteration context (handoff, principles) is in the user message (uncached).
- Verify cache hits via SDK
usage.cache_read_input_tokens and emit it to llm_metrics.jsonl.
- Add a
nous cost --cache-stats flag showing cache hit rate per campaign.
Acceptance criteria
Notes for implementers
- Keep the user message as the pivot for cache-busting per iteration; never bury per-iter content inside the cached system block.
- The SDK uses an array of system blocks; only mark the methodology ones as cached.
Part of #120.
TL;DR
Mark the 266-line
design.mdand 199-lineexecute_analyze.mdas ephemeral-cached system blocks on every SDK call. Expected reduction: 30–50% input tokens once warm.Why this matters
Every
claude -pinvocation today re-sends the full methodology prompt uncached. Across a 5-iteration campaign that's:mech-design-enforcement).The Anthropic prompt cache has a 5-minute TTL. Within a single phase the cache should hit; designer→executor transitions are typically < 5 minutes.
What's already shipped
Proposed approach
systemblock withcache_control: {"type": "ephemeral"}and the per-iteration context (handoff, principles) is in the user message (uncached).usage.cache_read_input_tokensand emit it tollm_metrics.jsonl.nous cost --cache-statsflag showing cache hit rate per campaign.Acceptance criteria
cache_read_input_tokens.nous cost --cache-statsexists and is documented.Notes for implementers
Part of #120.