Skip to content

perf: cache static methodology prompts via SDK system blocks (cache_control: ephemeral) #122

@sriumcp

Description

@sriumcp

TL;DR

Mark the 266-line design.md and 199-line execute_analyze.md as ephemeral-cached system blocks on every SDK call. Expected reduction: 30–50% input tokens once warm.

Why this matters

Every claude -p invocation today re-sends the full methodology prompt uncached. Across a 5-iteration campaign that's:

  • design.md (~266 lines × 5 iterations) + execute_analyze.md (~199 lines × 5 iterations) = ~2300 lines of methodology re-tokenized 10 times.
  • Plus per-iteration handoff (~150 lines, growing) and principles.json (up to 26 entries on mech-design-enforcement).

The Anthropic prompt cache has a 5-minute TTL. Within a single phase the cache should hit; designer→executor transitions are typically < 5 minutes.

What's already shipped

Proposed approach

  1. After Agentic Strategy Evolution: a three-loop methodology for optimizing multi-layer policy spaces #1 lands, restructure the prompt assembly so the methodology is a system block with cache_control: {"type": "ephemeral"} and the per-iteration context (handoff, principles) is in the user message (uncached).
  2. Verify cache hits via SDK usage.cache_read_input_tokens and emit it to llm_metrics.jsonl.
  3. Add a nous cost --cache-stats flag showing cache hit rate per campaign.

Acceptance criteria

  • After two successive phase calls within 5 minutes, the second shows non-zero cache_read_input_tokens.
  • On a representative 5-iteration campaign, total input tokens decrease by ≥ 25% vs the pre-change baseline.
  • nous cost --cache-stats exists and is documented.

Notes for implementers

  • Keep the user message as the pivot for cache-busting per iteration; never bury per-iter content inside the cached system block.
  • The SDK uses an array of system blocks; only mark the methodology ones as cached.

Part of #120.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions