perf: cache static methodology prompts via SDK system blocks (`cache_control: ephemeral`)

## TL;DR
Mark the 266-line `design.md` and 199-line `execute_analyze.md` as ephemeral-cached system blocks on every SDK call. Expected reduction: 30–50% input tokens once warm.

## Why this matters
Every `claude -p` invocation today re-sends the full methodology prompt uncached. Across a 5-iteration campaign that's:

- design.md (~266 lines × 5 iterations) + execute_analyze.md (~199 lines × 5 iterations) = ~2300 lines of methodology re-tokenized 10 times.
- Plus per-iteration handoff (~150 lines, growing) and principles.json (up to 26 entries on `mech-design-enforcement`).

The Anthropic prompt cache has a 5-minute TTL. Within a single phase the cache should hit; designer→executor transitions are typically < 5 minutes.

## What's already shipped
- #52 introduced compact handoff (variable context). This issue handles the *static* part — the methodology prompts themselves.
- #41 added token/cost tracking. This will let us measure the cache hit rate empirically.

## Proposed approach
1. After #1 lands, restructure the prompt assembly so the methodology is a `system` block with `cache_control: {"type": "ephemeral"}` and the per-iteration context (handoff, principles) is in the user message (uncached).
2. Verify cache hits via SDK `usage.cache_read_input_tokens` and emit it to `llm_metrics.jsonl`.
3. Add a `nous cost --cache-stats` flag showing cache hit rate per campaign.

## Acceptance criteria
- [ ] After two successive phase calls within 5 minutes, the second shows non-zero `cache_read_input_tokens`.
- [ ] On a representative 5-iteration campaign, total input tokens decrease by ≥ 25% vs the pre-change baseline.
- [ ] `nous cost --cache-stats` exists and is documented.

## Notes for implementers
- Keep the user message as the pivot for cache-busting per iteration; never bury per-iter content inside the cached system block.
- The SDK uses an array of system blocks; only mark the methodology ones as cached.

---
*Part of #120.*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: cache static methodology prompts via SDK system blocks (`cache_control: ephemeral`) #122

TL;DR

Why this matters

What's already shipped

Proposed approach

Acceptance criteria

Notes for implementers

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

perf: cache static methodology prompts via SDK system blocks (cache_control: ephemeral) #122

Description

TL;DR

Why this matters

What's already shipped

Proposed approach

Acceptance criteria

Notes for implementers

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

perf: cache static methodology prompts via SDK system blocks (`cache_control: ephemeral`) #122