diff --git a/CHANGELOG.md b/CHANGELOG.md
index 71c658e..c1931f8 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,6 +10,26 @@ and this project adheres to
 
 ## [Unreleased]
 
+### Added
+
+- **Polish prompt-cache hit-rate telemetry.** Each polish run now
+  tracks Anthropic prompt-cache token usage and logs a one-line
+  summary at the end of `attune-author regenerate`:
+  `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`.
+  A `WARNING` is appended when the run's hit rate falls below 50%,
+  surfacing silent cache regressions (prompt edits, model alias
+  drift). Hit rate is `read / (read + creation)` cacheable input
+  tokens.
+  - `attune_author.doc_gen._anthropic.call_anthropic` gains an optional
+    `on_cache_usage(creation, read, model)` callback; backward
+    compatible (the doc-gen path passes nothing).
+  - New in `attune_author.polish`: `PolishCacheStats`,
+    `polish_cache_stats()`, `format_polish_cache_summary()`,
+    `reset_polish_cache_telemetry()`. Telemetry follows the existing
+    in-process faithfulness-counter pattern (no new on-disk format).
+  - README: new "Cache hit rate" subsection under Polish cache.
+  - 16 new tests in `tests/unit/test_polish_cache_metrics.py`.
+
 ## [0.14.2] - 2026-05-27
 
 ### Fixed
diff --git a/README.md b/README.md
index d12d498..4985c0f 100644
--- a/README.md
+++ b/README.md
@@ -257,6 +257,33 @@ volatile frontmatter fields like `generated_at` stripped),
 context, and model name. Changing the model automatically invalidates
 all prior entries.
 
+### Cache hit rate
+
+Separately from the on-disk response cache above, each polish call
+uses Anthropic's **prompt cache** for the ~6000-token system prompt.
+After a regen run, `attune-author` logs a one-line summary at INFO:
+
+```
+Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))
+```
+
+The hit rate is `read / (read + creation)` — the fraction of cacheable
+input tokens served from cache rather than re-billed. Prompt caching
+cuts input cost ~90% on the cached portion, so a healthy multi-template
+run should settle well above 50% once the first call warms the cache.
+
+- **High (>80%)** — expected steady state; the system prompt is being
+  reused across calls.
+- **Low (<50%)** — triggers a `WARNING` in the summary. Usually means
+  the cache boundary broke: the system prompt changed between calls,
+  the model alias drifted, or only a single template was polished (no
+  reuse). Check recent edits to `polish_prompts.py` or `_POLISH_MODEL`.
+- **"no cacheable tokens observed"** — the prompt fell below Anthropic's
+  caching threshold or caching is disabled (`POLISH_CACHE_SYSTEM`).
+
+The metric is per-run (in-process); it is not persisted across
+invocations.
+
 ## Python API
 
 ```python
diff --git a/docs/specs/polish-cache-hit-metrics/decisions.md b/docs/specs/archive/polish-cache-hit-metrics/decisions.md
similarity index 84%
rename from docs/specs/polish-cache-hit-metrics/decisions.md
rename to docs/specs/archive/polish-cache-hit-metrics/decisions.md
index 11d0be5..4423033 100644
--- a/docs/specs/polish-cache-hit-metrics/decisions.md
+++ b/docs/specs/archive/polish-cache-hit-metrics/decisions.md
@@ -1,6 +1,12 @@
 # Decisions — Polish prompt-cache hit-rate telemetry
 
-**Status:** Draft (2026-05-11) — gated on briefing-followup batch
+**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. The Draft
+"gated on briefing-followup batch" note was superseded by this file's
+own "Execution gate" ("Not blocking"). One deviation: attune-author has
+no telemetry JSONL, so the metric uses the existing in-process
+faithfulness-counter pattern (INFO summary at end of run) rather than a
+new JSONL file; the threshold warning is current-run, not cross-run.
+See `tasks.md` for the per-phase record.
 **Owner:** Patrick
 
 ---
diff --git a/docs/specs/archive/polish-cache-hit-metrics/tasks.md b/docs/specs/archive/polish-cache-hit-metrics/tasks.md
new file mode 100644
index 0000000..d1a6c04
--- /dev/null
+++ b/docs/specs/archive/polish-cache-hit-metrics/tasks.md
@@ -0,0 +1,72 @@
+# Tasks — Polish prompt-cache hit-rate telemetry
+
+**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. See the
+"Deviation" note under Phases 3–4: attune-author has no JSONL
+telemetry, so the metric follows the existing in-process
+faithfulness-counter pattern (reset at run start, INFO summary at run
+end) instead of a new JSONL subsystem. Acceptance criteria in
+`decisions.md` are all met.
+
+## Phase 1 — Read the cache fields
+
+- [x] **1.1** Captured via a new `on_cache_usage(creation, read, model)`
+      callback on `doc_gen._anthropic.call_anthropic` (polish can't see
+      `response.usage` directly — `call_anthropic` returns only text).
+      `_log_cache_usage` now returns `(creation, read)`.
+- [x] **1.2** Compute hit rate: `read / max(read + creation, 1)`
+      (`PolishCacheStats.hit_rate`)
+- [x] **1.3** `PolishCacheStats` dataclass added in `polish.py`
+
+## Phase 2 — Surface to user
+
+- [x] **2.1** End-of-run summary logged at INFO via
+      `format_polish_cache_summary()`:
+      `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`
+- [x] **2.2** Graceful when both are zero:
+      `Polish cache: no cacheable tokens observed (cache not configured?)`
+
+## Phase 3 — Log to telemetry  *(deviation, see note)*
+
+- [x] **3.1** ~~Append per-call to existing telemetry JSONL~~ →
+      **There is no telemetry JSONL in attune-author.** Adopted the
+      existing in-process counter idiom (`_polish_cache_telemetry()` +
+      `reset_polish_cache_telemetry()`, mirroring
+      `generator._faithfulness_telemetry`), surfaced via the INFO
+      end-of-run summary in `maintenance.py`. Building a JSONL
+      subsystem would contradict the spec's "low effort, single file"
+      scope and the codebase's telemetry pattern.
+- [x] **3.2** Aggregate fields: calls, creation_tokens, read_tokens,
+      derived hit_rate, model (model accepted by the callback; per-model
+      breakdown explicitly out of scope per decisions.md).
+
+## Phase 4 — Threshold warning  *(deviation: current-run, not cross-run)*
+
+- [x] **4.1–4.3** `format_polish_cache_summary()` appends a `WARNING`
+      when the **current run's** hit rate < 50% (`_CACHE_HIT_WARN_THRESHOLD`)
+      and ≥1 cacheable token was seen, with a pointer to the README.
+      Cross-run rolling history (last N records) is deferred — it would
+      require the persistent JSONL layer this spec deliberately avoided.
+
+## Phase 5 — Test
+
+- [x] **5.1** `tests/unit/test_polish_cache_metrics.py`: mocks Anthropic
+      responses with known cache_creation/cache_read values; asserts the
+      callback fires (incl. the zero case), hit-rate math, accumulator,
+      summary line, and threshold warning (16 tests).
+- [ ] **5.2** Integration test (optional) — **skipped**: would require a
+      live API key (real prompt-cache hits can't be observed against a
+      mock). The unit tests cover the compute path; left optional as the
+      spec allowed.
+
+## Phase 6 — Docs
+
+- [x] **6.1** README "Cache hit rate" subsection — meaning, healthy
+      ranges, what to do when it drops.
+- [x] **6.2** CHANGELOG [Unreleased] entry added.
+
+## Out of scope
+
+- Per-stage cache breakdown (system / examples / messages)
+- Cost-in-dollars tracking (token-level only)
+- Cache strategy changes
+- Cross-package telemetry aggregation
diff --git a/docs/specs/polish-fact-check/decisions.md b/docs/specs/archive/polish-fact-check/decisions.md
similarity index 100%
rename from docs/specs/polish-fact-check/decisions.md
rename to docs/specs/archive/polish-fact-check/decisions.md
diff --git a/docs/specs/polish-fact-check/design.md b/docs/specs/archive/polish-fact-check/design.md
similarity index 100%
rename from docs/specs/polish-fact-check/design.md
rename to docs/specs/archive/polish-fact-check/design.md
diff --git a/docs/specs/polish-fact-check/requirements.md b/docs/specs/archive/polish-fact-check/requirements.md
similarity index 100%
rename from docs/specs/polish-fact-check/requirements.md
rename to docs/specs/archive/polish-fact-check/requirements.md
diff --git a/docs/specs/polish-fact-check/tasks.md b/docs/specs/archive/polish-fact-check/tasks.md
similarity index 100%
rename from docs/specs/polish-fact-check/tasks.md
rename to docs/specs/archive/polish-fact-check/tasks.md
diff --git a/docs/specs/regen-pipeline/design.md b/docs/specs/archive/regen-pipeline/design.md
similarity index 88%
rename from docs/specs/regen-pipeline/design.md
rename to docs/specs/archive/regen-pipeline/design.md
index 718abf3..e8a63fd 100644
--- a/docs/specs/regen-pipeline/design.md
+++ b/docs/specs/archive/regen-pipeline/design.md
@@ -1,8 +1,27 @@
 # Spec: Regen Pipeline — Design
 
+> ## ⚠️ OBSOLETE — do not implement (reconciled 2026-06-06)
+>
+> This design was never built and conflicts with the shipped architecture. It
+> assumes a single `corpus_root`, a React/JSX frontend (`App.jsx`,
+> `CorpusSetup`), a polish+Haiku `_regen` pipeline, and WS-badge wiring — none
+> of which exist. The shipped reality instead uses:
+>
+> - **Regen:** `sidecar/attune_gui/routes/living_docs.py` →
+>   `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry
+>   (`attune_gui.jobs`) → `_regenerate_doc_executor` →
+>   `attune_author.generator.generate_feature_templates` + `load_manifest`.
+> - **Corpus config:** multi-corpus registry (`attune_gui.editor_corpora`,
+>   `POST /api/corpus/register`) + workspace config (`attune_gui.workspace`,
+>   `living_docs.py` `get_config`/`set_config`).
+> - **Frontend:** TypeScript (`editor-frontend/src/corpus-switcher.ts`), not React.
+> - **Bulk:** `make regen-all` (Makefile), not `POST /api/templates/refresh-all`.
+>
+> Kept verbatim below for historical context only. See `requirements.md` banner.
+
 ## Phase 2: Design
 
-**Status**: in-review
+**Status**: obsolete — superseded by living-docs regen automation (was "in-review", never built)
 
 ---
 
diff --git a/docs/specs/regen-pipeline/requirements.md b/docs/specs/archive/regen-pipeline/requirements.md
similarity index 66%
rename from docs/specs/regen-pipeline/requirements.md
rename to docs/specs/archive/regen-pipeline/requirements.md
index f94af8d..6afcda8 100644
--- a/docs/specs/regen-pipeline/requirements.md
+++ b/docs/specs/archive/regen-pipeline/requirements.md
@@ -5,9 +5,31 @@
 
 ---
 
+> ## ⚠️ RECONCILED — satisfied-by-different-means (2026-06-06)
+>
+> This spec was previously marked "complete" with all tasks ✅, but a code
+> audit found **none** of its named symbols ever shipped (`_regen`,
+> `regen_template(corpus_root=…)`, `_resolve_corpus_root`, `atomic_write`,
+> `_patch_summaries_json`) and the attune-gui pieces (`/api/config`,
+> `refresh-all`, `CorpusSetup`) do not exist. The underlying need was instead
+> met by a **more evolved architecture**. All three user stories are satisfied:
+>
+> | User story | Status | Actual implementation |
+> |---|---|---|
+> | US1 — badge click → regen → saved to disk | ✅ met | `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry → `_regenerate_doc_executor` → `attune_author.generator.generate_feature_templates` (`sidecar/attune_gui/routes/living_docs.py`). Source-driven generation, not polish+Haiku. |
+> | US2 — first-run corpus setup UI | ✅ exceeded | Multi-corpus registry: `editor_corpora.py`, `POST /api/corpus/register`, `editor-frontend/src/corpus-switcher.ts` (dropdown + "Add corpus…" modal). |
+> | US3 — env auto-load on startup | ✅ met | Workspace config (`living_docs.py` `get_config`/`set_config`, `attune_gui.workspace`) + persisted corpus registry, replacing single `ATTUNE_CORPUS_ROOT`. |
+>
+> Bulk regen ships as the build-time `make regen-all` target (Makefile), not a
+> runtime "Regen all stale" button. The frontend is **TypeScript**, not the
+> React/JSX assumed by `design.md`.
+>
+> **No genuine product gaps remain.** This spec is retained for history; the
+> `design.md` below is **obsolete** (see its banner). Do not implement it.
+
 ## Phase 1: Requirements
 
-**Status**: approved
+**Status**: reconciled — satisfied by living-docs regen automation + corpus registry (was falsely marked "approved/complete")
 
 ### Problem statement
 
diff --git a/docs/specs/regen-pipeline/tasks.md b/docs/specs/archive/regen-pipeline/tasks.md
similarity index 80%
rename from docs/specs/regen-pipeline/tasks.md
rename to docs/specs/archive/regen-pipeline/tasks.md
index 9cc0f78..7e12159 100644
--- a/docs/specs/regen-pipeline/tasks.md
+++ b/docs/specs/archive/regen-pipeline/tasks.md
@@ -2,9 +2,29 @@
 
 ## Phase 3: Tasks
 
-**Status**: complete
+**Status**: NOT done as written — reconciled 2026-06-06
 
-> Shipped: `attune-author regenerate` CLI lives in `src/attune_author/cli.py:507` (handler) with the parser registered around line 154. Core logic in `maintenance.py` and `maintenance_batch.py`. CHANGELOG documents the batch variant.
+> ## ⚠️ The task table below is INACCURATE
+>
+> A 2026-06-06 code audit found that **none** of the attune-author symbols in
+> tasks 2–9 ever shipped (`_resolve_corpus_root`, `atomic_write`,
+> `_patch_summaries_json`, `regen_template(corpus_root=…)`, `_regen`) and
+> **none** of the attune-gui pieces in tasks 10–24 exist (`config.py`
+> `ConfigState`, `/api/config`, `/api/templates/refresh-all`,
+> `/api/browse/directory`, `CorpusSetup`, `App.jsx`). The "done" marks below are
+> false. The earlier "Shipped" note conflated this spec with the unrelated
+> hash-mismatch `attune-author regenerate` CLI — a different feature.
+>
+> **What actually satisfies the spec's user stories** (see `requirements.md`
+> banner for the full mapping):
+> - Single-doc regen → `POST /api/living-docs/docs/{id}/regenerate` (Jobs +
+>   `attune_author.generator.generate_feature_templates`).
+> - Corpus config → corpus registry (`editor_corpora.py`,
+>   `/api/corpus/register`) + workspace config (`attune_gui.workspace`).
+> - Bulk → `make regen-all` (Makefile).
+>
+> No code action is required: the product need is met. The table below is left
+> intact only as a record of the original (unbuilt) plan.
 
 ### Implementation order
 
diff --git a/docs/specs/regen-staleness-hash-mismatch/decisions.md b/docs/specs/archive/regen-staleness-hash-mismatch/decisions.md
similarity index 94%
rename from docs/specs/regen-staleness-hash-mismatch/decisions.md
rename to docs/specs/archive/regen-staleness-hash-mismatch/decisions.md
index 9cbde17..7e015e4 100644
--- a/docs/specs/regen-staleness-hash-mismatch/decisions.md
+++ b/docs/specs/archive/regen-staleness-hash-mismatch/decisions.md
@@ -1,6 +1,6 @@
 # Decisions — Regen / staleness hash mismatch
 
-**Status:** root cause confirmed 2026-05-27 — original hypothesis (budget truncation of hash inputs) was wrong; actual cause is LLM-polished frontmatter laundering. Fix direction concrete. Implementation TBD.
+**Status:** ✅ DONE — shipped in PR #48 (commit 1b1c7c5) / v0.14.2. Root cause was LLM-polished frontmatter laundering (not the original budget-truncation hypothesis). Fix: `apply_polish_results` re-injects deterministic frontmatter fields via `_replace_polished_frontmatter` (`generator.py:483`, `_DETERMINISTIC_FRONTMATTER_FIELDS`). Regression test: `tests/unit/test_polished_frontmatter_reinjection.py`. CHANGELOG documents it under [0.14.2]. Phase 3 release shipped; attune-gui can pin ≥0.14.2 to unblock its Phase 2.
 **Owner:** Patrick
 **Filed:** 2026-05-25 (handoff from attune-gui Phase 2 blockers; see [attune-gui docs/specs/living-docs-regen-automation/decisions.md](https://github.com/Smart-AI-Memory/attune-gui/blob/main/docs/specs/living-docs-regen-automation/decisions.md#phase-2-blockers-discovered-2026-05-23))
 
diff --git a/docs/specs/polish-cache-hit-metrics/tasks.md b/docs/specs/polish-cache-hit-metrics/tasks.md
deleted file mode 100644
index dfc6bf9..0000000
--- a/docs/specs/polish-cache-hit-metrics/tasks.md
+++ /dev/null
@@ -1,55 +0,0 @@
-# Tasks — Polish prompt-cache hit-rate telemetry
-
-## Phase 1 — Read the cache fields
-
-- [ ] **1.1** In `attune_author/polish.py`, capture
-      `response.usage.cache_creation_input_tokens` and
-      `response.usage.cache_read_input_tokens` from each
-      Anthropic API call
-- [ ] **1.2** Compute hit rate:
-      `read / max(read + creation, 1)`
-- [ ] **1.3** Add a `PolishCacheStats` dataclass for
-      structured passing
-
-## Phase 2 — Surface to user
-
-- [ ] **2.1** Print a one-line summary at end of polish run:
-      `Polish complete · cache hit: 87% (1241 read / 1421 total tokens)`
-- [ ] **2.2** Format gracefully when both are zero (no cache
-      configured)
-
-## Phase 3 — Log to telemetry
-
-- [ ] **3.1** Append per-call to existing telemetry JSONL
-      (wherever attune-author writes telemetry)
-- [ ] **3.2** Fields: timestamp, model, hit_rate,
-      read_tokens, creation_tokens, polish_target
-
-## Phase 4 — Threshold warning
-
-- [ ] **4.1** When invoked, read last N (e.g., 10) telemetry
-      records
-- [ ] **4.2** Compute rolling mean hit rate
-- [ ] **4.3** If <50%, print a warning at end of run with
-      pointer to docs
-
-## Phase 5 — Test
-
-- [ ] **5.1** Unit test: mock an Anthropic response with known
-      cache_creation / cache_read values; assert hit rate
-      computed correctly
-- [ ] **5.2** Integration test (optional): run polish twice
-      back-to-back; second run should report >0% cache hit
-
-## Phase 6 — Docs
-
-- [ ] **6.1** README section: "Cache hit rate" — what it means,
-      what good values look like, what to do if it drops
-- [ ] **6.2** Link from CHANGELOG when feature ships
-
-## Out of scope
-
-- Per-stage cache breakdown (system / examples / messages)
-- Cost-in-dollars tracking (token-level only)
-- Cache strategy changes
-- Cross-package telemetry aggregation
diff --git a/src/attune_author/doc_gen/_anthropic.py b/src/attune_author/doc_gen/_anthropic.py
index e47f801..d3cf208 100644
--- a/src/attune_author/doc_gen/_anthropic.py
+++ b/src/attune_author/doc_gen/_anthropic.py
@@ -16,6 +16,8 @@
 from typing import TYPE_CHECKING
 
 if TYPE_CHECKING:
+    from collections.abc import Callable
+
     from anthropic import Anthropic
 
 logger = logging.getLogger(__name__)
@@ -102,6 +104,7 @@ def call_anthropic(
     model: str,
     max_tokens: int,
     cache_system: bool = False,
+    on_cache_usage: Callable[[int, int, str], None] | None = None,
 ) -> str:
     """Make a single-turn ``messages.create`` call with retry/backoff.
 
@@ -125,6 +128,13 @@ def call_anthropic(
             for sonnet/opus, 2048 for haiku); below that, the call
             still works but no cache is used. Cache token usage is
             emitted at INFO so callers can verify hits.
+        on_cache_usage: Optional callback invoked once per successful
+            call with ``(cache_creation_input_tokens,
+            cache_read_input_tokens, model)``. Lets a caller (e.g. the
+            polish pass) accumulate cache hit-rate telemetry without
+            this module owning that concern. Fired even when both
+            counts are zero so callers can distinguish "no cache
+            configured" from "never called".
 
     Returns:
         The first text block of the response, or the empty
@@ -164,7 +174,9 @@ def call_anthropic(
                 system=system_payload,
                 messages=[{"role": "user", "content": user_message}],
             )
-            _log_cache_usage(response, model)
+            creation, read = _log_cache_usage(response, model)
+            if on_cache_usage is not None:
+                on_cache_usage(creation, read, model)
             if response.content:
                 return response.content[0].text
             return ""
@@ -182,16 +194,20 @@ def call_anthropic(
     raise AnthropicCallError(_redact(str(last_exc))) from None
 
 
-def _log_cache_usage(response: object, model: str) -> None:
+def _log_cache_usage(response: object, model: str) -> tuple[int, int]:
     """Emit cache hit telemetry from an Anthropic response.
 
     Reads ``cache_creation_input_tokens`` and ``cache_read_input_tokens``
     from the response's usage object when present and logs them at INFO.
     Older SDK responses without those fields are silently skipped.
+
+    Returns:
+        ``(creation, read)`` token counts, defaulting to ``(0, 0)`` when
+        the response has no usage block or the SDK omits the fields.
     """
     usage = getattr(response, "usage", None)
     if usage is None:
-        return
+        return (0, 0)
     creation = getattr(usage, "cache_creation_input_tokens", 0) or 0
     read = getattr(usage, "cache_read_input_tokens", 0) or 0
     if creation or read:
@@ -201,3 +217,4 @@ def _log_cache_usage(response: object, model: str) -> None:
             creation,
             read,
         )
+    return (creation, read)
diff --git a/src/attune_author/maintenance.py b/src/attune_author/maintenance.py
index 13788b5..7634fe9 100644
--- a/src/attune_author/maintenance.py
+++ b/src/attune_author/maintenance.py
@@ -102,8 +102,10 @@ def run_maintenance(
     # Reset Phase 3 faithfulness telemetry so the end-of-run summary
     # reflects this regen rather than carrying state across runs.
     from attune_author.generator import reset_faithfulness_telemetry
+    from attune_author.polish import reset_polish_cache_telemetry
 
     reset_faithfulness_telemetry()
+    reset_polish_cache_telemetry()
 
     for entry in report.help_entries:
         if not entry.is_stale:
@@ -147,6 +149,15 @@ def run_maintenance(
             telemetry["cost_usd"],
         )
 
+    # Prompt-cache hit-rate summary. Logged at INFO so it rides the
+    # same default `attune-author regenerate` output as the
+    # faithfulness line; silent when polish never ran this regen.
+    from attune_author.polish import format_polish_cache_summary
+
+    cache_summary = format_polish_cache_summary()
+    if cache_summary is not None:
+        logger.info("%s", cache_summary)
+
     return result
 
 
diff --git a/src/attune_author/polish.py b/src/attune_author/polish.py
index d4f1e0d..03b11df 100644
--- a/src/attune_author/polish.py
+++ b/src/attune_author/polish.py
@@ -39,6 +39,7 @@
 import os
 import re
 import time
+from dataclasses import dataclass
 from pathlib import Path
 
 from attune_author.doc_gen._anthropic import (
@@ -469,6 +470,118 @@ def build_polish_prompt(
 POLISH_MAX_TOKENS = 4096
 POLISH_CACHE_SYSTEM = True
 
+#: Rolling hit rate below which the end-of-run summary appends a
+#: warning. A healthy polish run that re-touches templates should
+#: sit well above this once the system prompt is cached; sustained
+#: lows usually mean the cache boundary broke (prompt edit, model
+#: alias drift) — see the README "Cache hit rate" section.
+_CACHE_HIT_WARN_THRESHOLD = 0.5
+
+
+@dataclass(frozen=True)
+class PolishCacheStats:
+    """Aggregate prompt-cache token usage across polish calls.
+
+    ``creation_tokens`` are input tokens written into Anthropic's
+    prompt cache; ``read_tokens`` are input tokens served from it.
+    The hit rate is ``read / (read + creation)`` — the fraction of
+    cacheable input that came from cache rather than being re-billed.
+    """
+
+    calls: int = 0
+    creation_tokens: int = 0
+    read_tokens: int = 0
+
+    @property
+    def total_tokens(self) -> int:
+        return self.read_tokens + self.creation_tokens
+
+    @property
+    def hit_rate(self) -> float:
+        """Cache read fraction in ``[0.0, 1.0]``; ``0.0`` when no
+        cacheable tokens were seen (avoids divide-by-zero)."""
+        return self.read_tokens / max(self.total_tokens, 1)
+
+    def summary_line(self) -> str:
+        """One-line human summary for end-of-run output.
+
+        Degrades gracefully when no cacheable tokens were seen (no
+        cache configured, or prompt below the caching threshold).
+        """
+        if self.total_tokens == 0:
+            return "Polish cache: no cacheable tokens observed (cache not configured?)"
+        return (
+            f"Polish cache hit: {self.hit_rate:.0%} "
+            f"({self.read_tokens} read / {self.total_tokens} total tokens, "
+            f"{self.calls} call(s))"
+        )
+
+
+def _polish_cache_telemetry() -> dict[str, int]:
+    """Per-process aggregate of prompt-cache token usage.
+
+    Stored on the function as an attribute so the end-of-run summary
+    can read totals without module-level state — same idiom as
+    ``generator._faithfulness_telemetry``. Reset via
+    :func:`reset_polish_cache_telemetry`.
+    """
+    state = getattr(_polish_cache_telemetry, "_state", None)
+    if state is None:
+        state = {"calls": 0, "creation": 0, "read": 0}
+        _polish_cache_telemetry._state = state  # type: ignore[attr-defined]
+    return state
+
+
+def reset_polish_cache_telemetry() -> None:
+    """Reset the per-process prompt-cache telemetry counters."""
+    _polish_cache_telemetry._state = {  # type: ignore[attr-defined]
+        "calls": 0,
+        "creation": 0,
+        "read": 0,
+    }
+
+
+def _record_cache_usage(creation: int, read: int, model: str) -> None:
+    """Accumulate one polish call's cache token counts.
+
+    Wired into :func:`call_anthropic` via its ``on_cache_usage``
+    hook. ``model`` is accepted to match the callback signature but
+    not aggregated — per-model breakdown is out of scope (decisions.md).
+    """
+    state = _polish_cache_telemetry()
+    state["calls"] += 1
+    state["creation"] += creation
+    state["read"] += read
+
+
+def polish_cache_stats() -> PolishCacheStats:
+    """Snapshot the current per-process prompt-cache aggregate."""
+    state = _polish_cache_telemetry()
+    return PolishCacheStats(
+        calls=state["calls"],
+        creation_tokens=state["creation"],
+        read_tokens=state["read"],
+    )
+
+
+def format_polish_cache_summary() -> str | None:
+    """End-of-run summary line, or ``None`` if polish never ran.
+
+    Appends a low-hit-rate warning when the run's hit rate falls below
+    :data:`_CACHE_HIT_WARN_THRESHOLD` and at least one cacheable token
+    was seen. Scope is the current process: callers reset at run start.
+    """
+    stats = polish_cache_stats()
+    if stats.calls == 0:
+        return None
+    line = stats.summary_line()
+    if stats.total_tokens > 0 and stats.hit_rate < _CACHE_HIT_WARN_THRESHOLD:
+        line += (
+            f" — WARNING: below {_CACHE_HIT_WARN_THRESHOLD:.0%}; "
+            "the prompt cache may have regressed (see README 'Cache hit rate')"
+        )
+    return line
+
 
 def _call_llm(
     content: str,
@@ -516,6 +629,7 @@ def _call_llm(
         model=_POLISH_MODEL,
         max_tokens=POLISH_MAX_TOKENS,
         cache_system=POLISH_CACHE_SYSTEM,
+        on_cache_usage=_record_cache_usage,
     )
     return polished or content
 
@@ -525,12 +639,16 @@ def _call_llm(
 # or from the wrapping polish layer.
 __all__ = [
     "AnthropicCallError",
+    "PolishCacheStats",
     "PolishError",
     "STRICT_ENV_VAR",
     "_env_strict_default",
     "build_source_summary",
     "clear_cache",
+    "format_polish_cache_summary",
+    "polish_cache_stats",
     "polish_template",
+    "reset_polish_cache_telemetry",
 ]
 
 
diff --git a/tests/test_maintenance_batch.py b/tests/test_maintenance_batch.py
index f846f70..6cfb769 100644
--- a/tests/test_maintenance_batch.py
+++ b/tests/test_maintenance_batch.py
@@ -19,7 +19,7 @@
 from __future__ import annotations
 
 import json
-from datetime import datetime, timezone
+from datetime import datetime, timedelta, timezone
 from pathlib import Path
 from types import SimpleNamespace
 from unittest.mock import MagicMock, patch
@@ -53,11 +53,16 @@
 
 
 def _state(submitted_at: datetime | None = None) -> BatchState:
+    # Default to "recently submitted" relative to now so the fixture stays
+    # inside the 29-day retention window for status/cancel paths that read
+    # without an injected ``now=``. (A hardcoded date silently expires and
+    # breaks these tests once it ages past the window.)
+    submitted_at = submitted_at or (datetime.now(timezone.utc) - timedelta(days=1))
     return BatchState(
         schema_version=1,
         batch_id="msgbatch_test",
-        submitted_at=submitted_at or datetime(2026, 5, 8, 18, 35, tzinfo=timezone.utc),
-        expected_completion_at=datetime(2026, 5, 8, 18, 41, tzinfo=timezone.utc),
+        submitted_at=submitted_at,
+        expected_completion_at=submitted_at + timedelta(minutes=6),
         model="claude-sonnet-4-6",
         requests=(
             BatchStateRequest("feat__auth__concept", "auth", "concept"),
diff --git a/tests/unit/test_polish_cache_metrics.py b/tests/unit/test_polish_cache_metrics.py
new file mode 100644
index 0000000..a0ab06d
--- /dev/null
+++ b/tests/unit/test_polish_cache_metrics.py
@@ -0,0 +1,182 @@
+"""Tests for polish prompt-cache hit-rate telemetry.
+
+Covers the spec ``polish-cache-hit-metrics``: capturing
+``cache_creation_input_tokens`` / ``cache_read_input_tokens`` from
+Anthropic responses, the ``PolishCacheStats`` hit-rate math, the
+per-process accumulator, the end-of-run summary line, and the
+low-hit-rate threshold warning.
+"""
+
+from __future__ import annotations
+
+from unittest.mock import MagicMock
+
+import pytest
+
+from attune_author import polish
+from attune_author.doc_gen import _anthropic
+from attune_author.polish import (
+    PolishCacheStats,
+    format_polish_cache_summary,
+    polish_cache_stats,
+    reset_polish_cache_telemetry,
+)
+
+
+@pytest.fixture(autouse=True)
+def _clean_telemetry():
+    """Each test starts and ends with zeroed counters so the
+    per-process accumulator can't leak across tests."""
+    reset_polish_cache_telemetry()
+    yield
+    reset_polish_cache_telemetry()
+
+
+# ---------------------------------------------------------------------------
+# PolishCacheStats hit-rate math
+# ---------------------------------------------------------------------------
+
+
+class TestPolishCacheStats:
+    def test_hit_rate_basic(self) -> None:
+        stats = PolishCacheStats(calls=1, creation_tokens=180, read_tokens=1241)
+        assert stats.total_tokens == 1421
+        assert stats.hit_rate == pytest.approx(1241 / 1421)
+
+    def test_hit_rate_full_hit(self) -> None:
+        stats = PolishCacheStats(calls=2, creation_tokens=0, read_tokens=2048)
+        assert stats.hit_rate == pytest.approx(1.0)
+
+    def test_hit_rate_zero_tokens_is_safe(self) -> None:
+        """No cacheable tokens must not divide by zero."""
+        stats = PolishCacheStats(calls=1, creation_tokens=0, read_tokens=0)
+        assert stats.total_tokens == 0
+        assert stats.hit_rate == 0.0
+
+    def test_summary_line_with_tokens(self) -> None:
+        stats = PolishCacheStats(calls=3, creation_tokens=180, read_tokens=1241)
+        line = stats.summary_line()
+        assert "87%" in line  # 1241/1421 == 0.873...
+        assert "1241 read" in line
+        assert "1421 total" in line
+        assert "3 call(s)" in line
+
+    def test_summary_line_graceful_when_zero(self) -> None:
+        stats = PolishCacheStats(calls=1, creation_tokens=0, read_tokens=0)
+        assert "no cacheable tokens" in stats.summary_line().lower()
+
+
+# ---------------------------------------------------------------------------
+# call_anthropic on_cache_usage callback (the capture path)
+# ---------------------------------------------------------------------------
+
+
+def _mock_client(creation: int, read: int) -> MagicMock:
+    client = MagicMock()
+    block = MagicMock()
+    block.text = "polished"
+    response = MagicMock()
+    response.content = [block]
+    response.usage.cache_creation_input_tokens = creation
+    response.usage.cache_read_input_tokens = read
+    client.messages.create.return_value = response
+    return client
+
+
+class TestOnCacheUsageCallback:
+    def test_callback_fires_with_token_counts(self) -> None:
+        seen: list[tuple[int, int, str]] = []
+        client = _mock_client(creation=1024, read=512)
+
+        _anthropic.call_anthropic(
+            client,
+            system="s",
+            user_message="u",
+            model="claude-sonnet-4-6",
+            max_tokens=10,
+            on_cache_usage=lambda c, r, m: seen.append((c, r, m)),
+        )
+
+        assert seen == [(1024, 512, "claude-sonnet-4-6")]
+
+    def test_callback_fires_even_when_zero(self) -> None:
+        """Caller must be able to tell 'no cache' from 'never called'."""
+        seen: list[tuple[int, int, str]] = []
+        client = _mock_client(creation=0, read=0)
+
+        _anthropic.call_anthropic(
+            client,
+            system="s",
+            user_message="u",
+            model="m",
+            max_tokens=10,
+            on_cache_usage=lambda c, r, m: seen.append((c, r, m)),
+        )
+
+        assert seen == [(0, 0, "m")]
+
+    def test_no_callback_is_fine(self) -> None:
+        """Omitting the callback (doc-gen path) must not error."""
+        client = _mock_client(creation=10, read=10)
+        out = _anthropic.call_anthropic(
+            client, system="s", user_message="u", model="m", max_tokens=10
+        )
+        assert out == "polished"
+
+
+# ---------------------------------------------------------------------------
+# Accumulator + reset
+# ---------------------------------------------------------------------------
+
+
+class TestAccumulator:
+    def test_records_accumulate_across_calls(self) -> None:
+        polish._record_cache_usage(1000, 0, "m")  # creation-only (cold)
+        polish._record_cache_usage(100, 900, "m")  # mostly read (warm)
+
+        stats = polish_cache_stats()
+        assert stats.calls == 2
+        assert stats.creation_tokens == 1100
+        assert stats.read_tokens == 900
+        assert stats.total_tokens == 2000
+        assert stats.hit_rate == pytest.approx(0.45)
+
+    def test_reset_zeroes_counters(self) -> None:
+        polish._record_cache_usage(100, 100, "m")
+        reset_polish_cache_telemetry()
+        stats = polish_cache_stats()
+        assert (stats.calls, stats.creation_tokens, stats.read_tokens) == (0, 0, 0)
+
+
+# ---------------------------------------------------------------------------
+# End-of-run summary + threshold warning
+# ---------------------------------------------------------------------------
+
+
+class TestSummary:
+    def test_summary_none_when_polish_never_ran(self) -> None:
+        assert format_polish_cache_summary() is None
+
+    def test_summary_present_after_calls(self) -> None:
+        polish._record_cache_usage(180, 1241, "m")
+        summary = format_polish_cache_summary()
+        assert summary is not None
+        assert "Polish cache hit" in summary
+        assert "WARNING" not in summary  # 87% is healthy
+
+    def test_low_hit_rate_appends_warning(self) -> None:
+        # 100 read / 1000 total == 10% — below the 50% threshold.
+        polish._record_cache_usage(900, 100, "m")
+        summary = format_polish_cache_summary()
+        assert summary is not None
+        assert "WARNING" in summary
+        assert "below 50%" in summary
+
+    def test_zero_token_run_warns_nothing(self) -> None:
+        """A run with calls but no cacheable tokens reports the
+        graceful line and no spurious threshold warning."""
+        polish._record_cache_usage(0, 0, "m")
+        summary = format_polish_cache_summary()
+        assert summary is not None
+        assert "no cacheable tokens" in summary.lower()
+        assert "WARNING" not in summary