Smart-AI-Memory · silversurfer562 · Jun 7, 2026 · Jun 6, 2026 · Jun 6, 2026 · Jun 6, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,6 +10,26 @@ and this project adheres to
 
 ## [Unreleased]
 
+### Added
+
+- **Polish prompt-cache hit-rate telemetry.** Each polish run now
+  tracks Anthropic prompt-cache token usage and logs a one-line
+  summary at the end of `attune-author regenerate`:
+  `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`.
+  A `WARNING` is appended when the run's hit rate falls below 50%,
+  surfacing silent cache regressions (prompt edits, model alias
+  drift). Hit rate is `read / (read + creation)` cacheable input
+  tokens.
+  - `attune_author.doc_gen._anthropic.call_anthropic` gains an optional
+    `on_cache_usage(creation, read, model)` callback; backward
+    compatible (the doc-gen path passes nothing).
+  - New in `attune_author.polish`: `PolishCacheStats`,
+    `polish_cache_stats()`, `format_polish_cache_summary()`,
+    `reset_polish_cache_telemetry()`. Telemetry follows the existing
+    in-process faithfulness-counter pattern (no new on-disk format).
+  - README: new "Cache hit rate" subsection under Polish cache.
+  - 16 new tests in `tests/unit/test_polish_cache_metrics.py`.
+
 ## [0.14.2] - 2026-05-27
 
 ### Fixed

diff --git a/README.md b/README.md
@@ -257,6 +257,33 @@ volatile frontmatter fields like `generated_at` stripped),
 context, and model name. Changing the model automatically invalidates
 all prior entries.
 
+### Cache hit rate
+
+Separately from the on-disk response cache above, each polish call
+uses Anthropic's **prompt cache** for the ~6000-token system prompt.
+After a regen run, `attune-author` logs a one-line summary at INFO:
+
+```
+Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))
+```
+
+The hit rate is `read / (read + creation)` — the fraction of cacheable
+input tokens served from cache rather than re-billed. Prompt caching
+cuts input cost ~90% on the cached portion, so a healthy multi-template
+run should settle well above 50% once the first call warms the cache.
+
+- **High (>80%)** — expected steady state; the system prompt is being
+  reused across calls.
+- **Low (<50%)** — triggers a `WARNING` in the summary. Usually means
+  the cache boundary broke: the system prompt changed between calls,
+  the model alias drifted, or only a single template was polished (no
+  reuse). Check recent edits to `polish_prompts.py` or `_POLISH_MODEL`.
+- **"no cacheable tokens observed"** — the prompt fell below Anthropic's
+  caching threshold or caching is disabled (`POLISH_CACHE_SYSTEM`).
+
+The metric is per-run (in-process); it is not persisted across
+invocations.
+
 ## Python API
 
 ```python

diff --git a/...ecs/polish-cache-hit-metrics/decisions.md → ...ive/polish-cache-hit-metrics/decisions.md b/...ecs/polish-cache-hit-metrics/decisions.md → ...ive/polish-cache-hit-metrics/decisions.md
@@ -1,6 +1,12 @@
 # Decisions — Polish prompt-cache hit-rate telemetry
 
-**Status:** Draft (2026-05-11) — gated on briefing-followup batch
+**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. The Draft
+"gated on briefing-followup batch" note was superseded by this file's
+own "Execution gate" ("Not blocking"). One deviation: attune-author has
+no telemetry JSONL, so the metric uses the existing in-process
+faithfulness-counter pattern (INFO summary at end of run) rather than a
+new JSONL file; the threshold warning is current-run, not cross-run.
+See `tasks.md` for the per-phase record.
 **Owner:** Patrick
 
 ---

diff --git a/docs/specs/archive/polish-cache-hit-metrics/tasks.md b/docs/specs/archive/polish-cache-hit-metrics/tasks.md
@@ -0,0 +1,72 @@
+# Tasks — Polish prompt-cache hit-rate telemetry
+
+**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. See the
+"Deviation" note under Phases 3–4: attune-author has no JSONL
+telemetry, so the metric follows the existing in-process
+faithfulness-counter pattern (reset at run start, INFO summary at run
+end) instead of a new JSONL subsystem. Acceptance criteria in
+`decisions.md` are all met.
+
+## Phase 1 — Read the cache fields
+
+- [x] **1.1** Captured via a new `on_cache_usage(creation, read, model)`
+      callback on `doc_gen._anthropic.call_anthropic` (polish can't see
+      `response.usage` directly — `call_anthropic` returns only text).
+      `_log_cache_usage` now returns `(creation, read)`.
+- [x] **1.2** Compute hit rate: `read / max(read + creation, 1)`
+      (`PolishCacheStats.hit_rate`)
+- [x] **1.3** `PolishCacheStats` dataclass added in `polish.py`
+
+## Phase 2 — Surface to user
+
+- [x] **2.1** End-of-run summary logged at INFO via
+      `format_polish_cache_summary()`:
+      `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`
+- [x] **2.2** Graceful when both are zero:
+      `Polish cache: no cacheable tokens observed (cache not configured?)`
+
+## Phase 3 — Log to telemetry  *(deviation, see note)*
+
+- [x] **3.1** ~~Append per-call to existing telemetry JSONL~~ →
+      **There is no telemetry JSONL in attune-author.** Adopted the
+      existing in-process counter idiom (`_polish_cache_telemetry()` +
+      `reset_polish_cache_telemetry()`, mirroring
+      `generator._faithfulness_telemetry`), surfaced via the INFO
+      end-of-run summary in `maintenance.py`. Building a JSONL
+      subsystem would contradict the spec's "low effort, single file"
+      scope and the codebase's telemetry pattern.
+- [x] **3.2** Aggregate fields: calls, creation_tokens, read_tokens,
+      derived hit_rate, model (model accepted by the callback; per-model
+      breakdown explicitly out of scope per decisions.md).
+
+## Phase 4 — Threshold warning  *(deviation: current-run, not cross-run)*
+
+- [x] **4.1–4.3** `format_polish_cache_summary()` appends a `WARNING`
+      when the **current run's** hit rate < 50% (`_CACHE_HIT_WARN_THRESHOLD`)
+      and ≥1 cacheable token was seen, with a pointer to the README.
+      Cross-run rolling history (last N records) is deferred — it would
+      require the persistent JSONL layer this spec deliberately avoided.
+
+## Phase 5 — Test
+
+- [x] **5.1** `tests/unit/test_polish_cache_metrics.py`: mocks Anthropic
+      responses with known cache_creation/cache_read values; asserts the
+      callback fires (incl. the zero case), hit-rate math, accumulator,
+      summary line, and threshold warning (16 tests).
+- [ ] **5.2** Integration test (optional) — **skipped**: would require a
+      live API key (real prompt-cache hits can't be observed against a
+      mock). The unit tests cover the compute path; left optional as the
+      spec allowed.
+
+## Phase 6 — Docs
+
+- [x] **6.1** README "Cache hit rate" subsection — meaning, healthy
+      ranges, what to do when it drops.
+- [x] **6.2** CHANGELOG [Unreleased] entry added.
+
+## Out of scope
+
+- Per-stage cache breakdown (system / examples / messages)
+- Cost-in-dollars tracking (token-level only)
+- Cache strategy changes
+- Cross-package telemetry aggregation
diff --git a/docs/specs/polish-fact-check/decisions.md → ...cs/archive/polish-fact-check/decisions.md b/docs/specs/polish-fact-check/decisions.md → ...cs/archive/polish-fact-check/decisions.md
diff --git a/docs/specs/polish-fact-check/design.md → ...specs/archive/polish-fact-check/design.md b/docs/specs/polish-fact-check/design.md → ...specs/archive/polish-fact-check/design.md
diff --git a/docs/specs/polish-fact-check/requirements.md → ...archive/polish-fact-check/requirements.md b/docs/specs/polish-fact-check/requirements.md → ...archive/polish-fact-check/requirements.md
diff --git a/docs/specs/polish-fact-check/tasks.md → .../specs/archive/polish-fact-check/tasks.md b/docs/specs/polish-fact-check/tasks.md → .../specs/archive/polish-fact-check/tasks.md
diff --git a/docs/specs/regen-pipeline/design.md → docs/specs/archive/regen-pipeline/design.md b/docs/specs/regen-pipeline/design.md → docs/specs/archive/regen-pipeline/design.md
@@ -1,8 +1,27 @@
 # Spec: Regen Pipeline — Design
 
+> ## ⚠️ OBSOLETE — do not implement (reconciled 2026-06-06)
+>
+> This design was never built and conflicts with the shipped architecture. It
+> assumes a single `corpus_root`, a React/JSX frontend (`App.jsx`,
+> `CorpusSetup`), a polish+Haiku `_regen` pipeline, and WS-badge wiring — none
+> of which exist. The shipped reality instead uses:
+>
+> - **Regen:** `sidecar/attune_gui/routes/living_docs.py` →
+>   `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry
+>   (`attune_gui.jobs`) → `_regenerate_doc_executor` →
+>   `attune_author.generator.generate_feature_templates` + `load_manifest`.
+> - **Corpus config:** multi-corpus registry (`attune_gui.editor_corpora`,
+>   `POST /api/corpus/register`) + workspace config (`attune_gui.workspace`,
+>   `living_docs.py` `get_config`/`set_config`).
+> - **Frontend:** TypeScript (`editor-frontend/src/corpus-switcher.ts`), not React.
+> - **Bulk:** `make regen-all` (Makefile), not `POST /api/templates/refresh-all`.
+>
+> Kept verbatim below for historical context only. See `requirements.md` banner.
+
 ## Phase 2: Design
 
-**Status**: in-review
+**Status**: obsolete — superseded by living-docs regen automation (was "in-review", never built)
 
 ---
 

diff --git a/docs/specs/regen-pipeline/requirements.md → ...cs/archive/regen-pipeline/requirements.md b/docs/specs/regen-pipeline/requirements.md → ...cs/archive/regen-pipeline/requirements.md
@@ -5,9 +5,31 @@
 
 ---
 
+> ## ⚠️ RECONCILED — satisfied-by-different-means (2026-06-06)
+>
+> This spec was previously marked "complete" with all tasks ✅, but a code
+> audit found **none** of its named symbols ever shipped (`_regen`,
+> `regen_template(corpus_root=…)`, `_resolve_corpus_root`, `atomic_write`,
+> `_patch_summaries_json`) and the attune-gui pieces (`/api/config`,
+> `refresh-all`, `CorpusSetup`) do not exist. The underlying need was instead
+> met by a **more evolved architecture**. All three user stories are satisfied:
+>
+> | User story | Status | Actual implementation |
+> |---|---|---|
+> | US1 — badge click → regen → saved to disk | ✅ met | `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry → `_regenerate_doc_executor` → `attune_author.generator.generate_feature_templates` (`sidecar/attune_gui/routes/living_docs.py`). Source-driven generation, not polish+Haiku. |
+> | US2 — first-run corpus setup UI | ✅ exceeded | Multi-corpus registry: `editor_corpora.py`, `POST /api/corpus/register`, `editor-frontend/src/corpus-switcher.ts` (dropdown + "Add corpus…" modal). |
+> | US3 — env auto-load on startup | ✅ met | Workspace config (`living_docs.py` `get_config`/`set_config`, `attune_gui.workspace`) + persisted corpus registry, replacing single `ATTUNE_CORPUS_ROOT`. |
+>
+> Bulk regen ships as the build-time `make regen-all` target (Makefile), not a
+> runtime "Regen all stale" button. The frontend is **TypeScript**, not the
+> React/JSX assumed by `design.md`.
+>
+> **No genuine product gaps remain.** This spec is retained for history; the
+> `design.md` below is **obsolete** (see its banner). Do not implement it.
+
 ## Phase 1: Requirements
 
-**Status**: approved
+**Status**: reconciled — satisfied by living-docs regen automation + corpus registry (was falsely marked "approved/complete")
 
 ### Problem statement
 

diff --git a/docs/specs/regen-pipeline/tasks.md → docs/specs/archive/regen-pipeline/tasks.md b/docs/specs/regen-pipeline/tasks.md → docs/specs/archive/regen-pipeline/tasks.md
@@ -2,9 +2,29 @@
 
 ## Phase 3: Tasks
 
-**Status**: complete
+**Status**: NOT done as written — reconciled 2026-06-06
 
-> Shipped: `attune-author regenerate` CLI lives in `src/attune_author/cli.py:507` (handler) with the parser registered around line 154. Core logic in `maintenance.py` and `maintenance_batch.py`. CHANGELOG documents the batch variant.
+> ## ⚠️ The task table below is INACCURATE
+>
+> A 2026-06-06 code audit found that **none** of the attune-author symbols in
+> tasks 2–9 ever shipped (`_resolve_corpus_root`, `atomic_write`,
+> `_patch_summaries_json`, `regen_template(corpus_root=…)`, `_regen`) and
+> **none** of the attune-gui pieces in tasks 10–24 exist (`config.py`
+> `ConfigState`, `/api/config`, `/api/templates/refresh-all`,
+> `/api/browse/directory`, `CorpusSetup`, `App.jsx`). The "done" marks below are
+> false. The earlier "Shipped" note conflated this spec with the unrelated
+> hash-mismatch `attune-author regenerate` CLI — a different feature.
+>
+> **What actually satisfies the spec's user stories** (see `requirements.md`
+> banner for the full mapping):
+> - Single-doc regen → `POST /api/living-docs/docs/{id}/regenerate` (Jobs +
+>   `attune_author.generator.generate_feature_templates`).
+> - Corpus config → corpus registry (`editor_corpora.py`,
+>   `/api/corpus/register`) + workspace config (`attune_gui.workspace`).
+> - Bulk → `make regen-all` (Makefile).
+>
+> No code action is required: the product need is met. The table below is left
+> intact only as a record of the original (unbuilt) plan.
 
 ### Implementation order
 

diff --git a/...egen-staleness-hash-mismatch/decisions.md → ...egen-staleness-hash-mismatch/decisions.md b/...egen-staleness-hash-mismatch/decisions.md → ...egen-staleness-hash-mismatch/decisions.md
@@ -1,6 +1,6 @@
 # Decisions — Regen / staleness hash mismatch
 
-**Status:** root cause confirmed 2026-05-27 — original hypothesis (budget truncation of hash inputs) was wrong; actual cause is LLM-polished frontmatter laundering. Fix direction concrete. Implementation TBD.
+**Status:** ✅ DONE — shipped in PR #48 (commit 1b1c7c5) / v0.14.2. Root cause was LLM-polished frontmatter laundering (not the original budget-truncation hypothesis). Fix: `apply_polish_results` re-injects deterministic frontmatter fields via `_replace_polished_frontmatter` (`generator.py:483`, `_DETERMINISTIC_FRONTMATTER_FIELDS`). Regression test: `tests/unit/test_polished_frontmatter_reinjection.py`. CHANGELOG documents it under [0.14.2]. Phase 3 release shipped; attune-gui can pin ≥0.14.2 to unblock its Phase 2.
 **Owner:** Patrick
 **Filed:** 2026-05-25 (handoff from attune-gui Phase 2 blockers; see [attune-gui docs/specs/living-docs-regen-automation/decisions.md](https://github.com/Smart-AI-Memory/attune-gui/blob/main/docs/specs/living-docs-regen-automation/decisions.md#phase-2-blockers-discovered-2026-05-23))
 

diff --git a/docs/specs/polish-cache-hit-metrics/tasks.md b/docs/specs/polish-cache-hit-metrics/tasks.md
diff --git a/src/attune_author/doc_gen/_anthropic.py b/src/attune_author/doc_gen/_anthropic.py
@@ -16,6 +16,8 @@
 from typing import TYPE_CHECKING
 
 if TYPE_CHECKING:
+    from collections.abc import Callable
+
     from anthropic import Anthropic
 
 logger = logging.getLogger(__name__)
@@ -102,6 +104,7 @@ def call_anthropic(
     model: str,
     max_tokens: int,
     cache_system: bool = False,
+    on_cache_usage: Callable[[int, int, str], None] | None = None,
 ) -> str:
     """Make a single-turn ``messages.create`` call with retry/backoff.
 
@@ -125,6 +128,13 @@ def call_anthropic(
             for sonnet/opus, 2048 for haiku); below that, the call
             still works but no cache is used. Cache token usage is
             emitted at INFO so callers can verify hits.
+        on_cache_usage: Optional callback invoked once per successful
+            call with ``(cache_creation_input_tokens,
+            cache_read_input_tokens, model)``. Lets a caller (e.g. the
+            polish pass) accumulate cache hit-rate telemetry without
+            this module owning that concern. Fired even when both
+            counts are zero so callers can distinguish "no cache
+            configured" from "never called".
 
     Returns:
         The first text block of the response, or the empty
@@ -164,7 +174,9 @@ def call_anthropic(
                 system=system_payload,
                 messages=[{"role": "user", "content": user_message}],
             )
-            _log_cache_usage(response, model)
+            creation, read = _log_cache_usage(response, model)
+            if on_cache_usage is not None:
+                on_cache_usage(creation, read, model)
             if response.content:
                 return response.content[0].text
             return ""
@@ -182,16 +194,20 @@ def call_anthropic(
     raise AnthropicCallError(_redact(str(last_exc))) from None
 
 
-def _log_cache_usage(response: object, model: str) -> None:
+def _log_cache_usage(response: object, model: str) -> tuple[int, int]:
     """Emit cache hit telemetry from an Anthropic response.
 
     Reads ``cache_creation_input_tokens`` and ``cache_read_input_tokens``
     from the response's usage object when present and logs them at INFO.
     Older SDK responses without those fields are silently skipped.
+
+    Returns:
+        ``(creation, read)`` token counts, defaulting to ``(0, 0)`` when
+        the response has no usage block or the SDK omits the fields.
     """
     usage = getattr(response, "usage", None)
     if usage is None:
-        return
+        return (0, 0)
     creation = getattr(usage, "cache_creation_input_tokens", 0) or 0
     read = getattr(usage, "cache_read_input_tokens", 0) or 0
     if creation or read:
@@ -201,3 +217,4 @@ def _log_cache_usage(response: object, model: str) -> None:
             creation,
             read,
         )
+    return (creation, read)