Smart-AI-Memory
diff --git a/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 20 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 27 additions & 0 deletions b/‎README.md‎
Lines changed: 27 additions & 0 deletions
diff --git a/‎…cs/polish-cache-hit-metrics/decisions.md‎ ‎…ve/polish-cache-hit-metrics/decisions.md‎docs/specs/polish-cache-hit-metrics/decisions.md renamed to docs/specs/archive/polish-cache-hit-metrics/decisions.md
Lines changed: 7 additions & 1 deletion b/‎…cs/polish-cache-hit-metrics/decisions.md‎ ‎…ve/polish-cache-hit-metrics/decisions.md‎docs/specs/polish-cache-hit-metrics/decisions.md renamed to docs/specs/archive/polish-cache-hit-metrics/decisions.md
Lines changed: 7 additions & 1 deletion
diff --git a/‎docs/specs/archive/polish-cache-hit-metrics/tasks.md‎
Lines changed: 72 additions & 0 deletions b/‎docs/specs/archive/polish-cache-hit-metrics/tasks.md‎
Lines changed: 72 additions & 0 deletions
diff --git a/‎docs/specs/polish-fact-check/decisions.md‎ ‎…s/archive/polish-fact-check/decisions.md‎docs/specs/polish-fact-check/decisions.md renamed to docs/specs/archive/polish-fact-check/decisions.md b/‎docs/specs/polish-fact-check/decisions.md‎ ‎…s/archive/polish-fact-check/decisions.md‎docs/specs/polish-fact-check/decisions.md renamed to docs/specs/archive/polish-fact-check/decisions.md
diff --git a/‎docs/specs/polish-fact-check/design.md‎ ‎…pecs/archive/polish-fact-check/design.md‎docs/specs/polish-fact-check/design.md renamed to docs/specs/archive/polish-fact-check/design.md b/‎docs/specs/polish-fact-check/design.md‎ ‎…pecs/archive/polish-fact-check/design.md‎docs/specs/polish-fact-check/design.md renamed to docs/specs/archive/polish-fact-check/design.md
diff --git a/‎…/specs/polish-fact-check/requirements.md‎ ‎…rchive/polish-fact-check/requirements.md‎docs/specs/polish-fact-check/requirements.md renamed to docs/specs/archive/polish-fact-check/requirements.md b/‎…/specs/polish-fact-check/requirements.md‎ ‎…rchive/polish-fact-check/requirements.md‎docs/specs/polish-fact-check/requirements.md renamed to docs/specs/archive/polish-fact-check/requirements.md
diff --git a/‎docs/specs/polish-fact-check/tasks.md‎ ‎…specs/archive/polish-fact-check/tasks.md‎docs/specs/polish-fact-check/tasks.md renamed to docs/specs/archive/polish-fact-check/tasks.md b/‎docs/specs/polish-fact-check/tasks.md‎ ‎…specs/archive/polish-fact-check/tasks.md‎docs/specs/polish-fact-check/tasks.md renamed to docs/specs/archive/polish-fact-check/tasks.md
diff --git a/‎docs/specs/regen-pipeline/design.md‎ ‎…s/specs/archive/regen-pipeline/design.md‎docs/specs/regen-pipeline/design.md renamed to docs/specs/archive/regen-pipeline/design.md
Lines changed: 20 additions & 1 deletion b/‎docs/specs/regen-pipeline/design.md‎ ‎…s/specs/archive/regen-pipeline/design.md‎docs/specs/regen-pipeline/design.md renamed to docs/specs/archive/regen-pipeline/design.md
Lines changed: 20 additions & 1 deletion
diff --git a/‎docs/specs/regen-pipeline/requirements.md‎ ‎…s/archive/regen-pipeline/requirements.md‎docs/specs/regen-pipeline/requirements.md renamed to docs/specs/archive/regen-pipeline/requirements.md
Lines changed: 23 additions & 1 deletion b/‎docs/specs/regen-pipeline/requirements.md‎ ‎…s/archive/regen-pipeline/requirements.md‎docs/specs/regen-pipeline/requirements.md renamed to docs/specs/archive/regen-pipeline/requirements.md
Lines changed: 23 additions & 1 deletion
@@ -10,6 +10,26 @@ and this project adheres to
 
 ## [Unreleased]
 
+### Added
+
+- **Polish prompt-cache hit-rate telemetry.** Each polish run now
+  tracks Anthropic prompt-cache token usage and logs a one-line
+  summary at the end of `attune-author regenerate`:
+  `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`.
+  A `WARNING` is appended when the run's hit rate falls below 50%,
+  surfacing silent cache regressions (prompt edits, model alias
+  drift). Hit rate is `read / (read + creation)` cacheable input
+  tokens.
+  - `attune_author.doc_gen._anthropic.call_anthropic` gains an optional
+    `on_cache_usage(creation, read, model)` callback; backward
+    compatible (the doc-gen path passes nothing).
+  - New in `attune_author.polish`: `PolishCacheStats`,
+    `polish_cache_stats()`, `format_polish_cache_summary()`,
+    `reset_polish_cache_telemetry()`. Telemetry follows the existing
+    in-process faithfulness-counter pattern (no new on-disk format).
+  - README: new "Cache hit rate" subsection under Polish cache.
+  - 16 new tests in `tests/unit/test_polish_cache_metrics.py`.
+
 ## [0.14.2] - 2026-05-27
 
 ### Fixed
 
@@ -257,6 +257,33 @@ volatile frontmatter fields like `generated_at` stripped),
 context, and model name. Changing the model automatically invalidates
 all prior entries.
 
+### Cache hit rate
+
+Separately from the on-disk response cache above, each polish call
+uses Anthropic's **prompt cache** for the ~6000-token system prompt.
+After a regen run, `attune-author` logs a one-line summary at INFO:
+
+```
+Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))
+```
+
+The hit rate is `read / (read + creation)` — the fraction of cacheable
+input tokens served from cache rather than re-billed. Prompt caching
+cuts input cost ~90% on the cached portion, so a healthy multi-template
+run should settle well above 50% once the first call warms the cache.
+
+- **High (>80%)** — expected steady state; the system prompt is being
+  reused across calls.
+- **Low (<50%)** — triggers a `WARNING` in the summary. Usually means
+  the cache boundary broke: the system prompt changed between calls,
+  the model alias drifted, or only a single template was polished (no
+  reuse). Check recent edits to `polish_prompts.py` or `_POLISH_MODEL`.
+- **"no cacheable tokens observed"** — the prompt fell below Anthropic's
+  caching threshold or caching is disabled (`POLISH_CACHE_SYSTEM`).
+
+The metric is per-run (in-process); it is not persisted across
+invocations.
+
 ## Python API
 
 ```python
 
@@ -1,6 +1,12 @@
 # Decisions — Polish prompt-cache hit-rate telemetry
 
-**Status:** Draft (2026-05-11) — gated on briefing-followup batch
+**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. The Draft
+"gated on briefing-followup batch" note was superseded by this file's
+own "Execution gate" ("Not blocking"). One deviation: attune-author has
+no telemetry JSONL, so the metric uses the existing in-process
+faithfulness-counter pattern (INFO summary at end of run) rather than a
+new JSONL file; the threshold warning is current-run, not cross-run.
+See `tasks.md` for the per-phase record.
 **Owner:** Patrick
 
 ---
 
@@ -0,0 +1,72 @@
+# Tasks — Polish prompt-cache hit-rate telemetry
+
+**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. See the
+"Deviation" note under Phases 3–4: attune-author has no JSONL
+telemetry, so the metric follows the existing in-process
+faithfulness-counter pattern (reset at run start, INFO summary at run
+end) instead of a new JSONL subsystem. Acceptance criteria in
+`decisions.md` are all met.
+
+## Phase 1 — Read the cache fields
+
+- [x] **1.1** Captured via a new `on_cache_usage(creation, read, model)`
+      callback on `doc_gen._anthropic.call_anthropic` (polish can't see
+      `response.usage` directly — `call_anthropic` returns only text).
+      `_log_cache_usage` now returns `(creation, read)`.
+- [x] **1.2** Compute hit rate: `read / max(read + creation, 1)`
+      (`PolishCacheStats.hit_rate`)
+- [x] **1.3** `PolishCacheStats` dataclass added in `polish.py`
+
+## Phase 2 — Surface to user
+
+- [x] **2.1** End-of-run summary logged at INFO via
+      `format_polish_cache_summary()`:
+      `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`
+- [x] **2.2** Graceful when both are zero:
+      `Polish cache: no cacheable tokens observed (cache not configured?)`
+
+## Phase 3 — Log to telemetry  *(deviation, see note)*
+
+- [x] **3.1** ~~Append per-call to existing telemetry JSONL~~ →
+      **There is no telemetry JSONL in attune-author.** Adopted the
+      existing in-process counter idiom (`_polish_cache_telemetry()` +
+      `reset_polish_cache_telemetry()`, mirroring
+      `generator._faithfulness_telemetry`), surfaced via the INFO
+      end-of-run summary in `maintenance.py`. Building a JSONL
+      subsystem would contradict the spec's "low effort, single file"
+      scope and the codebase's telemetry pattern.
+- [x] **3.2** Aggregate fields: calls, creation_tokens, read_tokens,
+      derived hit_rate, model (model accepted by the callback; per-model
+      breakdown explicitly out of scope per decisions.md).
+
+## Phase 4 — Threshold warning  *(deviation: current-run, not cross-run)*
+
+- [x] **4.1–4.3** `format_polish_cache_summary()` appends a `WARNING`
+      when the **current run's** hit rate < 50% (`_CACHE_HIT_WARN_THRESHOLD`)
+      and ≥1 cacheable token was seen, with a pointer to the README.
+      Cross-run rolling history (last N records) is deferred — it would
+      require the persistent JSONL layer this spec deliberately avoided.
+
+## Phase 5 — Test
+
+- [x] **5.1** `tests/unit/test_polish_cache_metrics.py`: mocks Anthropic
+      responses with known cache_creation/cache_read values; asserts the
+      callback fires (incl. the zero case), hit-rate math, accumulator,
+      summary line, and threshold warning (16 tests).
+- [ ] **5.2** Integration test (optional) — **skipped**: would require a
+      live API key (real prompt-cache hits can't be observed against a
+      mock). The unit tests cover the compute path; left optional as the
+      spec allowed.
+
+## Phase 6 — Docs
+
+- [x] **6.1** README "Cache hit rate" subsection — meaning, healthy
+      ranges, what to do when it drops.
+- [x] **6.2** CHANGELOG [Unreleased] entry added.
+
+## Out of scope
+
+- Per-stage cache breakdown (system / examples / messages)
+- Cost-in-dollars tracking (token-level only)
+- Cache strategy changes
+- Cross-package telemetry aggregation
@@ -1,8 +1,27 @@
 # Spec: Regen Pipeline — Design
 
+> ## ⚠️ OBSOLETE — do not implement (reconciled 2026-06-06)
+>
+> This design was never built and conflicts with the shipped architecture. It
+> assumes a single `corpus_root`, a React/JSX frontend (`App.jsx`,
+> `CorpusSetup`), a polish+Haiku `_regen` pipeline, and WS-badge wiring — none
+> of which exist. The shipped reality instead uses:
+>
+> - **Regen:** `sidecar/attune_gui/routes/living_docs.py` →
+>   `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry
+>   (`attune_gui.jobs`) → `_regenerate_doc_executor` →
+>   `attune_author.generator.generate_feature_templates` + `load_manifest`.
+> - **Corpus config:** multi-corpus registry (`attune_gui.editor_corpora`,
+>   `POST /api/corpus/register`) + workspace config (`attune_gui.workspace`,
+>   `living_docs.py` `get_config`/`set_config`).
+> - **Frontend:** TypeScript (`editor-frontend/src/corpus-switcher.ts`), not React.
+> - **Bulk:** `make regen-all` (Makefile), not `POST /api/templates/refresh-all`.
+>
+> Kept verbatim below for historical context only. See `requirements.md` banner.
+
 ## Phase 2: Design
 
-**Status**: in-review
+**Status**: obsolete — superseded by living-docs regen automation (was "in-review", never built)
 
 ---
 
 
@@ -5,9 +5,31 @@
 
 ---
 
+> ## ⚠️ RECONCILED — satisfied-by-different-means (2026-06-06)
+>
+> This spec was previously marked "complete" with all tasks ✅, but a code
+> audit found **none** of its named symbols ever shipped (`_regen`,
+> `regen_template(corpus_root=…)`, `_resolve_corpus_root`, `atomic_write`,
+> `_patch_summaries_json`) and the attune-gui pieces (`/api/config`,
+> `refresh-all`, `CorpusSetup`) do not exist. The underlying need was instead
+> met by a **more evolved architecture**. All three user stories are satisfied:
+>
+> | User story | Status | Actual implementation |
+> |---|---|---|
+> | US1 — badge click → regen → saved to disk | ✅ met | `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry → `_regenerate_doc_executor` → `attune_author.generator.generate_feature_templates` (`sidecar/attune_gui/routes/living_docs.py`). Source-driven generation, not polish+Haiku. |
+> | US2 — first-run corpus setup UI | ✅ exceeded | Multi-corpus registry: `editor_corpora.py`, `POST /api/corpus/register`, `editor-frontend/src/corpus-switcher.ts` (dropdown + "Add corpus…" modal). |
+> | US3 — env auto-load on startup | ✅ met | Workspace config (`living_docs.py` `get_config`/`set_config`, `attune_gui.workspace`) + persisted corpus registry, replacing single `ATTUNE_CORPUS_ROOT`. |
+>
+> Bulk regen ships as the build-time `make regen-all` target (Makefile), not a
+> runtime "Regen all stale" button. The frontend is **TypeScript**, not the
+> React/JSX assumed by `design.md`.
+>
+> **No genuine product gaps remain.** This spec is retained for history; the
+> `design.md` below is **obsolete** (see its banner). Do not implement it.
+
 ## Phase 1: Requirements
 
-**Status**: approved
+**Status**: reconciled — satisfied by living-docs regen automation + corpus registry (was falsely marked "approved/complete")
 
 ### Problem statement