Skip to content

Commit e4ce8d3

Browse files
Polish cache-hit telemetry + spec reconcile & archive (#52)
* docs(specs): reconcile regen-pipeline — mark satisfied-by-different-means Audit found regen-pipeline was marked "complete" with all 24 tasks checked, but none of its named symbols ever shipped in either repo (attune-author: _regen, regen_template(corpus_root=...), _resolve_corpus_root, atomic_write, _patch_summaries_json; attune-gui: /api/config, /api/templates/refresh-all, /api/browse/directory, CorpusSetup, App.jsx). A bogus "Shipped" note had conflated it with the unrelated hash-mismatch regenerate CLI. The 3 user stories are all satisfied by a more evolved architecture: - regen: POST /api/living-docs/docs/{id}/regenerate (Jobs + generate_feature_templates) - corpus config: multi-corpus registry + workspace config - bulk: make regen-all No genuine product gaps remain. This commit corrects the spec docs: - requirements.md: status -> reconciled, with user-story->reality map - design.md: marked obsolete (assumes React/JSX + single corpus_root) - tasks.md: flags the false done-marks and corrects the Shipped note Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * docs(specs): mark regen-staleness-hash-mismatch DONE (shipped in #48/0.14.2) Status said "Implementation TBD" but the fix shipped in PR #48 (commit 1b1c7c5), released in 0.14.2: apply_polish_results now re-injects deterministic frontmatter via _replace_polished_frontmatter, with regression test tests/unit/test_polished_frontmatter_reinjection.py and a CHANGELOG entry. Status corrected to reflect shipped reality. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * feat(polish): prompt-cache hit-rate telemetry (spec polish-cache-hit-metrics) Each polish run now tracks Anthropic prompt-cache token usage and logs a one-line summary at the end of `attune-author regenerate`: Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s)) A WARNING is appended when the run's hit rate < 50% (with >=1 cacheable token), surfacing silent cache regressions (prompt edits, model alias drift). Hit rate = read / (read + creation) cacheable input tokens. Implementation: - doc_gen/_anthropic.call_anthropic gains an optional on_cache_usage(creation, read, model) callback; _log_cache_usage now returns (creation, read). Backward compatible — doc-gen passes nothing. - polish.py: PolishCacheStats dataclass, in-process accumulator (_polish_cache_telemetry / reset_polish_cache_telemetry, mirroring generator._faithfulness_telemetry), polish_cache_stats(), and format_polish_cache_summary(). _call_llm wires the callback. - maintenance.py: reset at run start, log summary at run end alongside the faithfulness summary. Deviation from the written spec: attune-author has no telemetry JSONL, so the metric follows the existing in-process faithfulness-counter pattern instead of a new JSONL subsystem; the threshold warning is current-run, not cross-run. Acceptance criteria in decisions.md all met. Tests: 16 new in tests/unit/test_polish_cache_metrics.py (callback firing incl. zero case, hit-rate math, accumulator, summary, warning). Docs: README "Cache hit rate" subsection; CHANGELOG [Unreleased]. Spec docs updated to DONE with the deviation noted. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore(help): regenerate polish/staleness templates after cache-metrics change Auto-regenerated by the pre-commit help-freshness hook following the polish prompt-cache telemetry work (d4af5a3). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * chore(specs): archive completed/superseded specs Move terminal specs into docs/specs/archive/ so they stop inflating the active count: polish-fact-check (v0.14.0), polish-cache-hit-metrics (done), regen-staleness-hash-mismatch (#48/0.14.2), regen-pipeline (superseded). skill-export-evangelism kept active (open). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> * Revert "chore(help): regenerate polish/staleness templates after cache-metrics change" This reverts commit b9edfc4. * test(batch): make batch-state fixture date relative to now The _state() helper hardcoded submitted_at=2026-05-08, which silently expired past the 29-day retention window on 2026-06-06 and broke the status/cancel tests (they read batch state without an injected now=). Default to now-1day so the fixture stays inside the window. Fixes the 3 date-bomb failures in test_maintenance_batch.py (TestStatus/TestCancel). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
1 parent 01db78a commit e4ce8d3

18 files changed

Lines changed: 531 additions & 67 deletions

File tree

CHANGELOG.md

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,26 @@ and this project adheres to
1010

1111
## [Unreleased]
1212

13+
### Added
14+
15+
- **Polish prompt-cache hit-rate telemetry.** Each polish run now
16+
tracks Anthropic prompt-cache token usage and logs a one-line
17+
summary at the end of `attune-author regenerate`:
18+
`Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`.
19+
A `WARNING` is appended when the run's hit rate falls below 50%,
20+
surfacing silent cache regressions (prompt edits, model alias
21+
drift). Hit rate is `read / (read + creation)` cacheable input
22+
tokens.
23+
- `attune_author.doc_gen._anthropic.call_anthropic` gains an optional
24+
`on_cache_usage(creation, read, model)` callback; backward
25+
compatible (the doc-gen path passes nothing).
26+
- New in `attune_author.polish`: `PolishCacheStats`,
27+
`polish_cache_stats()`, `format_polish_cache_summary()`,
28+
`reset_polish_cache_telemetry()`. Telemetry follows the existing
29+
in-process faithfulness-counter pattern (no new on-disk format).
30+
- README: new "Cache hit rate" subsection under Polish cache.
31+
- 16 new tests in `tests/unit/test_polish_cache_metrics.py`.
32+
1333
## [0.14.2] - 2026-05-27
1434

1535
### Fixed

README.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,33 @@ volatile frontmatter fields like `generated_at` stripped),
257257
context, and model name. Changing the model automatically invalidates
258258
all prior entries.
259259

260+
### Cache hit rate
261+
262+
Separately from the on-disk response cache above, each polish call
263+
uses Anthropic's **prompt cache** for the ~6000-token system prompt.
264+
After a regen run, `attune-author` logs a one-line summary at INFO:
265+
266+
```
267+
Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))
268+
```
269+
270+
The hit rate is `read / (read + creation)` — the fraction of cacheable
271+
input tokens served from cache rather than re-billed. Prompt caching
272+
cuts input cost ~90% on the cached portion, so a healthy multi-template
273+
run should settle well above 50% once the first call warms the cache.
274+
275+
- **High (>80%)** — expected steady state; the system prompt is being
276+
reused across calls.
277+
- **Low (<50%)** — triggers a `WARNING` in the summary. Usually means
278+
the cache boundary broke: the system prompt changed between calls,
279+
the model alias drifted, or only a single template was polished (no
280+
reuse). Check recent edits to `polish_prompts.py` or `_POLISH_MODEL`.
281+
- **"no cacheable tokens observed"** — the prompt fell below Anthropic's
282+
caching threshold or caching is disabled (`POLISH_CACHE_SYSTEM`).
283+
284+
The metric is per-run (in-process); it is not persisted across
285+
invocations.
286+
260287
## Python API
261288

262289
```python

docs/specs/polish-cache-hit-metrics/decisions.md renamed to docs/specs/archive/polish-cache-hit-metrics/decisions.md

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,12 @@
11
# Decisions — Polish prompt-cache hit-rate telemetry
22

3-
**Status:** Draft (2026-05-11) — gated on briefing-followup batch
3+
**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. The Draft
4+
"gated on briefing-followup batch" note was superseded by this file's
5+
own "Execution gate" ("Not blocking"). One deviation: attune-author has
6+
no telemetry JSONL, so the metric uses the existing in-process
7+
faithfulness-counter pattern (INFO summary at end of run) rather than a
8+
new JSONL file; the threshold warning is current-run, not cross-run.
9+
See `tasks.md` for the per-phase record.
410
**Owner:** Patrick
511

612
---
Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,72 @@
1+
# Tasks — Polish prompt-cache hit-rate telemetry
2+
3+
**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. See the
4+
"Deviation" note under Phases 3–4: attune-author has no JSONL
5+
telemetry, so the metric follows the existing in-process
6+
faithfulness-counter pattern (reset at run start, INFO summary at run
7+
end) instead of a new JSONL subsystem. Acceptance criteria in
8+
`decisions.md` are all met.
9+
10+
## Phase 1 — Read the cache fields
11+
12+
- [x] **1.1** Captured via a new `on_cache_usage(creation, read, model)`
13+
callback on `doc_gen._anthropic.call_anthropic` (polish can't see
14+
`response.usage` directly — `call_anthropic` returns only text).
15+
`_log_cache_usage` now returns `(creation, read)`.
16+
- [x] **1.2** Compute hit rate: `read / max(read + creation, 1)`
17+
(`PolishCacheStats.hit_rate`)
18+
- [x] **1.3** `PolishCacheStats` dataclass added in `polish.py`
19+
20+
## Phase 2 — Surface to user
21+
22+
- [x] **2.1** End-of-run summary logged at INFO via
23+
`format_polish_cache_summary()`:
24+
`Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`
25+
- [x] **2.2** Graceful when both are zero:
26+
`Polish cache: no cacheable tokens observed (cache not configured?)`
27+
28+
## Phase 3 — Log to telemetry *(deviation, see note)*
29+
30+
- [x] **3.1** ~~Append per-call to existing telemetry JSONL~~
31+
**There is no telemetry JSONL in attune-author.** Adopted the
32+
existing in-process counter idiom (`_polish_cache_telemetry()` +
33+
`reset_polish_cache_telemetry()`, mirroring
34+
`generator._faithfulness_telemetry`), surfaced via the INFO
35+
end-of-run summary in `maintenance.py`. Building a JSONL
36+
subsystem would contradict the spec's "low effort, single file"
37+
scope and the codebase's telemetry pattern.
38+
- [x] **3.2** Aggregate fields: calls, creation_tokens, read_tokens,
39+
derived hit_rate, model (model accepted by the callback; per-model
40+
breakdown explicitly out of scope per decisions.md).
41+
42+
## Phase 4 — Threshold warning *(deviation: current-run, not cross-run)*
43+
44+
- [x] **4.1–4.3** `format_polish_cache_summary()` appends a `WARNING`
45+
when the **current run's** hit rate < 50% (`_CACHE_HIT_WARN_THRESHOLD`)
46+
and ≥1 cacheable token was seen, with a pointer to the README.
47+
Cross-run rolling history (last N records) is deferred — it would
48+
require the persistent JSONL layer this spec deliberately avoided.
49+
50+
## Phase 5 — Test
51+
52+
- [x] **5.1** `tests/unit/test_polish_cache_metrics.py`: mocks Anthropic
53+
responses with known cache_creation/cache_read values; asserts the
54+
callback fires (incl. the zero case), hit-rate math, accumulator,
55+
summary line, and threshold warning (16 tests).
56+
- [ ] **5.2** Integration test (optional) — **skipped**: would require a
57+
live API key (real prompt-cache hits can't be observed against a
58+
mock). The unit tests cover the compute path; left optional as the
59+
spec allowed.
60+
61+
## Phase 6 — Docs
62+
63+
- [x] **6.1** README "Cache hit rate" subsection — meaning, healthy
64+
ranges, what to do when it drops.
65+
- [x] **6.2** CHANGELOG [Unreleased] entry added.
66+
67+
## Out of scope
68+
69+
- Per-stage cache breakdown (system / examples / messages)
70+
- Cost-in-dollars tracking (token-level only)
71+
- Cache strategy changes
72+
- Cross-package telemetry aggregation
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.

docs/specs/regen-pipeline/design.md renamed to docs/specs/archive/regen-pipeline/design.md

Lines changed: 20 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,8 +1,27 @@
11
# Spec: Regen Pipeline — Design
22

3+
> ## ⚠️ OBSOLETE — do not implement (reconciled 2026-06-06)
4+
>
5+
> This design was never built and conflicts with the shipped architecture. It
6+
> assumes a single `corpus_root`, a React/JSX frontend (`App.jsx`,
7+
> `CorpusSetup`), a polish+Haiku `_regen` pipeline, and WS-badge wiring — none
8+
> of which exist. The shipped reality instead uses:
9+
>
10+
> - **Regen:** `sidecar/attune_gui/routes/living_docs.py`
11+
> `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry
12+
> (`attune_gui.jobs`) → `_regenerate_doc_executor`
13+
> `attune_author.generator.generate_feature_templates` + `load_manifest`.
14+
> - **Corpus config:** multi-corpus registry (`attune_gui.editor_corpora`,
15+
> `POST /api/corpus/register`) + workspace config (`attune_gui.workspace`,
16+
> `living_docs.py` `get_config`/`set_config`).
17+
> - **Frontend:** TypeScript (`editor-frontend/src/corpus-switcher.ts`), not React.
18+
> - **Bulk:** `make regen-all` (Makefile), not `POST /api/templates/refresh-all`.
19+
>
20+
> Kept verbatim below for historical context only. See `requirements.md` banner.
21+
322
## Phase 2: Design
423

5-
**Status**: in-review
24+
**Status**: obsolete — superseded by living-docs regen automation (was "in-review", never built)
625

726
---
827

docs/specs/regen-pipeline/requirements.md renamed to docs/specs/archive/regen-pipeline/requirements.md

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,9 +5,31 @@
55
66
---
77

8+
> ## ⚠️ RECONCILED — satisfied-by-different-means (2026-06-06)
9+
>
10+
> This spec was previously marked "complete" with all tasks ✅, but a code
11+
> audit found **none** of its named symbols ever shipped (`_regen`,
12+
> `regen_template(corpus_root=…)`, `_resolve_corpus_root`, `atomic_write`,
13+
> `_patch_summaries_json`) and the attune-gui pieces (`/api/config`,
14+
> `refresh-all`, `CorpusSetup`) do not exist. The underlying need was instead
15+
> met by a **more evolved architecture**. All three user stories are satisfied:
16+
>
17+
> | User story | Status | Actual implementation |
18+
> |---|---|---|
19+
> | US1 — badge click → regen → saved to disk | ✅ met | `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry → `_regenerate_doc_executor``attune_author.generator.generate_feature_templates` (`sidecar/attune_gui/routes/living_docs.py`). Source-driven generation, not polish+Haiku. |
20+
> | US2 — first-run corpus setup UI | ✅ exceeded | Multi-corpus registry: `editor_corpora.py`, `POST /api/corpus/register`, `editor-frontend/src/corpus-switcher.ts` (dropdown + "Add corpus…" modal). |
21+
> | US3 — env auto-load on startup | ✅ met | Workspace config (`living_docs.py` `get_config`/`set_config`, `attune_gui.workspace`) + persisted corpus registry, replacing single `ATTUNE_CORPUS_ROOT`. |
22+
>
23+
> Bulk regen ships as the build-time `make regen-all` target (Makefile), not a
24+
> runtime "Regen all stale" button. The frontend is **TypeScript**, not the
25+
> React/JSX assumed by `design.md`.
26+
>
27+
> **No genuine product gaps remain.** This spec is retained for history; the
28+
> `design.md` below is **obsolete** (see its banner). Do not implement it.
29+
830
## Phase 1: Requirements
931

10-
**Status**: approved
32+
**Status**: reconciled — satisfied by living-docs regen automation + corpus registry (was falsely marked "approved/complete")
1133

1234
### Problem statement
1335

0 commit comments

Comments
 (0)