From e9ea6719e34f1405347b9794602adf141276e383 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Sat, 6 Jun 2026 19:05:16 -0400 Subject: [PATCH 1/7] =?UTF-8?q?docs(specs):=20reconcile=20regen-pipeline?= =?UTF-8?q?=20=E2=80=94=20mark=20satisfied-by-different-means?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Audit found regen-pipeline was marked "complete" with all 24 tasks checked, but none of its named symbols ever shipped in either repo (attune-author: _regen, regen_template(corpus_root=...), _resolve_corpus_root, atomic_write, _patch_summaries_json; attune-gui: /api/config, /api/templates/refresh-all, /api/browse/directory, CorpusSetup, App.jsx). A bogus "Shipped" note had conflated it with the unrelated hash-mismatch regenerate CLI. The 3 user stories are all satisfied by a more evolved architecture: - regen: POST /api/living-docs/docs/{id}/regenerate (Jobs + generate_feature_templates) - corpus config: multi-corpus registry + workspace config - bulk: make regen-all No genuine product gaps remain. This commit corrects the spec docs: - requirements.md: status -> reconciled, with user-story->reality map - design.md: marked obsolete (assumes React/JSX + single corpus_root) - tasks.md: flags the false done-marks and corrects the Shipped note Co-Authored-By: Claude Opus 4.8 --- docs/specs/regen-pipeline/design.md | 21 +++++++++++++++++++- docs/specs/regen-pipeline/requirements.md | 24 ++++++++++++++++++++++- docs/specs/regen-pipeline/tasks.md | 24 +++++++++++++++++++++-- 3 files changed, 65 insertions(+), 4 deletions(-) diff --git a/docs/specs/regen-pipeline/design.md b/docs/specs/regen-pipeline/design.md index 718abf3..e8a63fd 100644 --- a/docs/specs/regen-pipeline/design.md +++ b/docs/specs/regen-pipeline/design.md @@ -1,8 +1,27 @@ # Spec: Regen Pipeline — Design +> ## ⚠️ OBSOLETE — do not implement (reconciled 2026-06-06) +> +> This design was never built and conflicts with the shipped architecture. It +> assumes a single `corpus_root`, a React/JSX frontend (`App.jsx`, +> `CorpusSetup`), a polish+Haiku `_regen` pipeline, and WS-badge wiring — none +> of which exist. The shipped reality instead uses: +> +> - **Regen:** `sidecar/attune_gui/routes/living_docs.py` → +> `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry +> (`attune_gui.jobs`) → `_regenerate_doc_executor` → +> `attune_author.generator.generate_feature_templates` + `load_manifest`. +> - **Corpus config:** multi-corpus registry (`attune_gui.editor_corpora`, +> `POST /api/corpus/register`) + workspace config (`attune_gui.workspace`, +> `living_docs.py` `get_config`/`set_config`). +> - **Frontend:** TypeScript (`editor-frontend/src/corpus-switcher.ts`), not React. +> - **Bulk:** `make regen-all` (Makefile), not `POST /api/templates/refresh-all`. +> +> Kept verbatim below for historical context only. See `requirements.md` banner. + ## Phase 2: Design -**Status**: in-review +**Status**: obsolete — superseded by living-docs regen automation (was "in-review", never built) --- diff --git a/docs/specs/regen-pipeline/requirements.md b/docs/specs/regen-pipeline/requirements.md index f94af8d..6afcda8 100644 --- a/docs/specs/regen-pipeline/requirements.md +++ b/docs/specs/regen-pipeline/requirements.md @@ -5,9 +5,31 @@ --- +> ## ⚠️ RECONCILED — satisfied-by-different-means (2026-06-06) +> +> This spec was previously marked "complete" with all tasks ✅, but a code +> audit found **none** of its named symbols ever shipped (`_regen`, +> `regen_template(corpus_root=…)`, `_resolve_corpus_root`, `atomic_write`, +> `_patch_summaries_json`) and the attune-gui pieces (`/api/config`, +> `refresh-all`, `CorpusSetup`) do not exist. The underlying need was instead +> met by a **more evolved architecture**. All three user stories are satisfied: +> +> | User story | Status | Actual implementation | +> |---|---|---| +> | US1 — badge click → regen → saved to disk | ✅ met | `POST /api/living-docs/docs/{id}/regenerate` → Jobs registry → `_regenerate_doc_executor` → `attune_author.generator.generate_feature_templates` (`sidecar/attune_gui/routes/living_docs.py`). Source-driven generation, not polish+Haiku. | +> | US2 — first-run corpus setup UI | ✅ exceeded | Multi-corpus registry: `editor_corpora.py`, `POST /api/corpus/register`, `editor-frontend/src/corpus-switcher.ts` (dropdown + "Add corpus…" modal). | +> | US3 — env auto-load on startup | ✅ met | Workspace config (`living_docs.py` `get_config`/`set_config`, `attune_gui.workspace`) + persisted corpus registry, replacing single `ATTUNE_CORPUS_ROOT`. | +> +> Bulk regen ships as the build-time `make regen-all` target (Makefile), not a +> runtime "Regen all stale" button. The frontend is **TypeScript**, not the +> React/JSX assumed by `design.md`. +> +> **No genuine product gaps remain.** This spec is retained for history; the +> `design.md` below is **obsolete** (see its banner). Do not implement it. + ## Phase 1: Requirements -**Status**: approved +**Status**: reconciled — satisfied by living-docs regen automation + corpus registry (was falsely marked "approved/complete") ### Problem statement diff --git a/docs/specs/regen-pipeline/tasks.md b/docs/specs/regen-pipeline/tasks.md index 9cc0f78..7e12159 100644 --- a/docs/specs/regen-pipeline/tasks.md +++ b/docs/specs/regen-pipeline/tasks.md @@ -2,9 +2,29 @@ ## Phase 3: Tasks -**Status**: complete +**Status**: NOT done as written — reconciled 2026-06-06 -> Shipped: `attune-author regenerate` CLI lives in `src/attune_author/cli.py:507` (handler) with the parser registered around line 154. Core logic in `maintenance.py` and `maintenance_batch.py`. CHANGELOG documents the batch variant. +> ## ⚠️ The task table below is INACCURATE +> +> A 2026-06-06 code audit found that **none** of the attune-author symbols in +> tasks 2–9 ever shipped (`_resolve_corpus_root`, `atomic_write`, +> `_patch_summaries_json`, `regen_template(corpus_root=…)`, `_regen`) and +> **none** of the attune-gui pieces in tasks 10–24 exist (`config.py` +> `ConfigState`, `/api/config`, `/api/templates/refresh-all`, +> `/api/browse/directory`, `CorpusSetup`, `App.jsx`). The "done" marks below are +> false. The earlier "Shipped" note conflated this spec with the unrelated +> hash-mismatch `attune-author regenerate` CLI — a different feature. +> +> **What actually satisfies the spec's user stories** (see `requirements.md` +> banner for the full mapping): +> - Single-doc regen → `POST /api/living-docs/docs/{id}/regenerate` (Jobs + +> `attune_author.generator.generate_feature_templates`). +> - Corpus config → corpus registry (`editor_corpora.py`, +> `/api/corpus/register`) + workspace config (`attune_gui.workspace`). +> - Bulk → `make regen-all` (Makefile). +> +> No code action is required: the product need is met. The table below is left +> intact only as a record of the original (unbuilt) plan. ### Implementation order From e1eef7b29d74c37b5beb5edc676c9ab66cd4d828 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Sat, 6 Jun 2026 19:08:06 -0400 Subject: [PATCH 2/7] docs(specs): mark regen-staleness-hash-mismatch DONE (shipped in #48/0.14.2) Status said "Implementation TBD" but the fix shipped in PR #48 (commit 1b1c7c5), released in 0.14.2: apply_polish_results now re-injects deterministic frontmatter via _replace_polished_frontmatter, with regression test tests/unit/test_polished_frontmatter_reinjection.py and a CHANGELOG entry. Status corrected to reflect shipped reality. Co-Authored-By: Claude Opus 4.8 --- docs/specs/regen-staleness-hash-mismatch/decisions.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/specs/regen-staleness-hash-mismatch/decisions.md b/docs/specs/regen-staleness-hash-mismatch/decisions.md index 9cbde17..7e015e4 100644 --- a/docs/specs/regen-staleness-hash-mismatch/decisions.md +++ b/docs/specs/regen-staleness-hash-mismatch/decisions.md @@ -1,6 +1,6 @@ # Decisions — Regen / staleness hash mismatch -**Status:** root cause confirmed 2026-05-27 — original hypothesis (budget truncation of hash inputs) was wrong; actual cause is LLM-polished frontmatter laundering. Fix direction concrete. Implementation TBD. +**Status:** ✅ DONE — shipped in PR #48 (commit 1b1c7c5) / v0.14.2. Root cause was LLM-polished frontmatter laundering (not the original budget-truncation hypothesis). Fix: `apply_polish_results` re-injects deterministic frontmatter fields via `_replace_polished_frontmatter` (`generator.py:483`, `_DETERMINISTIC_FRONTMATTER_FIELDS`). Regression test: `tests/unit/test_polished_frontmatter_reinjection.py`. CHANGELOG documents it under [0.14.2]. Phase 3 release shipped; attune-gui can pin ≥0.14.2 to unblock its Phase 2. **Owner:** Patrick **Filed:** 2026-05-25 (handoff from attune-gui Phase 2 blockers; see [attune-gui docs/specs/living-docs-regen-automation/decisions.md](https://github.com/Smart-AI-Memory/attune-gui/blob/main/docs/specs/living-docs-regen-automation/decisions.md#phase-2-blockers-discovered-2026-05-23)) From d4af5a33b05eff61bb3def7890c68c84424ae396 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Sat, 6 Jun 2026 19:19:46 -0400 Subject: [PATCH 3/7] feat(polish): prompt-cache hit-rate telemetry (spec polish-cache-hit-metrics) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Each polish run now tracks Anthropic prompt-cache token usage and logs a one-line summary at the end of `attune-author regenerate`: Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s)) A WARNING is appended when the run's hit rate < 50% (with >=1 cacheable token), surfacing silent cache regressions (prompt edits, model alias drift). Hit rate = read / (read + creation) cacheable input tokens. Implementation: - doc_gen/_anthropic.call_anthropic gains an optional on_cache_usage(creation, read, model) callback; _log_cache_usage now returns (creation, read). Backward compatible — doc-gen passes nothing. - polish.py: PolishCacheStats dataclass, in-process accumulator (_polish_cache_telemetry / reset_polish_cache_telemetry, mirroring generator._faithfulness_telemetry), polish_cache_stats(), and format_polish_cache_summary(). _call_llm wires the callback. - maintenance.py: reset at run start, log summary at run end alongside the faithfulness summary. Deviation from the written spec: attune-author has no telemetry JSONL, so the metric follows the existing in-process faithfulness-counter pattern instead of a new JSONL subsystem; the threshold warning is current-run, not cross-run. Acceptance criteria in decisions.md all met. Tests: 16 new in tests/unit/test_polish_cache_metrics.py (callback firing incl. zero case, hit-rate math, accumulator, summary, warning). Docs: README "Cache hit rate" subsection; CHANGELOG [Unreleased]. Spec docs updated to DONE with the deviation noted. Co-Authored-By: Claude Opus 4.8 --- CHANGELOG.md | 20 ++ README.md | 27 +++ .../polish-cache-hit-metrics/decisions.md | 8 +- docs/specs/polish-cache-hit-metrics/tasks.md | 79 +++++--- src/attune_author/doc_gen/_anthropic.py | 23 ++- src/attune_author/maintenance.py | 11 ++ src/attune_author/polish.py | 118 ++++++++++++ tests/unit/test_polish_cache_metrics.py | 182 ++++++++++++++++++ 8 files changed, 433 insertions(+), 35 deletions(-) create mode 100644 tests/unit/test_polish_cache_metrics.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 71c658e..c1931f8 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,6 +10,26 @@ and this project adheres to ## [Unreleased] +### Added + +- **Polish prompt-cache hit-rate telemetry.** Each polish run now + tracks Anthropic prompt-cache token usage and logs a one-line + summary at the end of `attune-author regenerate`: + `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))`. + A `WARNING` is appended when the run's hit rate falls below 50%, + surfacing silent cache regressions (prompt edits, model alias + drift). Hit rate is `read / (read + creation)` cacheable input + tokens. + - `attune_author.doc_gen._anthropic.call_anthropic` gains an optional + `on_cache_usage(creation, read, model)` callback; backward + compatible (the doc-gen path passes nothing). + - New in `attune_author.polish`: `PolishCacheStats`, + `polish_cache_stats()`, `format_polish_cache_summary()`, + `reset_polish_cache_telemetry()`. Telemetry follows the existing + in-process faithfulness-counter pattern (no new on-disk format). + - README: new "Cache hit rate" subsection under Polish cache. + - 16 new tests in `tests/unit/test_polish_cache_metrics.py`. + ## [0.14.2] - 2026-05-27 ### Fixed diff --git a/README.md b/README.md index d12d498..4985c0f 100644 --- a/README.md +++ b/README.md @@ -257,6 +257,33 @@ volatile frontmatter fields like `generated_at` stripped), context, and model name. Changing the model automatically invalidates all prior entries. +### Cache hit rate + +Separately from the on-disk response cache above, each polish call +uses Anthropic's **prompt cache** for the ~6000-token system prompt. +After a regen run, `attune-author` logs a one-line summary at INFO: + +``` +Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s)) +``` + +The hit rate is `read / (read + creation)` — the fraction of cacheable +input tokens served from cache rather than re-billed. Prompt caching +cuts input cost ~90% on the cached portion, so a healthy multi-template +run should settle well above 50% once the first call warms the cache. + +- **High (>80%)** — expected steady state; the system prompt is being + reused across calls. +- **Low (<50%)** — triggers a `WARNING` in the summary. Usually means + the cache boundary broke: the system prompt changed between calls, + the model alias drifted, or only a single template was polished (no + reuse). Check recent edits to `polish_prompts.py` or `_POLISH_MODEL`. +- **"no cacheable tokens observed"** — the prompt fell below Anthropic's + caching threshold or caching is disabled (`POLISH_CACHE_SYSTEM`). + +The metric is per-run (in-process); it is not persisted across +invocations. + ## Python API ```python diff --git a/docs/specs/polish-cache-hit-metrics/decisions.md b/docs/specs/polish-cache-hit-metrics/decisions.md index 11d0be5..4423033 100644 --- a/docs/specs/polish-cache-hit-metrics/decisions.md +++ b/docs/specs/polish-cache-hit-metrics/decisions.md @@ -1,6 +1,12 @@ # Decisions — Polish prompt-cache hit-rate telemetry -**Status:** Draft (2026-05-11) — gated on briefing-followup batch +**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. The Draft +"gated on briefing-followup batch" note was superseded by this file's +own "Execution gate" ("Not blocking"). One deviation: attune-author has +no telemetry JSONL, so the metric uses the existing in-process +faithfulness-counter pattern (INFO summary at end of run) rather than a +new JSONL file; the threshold warning is current-run, not cross-run. +See `tasks.md` for the per-phase record. **Owner:** Patrick --- diff --git a/docs/specs/polish-cache-hit-metrics/tasks.md b/docs/specs/polish-cache-hit-metrics/tasks.md index dfc6bf9..d1a6c04 100644 --- a/docs/specs/polish-cache-hit-metrics/tasks.md +++ b/docs/specs/polish-cache-hit-metrics/tasks.md @@ -1,51 +1,68 @@ # Tasks — Polish prompt-cache hit-rate telemetry +**Status:** ✅ DONE (2026-06-06) — shipped to [Unreleased]. See the +"Deviation" note under Phases 3–4: attune-author has no JSONL +telemetry, so the metric follows the existing in-process +faithfulness-counter pattern (reset at run start, INFO summary at run +end) instead of a new JSONL subsystem. Acceptance criteria in +`decisions.md` are all met. + ## Phase 1 — Read the cache fields -- [ ] **1.1** In `attune_author/polish.py`, capture - `response.usage.cache_creation_input_tokens` and - `response.usage.cache_read_input_tokens` from each - Anthropic API call -- [ ] **1.2** Compute hit rate: - `read / max(read + creation, 1)` -- [ ] **1.3** Add a `PolishCacheStats` dataclass for - structured passing +- [x] **1.1** Captured via a new `on_cache_usage(creation, read, model)` + callback on `doc_gen._anthropic.call_anthropic` (polish can't see + `response.usage` directly — `call_anthropic` returns only text). + `_log_cache_usage` now returns `(creation, read)`. +- [x] **1.2** Compute hit rate: `read / max(read + creation, 1)` + (`PolishCacheStats.hit_rate`) +- [x] **1.3** `PolishCacheStats` dataclass added in `polish.py` ## Phase 2 — Surface to user -- [ ] **2.1** Print a one-line summary at end of polish run: - `Polish complete · cache hit: 87% (1241 read / 1421 total tokens)` -- [ ] **2.2** Format gracefully when both are zero (no cache - configured) +- [x] **2.1** End-of-run summary logged at INFO via + `format_polish_cache_summary()`: + `Polish cache hit: 87% (1241 read / 1421 total tokens, 6 call(s))` +- [x] **2.2** Graceful when both are zero: + `Polish cache: no cacheable tokens observed (cache not configured?)` -## Phase 3 — Log to telemetry +## Phase 3 — Log to telemetry *(deviation, see note)* -- [ ] **3.1** Append per-call to existing telemetry JSONL - (wherever attune-author writes telemetry) -- [ ] **3.2** Fields: timestamp, model, hit_rate, - read_tokens, creation_tokens, polish_target +- [x] **3.1** ~~Append per-call to existing telemetry JSONL~~ → + **There is no telemetry JSONL in attune-author.** Adopted the + existing in-process counter idiom (`_polish_cache_telemetry()` + + `reset_polish_cache_telemetry()`, mirroring + `generator._faithfulness_telemetry`), surfaced via the INFO + end-of-run summary in `maintenance.py`. Building a JSONL + subsystem would contradict the spec's "low effort, single file" + scope and the codebase's telemetry pattern. +- [x] **3.2** Aggregate fields: calls, creation_tokens, read_tokens, + derived hit_rate, model (model accepted by the callback; per-model + breakdown explicitly out of scope per decisions.md). -## Phase 4 — Threshold warning +## Phase 4 — Threshold warning *(deviation: current-run, not cross-run)* -- [ ] **4.1** When invoked, read last N (e.g., 10) telemetry - records -- [ ] **4.2** Compute rolling mean hit rate -- [ ] **4.3** If <50%, print a warning at end of run with - pointer to docs +- [x] **4.1–4.3** `format_polish_cache_summary()` appends a `WARNING` + when the **current run's** hit rate < 50% (`_CACHE_HIT_WARN_THRESHOLD`) + and ≥1 cacheable token was seen, with a pointer to the README. + Cross-run rolling history (last N records) is deferred — it would + require the persistent JSONL layer this spec deliberately avoided. ## Phase 5 — Test -- [ ] **5.1** Unit test: mock an Anthropic response with known - cache_creation / cache_read values; assert hit rate - computed correctly -- [ ] **5.2** Integration test (optional): run polish twice - back-to-back; second run should report >0% cache hit +- [x] **5.1** `tests/unit/test_polish_cache_metrics.py`: mocks Anthropic + responses with known cache_creation/cache_read values; asserts the + callback fires (incl. the zero case), hit-rate math, accumulator, + summary line, and threshold warning (16 tests). +- [ ] **5.2** Integration test (optional) — **skipped**: would require a + live API key (real prompt-cache hits can't be observed against a + mock). The unit tests cover the compute path; left optional as the + spec allowed. ## Phase 6 — Docs -- [ ] **6.1** README section: "Cache hit rate" — what it means, - what good values look like, what to do if it drops -- [ ] **6.2** Link from CHANGELOG when feature ships +- [x] **6.1** README "Cache hit rate" subsection — meaning, healthy + ranges, what to do when it drops. +- [x] **6.2** CHANGELOG [Unreleased] entry added. ## Out of scope diff --git a/src/attune_author/doc_gen/_anthropic.py b/src/attune_author/doc_gen/_anthropic.py index e47f801..d3cf208 100644 --- a/src/attune_author/doc_gen/_anthropic.py +++ b/src/attune_author/doc_gen/_anthropic.py @@ -16,6 +16,8 @@ from typing import TYPE_CHECKING if TYPE_CHECKING: + from collections.abc import Callable + from anthropic import Anthropic logger = logging.getLogger(__name__) @@ -102,6 +104,7 @@ def call_anthropic( model: str, max_tokens: int, cache_system: bool = False, + on_cache_usage: Callable[[int, int, str], None] | None = None, ) -> str: """Make a single-turn ``messages.create`` call with retry/backoff. @@ -125,6 +128,13 @@ def call_anthropic( for sonnet/opus, 2048 for haiku); below that, the call still works but no cache is used. Cache token usage is emitted at INFO so callers can verify hits. + on_cache_usage: Optional callback invoked once per successful + call with ``(cache_creation_input_tokens, + cache_read_input_tokens, model)``. Lets a caller (e.g. the + polish pass) accumulate cache hit-rate telemetry without + this module owning that concern. Fired even when both + counts are zero so callers can distinguish "no cache + configured" from "never called". Returns: The first text block of the response, or the empty @@ -164,7 +174,9 @@ def call_anthropic( system=system_payload, messages=[{"role": "user", "content": user_message}], ) - _log_cache_usage(response, model) + creation, read = _log_cache_usage(response, model) + if on_cache_usage is not None: + on_cache_usage(creation, read, model) if response.content: return response.content[0].text return "" @@ -182,16 +194,20 @@ def call_anthropic( raise AnthropicCallError(_redact(str(last_exc))) from None -def _log_cache_usage(response: object, model: str) -> None: +def _log_cache_usage(response: object, model: str) -> tuple[int, int]: """Emit cache hit telemetry from an Anthropic response. Reads ``cache_creation_input_tokens`` and ``cache_read_input_tokens`` from the response's usage object when present and logs them at INFO. Older SDK responses without those fields are silently skipped. + + Returns: + ``(creation, read)`` token counts, defaulting to ``(0, 0)`` when + the response has no usage block or the SDK omits the fields. """ usage = getattr(response, "usage", None) if usage is None: - return + return (0, 0) creation = getattr(usage, "cache_creation_input_tokens", 0) or 0 read = getattr(usage, "cache_read_input_tokens", 0) or 0 if creation or read: @@ -201,3 +217,4 @@ def _log_cache_usage(response: object, model: str) -> None: creation, read, ) + return (creation, read) diff --git a/src/attune_author/maintenance.py b/src/attune_author/maintenance.py index 13788b5..7634fe9 100644 --- a/src/attune_author/maintenance.py +++ b/src/attune_author/maintenance.py @@ -102,8 +102,10 @@ def run_maintenance( # Reset Phase 3 faithfulness telemetry so the end-of-run summary # reflects this regen rather than carrying state across runs. from attune_author.generator import reset_faithfulness_telemetry + from attune_author.polish import reset_polish_cache_telemetry reset_faithfulness_telemetry() + reset_polish_cache_telemetry() for entry in report.help_entries: if not entry.is_stale: @@ -147,6 +149,15 @@ def run_maintenance( telemetry["cost_usd"], ) + # Prompt-cache hit-rate summary. Logged at INFO so it rides the + # same default `attune-author regenerate` output as the + # faithfulness line; silent when polish never ran this regen. + from attune_author.polish import format_polish_cache_summary + + cache_summary = format_polish_cache_summary() + if cache_summary is not None: + logger.info("%s", cache_summary) + return result diff --git a/src/attune_author/polish.py b/src/attune_author/polish.py index d4f1e0d..03b11df 100644 --- a/src/attune_author/polish.py +++ b/src/attune_author/polish.py @@ -39,6 +39,7 @@ import os import re import time +from dataclasses import dataclass from pathlib import Path from attune_author.doc_gen._anthropic import ( @@ -469,6 +470,118 @@ def build_polish_prompt( POLISH_MAX_TOKENS = 4096 POLISH_CACHE_SYSTEM = True +#: Rolling hit rate below which the end-of-run summary appends a +#: warning. A healthy polish run that re-touches templates should +#: sit well above this once the system prompt is cached; sustained +#: lows usually mean the cache boundary broke (prompt edit, model +#: alias drift) — see the README "Cache hit rate" section. +_CACHE_HIT_WARN_THRESHOLD = 0.5 + + +@dataclass(frozen=True) +class PolishCacheStats: + """Aggregate prompt-cache token usage across polish calls. + + ``creation_tokens`` are input tokens written into Anthropic's + prompt cache; ``read_tokens`` are input tokens served from it. + The hit rate is ``read / (read + creation)`` — the fraction of + cacheable input that came from cache rather than being re-billed. + """ + + calls: int = 0 + creation_tokens: int = 0 + read_tokens: int = 0 + + @property + def total_tokens(self) -> int: + return self.read_tokens + self.creation_tokens + + @property + def hit_rate(self) -> float: + """Cache read fraction in ``[0.0, 1.0]``; ``0.0`` when no + cacheable tokens were seen (avoids divide-by-zero).""" + return self.read_tokens / max(self.total_tokens, 1) + + def summary_line(self) -> str: + """One-line human summary for end-of-run output. + + Degrades gracefully when no cacheable tokens were seen (no + cache configured, or prompt below the caching threshold). + """ + if self.total_tokens == 0: + return "Polish cache: no cacheable tokens observed (cache not configured?)" + return ( + f"Polish cache hit: {self.hit_rate:.0%} " + f"({self.read_tokens} read / {self.total_tokens} total tokens, " + f"{self.calls} call(s))" + ) + + +def _polish_cache_telemetry() -> dict[str, int]: + """Per-process aggregate of prompt-cache token usage. + + Stored on the function as an attribute so the end-of-run summary + can read totals without module-level state — same idiom as + ``generator._faithfulness_telemetry``. Reset via + :func:`reset_polish_cache_telemetry`. + """ + state = getattr(_polish_cache_telemetry, "_state", None) + if state is None: + state = {"calls": 0, "creation": 0, "read": 0} + _polish_cache_telemetry._state = state # type: ignore[attr-defined] + return state + + +def reset_polish_cache_telemetry() -> None: + """Reset the per-process prompt-cache telemetry counters.""" + _polish_cache_telemetry._state = { # type: ignore[attr-defined] + "calls": 0, + "creation": 0, + "read": 0, + } + + +def _record_cache_usage(creation: int, read: int, model: str) -> None: + """Accumulate one polish call's cache token counts. + + Wired into :func:`call_anthropic` via its ``on_cache_usage`` + hook. ``model`` is accepted to match the callback signature but + not aggregated — per-model breakdown is out of scope (decisions.md). + """ + state = _polish_cache_telemetry() + state["calls"] += 1 + state["creation"] += creation + state["read"] += read + + +def polish_cache_stats() -> PolishCacheStats: + """Snapshot the current per-process prompt-cache aggregate.""" + state = _polish_cache_telemetry() + return PolishCacheStats( + calls=state["calls"], + creation_tokens=state["creation"], + read_tokens=state["read"], + ) + + +def format_polish_cache_summary() -> str | None: + """End-of-run summary line, or ``None`` if polish never ran. + + Appends a low-hit-rate warning when the run's hit rate falls below + :data:`_CACHE_HIT_WARN_THRESHOLD` and at least one cacheable token + was seen. Scope is the current process: callers reset at run start. + """ + stats = polish_cache_stats() + if stats.calls == 0: + return None + line = stats.summary_line() + if stats.total_tokens > 0 and stats.hit_rate < _CACHE_HIT_WARN_THRESHOLD: + line += ( + f" — WARNING: below {_CACHE_HIT_WARN_THRESHOLD:.0%}; " + "the prompt cache may have regressed (see README 'Cache hit rate')" + ) + return line + def _call_llm( content: str, @@ -516,6 +629,7 @@ def _call_llm( model=_POLISH_MODEL, max_tokens=POLISH_MAX_TOKENS, cache_system=POLISH_CACHE_SYSTEM, + on_cache_usage=_record_cache_usage, ) return polished or content @@ -525,12 +639,16 @@ def _call_llm( # or from the wrapping polish layer. __all__ = [ "AnthropicCallError", + "PolishCacheStats", "PolishError", "STRICT_ENV_VAR", "_env_strict_default", "build_source_summary", "clear_cache", + "format_polish_cache_summary", + "polish_cache_stats", "polish_template", + "reset_polish_cache_telemetry", ] diff --git a/tests/unit/test_polish_cache_metrics.py b/tests/unit/test_polish_cache_metrics.py new file mode 100644 index 0000000..a0ab06d --- /dev/null +++ b/tests/unit/test_polish_cache_metrics.py @@ -0,0 +1,182 @@ +"""Tests for polish prompt-cache hit-rate telemetry. + +Covers the spec ``polish-cache-hit-metrics``: capturing +``cache_creation_input_tokens`` / ``cache_read_input_tokens`` from +Anthropic responses, the ``PolishCacheStats`` hit-rate math, the +per-process accumulator, the end-of-run summary line, and the +low-hit-rate threshold warning. +""" + +from __future__ import annotations + +from unittest.mock import MagicMock + +import pytest + +from attune_author import polish +from attune_author.doc_gen import _anthropic +from attune_author.polish import ( + PolishCacheStats, + format_polish_cache_summary, + polish_cache_stats, + reset_polish_cache_telemetry, +) + + +@pytest.fixture(autouse=True) +def _clean_telemetry(): + """Each test starts and ends with zeroed counters so the + per-process accumulator can't leak across tests.""" + reset_polish_cache_telemetry() + yield + reset_polish_cache_telemetry() + + +# --------------------------------------------------------------------------- +# PolishCacheStats hit-rate math +# --------------------------------------------------------------------------- + + +class TestPolishCacheStats: + def test_hit_rate_basic(self) -> None: + stats = PolishCacheStats(calls=1, creation_tokens=180, read_tokens=1241) + assert stats.total_tokens == 1421 + assert stats.hit_rate == pytest.approx(1241 / 1421) + + def test_hit_rate_full_hit(self) -> None: + stats = PolishCacheStats(calls=2, creation_tokens=0, read_tokens=2048) + assert stats.hit_rate == pytest.approx(1.0) + + def test_hit_rate_zero_tokens_is_safe(self) -> None: + """No cacheable tokens must not divide by zero.""" + stats = PolishCacheStats(calls=1, creation_tokens=0, read_tokens=0) + assert stats.total_tokens == 0 + assert stats.hit_rate == 0.0 + + def test_summary_line_with_tokens(self) -> None: + stats = PolishCacheStats(calls=3, creation_tokens=180, read_tokens=1241) + line = stats.summary_line() + assert "87%" in line # 1241/1421 == 0.873... + assert "1241 read" in line + assert "1421 total" in line + assert "3 call(s)" in line + + def test_summary_line_graceful_when_zero(self) -> None: + stats = PolishCacheStats(calls=1, creation_tokens=0, read_tokens=0) + assert "no cacheable tokens" in stats.summary_line().lower() + + +# --------------------------------------------------------------------------- +# call_anthropic on_cache_usage callback (the capture path) +# --------------------------------------------------------------------------- + + +def _mock_client(creation: int, read: int) -> MagicMock: + client = MagicMock() + block = MagicMock() + block.text = "polished" + response = MagicMock() + response.content = [block] + response.usage.cache_creation_input_tokens = creation + response.usage.cache_read_input_tokens = read + client.messages.create.return_value = response + return client + + +class TestOnCacheUsageCallback: + def test_callback_fires_with_token_counts(self) -> None: + seen: list[tuple[int, int, str]] = [] + client = _mock_client(creation=1024, read=512) + + _anthropic.call_anthropic( + client, + system="s", + user_message="u", + model="claude-sonnet-4-6", + max_tokens=10, + on_cache_usage=lambda c, r, m: seen.append((c, r, m)), + ) + + assert seen == [(1024, 512, "claude-sonnet-4-6")] + + def test_callback_fires_even_when_zero(self) -> None: + """Caller must be able to tell 'no cache' from 'never called'.""" + seen: list[tuple[int, int, str]] = [] + client = _mock_client(creation=0, read=0) + + _anthropic.call_anthropic( + client, + system="s", + user_message="u", + model="m", + max_tokens=10, + on_cache_usage=lambda c, r, m: seen.append((c, r, m)), + ) + + assert seen == [(0, 0, "m")] + + def test_no_callback_is_fine(self) -> None: + """Omitting the callback (doc-gen path) must not error.""" + client = _mock_client(creation=10, read=10) + out = _anthropic.call_anthropic( + client, system="s", user_message="u", model="m", max_tokens=10 + ) + assert out == "polished" + + +# --------------------------------------------------------------------------- +# Accumulator + reset +# --------------------------------------------------------------------------- + + +class TestAccumulator: + def test_records_accumulate_across_calls(self) -> None: + polish._record_cache_usage(1000, 0, "m") # creation-only (cold) + polish._record_cache_usage(100, 900, "m") # mostly read (warm) + + stats = polish_cache_stats() + assert stats.calls == 2 + assert stats.creation_tokens == 1100 + assert stats.read_tokens == 900 + assert stats.total_tokens == 2000 + assert stats.hit_rate == pytest.approx(0.45) + + def test_reset_zeroes_counters(self) -> None: + polish._record_cache_usage(100, 100, "m") + reset_polish_cache_telemetry() + stats = polish_cache_stats() + assert (stats.calls, stats.creation_tokens, stats.read_tokens) == (0, 0, 0) + + +# --------------------------------------------------------------------------- +# End-of-run summary + threshold warning +# --------------------------------------------------------------------------- + + +class TestSummary: + def test_summary_none_when_polish_never_ran(self) -> None: + assert format_polish_cache_summary() is None + + def test_summary_present_after_calls(self) -> None: + polish._record_cache_usage(180, 1241, "m") + summary = format_polish_cache_summary() + assert summary is not None + assert "Polish cache hit" in summary + assert "WARNING" not in summary # 87% is healthy + + def test_low_hit_rate_appends_warning(self) -> None: + # 100 read / 1000 total == 10% — below the 50% threshold. + polish._record_cache_usage(900, 100, "m") + summary = format_polish_cache_summary() + assert summary is not None + assert "WARNING" in summary + assert "below 50%" in summary + + def test_zero_token_run_warns_nothing(self) -> None: + """A run with calls but no cacheable tokens reports the + graceful line and no spurious threshold warning.""" + polish._record_cache_usage(0, 0, "m") + summary = format_polish_cache_summary() + assert summary is not None + assert "no cacheable tokens" in summary.lower() + assert "WARNING" not in summary From b9edfc43bf3db3ac7b317a1e82db6ae64d338dc7 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Sat, 6 Jun 2026 21:13:25 -0400 Subject: [PATCH 4/7] chore(help): regenerate polish/staleness templates after cache-metrics change Auto-regenerated by the pre-commit help-freshness hook following the polish prompt-cache telemetry work (d4af5a3). Co-Authored-By: Claude Opus 4.8 --- .help/templates/polish/concept.md | 49 +++--- .help/templates/polish/reference.md | 43 +++--- .help/templates/polish/task.md | 66 ++++---- .../staleness-and-maintenance/concept.md | 44 +++--- .../staleness-and-maintenance/reference.md | 60 ++++---- .../staleness-and-maintenance/task.md | 143 ++++++------------ 6 files changed, 173 insertions(+), 232 deletions(-) diff --git a/.help/templates/polish/concept.md b/.help/templates/polish/concept.md index cdd421e..a153b62 100644 --- a/.help/templates/polish/concept.md +++ b/.help/templates/polish/concept.md @@ -1,47 +1,36 @@ --- -type: concept feature: polish depth: concept -generated_at: 2026-04-26T19:46:57.016050+00:00 -source_hash: c3c5a14decb406edb1b2d8ca09a6adb5d3bf68908f60cdaf9a9ea6ba0df1471d +generated_at: 2026-06-06T23:19:48.555770+00:00 +source_hash: 79da77a01c4b4a11716e33f5673ee64882fe6354c51b6cf999aee80d9dbe4b7e status: generated --- # Polish -## What +## How it works -Polish is an LLM-powered editing pass that transforms auto-generated help templates into clear, readable documentation that follows Google's style guide. +Improve generated template quality with an LLM rewrite pass that uses per-type system prompts and source-grounded summaries +. -When the help system generates a template from source code, it produces functional but mechanical content. The polish pass rewrites this draft using template-type-specific prompts and source code summaries to ensure the output reads naturally while staying technically accurate. +The main building blocks are: -## Why +- **`PolishError`** — Raised when the polish pass fails in strict mode. +- **`PolishCacheStats`** — Aggregate prompt-cache token usage across polish calls. -Raw generated templates suffer from three quality problems: +Under the hood, this feature spans 2 source +files covering: -1. **Formulaic language** — Phrases like "manages core functionality" and "provides key capabilities" appear in every draft regardless of what the code actually does -2. **Poor structure** — Auto-generated sections follow a rigid pattern that doesn't adapt to the specific content being documented -3. **Missing context** — The generator knows what functions exist but not why they matter or how they fit together +- Per-type system prompts and anti-patterns for the polish pass. -Polish addresses these by applying human writing standards through AI, producing templates that read as if written by a technical writer who understands both the code and the audience. +## What connects to it -## Core components +This feature relates to: polish, llm, anthropic, quality. -The polish system has three main parts: +Other parts of the codebase interact with +polish through these interfaces: -**Template-specific prompts** — Each of the 11 template types gets its own system prompt with targeted guidance. Concept templates focus on mental models and noun-phrase headings. Task templates emphasize step-by-step clarity. Reference templates prioritize completeness and lookup efficiency. - -**Source summaries** — Rather than sending raw code to the LLM, polish builds concise summaries highlighting public classes, functions, module purposes, and key constants. This keeps the context focused and prevents hallucination. - -**Error handling** — The `PolishError` exception captures polish failures in strict mode, allowing the system to fall back to unpolished content rather than blocking generation entirely. - -## Quality safeguards - -Polish operates under strict constraints to prevent content drift: - -- Preserves YAML frontmatter exactly as generated -- Maintains the h1 title and section structure intent -- Uses only information present in the source summary -- Returns pure markdown with no additional commentary - -The `STRICT_ENV_VAR` setting controls whether polish failures stop generation or allow fallback to draft content. +| Interface | Purpose | File | +|-----------|---------|------| +| `PolishError` | Raised when the polish pass fails in strict mode. | `src/attune_author/polish.py` | +| `PolishCacheStats` | Aggregate prompt-cache token usage across polish calls. | `src/attune_author/polish.py` | diff --git a/.help/templates/polish/reference.md b/.help/templates/polish/reference.md index c308163..4f273af 100644 --- a/.help/templates/polish/reference.md +++ b/.help/templates/polish/reference.md @@ -1,43 +1,38 @@ --- -type: reference feature: polish depth: reference -generated_at: 2026-04-26T19:47:19.826246+00:00 -source_hash: c3c5a14decb406edb1b2d8ca09a6adb5d3bf68908f60cdaf9a9ea6ba0df1471d +generated_at: 2026-06-06T23:19:48.567369+00:00 +source_hash: 79da77a01c4b4a11716e33f5673ee64882fe6354c51b6cf999aee80d9dbe4b7e status: generated --- # Polish reference -Polish generated help templates using an LLM to improve readability, structure, and adherence to Google's developer documentation style guide. - ## Classes -| Class | Description | -|-------|-------------| -| `PolishError` | Raised when the polish pass fails in strict mode | +| Class | Description | File | +|-------|-------------|------| +| `PolishError` | Raised when the polish pass fails in strict mode. | `src/attune_author/polish.py` | +| `PolishCacheStats` | Aggregate prompt-cache token usage across polish calls. | `src/attune_author/polish.py` | ## Functions -| Function | Parameters | Returns | Description | -|----------|------------|---------|-------------| -| `polish_template()` | `content: str, feature_name: str, source_summary: str, template_type: str = "generic", strict: bool \| None = None, augmented_context: str \| None = None` | `str` | Polish a generated template using an LLM | -| `build_source_summary()` | `public_classes: list[dict[str, str]], public_functions: list[dict[str, str]], module_docstrings: list[str], file_count: int, function_signatures: list[dict[str, str]] \| None = None, class_signatures: list[dict[str, str]] \| None = None, module_constants: list[dict[str, object]] \| None = None` | `str` | Build a concise source summary for the polish prompt | -| `get_system_prompt()` | `template_type: str` | `str` | Build the system prompt for a given template kind | - -### Raises +| Function | Description | File | +|----------|-------------|------| +| `clear_cache()` | Delete every entry in the polish cache directory. | `src/attune_author/polish.py` | +| `polish_template()` | Polish a generated template using an LLM. | `src/attune_author/polish.py` | +| `build_polish_prompt()` | Build the (system_prompt, user_message) pair for a polish call. | `src/attune_author/polish.py` | +| `reset_polish_cache_telemetry()` | Reset the per-process prompt-cache telemetry counters. | `src/attune_author/polish.py` | +| `polish_cache_stats()` | Snapshot the current per-process prompt-cache aggregate. | `src/attune_author/polish.py` | +| `format_polish_cache_summary()` | End-of-run summary line, or ``None`` if polish never ran. | `src/attune_author/polish.py` | +| `build_source_summary()` | Build a concise source summary for the polish prompt. | `src/attune_author/polish.py` | +| `get_system_prompt()` | Build the system prompt for a given template kind. | `src/attune_author/polish_prompts.py` | -| Function | Exception | Message | -|----------|-----------|---------| -| `polish_template()` | `PolishError` | `'Polish pass failed for {...} (type={...}): {...}'` | -## Constants +## Source files -| Constant | Values | Description | -|----------|--------|-------------| -| `STRICT_ENV_VAR` | `'ATTUNE_AUTHOR_STRICT_POLISH'` | Environment variable name for enabling strict mode | -| `_FALSY` | `{'0', 'false', 'no', 'off'}` | String values that disable strict mode | -| `_BASE_RULES` | System prompt base rules | Core polishing instructions applied to all template types | +- `src/attune_author/polish.py` +- `src/attune_author/polish_prompts.py` ## Tags diff --git a/.help/templates/polish/task.md b/.help/templates/polish/task.md index 5521a83..d5f334a 100644 --- a/.help/templates/polish/task.md +++ b/.help/templates/polish/task.md @@ -1,47 +1,59 @@ --- -type: task feature: polish depth: task -generated_at: 2026-04-26T19:47:10.811578+00:00 -source_hash: c3c5a14decb406edb1b2d8ca09a6adb5d3bf68908f60cdaf9a9ea6ba0df1471d +generated_at: 2026-06-06T23:19:48.562428+00:00 +source_hash: 79da77a01c4b4a11716e33f5673ee64882fe6354c51b6cf999aee80d9dbe4b7e status: generated --- # Work with polish -Use the polish module when you need to improve auto-generated template quality through LLM rewriting that applies type-specific style rules and source-grounded accuracy checks. +Use polish when you need to improve generated template quality with an llm rewrite pass that uses per-type system prompts and source-grounded summaries +. ## Prerequisites - Access to the project source code -- Understanding of template types and their style conventions -- Familiarity with the polish module structure +- Familiarity with the files under src/attune_author/polish.py ## Steps -1. **Identify the polish function you need to modify.** - The module separates concerns into three main functions: - - `polish_template()` — Orchestrates the LLM rewriting process - - `build_source_summary()` — Creates concise source descriptions for prompt context - - `get_system_prompt()` — Retrieves type-specific style rules +1. **Understand the current behavior.** + Read the entry points to see what polish + does today before making changes. + The primary functions are: + - `clear_cache()` in `src/attune_author/polish.py` — Delete every entry in the polish cache directory. + - `polish_template()` in `src/attune_author/polish.py` — Polish a generated template using an LLM. + - `build_polish_prompt()` in `src/attune_author/polish.py` — Build the (system_prompt, user_message) pair for a polish call. + - `reset_polish_cache_telemetry()` in `src/attune_author/polish.py` — Reset the per-process prompt-cache telemetry counters. + - `polish_cache_stats()` in `src/attune_author/polish.py` — Snapshot the current per-process prompt-cache aggregate. +2. **Locate the right function to change.** + Each function has a single responsibility. Read its + docstring, parameters, and return type to confirm it + owns the behavior you need to modify. + +3. **Make your change.** + Follow existing patterns in the file — naming + conventions, error handling style, and logging. + +4. **Run the related tests.** + This catches regressions before they reach other + developers. Target with `pytest -k "polish"`. -2. **Review the function's current implementation.** - Read the docstring, parameter types, and return values to understand the function's scope and constraints. - -3. **Implement your changes following the module patterns.** - Maintain the existing error handling style, use the `PolishError` for polish failures, and preserve the source-grounded accuracy approach. - -4. **Test your changes with the polish test suite.** - Run `pytest -k "polish"` to verify your modifications don't break existing functionality. +## Key files -## Verify success +- `src/attune_author/polish.py` +- `src/attune_author/polish_prompts.py` -Your changes work correctly when: -- The polish test suite passes without errors -- Generated templates maintain their factual accuracy while improving in readability -- Type-specific style rules are properly applied based on the template kind +## Common modifications -## Key files +Functions you are most likely to modify: -- `src/attune_author/polish.py` — Core polish functions -- `src/attune_author/polish_prompts.py` — Type-specific system prompts +- `clear_cache()` in `src/attune_author/polish.py` +- `polish_template()` in `src/attune_author/polish.py` +- `build_polish_prompt()` in `src/attune_author/polish.py` +- `reset_polish_cache_telemetry()` in `src/attune_author/polish.py` +- `polish_cache_stats()` in `src/attune_author/polish.py` +- `format_polish_cache_summary()` in `src/attune_author/polish.py` +- `build_source_summary()` in `src/attune_author/polish.py` +- `get_system_prompt()` in `src/attune_author/polish_prompts.py` diff --git a/.help/templates/staleness-and-maintenance/concept.md b/.help/templates/staleness-and-maintenance/concept.md index b0e569e..490f03c 100644 --- a/.help/templates/staleness-and-maintenance/concept.md +++ b/.help/templates/staleness-and-maintenance/concept.md @@ -1,9 +1,8 @@ --- -type: concept feature: staleness-and-maintenance depth: concept -generated_at: 2026-04-26T19:47:57.095143+00:00 -source_hash: 196e1038a7194fe466fe8c96559cc4197bb18833f5afc123452ec132dd9007b6 +generated_at: 2026-06-06T23:19:48.572962+00:00 +source_hash: a32e9d9904602f0f282f0bf02f119e350efd6c8b4ecb73c04564917b6ae65f69 status: generated --- @@ -11,30 +10,31 @@ status: generated ## How it works -Staleness detection identifies when generated help templates are out of sync with their source code. When you modify functions, classes, or files that generated templates reference, those templates become stale and need regeneration to reflect current behavior. +Detect when generated templates are out of date with their source files and regenerate stale ones +. -The system tracks this through source hashes — cryptographic fingerprints of the code that generated each template. When source files change, their hashes change, marking dependent templates as stale. +The main building blocks are: -## Core components +- **`FeatureStaleness`** — Staleness status for one feature's ``.help/`` templates. +- **`DocStaleness`** — Staleness status for one project doc file in ``docs/``. +- **`StalenessReport`** — Combined staleness report across help templates and project docs. +- **`MaintenanceResult`** — Result of a help maintenance run. -**MaintenanceResult** captures what happened during a maintenance run. It tracks which features were stale, which got regenerated successfully, which were skipped because they require manual updates, and which failed during regeneration. +Under the hood, this feature spans 2 source +files covering: -**Staleness detection** compares current source hashes against the hashes stored in template frontmatter. Templates with mismatched hashes are marked stale and queued for regeneration. +- Help maintenance logic for commit hooks and manual refresh. -**Automated maintenance** runs either manually through `run_maintenance()` or automatically via the post-commit hook. The hook examines recent git changes and regenerates only templates affected by those changes. +## What connects to it -## When staleness matters +This feature relates to: freshness, hashing, regeneration. -Templates become stale in three scenarios: +Other parts of the codebase interact with +staleness and maintenance through these interfaces: -1. **Function signatures change** — adding parameters, changing return types, or modifying docstrings -2. **Class structure evolves** — new methods, field additions, or inheritance changes -3. **Module organization shifts** — moving files, renaming modules, or changing import paths - -The maintenance system prevents documentation drift by catching these changes before templates mislead users. - -## Hook integration - -The post-commit hook automatically runs maintenance after each commit. It examines `get_changed_files()` to identify what changed, then regenerates only the templates that depend on those files. This keeps help content fresh without manual intervention. - -For manual maintenance, `run_maintenance()` can target specific features or scan the entire help directory. The `dry_run` option shows what would be regenerated without making changes. +| Interface | Purpose | File | +|-----------|---------|------| +| `FeatureStaleness` | Staleness status for one feature's ``.help/`` templates. | `src/attune_author/staleness.py` | +| `DocStaleness` | Staleness status for one project doc file in ``docs/``. | `src/attune_author/staleness.py` | +| `StalenessReport` | Combined staleness report across help templates and project docs. | `src/attune_author/staleness.py` | +| `MaintenanceResult` | Result of a help maintenance run. | `src/attune_author/maintenance.py` | diff --git a/.help/templates/staleness-and-maintenance/reference.md b/.help/templates/staleness-and-maintenance/reference.md index 0c64f9a..a36be94 100644 --- a/.help/templates/staleness-and-maintenance/reference.md +++ b/.help/templates/staleness-and-maintenance/reference.md @@ -1,49 +1,43 @@ --- -type: reference feature: staleness-and-maintenance depth: reference -generated_at: 2026-04-26T19:48:20.345159+00:00 -source_hash: 196e1038a7194fe466fe8c96559cc4197bb18833f5afc123452ec132dd9007b6 +generated_at: 2026-06-06T23:19:48.582350+00:00 +source_hash: a32e9d9904602f0f282f0bf02f119e350efd6c8b4ecb73c04564917b6ae65f69 status: generated --- -# Staleness and maintenance reference - -Detect outdated help templates and regenerate them automatically. Check which templates need updating based on source code changes and run maintenance operations to keep documentation fresh. +# Staleness And Maintenance reference ## Classes -| Class | Description | -|-------|-------------| -| `MaintenanceResult` | Result of a help maintenance run | - -### MaintenanceResult fields +| Class | Description | File | +|-------|-------------|------| +| `FeatureStaleness` | Staleness status for one feature's ``.help/`` templates. | `src/attune_author/staleness.py` | +| `DocStaleness` | Staleness status for one project doc file in ``docs/``. | `src/attune_author/staleness.py` | +| `StalenessReport` | Combined staleness report across help templates and project docs. | `src/attune_author/staleness.py` | +| `MaintenanceResult` | Result of a help maintenance run. | `src/attune_author/maintenance.py` | -| Field | Type | Default | -|-------|------|---------| -| `staleness` | `StalenessReport` | | -| `regenerated` | `list[GenerationResult]` | `field(default_factory=list)` | -| `skipped_manual` | `list[str]` | `field(default_factory=list)` | -| `failed` | `list[str]` | `field(default_factory=list)` | +## Functions -### MaintenanceResult properties +| Function | Description | File | +|----------|-------------|------| +| `compute_semantic_hash()` | Compute a semantic SHA-256 hash of a feature's Python source files. | `src/attune_author/staleness.py` | +| `compute_source_hash()` | Compute SHA-256 hash of a feature's source files. | `src/attune_author/staleness.py` | +| `parse_doc_footer()` | Parse an attune-generated HTML comment footer. | `src/attune_author/staleness.py` | +| `build_doc_footer()` | Build an attune-generated HTML comment footer line. | `src/attune_author/staleness.py` | +| `check_staleness()` | Check staleness across help templates and project docs. | `src/attune_author/staleness.py` | +| `check_workspace_staleness()` | Check staleness for a workspace using the conventional ``.help/`` layout. | `src/attune_author/staleness.py` | +| `run_maintenance()` | Run help maintenance — check staleness and regenerate. | `src/attune_author/maintenance.py` | +| `get_changed_files()` | Get files changed in the most recent commit. | `src/attune_author/maintenance.py` | +| `run_hook()` | Post-commit hook entry point. | `src/attune_author/maintenance.py` | +| `format_status_report()` | Format a staleness report for display. | `src/attune_author/maintenance.py` | -| Property | Type | Description | -|----------|------|-------------| -| `stale_count` | `int` | Number of stale features detected | -| `regenerated_count` | `int` | Number of features regenerated | -## Functions +## Source files -| Function | Parameters | Returns | Description | -|----------|------------|---------|-------------| -| `run_maintenance` | `help_dir: str \| Path, project_root: str \| Path, features: list[str] \| None = None, dry_run: bool = False` | `MaintenanceResult` | Run help maintenance — check staleness and regenerate | -| `get_changed_files` | `project_root: str \| Path` | `list[str]` | Get files changed in the most recent commit | -| `run_hook` | `help_dir: str \| Path, project_root: str \| Path` | `MaintenanceResult \| None` | Post-commit hook entry point | -| `format_status_report` | `report: StalenessReport, help_dir: str \| Path \| None = None` | `str` | Format a staleness report for display | +- `src/attune_author/staleness.py` +- `src/attune_author/maintenance.py` -## Constants +## Tags -| Constant | Values | -|----------|--------| -| `__all__` | `['DocStaleness', 'FeatureStaleness', 'StalenessReport', '_read_frontmatter_value', 'build_doc_footer', 'check_staleness', 'compute_source_hash', 'parse_doc_footer']` | +`freshness`, `hashing`, `regeneration` diff --git a/.help/templates/staleness-and-maintenance/task.md b/.help/templates/staleness-and-maintenance/task.md index 7eb79cd..8a0a931 100644 --- a/.help/templates/staleness-and-maintenance/task.md +++ b/.help/templates/staleness-and-maintenance/task.md @@ -1,108 +1,59 @@ --- -type: task feature: staleness-and-maintenance depth: task -generated_at: 2026-04-26T19:48:08.669237+00:00 -source_hash: 196e1038a7194fe466fe8c96559cc4197bb18833f5afc123452ec132dd9007b6 +generated_at: 2026-06-06T23:19:48.577723+00:00 +source_hash: a32e9d9904602f0f282f0bf02f119e350efd6c8b4ecb73c04564917b6ae65f69 status: generated --- # Work with staleness and maintenance -Use staleness and maintenance when you need to detect outdated generated templates and regenerate them automatically to keep your help system current with source code changes. +Use staleness and maintenance when you need to detect when generated templates are out of date with their source files and regenerate stale ones +. ## Prerequisites - Access to the project source code -- Understanding of how templates are generated from source files - -## Check for stale templates - -1. **Import the maintenance module** - ```python - from attune_author.maintenance import run_maintenance - ``` - -2. **Run staleness detection** - ```python - result = run_maintenance( - help_dir="docs/help", - project_root=".", - dry_run=True # Check only, don't regenerate - ) - ``` - -3. **Review the staleness report** - ```python - print(f"Found {result.stale_count} stale templates") - print(result.staleness) # Detailed report - ``` - -## Regenerate outdated templates - -1. **Run maintenance with regeneration enabled** - ```python - result = run_maintenance( - help_dir="docs/help", - project_root=".", - dry_run=False # Actually regenerate - ) - ``` - -2. **Target specific features** (optional) - ```python - result = run_maintenance( - help_dir="docs/help", - project_root=".", - features=["authentication", "error-handling"] - ) - ``` - -## Set up automatic maintenance - -1. **Configure the post-commit hook** - ```python - from attune_author.maintenance import run_hook - - # In your .git/hooks/post-commit script - result = run_hook( - help_dir="docs/help", - project_root="." - ) - ``` - -2. **Handle hook results** - ```python - if result and result.stale_count > 0: - print(f"Regenerated {result.regenerated_count} templates") - ``` - -## Format status reports - -1. **Generate a readable status report** - ```python - from attune_author.maintenance import format_status_report - - report = format_status_report( - result.staleness, - help_dir="docs/help" - ) - print(report) - ``` - -2. **Check what files changed recently** - ```python - from attune_author.maintenance import get_changed_files - - changed = get_changed_files(".") - print(f"Changed files: {changed}") - ``` - -## Verification - -You've successfully set up staleness and maintenance when: - -- `run_maintenance()` returns a `MaintenanceResult` with accurate stale counts -- Dry runs identify stale templates without modifying files -- Regeneration updates only the templates that need refreshing -- Status reports clearly show which templates were updated and why +- Familiarity with the files under src/attune_author/staleness.py + +## Steps + +1. **Understand the current behavior.** + Read the entry points to see what staleness and maintenance + does today before making changes. + The primary functions are: + - `compute_semantic_hash()` in `src/attune_author/staleness.py` — Compute a semantic SHA-256 hash of a feature's Python source files. + - `compute_source_hash()` in `src/attune_author/staleness.py` — Compute SHA-256 hash of a feature's source files. + - `parse_doc_footer()` in `src/attune_author/staleness.py` — Parse an attune-generated HTML comment footer. + - `build_doc_footer()` in `src/attune_author/staleness.py` — Build an attune-generated HTML comment footer line. + - `check_staleness()` in `src/attune_author/staleness.py` — Check staleness across help templates and project docs. +2. **Locate the right function to change.** + Each function has a single responsibility. Read its + docstring, parameters, and return type to confirm it + owns the behavior you need to modify. + +3. **Make your change.** + Follow existing patterns in the file — naming + conventions, error handling style, and logging. + +4. **Run the related tests.** + This catches regressions before they reach other + developers. Target with `pytest -k "staleness-and-maintenance"`. + +## Key files + +- `src/attune_author/staleness.py` +- `src/attune_author/maintenance.py` + +## Common modifications + +Functions you are most likely to modify: + +- `compute_semantic_hash()` in `src/attune_author/staleness.py` +- `compute_source_hash()` in `src/attune_author/staleness.py` +- `parse_doc_footer()` in `src/attune_author/staleness.py` +- `build_doc_footer()` in `src/attune_author/staleness.py` +- `check_staleness()` in `src/attune_author/staleness.py` +- `check_workspace_staleness()` in `src/attune_author/staleness.py` +- `run_maintenance()` in `src/attune_author/maintenance.py` +- `get_changed_files()` in `src/attune_author/maintenance.py` From c3d46eda03085db2faf70b94e8a9dfacb7bac6ff Mon Sep 17 00:00:00 2001 From: GeneAI Date: Sat, 6 Jun 2026 21:13:26 -0400 Subject: [PATCH 5/7] chore(specs): archive completed/superseded specs Move terminal specs into docs/specs/archive/ so they stop inflating the active count: polish-fact-check (v0.14.0), polish-cache-hit-metrics (done), regen-staleness-hash-mismatch (#48/0.14.2), regen-pipeline (superseded). skill-export-evangelism kept active (open). Co-Authored-By: Claude Opus 4.8 --- docs/specs/{ => archive}/polish-cache-hit-metrics/decisions.md | 0 docs/specs/{ => archive}/polish-cache-hit-metrics/tasks.md | 0 docs/specs/{ => archive}/polish-fact-check/decisions.md | 0 docs/specs/{ => archive}/polish-fact-check/design.md | 0 docs/specs/{ => archive}/polish-fact-check/requirements.md | 0 docs/specs/{ => archive}/polish-fact-check/tasks.md | 0 docs/specs/{ => archive}/regen-pipeline/design.md | 0 docs/specs/{ => archive}/regen-pipeline/requirements.md | 0 docs/specs/{ => archive}/regen-pipeline/tasks.md | 0 .../{ => archive}/regen-staleness-hash-mismatch/decisions.md | 0 10 files changed, 0 insertions(+), 0 deletions(-) rename docs/specs/{ => archive}/polish-cache-hit-metrics/decisions.md (100%) rename docs/specs/{ => archive}/polish-cache-hit-metrics/tasks.md (100%) rename docs/specs/{ => archive}/polish-fact-check/decisions.md (100%) rename docs/specs/{ => archive}/polish-fact-check/design.md (100%) rename docs/specs/{ => archive}/polish-fact-check/requirements.md (100%) rename docs/specs/{ => archive}/polish-fact-check/tasks.md (100%) rename docs/specs/{ => archive}/regen-pipeline/design.md (100%) rename docs/specs/{ => archive}/regen-pipeline/requirements.md (100%) rename docs/specs/{ => archive}/regen-pipeline/tasks.md (100%) rename docs/specs/{ => archive}/regen-staleness-hash-mismatch/decisions.md (100%) diff --git a/docs/specs/polish-cache-hit-metrics/decisions.md b/docs/specs/archive/polish-cache-hit-metrics/decisions.md similarity index 100% rename from docs/specs/polish-cache-hit-metrics/decisions.md rename to docs/specs/archive/polish-cache-hit-metrics/decisions.md diff --git a/docs/specs/polish-cache-hit-metrics/tasks.md b/docs/specs/archive/polish-cache-hit-metrics/tasks.md similarity index 100% rename from docs/specs/polish-cache-hit-metrics/tasks.md rename to docs/specs/archive/polish-cache-hit-metrics/tasks.md diff --git a/docs/specs/polish-fact-check/decisions.md b/docs/specs/archive/polish-fact-check/decisions.md similarity index 100% rename from docs/specs/polish-fact-check/decisions.md rename to docs/specs/archive/polish-fact-check/decisions.md diff --git a/docs/specs/polish-fact-check/design.md b/docs/specs/archive/polish-fact-check/design.md similarity index 100% rename from docs/specs/polish-fact-check/design.md rename to docs/specs/archive/polish-fact-check/design.md diff --git a/docs/specs/polish-fact-check/requirements.md b/docs/specs/archive/polish-fact-check/requirements.md similarity index 100% rename from docs/specs/polish-fact-check/requirements.md rename to docs/specs/archive/polish-fact-check/requirements.md diff --git a/docs/specs/polish-fact-check/tasks.md b/docs/specs/archive/polish-fact-check/tasks.md similarity index 100% rename from docs/specs/polish-fact-check/tasks.md rename to docs/specs/archive/polish-fact-check/tasks.md diff --git a/docs/specs/regen-pipeline/design.md b/docs/specs/archive/regen-pipeline/design.md similarity index 100% rename from docs/specs/regen-pipeline/design.md rename to docs/specs/archive/regen-pipeline/design.md diff --git a/docs/specs/regen-pipeline/requirements.md b/docs/specs/archive/regen-pipeline/requirements.md similarity index 100% rename from docs/specs/regen-pipeline/requirements.md rename to docs/specs/archive/regen-pipeline/requirements.md diff --git a/docs/specs/regen-pipeline/tasks.md b/docs/specs/archive/regen-pipeline/tasks.md similarity index 100% rename from docs/specs/regen-pipeline/tasks.md rename to docs/specs/archive/regen-pipeline/tasks.md diff --git a/docs/specs/regen-staleness-hash-mismatch/decisions.md b/docs/specs/archive/regen-staleness-hash-mismatch/decisions.md similarity index 100% rename from docs/specs/regen-staleness-hash-mismatch/decisions.md rename to docs/specs/archive/regen-staleness-hash-mismatch/decisions.md From fd51e29f979182a6cce5a33ea0e3e0e298958414 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Sat, 6 Jun 2026 21:26:02 -0400 Subject: [PATCH 6/7] Revert "chore(help): regenerate polish/staleness templates after cache-metrics change" This reverts commit b9edfc43bf3db3ac7b317a1e82db6ae64d338dc7. --- .help/templates/polish/concept.md | 49 +++--- .help/templates/polish/reference.md | 43 +++--- .help/templates/polish/task.md | 66 ++++---- .../staleness-and-maintenance/concept.md | 44 +++--- .../staleness-and-maintenance/reference.md | 60 ++++---- .../staleness-and-maintenance/task.md | 143 ++++++++++++------ 6 files changed, 232 insertions(+), 173 deletions(-) diff --git a/.help/templates/polish/concept.md b/.help/templates/polish/concept.md index a153b62..cdd421e 100644 --- a/.help/templates/polish/concept.md +++ b/.help/templates/polish/concept.md @@ -1,36 +1,47 @@ --- +type: concept feature: polish depth: concept -generated_at: 2026-06-06T23:19:48.555770+00:00 -source_hash: 79da77a01c4b4a11716e33f5673ee64882fe6354c51b6cf999aee80d9dbe4b7e +generated_at: 2026-04-26T19:46:57.016050+00:00 +source_hash: c3c5a14decb406edb1b2d8ca09a6adb5d3bf68908f60cdaf9a9ea6ba0df1471d status: generated --- # Polish -## How it works +## What -Improve generated template quality with an LLM rewrite pass that uses per-type system prompts and source-grounded summaries -. +Polish is an LLM-powered editing pass that transforms auto-generated help templates into clear, readable documentation that follows Google's style guide. -The main building blocks are: +When the help system generates a template from source code, it produces functional but mechanical content. The polish pass rewrites this draft using template-type-specific prompts and source code summaries to ensure the output reads naturally while staying technically accurate. -- **`PolishError`** — Raised when the polish pass fails in strict mode. -- **`PolishCacheStats`** — Aggregate prompt-cache token usage across polish calls. +## Why -Under the hood, this feature spans 2 source -files covering: +Raw generated templates suffer from three quality problems: -- Per-type system prompts and anti-patterns for the polish pass. +1. **Formulaic language** — Phrases like "manages core functionality" and "provides key capabilities" appear in every draft regardless of what the code actually does +2. **Poor structure** — Auto-generated sections follow a rigid pattern that doesn't adapt to the specific content being documented +3. **Missing context** — The generator knows what functions exist but not why they matter or how they fit together -## What connects to it +Polish addresses these by applying human writing standards through AI, producing templates that read as if written by a technical writer who understands both the code and the audience. -This feature relates to: polish, llm, anthropic, quality. +## Core components -Other parts of the codebase interact with -polish through these interfaces: +The polish system has three main parts: -| Interface | Purpose | File | -|-----------|---------|------| -| `PolishError` | Raised when the polish pass fails in strict mode. | `src/attune_author/polish.py` | -| `PolishCacheStats` | Aggregate prompt-cache token usage across polish calls. | `src/attune_author/polish.py` | +**Template-specific prompts** — Each of the 11 template types gets its own system prompt with targeted guidance. Concept templates focus on mental models and noun-phrase headings. Task templates emphasize step-by-step clarity. Reference templates prioritize completeness and lookup efficiency. + +**Source summaries** — Rather than sending raw code to the LLM, polish builds concise summaries highlighting public classes, functions, module purposes, and key constants. This keeps the context focused and prevents hallucination. + +**Error handling** — The `PolishError` exception captures polish failures in strict mode, allowing the system to fall back to unpolished content rather than blocking generation entirely. + +## Quality safeguards + +Polish operates under strict constraints to prevent content drift: + +- Preserves YAML frontmatter exactly as generated +- Maintains the h1 title and section structure intent +- Uses only information present in the source summary +- Returns pure markdown with no additional commentary + +The `STRICT_ENV_VAR` setting controls whether polish failures stop generation or allow fallback to draft content. diff --git a/.help/templates/polish/reference.md b/.help/templates/polish/reference.md index 4f273af..c308163 100644 --- a/.help/templates/polish/reference.md +++ b/.help/templates/polish/reference.md @@ -1,38 +1,43 @@ --- +type: reference feature: polish depth: reference -generated_at: 2026-06-06T23:19:48.567369+00:00 -source_hash: 79da77a01c4b4a11716e33f5673ee64882fe6354c51b6cf999aee80d9dbe4b7e +generated_at: 2026-04-26T19:47:19.826246+00:00 +source_hash: c3c5a14decb406edb1b2d8ca09a6adb5d3bf68908f60cdaf9a9ea6ba0df1471d status: generated --- # Polish reference +Polish generated help templates using an LLM to improve readability, structure, and adherence to Google's developer documentation style guide. + ## Classes -| Class | Description | File | -|-------|-------------|------| -| `PolishError` | Raised when the polish pass fails in strict mode. | `src/attune_author/polish.py` | -| `PolishCacheStats` | Aggregate prompt-cache token usage across polish calls. | `src/attune_author/polish.py` | +| Class | Description | +|-------|-------------| +| `PolishError` | Raised when the polish pass fails in strict mode | ## Functions -| Function | Description | File | -|----------|-------------|------| -| `clear_cache()` | Delete every entry in the polish cache directory. | `src/attune_author/polish.py` | -| `polish_template()` | Polish a generated template using an LLM. | `src/attune_author/polish.py` | -| `build_polish_prompt()` | Build the (system_prompt, user_message) pair for a polish call. | `src/attune_author/polish.py` | -| `reset_polish_cache_telemetry()` | Reset the per-process prompt-cache telemetry counters. | `src/attune_author/polish.py` | -| `polish_cache_stats()` | Snapshot the current per-process prompt-cache aggregate. | `src/attune_author/polish.py` | -| `format_polish_cache_summary()` | End-of-run summary line, or ``None`` if polish never ran. | `src/attune_author/polish.py` | -| `build_source_summary()` | Build a concise source summary for the polish prompt. | `src/attune_author/polish.py` | -| `get_system_prompt()` | Build the system prompt for a given template kind. | `src/attune_author/polish_prompts.py` | +| Function | Parameters | Returns | Description | +|----------|------------|---------|-------------| +| `polish_template()` | `content: str, feature_name: str, source_summary: str, template_type: str = "generic", strict: bool \| None = None, augmented_context: str \| None = None` | `str` | Polish a generated template using an LLM | +| `build_source_summary()` | `public_classes: list[dict[str, str]], public_functions: list[dict[str, str]], module_docstrings: list[str], file_count: int, function_signatures: list[dict[str, str]] \| None = None, class_signatures: list[dict[str, str]] \| None = None, module_constants: list[dict[str, object]] \| None = None` | `str` | Build a concise source summary for the polish prompt | +| `get_system_prompt()` | `template_type: str` | `str` | Build the system prompt for a given template kind | + +### Raises +| Function | Exception | Message | +|----------|-----------|---------| +| `polish_template()` | `PolishError` | `'Polish pass failed for {...} (type={...}): {...}'` | -## Source files +## Constants -- `src/attune_author/polish.py` -- `src/attune_author/polish_prompts.py` +| Constant | Values | Description | +|----------|--------|-------------| +| `STRICT_ENV_VAR` | `'ATTUNE_AUTHOR_STRICT_POLISH'` | Environment variable name for enabling strict mode | +| `_FALSY` | `{'0', 'false', 'no', 'off'}` | String values that disable strict mode | +| `_BASE_RULES` | System prompt base rules | Core polishing instructions applied to all template types | ## Tags diff --git a/.help/templates/polish/task.md b/.help/templates/polish/task.md index d5f334a..5521a83 100644 --- a/.help/templates/polish/task.md +++ b/.help/templates/polish/task.md @@ -1,59 +1,47 @@ --- +type: task feature: polish depth: task -generated_at: 2026-06-06T23:19:48.562428+00:00 -source_hash: 79da77a01c4b4a11716e33f5673ee64882fe6354c51b6cf999aee80d9dbe4b7e +generated_at: 2026-04-26T19:47:10.811578+00:00 +source_hash: c3c5a14decb406edb1b2d8ca09a6adb5d3bf68908f60cdaf9a9ea6ba0df1471d status: generated --- # Work with polish -Use polish when you need to improve generated template quality with an llm rewrite pass that uses per-type system prompts and source-grounded summaries -. +Use the polish module when you need to improve auto-generated template quality through LLM rewriting that applies type-specific style rules and source-grounded accuracy checks. ## Prerequisites - Access to the project source code -- Familiarity with the files under src/attune_author/polish.py +- Understanding of template types and their style conventions +- Familiarity with the polish module structure ## Steps -1. **Understand the current behavior.** - Read the entry points to see what polish - does today before making changes. - The primary functions are: - - `clear_cache()` in `src/attune_author/polish.py` — Delete every entry in the polish cache directory. - - `polish_template()` in `src/attune_author/polish.py` — Polish a generated template using an LLM. - - `build_polish_prompt()` in `src/attune_author/polish.py` — Build the (system_prompt, user_message) pair for a polish call. - - `reset_polish_cache_telemetry()` in `src/attune_author/polish.py` — Reset the per-process prompt-cache telemetry counters. - - `polish_cache_stats()` in `src/attune_author/polish.py` — Snapshot the current per-process prompt-cache aggregate. -2. **Locate the right function to change.** - Each function has a single responsibility. Read its - docstring, parameters, and return type to confirm it - owns the behavior you need to modify. - -3. **Make your change.** - Follow existing patterns in the file — naming - conventions, error handling style, and logging. - -4. **Run the related tests.** - This catches regressions before they reach other - developers. Target with `pytest -k "polish"`. +1. **Identify the polish function you need to modify.** + The module separates concerns into three main functions: + - `polish_template()` — Orchestrates the LLM rewriting process + - `build_source_summary()` — Creates concise source descriptions for prompt context + - `get_system_prompt()` — Retrieves type-specific style rules -## Key files +2. **Review the function's current implementation.** + Read the docstring, parameter types, and return values to understand the function's scope and constraints. + +3. **Implement your changes following the module patterns.** + Maintain the existing error handling style, use the `PolishError` for polish failures, and preserve the source-grounded accuracy approach. -- `src/attune_author/polish.py` -- `src/attune_author/polish_prompts.py` +4. **Test your changes with the polish test suite.** + Run `pytest -k "polish"` to verify your modifications don't break existing functionality. -## Common modifications +## Verify success -Functions you are most likely to modify: +Your changes work correctly when: +- The polish test suite passes without errors +- Generated templates maintain their factual accuracy while improving in readability +- Type-specific style rules are properly applied based on the template kind + +## Key files -- `clear_cache()` in `src/attune_author/polish.py` -- `polish_template()` in `src/attune_author/polish.py` -- `build_polish_prompt()` in `src/attune_author/polish.py` -- `reset_polish_cache_telemetry()` in `src/attune_author/polish.py` -- `polish_cache_stats()` in `src/attune_author/polish.py` -- `format_polish_cache_summary()` in `src/attune_author/polish.py` -- `build_source_summary()` in `src/attune_author/polish.py` -- `get_system_prompt()` in `src/attune_author/polish_prompts.py` +- `src/attune_author/polish.py` — Core polish functions +- `src/attune_author/polish_prompts.py` — Type-specific system prompts diff --git a/.help/templates/staleness-and-maintenance/concept.md b/.help/templates/staleness-and-maintenance/concept.md index 490f03c..b0e569e 100644 --- a/.help/templates/staleness-and-maintenance/concept.md +++ b/.help/templates/staleness-and-maintenance/concept.md @@ -1,8 +1,9 @@ --- +type: concept feature: staleness-and-maintenance depth: concept -generated_at: 2026-06-06T23:19:48.572962+00:00 -source_hash: a32e9d9904602f0f282f0bf02f119e350efd6c8b4ecb73c04564917b6ae65f69 +generated_at: 2026-04-26T19:47:57.095143+00:00 +source_hash: 196e1038a7194fe466fe8c96559cc4197bb18833f5afc123452ec132dd9007b6 status: generated --- @@ -10,31 +11,30 @@ status: generated ## How it works -Detect when generated templates are out of date with their source files and regenerate stale ones -. +Staleness detection identifies when generated help templates are out of sync with their source code. When you modify functions, classes, or files that generated templates reference, those templates become stale and need regeneration to reflect current behavior. -The main building blocks are: +The system tracks this through source hashes — cryptographic fingerprints of the code that generated each template. When source files change, their hashes change, marking dependent templates as stale. -- **`FeatureStaleness`** — Staleness status for one feature's ``.help/`` templates. -- **`DocStaleness`** — Staleness status for one project doc file in ``docs/``. -- **`StalenessReport`** — Combined staleness report across help templates and project docs. -- **`MaintenanceResult`** — Result of a help maintenance run. +## Core components -Under the hood, this feature spans 2 source -files covering: +**MaintenanceResult** captures what happened during a maintenance run. It tracks which features were stale, which got regenerated successfully, which were skipped because they require manual updates, and which failed during regeneration. -- Help maintenance logic for commit hooks and manual refresh. +**Staleness detection** compares current source hashes against the hashes stored in template frontmatter. Templates with mismatched hashes are marked stale and queued for regeneration. -## What connects to it +**Automated maintenance** runs either manually through `run_maintenance()` or automatically via the post-commit hook. The hook examines recent git changes and regenerates only templates affected by those changes. -This feature relates to: freshness, hashing, regeneration. +## When staleness matters -Other parts of the codebase interact with -staleness and maintenance through these interfaces: +Templates become stale in three scenarios: -| Interface | Purpose | File | -|-----------|---------|------| -| `FeatureStaleness` | Staleness status for one feature's ``.help/`` templates. | `src/attune_author/staleness.py` | -| `DocStaleness` | Staleness status for one project doc file in ``docs/``. | `src/attune_author/staleness.py` | -| `StalenessReport` | Combined staleness report across help templates and project docs. | `src/attune_author/staleness.py` | -| `MaintenanceResult` | Result of a help maintenance run. | `src/attune_author/maintenance.py` | +1. **Function signatures change** — adding parameters, changing return types, or modifying docstrings +2. **Class structure evolves** — new methods, field additions, or inheritance changes +3. **Module organization shifts** — moving files, renaming modules, or changing import paths + +The maintenance system prevents documentation drift by catching these changes before templates mislead users. + +## Hook integration + +The post-commit hook automatically runs maintenance after each commit. It examines `get_changed_files()` to identify what changed, then regenerates only the templates that depend on those files. This keeps help content fresh without manual intervention. + +For manual maintenance, `run_maintenance()` can target specific features or scan the entire help directory. The `dry_run` option shows what would be regenerated without making changes. diff --git a/.help/templates/staleness-and-maintenance/reference.md b/.help/templates/staleness-and-maintenance/reference.md index a36be94..0c64f9a 100644 --- a/.help/templates/staleness-and-maintenance/reference.md +++ b/.help/templates/staleness-and-maintenance/reference.md @@ -1,43 +1,49 @@ --- +type: reference feature: staleness-and-maintenance depth: reference -generated_at: 2026-06-06T23:19:48.582350+00:00 -source_hash: a32e9d9904602f0f282f0bf02f119e350efd6c8b4ecb73c04564917b6ae65f69 +generated_at: 2026-04-26T19:48:20.345159+00:00 +source_hash: 196e1038a7194fe466fe8c96559cc4197bb18833f5afc123452ec132dd9007b6 status: generated --- -# Staleness And Maintenance reference +# Staleness and maintenance reference + +Detect outdated help templates and regenerate them automatically. Check which templates need updating based on source code changes and run maintenance operations to keep documentation fresh. ## Classes -| Class | Description | File | -|-------|-------------|------| -| `FeatureStaleness` | Staleness status for one feature's ``.help/`` templates. | `src/attune_author/staleness.py` | -| `DocStaleness` | Staleness status for one project doc file in ``docs/``. | `src/attune_author/staleness.py` | -| `StalenessReport` | Combined staleness report across help templates and project docs. | `src/attune_author/staleness.py` | -| `MaintenanceResult` | Result of a help maintenance run. | `src/attune_author/maintenance.py` | +| Class | Description | +|-------|-------------| +| `MaintenanceResult` | Result of a help maintenance run | -## Functions +### MaintenanceResult fields -| Function | Description | File | -|----------|-------------|------| -| `compute_semantic_hash()` | Compute a semantic SHA-256 hash of a feature's Python source files. | `src/attune_author/staleness.py` | -| `compute_source_hash()` | Compute SHA-256 hash of a feature's source files. | `src/attune_author/staleness.py` | -| `parse_doc_footer()` | Parse an attune-generated HTML comment footer. | `src/attune_author/staleness.py` | -| `build_doc_footer()` | Build an attune-generated HTML comment footer line. | `src/attune_author/staleness.py` | -| `check_staleness()` | Check staleness across help templates and project docs. | `src/attune_author/staleness.py` | -| `check_workspace_staleness()` | Check staleness for a workspace using the conventional ``.help/`` layout. | `src/attune_author/staleness.py` | -| `run_maintenance()` | Run help maintenance — check staleness and regenerate. | `src/attune_author/maintenance.py` | -| `get_changed_files()` | Get files changed in the most recent commit. | `src/attune_author/maintenance.py` | -| `run_hook()` | Post-commit hook entry point. | `src/attune_author/maintenance.py` | -| `format_status_report()` | Format a staleness report for display. | `src/attune_author/maintenance.py` | +| Field | Type | Default | +|-------|------|---------| +| `staleness` | `StalenessReport` | | +| `regenerated` | `list[GenerationResult]` | `field(default_factory=list)` | +| `skipped_manual` | `list[str]` | `field(default_factory=list)` | +| `failed` | `list[str]` | `field(default_factory=list)` | +### MaintenanceResult properties -## Source files +| Property | Type | Description | +|----------|------|-------------| +| `stale_count` | `int` | Number of stale features detected | +| `regenerated_count` | `int` | Number of features regenerated | + +## Functions -- `src/attune_author/staleness.py` -- `src/attune_author/maintenance.py` +| Function | Parameters | Returns | Description | +|----------|------------|---------|-------------| +| `run_maintenance` | `help_dir: str \| Path, project_root: str \| Path, features: list[str] \| None = None, dry_run: bool = False` | `MaintenanceResult` | Run help maintenance — check staleness and regenerate | +| `get_changed_files` | `project_root: str \| Path` | `list[str]` | Get files changed in the most recent commit | +| `run_hook` | `help_dir: str \| Path, project_root: str \| Path` | `MaintenanceResult \| None` | Post-commit hook entry point | +| `format_status_report` | `report: StalenessReport, help_dir: str \| Path \| None = None` | `str` | Format a staleness report for display | -## Tags +## Constants -`freshness`, `hashing`, `regeneration` +| Constant | Values | +|----------|--------| +| `__all__` | `['DocStaleness', 'FeatureStaleness', 'StalenessReport', '_read_frontmatter_value', 'build_doc_footer', 'check_staleness', 'compute_source_hash', 'parse_doc_footer']` | diff --git a/.help/templates/staleness-and-maintenance/task.md b/.help/templates/staleness-and-maintenance/task.md index 8a0a931..7eb79cd 100644 --- a/.help/templates/staleness-and-maintenance/task.md +++ b/.help/templates/staleness-and-maintenance/task.md @@ -1,59 +1,108 @@ --- +type: task feature: staleness-and-maintenance depth: task -generated_at: 2026-06-06T23:19:48.577723+00:00 -source_hash: a32e9d9904602f0f282f0bf02f119e350efd6c8b4ecb73c04564917b6ae65f69 +generated_at: 2026-04-26T19:48:08.669237+00:00 +source_hash: 196e1038a7194fe466fe8c96559cc4197bb18833f5afc123452ec132dd9007b6 status: generated --- # Work with staleness and maintenance -Use staleness and maintenance when you need to detect when generated templates are out of date with their source files and regenerate stale ones -. +Use staleness and maintenance when you need to detect outdated generated templates and regenerate them automatically to keep your help system current with source code changes. ## Prerequisites - Access to the project source code -- Familiarity with the files under src/attune_author/staleness.py - -## Steps - -1. **Understand the current behavior.** - Read the entry points to see what staleness and maintenance - does today before making changes. - The primary functions are: - - `compute_semantic_hash()` in `src/attune_author/staleness.py` — Compute a semantic SHA-256 hash of a feature's Python source files. - - `compute_source_hash()` in `src/attune_author/staleness.py` — Compute SHA-256 hash of a feature's source files. - - `parse_doc_footer()` in `src/attune_author/staleness.py` — Parse an attune-generated HTML comment footer. - - `build_doc_footer()` in `src/attune_author/staleness.py` — Build an attune-generated HTML comment footer line. - - `check_staleness()` in `src/attune_author/staleness.py` — Check staleness across help templates and project docs. -2. **Locate the right function to change.** - Each function has a single responsibility. Read its - docstring, parameters, and return type to confirm it - owns the behavior you need to modify. - -3. **Make your change.** - Follow existing patterns in the file — naming - conventions, error handling style, and logging. - -4. **Run the related tests.** - This catches regressions before they reach other - developers. Target with `pytest -k "staleness-and-maintenance"`. - -## Key files - -- `src/attune_author/staleness.py` -- `src/attune_author/maintenance.py` - -## Common modifications - -Functions you are most likely to modify: - -- `compute_semantic_hash()` in `src/attune_author/staleness.py` -- `compute_source_hash()` in `src/attune_author/staleness.py` -- `parse_doc_footer()` in `src/attune_author/staleness.py` -- `build_doc_footer()` in `src/attune_author/staleness.py` -- `check_staleness()` in `src/attune_author/staleness.py` -- `check_workspace_staleness()` in `src/attune_author/staleness.py` -- `run_maintenance()` in `src/attune_author/maintenance.py` -- `get_changed_files()` in `src/attune_author/maintenance.py` +- Understanding of how templates are generated from source files + +## Check for stale templates + +1. **Import the maintenance module** + ```python + from attune_author.maintenance import run_maintenance + ``` + +2. **Run staleness detection** + ```python + result = run_maintenance( + help_dir="docs/help", + project_root=".", + dry_run=True # Check only, don't regenerate + ) + ``` + +3. **Review the staleness report** + ```python + print(f"Found {result.stale_count} stale templates") + print(result.staleness) # Detailed report + ``` + +## Regenerate outdated templates + +1. **Run maintenance with regeneration enabled** + ```python + result = run_maintenance( + help_dir="docs/help", + project_root=".", + dry_run=False # Actually regenerate + ) + ``` + +2. **Target specific features** (optional) + ```python + result = run_maintenance( + help_dir="docs/help", + project_root=".", + features=["authentication", "error-handling"] + ) + ``` + +## Set up automatic maintenance + +1. **Configure the post-commit hook** + ```python + from attune_author.maintenance import run_hook + + # In your .git/hooks/post-commit script + result = run_hook( + help_dir="docs/help", + project_root="." + ) + ``` + +2. **Handle hook results** + ```python + if result and result.stale_count > 0: + print(f"Regenerated {result.regenerated_count} templates") + ``` + +## Format status reports + +1. **Generate a readable status report** + ```python + from attune_author.maintenance import format_status_report + + report = format_status_report( + result.staleness, + help_dir="docs/help" + ) + print(report) + ``` + +2. **Check what files changed recently** + ```python + from attune_author.maintenance import get_changed_files + + changed = get_changed_files(".") + print(f"Changed files: {changed}") + ``` + +## Verification + +You've successfully set up staleness and maintenance when: + +- `run_maintenance()` returns a `MaintenanceResult` with accurate stale counts +- Dry runs identify stale templates without modifying files +- Regeneration updates only the templates that need refreshing +- Status reports clearly show which templates were updated and why From dada5f4da6f103b1e05c9937a39aacd4216de610 Mon Sep 17 00:00:00 2001 From: GeneAI Date: Sat, 6 Jun 2026 21:27:51 -0400 Subject: [PATCH 7/7] test(batch): make batch-state fixture date relative to now The _state() helper hardcoded submitted_at=2026-05-08, which silently expired past the 29-day retention window on 2026-06-06 and broke the status/cancel tests (they read batch state without an injected now=). Default to now-1day so the fixture stays inside the window. Fixes the 3 date-bomb failures in test_maintenance_batch.py (TestStatus/TestCancel). Co-Authored-By: Claude Opus 4.8 --- tests/test_maintenance_batch.py | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/tests/test_maintenance_batch.py b/tests/test_maintenance_batch.py index f846f70..6cfb769 100644 --- a/tests/test_maintenance_batch.py +++ b/tests/test_maintenance_batch.py @@ -19,7 +19,7 @@ from __future__ import annotations import json -from datetime import datetime, timezone +from datetime import datetime, timedelta, timezone from pathlib import Path from types import SimpleNamespace from unittest.mock import MagicMock, patch @@ -53,11 +53,16 @@ def _state(submitted_at: datetime | None = None) -> BatchState: + # Default to "recently submitted" relative to now so the fixture stays + # inside the 29-day retention window for status/cancel paths that read + # without an injected ``now=``. (A hardcoded date silently expires and + # breaks these tests once it ages past the window.) + submitted_at = submitted_at or (datetime.now(timezone.utc) - timedelta(days=1)) return BatchState( schema_version=1, batch_id="msgbatch_test", - submitted_at=submitted_at or datetime(2026, 5, 8, 18, 35, tzinfo=timezone.utc), - expected_completion_at=datetime(2026, 5, 8, 18, 41, tzinfo=timezone.utc), + submitted_at=submitted_at, + expected_completion_at=submitted_at + timedelta(minutes=6), model="claude-sonnet-4-6", requests=( BatchStateRequest("feat__auth__concept", "auth", "concept"),