Skip to content

Commit f32efba

Browse files
release: v0.6.0 — RAG singleton, retry/backoff, parallel polish, on-disk polish cache (#7)
A four-pronged perf and resilience pass on the polish/RAG path, plus a new "cache" CLI subcommand for the on-disk cache it introduces. rag_hook: process-level RagPipeline singleton --------------------------------------------- RagPipeline construction loads the corpus, which is heavy enough that doing it once per template kind (15+ times in --all-kinds runs) was visibly slow. _get_pipeline() now caches the pipeline behind a threading.Lock with double-checked locking, so cost is paid once per process. tests/conftest.py resets the singleton between tests so existing patches still intercept construction. doc_gen/_anthropic: retry with exponential backoff -------------------------------------------------- call_anthropic now distinguishes retryable (429, 529, APIConnectionError) from non-retryable SDK errors and retries the former up to 3 times with 1s/2s/4s backoff. Non-retryable errors raise immediately. Credential redaction and __cause__ stripping are preserved. generator: parallel polish -------------------------- generate_feature_templates is now a three-phase pipeline — render (sequential, fast), polish (concurrent via ThreadPoolExecutor, max 4 workers), write (sequential, ordered). Saturates LLM-bound wall time for --all-kinds runs while staying under Anthropic rate limits. polish: on-disk cache with mtime TTL prune + clear --------------------------------------------------- polish_template now consults a sha256-keyed on-disk cache before calling the LLM. Key includes content + source_summary + template_type + system_prompt + augmented_context + model so any input change invalidates the entry. Default location is ~/.attune/polish_cache/ (overridable via env). _cache_get bumps mtime on hit so the prune sweeper treats hot entries as hot even on noatime mounts. _cache_prune deletes entries older than the TTL (default 30d, env-tunable, 0 disables) and runs lazily piggybacked on _cache_put. clear_cache() is exposed for manual nukes; the new "attune-author cache clear" subcommand calls it. Tests ----- - tests/test_polish_cache.py (new, 12 tests): hit/miss, mtime bump on hit, model in key, prune by mtime, TTL=0 disables, invalid TTL falls back, clear_cache, polish_template skips LLM on cache hit. - tests/test_anthropic_retry.py (new, 9 tests): retries on 429 / 529 / APIConnectionError, exponential schedule, gives up after _MAX_RETRIES, non-retryable raises immediately, credential redaction, __cause__ stripped. Full suite: 518 passed, 37 skipped (was 497, +21 new tests). Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
1 parent 8b100db commit f32efba

10 files changed

Lines changed: 823 additions & 43 deletions

File tree

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
44

55
[project]
66
name = "attune-author"
7-
version = "0.5.1"
7+
version = "0.6.0"
88
description = "Documentation authoring and maintenance for the attune ecosystem — generate, maintain, and validate help content with AI assistance."
99
readme = {file = "README.md", content-type = "text/markdown"}
1010
requires-python = ">=3.10"

src/attune_author/cli.py

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,27 @@ def _build_parser() -> argparse.ArgumentParser:
169169
help="Report stale features without regenerating.",
170170
)
171171

172+
p_cache = sub.add_parser(
173+
"cache",
174+
help="Manage the on-disk polish cache",
175+
description=(
176+
"Inspect and clear the on-disk LLM polish cache used by the "
177+
"generator. Entries are pruned automatically by mtime (default "
178+
"TTL 30 days, configurable via ATTUNE_AUTHOR_POLISH_CACHE_TTL_SECONDS); "
179+
"this command exposes a manual nuke."
180+
),
181+
)
182+
cache_sub = p_cache.add_subparsers(dest="cache_command", help="Cache subcommands")
183+
cache_sub.add_parser(
184+
"clear",
185+
help="Delete every cached polish entry",
186+
description=(
187+
"Remove all entries from the polish cache directory. Useful "
188+
"after a prompt change in attune-author itself, or to reclaim "
189+
"disk space without waiting for the TTL sweep."
190+
),
191+
)
192+
172193
p_docs = sub.add_parser(
173194
"docs",
174195
help="Generate docs from source (requires [ai])",
@@ -220,6 +241,7 @@ def _dispatch(args: argparse.Namespace, parser: argparse.ArgumentParser) -> int:
220241
"generate": _cmd_generate,
221242
"regenerate": _cmd_regenerate,
222243
"docs": _cmd_docs,
244+
"cache": _cmd_cache,
223245
}
224246
handler = handlers.get(args.command)
225247
if handler is None:
@@ -403,6 +425,20 @@ def _cmd_regenerate(args: argparse.Namespace) -> int:
403425
return 0
404426

405427

428+
def _cmd_cache(args: argparse.Namespace) -> int:
429+
"""Handle the cache command and its subcommands."""
430+
from attune_author.polish import _cache_dir, clear_cache
431+
432+
if args.cache_command == "clear":
433+
deleted = clear_cache()
434+
cache_path = _cache_dir()
435+
print(f"Cleared {deleted} entries from {cache_path}")
436+
return 0
437+
438+
print("Usage: attune-author cache clear", file=sys.stderr)
439+
return 1
440+
441+
406442
def _cmd_docs(args: argparse.Namespace) -> int:
407443
"""Handle the docs command."""
408444
if not args.target:

src/attune_author/doc_gen/_anthropic.py

Lines changed: 57 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -12,13 +12,17 @@
1212
import logging
1313
import os
1414
import re
15+
import time
1516
from typing import TYPE_CHECKING
1617

1718
if TYPE_CHECKING:
1819
from anthropic import Anthropic
1920

2021
logger = logging.getLogger(__name__)
2122

23+
_MAX_RETRIES = 3
24+
_RETRY_BASE_DELAY = 1.0 # seconds; doubles each attempt
25+
2226
#: Source-content character budgets per doc-gen stage. Tuned so
2327
#: the outline and review stages see enough code for accuracy
2428
#: without dominating the prompt context, while the write stage
@@ -55,6 +59,19 @@ def _redact(text: str) -> str:
5559
return _KEY_PATTERN.sub(_REDACTED, text)
5660

5761

62+
def _is_retryable(exc: Exception) -> bool:
63+
"""Return True for transient Anthropic errors that are safe to retry."""
64+
try:
65+
from anthropic import APIConnectionError, APIStatusError
66+
except ImportError:
67+
return False
68+
if isinstance(exc, APIConnectionError):
69+
return True
70+
if isinstance(exc, APIStatusError):
71+
return exc.status_code in (429, 529)
72+
return False
73+
74+
5875
def get_client(api_key: str | None = None) -> Anthropic:
5976
"""Instantiate an Anthropic client.
6077
@@ -85,11 +102,11 @@ def call_anthropic(
85102
model: str,
86103
max_tokens: int,
87104
) -> str:
88-
"""Make a single-turn ``messages.create`` call.
105+
"""Make a single-turn ``messages.create`` call with retry/backoff.
89106
90-
Wraps the SDK call so every caller shares identical error
91-
handling, message shape, and response unwrapping. Any
92-
exception raised by the SDK is re-raised as
107+
Retries up to ``_MAX_RETRIES`` times on transient errors (rate
108+
limits and overload responses). Non-transient SDK errors fail
109+
immediately. All exceptions are re-raised as
93110
:class:`AnthropicCallError` with a redacted message and an
94111
empty ``__cause__`` chain to guarantee credential material
95112
cannot leak through ``str(exc.__cause__)``.
@@ -106,23 +123,40 @@ def call_anthropic(
106123
string if the response carried no content.
107124
108125
Raises:
109-
AnthropicCallError: On any SDK or transport failure.
126+
AnthropicCallError: On any SDK or transport failure after
127+
retries are exhausted.
110128
"""
111-
try:
112-
response = client.messages.create(
113-
model=model,
114-
max_tokens=max_tokens,
115-
system=system,
116-
messages=[{"role": "user", "content": user_message}],
117-
)
118-
except Exception as exc: # noqa: BLE001
119-
# INTENTIONAL: every SDK exception type funnels through
120-
# one redaction pass so credential material can't leak
121-
# into logs, error surfaces, or upstream exception
122-
# chains. `from None` strips __cause__ so callers
123-
# inspecting the chain only ever see the redacted form.
124-
raise AnthropicCallError(_redact(str(exc))) from None
125-
126-
if response.content:
127-
return response.content[0].text
128-
return ""
129+
last_exc: Exception | None = None
130+
for attempt in range(_MAX_RETRIES + 1):
131+
if attempt:
132+
delay = _RETRY_BASE_DELAY * (2 ** (attempt - 1))
133+
logger.warning(
134+
"Anthropic call failed (attempt %d/%d), retrying in %.1fs: %s",
135+
attempt,
136+
_MAX_RETRIES,
137+
delay,
138+
_redact(str(last_exc)),
139+
)
140+
time.sleep(delay)
141+
try:
142+
response = client.messages.create(
143+
model=model,
144+
max_tokens=max_tokens,
145+
system=system,
146+
messages=[{"role": "user", "content": user_message}],
147+
)
148+
if response.content:
149+
return response.content[0].text
150+
return ""
151+
except Exception as exc: # noqa: BLE001
152+
# INTENTIONAL: every SDK exception type funnels through
153+
# one redaction pass so credential material can't leak
154+
# into logs, error surfaces, or upstream exception
155+
# chains. `from None` strips __cause__ so callers
156+
# inspecting the chain only ever see the redacted form.
157+
if _is_retryable(exc):
158+
last_exc = exc
159+
continue
160+
raise AnthropicCallError(_redact(str(exc))) from None
161+
162+
raise AnthropicCallError(_redact(str(last_exc))) from None

src/attune_author/generator.py

Lines changed: 60 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414

1515
import ast
1616
import logging
17+
from concurrent.futures import ThreadPoolExecutor, as_completed
1718
from dataclasses import dataclass, field
1819
from datetime import datetime, timezone
1920
from pathlib import Path
@@ -25,6 +26,55 @@
2526

2627
logger = logging.getLogger(__name__)
2728

29+
#: Cap on concurrent LLM calls during the parallel polish phase.
30+
#: Sized to comfortably fit under Anthropic's per-minute rate
31+
#: limits while still saturating the LLM-bound wall time of a
32+
#: typical ``regenerate --all-kinds`` run.
33+
_POLISH_MAX_WORKERS = 4
34+
35+
36+
def _parallel_polish(
37+
pending: list[tuple[str, str, Path]],
38+
feature: object,
39+
source_info: object,
40+
use_rag: bool,
41+
) -> dict[str, tuple[str, Path]]:
42+
"""Polish a batch of rendered templates concurrently.
43+
44+
Args:
45+
pending: List of (depth, rendered_content, out_path) tuples.
46+
feature: Feature being documented (read-only, thread-safe).
47+
source_info: Extracted source info (read-only, thread-safe).
48+
use_rag: Whether to use RAG grounding during polish.
49+
50+
Returns:
51+
Mapping of depth -> (polished_content, out_path). Raises
52+
the first exception encountered (propagated from the future).
53+
"""
54+
55+
def _task(depth: str, content: str, out_path: Path) -> tuple[str, str, Path]:
56+
polished = _maybe_polish(
57+
content,
58+
feature, # type: ignore[arg-type]
59+
source_info, # type: ignore[arg-type]
60+
template_type=depth,
61+
use_rag=use_rag,
62+
)
63+
return depth, polished, out_path
64+
65+
results: dict[str, tuple[str, Path]] = {}
66+
workers = min(len(pending), _POLISH_MAX_WORKERS)
67+
with ThreadPoolExecutor(max_workers=workers) as executor:
68+
futures = {
69+
executor.submit(_task, depth, content, out_path): depth
70+
for depth, content, out_path in pending
71+
}
72+
for future in as_completed(futures):
73+
depth, polished, out_path = future.result()
74+
results[depth] = (polished, out_path)
75+
return results
76+
77+
2878
#: Core progressive-depth template kinds. These form the
2979
#: progressive disclosure path that attune-help renders:
3080
#: concept → task → reference. They are generated by
@@ -234,6 +284,9 @@ def generate_feature_templates(
234284
", ".join(feature.doc_paths[1:]),
235285
)
236286

287+
# Phase 1: render all templates (fast Jinja2, sequential).
288+
# Determines which depths are active and builds the rendered skeleton.
289+
pending: list[tuple[str, str, Path]] = []
237290
for depth in target_depths:
238291
if depth not in _ALL_TEMPLATE_NAMES:
239292
logger.warning("Unknown template kind '%s', skipping", depth)
@@ -278,17 +331,15 @@ def generate_feature_templates(
278331
source_hash=source_hash,
279332
source_info=source_info,
280333
)
334+
pending.append((depth, content, out_path))
281335

282-
# LLM polish pass — improves writing quality
283-
content = _maybe_polish(
284-
content,
285-
feature,
286-
source_info,
287-
template_type=depth,
288-
use_rag=use_rag,
289-
)
336+
# Phase 2: LLM polish — run all depths concurrently.
337+
polished = _parallel_polish(pending, feature, source_info, use_rag)
290338

291-
out_path.write_text(content, encoding="utf-8")
339+
# Phase 3: write results in original depth order.
340+
for depth, content, out_path in pending:
341+
final_content, _ = polished[depth]
342+
out_path.write_text(final_content, encoding="utf-8")
292343
result.templates.append(
293344
GeneratedTemplate(
294345
feature=feature.name,

0 commit comments

Comments
 (0)