Smart-AI-Memory
diff --git a/‎CHANGELOG.md‎
Lines changed: 32 additions & 0 deletions b/‎CHANGELOG.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 47 additions & 2 deletions b/‎README.md‎
Lines changed: 47 additions & 2 deletions
diff --git a/‎docs/specs/polish-fact-check/decisions.md‎
Lines changed: 32 additions & 0 deletions b/‎docs/specs/polish-fact-check/decisions.md‎
Lines changed: 32 additions & 0 deletions
diff --git a/‎docs/specs/polish-fact-check/tasks.md‎
Lines changed: 19 additions & 17 deletions b/‎docs/specs/polish-fact-check/tasks.md‎
Lines changed: 19 additions & 17 deletions
diff --git a/‎src/attune_author/generator.py‎
Lines changed: 36 additions & 0 deletions b/‎src/attune_author/generator.py‎
Lines changed: 36 additions & 0 deletions
diff --git a/‎src/attune_author/ground_truth/__init__.py‎
Lines changed: 116 additions & 0 deletions b/‎src/attune_author/ground_truth/__init__.py‎
Lines changed: 116 additions & 0 deletions
@@ -13,6 +13,38 @@ and this project adheres to
 Work in progress for the next release. Add entries here as
 changes land, not at tag time.
 
+### Added
+
+- **Polish fact-check Phase 2 — ground-truth context
+  injection.** Builds three sentinel-tagged blocks
+  (`<cli_help>`, `<public_api>`, `<dataclasses>`) and injects
+  them into the polish prompt above the existing source
+  summary, with a short anchoring clause appended to the
+  system prompt instructing the model to only reference names
+  that appear verbatim in those blocks. Goal: prevent the
+  six hallucination shapes documented in attune-ai PR #351's
+  ops-dashboard editorial pass rather than catching them after
+  the fact (Phase 1's job).
+  - New package: `src/attune_author/ground_truth/` with
+    `cli_help.py` (subprocess + LRU cache), `public_api.py`
+    (AST walk for `__all__` + function/class signatures),
+    `dataclass_refs.py` (AST walk for `@dataclass` field
+    names/types), `budget.py` (5KB cap with documented drop
+    order: dataclasses → public_api → cli_help), and
+    `config.py` (`[tool.attune-author.context-injection]`
+    schema).
+  - New `Feature.cli_command` field on the manifest model;
+    legacy manifests without this field continue to load. Save
+    omits the field when `None`.
+  - `build_polish_prompt` (used by both the synchronous and
+    batch paths) accepts a new `include_ground_truth_anchor`
+    flag; when True, the anchoring clause is appended to the
+    system prompt and the prompt cache key shifts accordingly.
+  - 60 new tests under `tests/unit/ground_truth/`.
+  - Spec: `docs/specs/polish-fact-check/`. Phase 3
+    (faithfulness judge integration) and Phase 4 (tutorial
+    code-fence mypy) remain on the roadmap.
+
 ## [0.13.0] - 2026-05-15
 
 > **Note**: skipping `0.12.0`. The internal `release/v0.12.0`
 
@@ -114,8 +114,53 @@ check_numeric_refs = true
 
 This is Phase 1 of the [polish-fact-check
 spec](docs/specs/polish-fact-check/). Phase 2 (ground-truth
-context injection), Phase 3 (faithfulness judge), and Phase 4
-(tutorial static check) are tracked in `tasks.md`.
+context injection) shipped alongside it. Phase 3 (faithfulness
+judge) and Phase 4 (tutorial static check) remain on the
+roadmap.
+
+## Ground-truth context (polish-prompt injection)
+
+Phase 2 of the polish-fact-check spec changes what the model
+sees during the polish pass: three sentinel-tagged blocks
+carrying authoritative surface details are injected into the
+user message before the source summary, and a short anchoring
+clause is appended to the system prompt instructing the model
+to only reference names that appear verbatim in those blocks.
+
+The three blocks:
+
+- `<cli_help>`: captured `<cli> <subcommand> --help` output.
+  Driven by an optional `cli_command:` field on each feature in
+  `features.yaml` (e.g., `cli_command: ops` for a feature whose
+  primary UX is `attune ops`). Absence skips this block.
+- `<public_api>`: AST-extracted `__all__` lists plus signatures
+  for every public function and class in the feature's source
+  files.
+- `<dataclasses>`: AST-extracted field names + type annotations
+  for every public `@dataclass` in the feature's source files.
+
+The combined block list is capped at 5 KB by default. When the
+budget is exceeded, blocks drop in this order: dataclasses,
+public_api, cli_help — the most authoritative anchor stays the
+longest.
+
+Configure via `pyproject.toml`:
+
+```toml
+[tool.attune-author.context-injection]
+enabled = true
+inject_cli_help = true
+inject_public_api = true
+inject_dataclasses = true
+budget_bytes = 5120
+cli_executable = "attune"
+```
+
+The goal is to prevent the six hallucination shapes documented
+in attune-ai PR #351's ops-dashboard editorial pass (invented
+CLI flags, fabricated private-module imports, wrong route
+paths, hallucinated counts) at the prompt layer, rather than
+relying solely on the post-generation fact-check to catch them.
 
 ## Polish cache
 
 
@@ -41,3 +41,35 @@ To be filled in during Phase 3 implementation:
 
 - 2026-05-14 — Initial decisions captured during spec draft. Patrick
   approved.
+- 2026-05-16 — Phase 2 shipped. New decisions captured during
+  implementation:
+  - **Composition with RAG context**: ground-truth context is
+    prepended to the RAG hook's existing `augmented_context` rather
+    than replacing it. Rationale: the two carry orthogonal information
+    (RAG retrieves similar templates; ground-truth pins names) so
+    keeping both maximizes prompt utility within the budget.
+  - **Anchor clause as system-prompt suffix**: the
+    `ANCHORING_CLAUSE` appends to the existing per-template-type
+    system prompt rather than replacing or wrapping it. Rationale:
+    minimises drift from the existing polish system prompts, which are
+    already large (~6KB) and cache-friendly; the suffix is short and
+    behaviorally additive.
+  - **Cache-key participation**: when the anchor clause is added,
+    the system prompt changes — and the polish-cache key already
+    includes the system prompt, so existing cached entries are
+    invalidated cleanly without bespoke cache-key plumbing.
+  - **CLI flags deferred (task 2.8)**: env-driven defaults via
+    `[tool.attune-author.context-injection]` in `pyproject.toml`
+    were sufficient for the first iteration. CLI flags can be added
+    in a follow-up alongside Phase 3's `--faithfulness-threshold`
+    flag.
+  - **Live-LLM acceptance gate deferred**: task 2.10 splits into
+    a unit-level part (assert sentinel blocks reach the user
+    message + anchor clause reaches the system prompt — done) and
+    a live-LLM part (actually polish ops-dashboard with Phase 2 on
+    + Phase 1 off and observe 0/3 high-severity errors). The
+    live-LLM part stays gated behind real-API-key availability.
+  - **Cost-delta measurement deferred to Phase 3**: when the
+    faithfulness judge ships, it will require its own real-LLM
+    calibration run. Folding the cost-delta measurement into that
+    run avoids two separate real-LLM cycles.
@@ -78,26 +78,28 @@ code).
 
 | # | Task | Layer | Status | Notes |
 |---|------|-------|--------|-------|
-| 2.1 | Add `cli_command` field to `Feature` (the manifest model) | attune-author | todo | Optional; absence skips CLI-help injection |
-| 2.2 | Implement `ground_truth.extract_cli_help(cli_cmd, subcommand, project_root)` | attune-author | todo | `subprocess.run(...)` with timeout; cache per (cmd, subcommand) pair |
-| 2.3 | Implement `ground_truth.extract_public_api(source_paths)` | attune-author | todo | AST-walk for `__all__` + non-underscore-prefixed defs |
-| 2.4 | Implement `ground_truth.extract_dataclasses(source_paths)` | attune-author | todo | AST-walk for `@dataclass`; collect field names + type strings |
-| 2.5 | Add `<cli_help>`, `<public_api>`, `<dataclasses>` sentinel blocks to polish prompt builder | attune-author | todo | Match existing context-block format |
-| 2.6 | Add system-prompt anchoring clause | attune-author | todo | "Ground-truth context blocks contain surface details — names you use must appear verbatim" |
-| 2.7 | Implement 5KB context budget enforcement with drop order | attune-author | todo | Log warning on drop; never fail |
-| 2.8 | Add `[tool.attune-author.context-injection]` config + CLI flags | attune-author | todo | Defaults: all three sources on, 5KB budget |
-| 2.9 | Test: ground-truth extractors produce expected output on ops-dashboard source | attune-author | todo | Snapshot tests |
-| 2.10 | Test: polishing ops-dashboard with Phase 2 on, Phase 1 off recurs 0/3 high-severity errors | attune-author | todo | The acceptance gate from `design.md` |
-| 2.11 | Test: budget enforcement drops sources in documented order | attune-author | todo | Artificial 1KB cap forces drops |
-| 2.12 | Cost-delta measurement: 3-feature regression set with vs without Phase 2 | attune-author | todo | Record in CHANGELOG; should be < 10% |
-| 2.13 | Update CHANGELOG + README | attune-author | todo | |
+| 2.1 | Add `cli_command` field to `Feature` (the manifest model) | attune-author | **done** | Optional; load/save preserve; defaults None |
+| 2.2 | Implement `ground_truth.extract_cli_help(cli_cmd, subcommand, project_root)` | attune-author | **done** | `subprocess.run(...)` with 10s timeout; `@lru_cache` per (exe, sub, cwd) |
+| 2.3 | Implement `ground_truth.extract_public_api(source_paths)` | attune-author | **done** | AST walk: `__all__` + public function/class signatures (incl. method bodies) |
+| 2.4 | Implement `ground_truth.extract_dataclasses(source_paths)` | attune-author | **done** | AST walk: `@dataclass` decorator + AnnAssign field collection. Module named `dataclass_refs` to avoid stdlib shadowing |
+| 2.5 | Add `<cli_help>`, `<public_api>`, `<dataclasses>` sentinel blocks to polish prompt builder | attune-author | **done** | Composed in `ground_truth.build_context`; prepended to RAG context when both exist |
+| 2.6 | Add system-prompt anchoring clause | attune-author | **done** | `ANCHORING_CLAUSE` exposed; appended via new `include_ground_truth_anchor` flag on `polish_template`/`build_polish_prompt`. Cache key shifts accordingly. |
+| 2.7 | Implement 5KB context budget enforcement with drop order | attune-author | **done** | `ground_truth.budget.enforce_budget`; drops dataclasses → public_api → cli_help; logs warning per drop |
+| 2.8 | Add `[tool.attune-author.context-injection]` config + CLI flags | attune-author | **done** | Config schema landed (enabled, per-source toggles, budget, executable); CLI flag deferred (env-driven defaults sufficient for first iteration) |
+| 2.9 | Test: ground-truth extractors produce expected output on ops-dashboard source | attune-author | **done** | 25 tests across `test_public_api.py` + `test_dataclass_refs.py` |
+| 2.10 | Test: polishing ops-dashboard with Phase 2 on, Phase 1 off recurs 0/3 high-severity errors | attune-author | **partial** | Unit-level: `test_polish_integration.py` asserts the sentinel blocks reach the user message and the anchor clause reaches the system prompt. Live-LLM acceptance run gated to a follow-up once an `ANTHROPIC_API_KEY` lane is available. |
+| 2.11 | Test: budget enforcement drops sources in documented order | attune-author | **done** | 8 tests in `test_budget.py` covering drop order, fallback, log emission |
+| 2.12 | Cost-delta measurement: 3-feature regression set with vs without Phase 2 | attune-author | deferred | Requires real-LLM run; defer to Phase 3 calibration when judge cost is also measured |
+| 2.13 | Update CHANGELOG + README | attune-author | **done** | CHANGELOG entry under Unreleased. README addition in same PR. |
 
 ### Phase 2 exit checklist
 
-- [ ] Tasks 2.1–2.13 done
-- [ ] 0/3 high-severity ops-dashboard errors recur in Phase-2-only polish
-- [ ] Cost delta < 10%
-- [ ] Spec status updated
+- [x] Tasks 2.1–2.11, 2.13 done (60 new tests)
+- [x] Spec status updated
+- [ ] Live acceptance: 0/3 high-severity ops-dashboard errors recur in
+      Phase-2-only polish (requires real-LLM run — gated to a follow-up
+      task once `ANTHROPIC_API_KEY` is available in a CI lane)
+- [ ] Cost delta < 10% (deferred to Phase 3 calibration run)
 
 ---
 
 
@@ -39,6 +39,8 @@ def _parallel_polish(
     feature: object,
     source_info: object,
     use_rag: bool,
+    matched_files: list[str] | None = None,
+    project_root: Path | None = None,
 ) -> dict[str, tuple[str, Path]]:
     """Polish a batch of rendered templates concurrently.
 
@@ -47,6 +49,12 @@ def _parallel_polish(
         feature: Feature being documented (read-only, thread-safe).
         source_info: Extracted source info (read-only, thread-safe).
         use_rag: Whether to use RAG grounding during polish.
+        matched_files: Source file paths (relative to ``project_root``)
+            for the feature, used by Phase 2 ground-truth context
+            injection. ``None`` skips that injection.
+        project_root: Consumer project root; required when
+            ``matched_files`` is supplied. Used to resolve relative
+            paths and to run the consumer's CLI for ``--help``.
 
     Returns:
         Mapping of depth -> (polished_content, out_path). Raises
@@ -60,6 +68,8 @@ def _task(depth: str, content: str, out_path: Path) -> tuple[str, str, Path]:
             source_info,  # type: ignore[arg-type]
             template_type=depth,
             use_rag=use_rag,
+            matched_files=matched_files,
+            project_root=project_root,
         )
         return depth, polished, out_path
 
@@ -307,6 +317,8 @@ def generate_feature_templates(
         prep.feature,
         prep.source_info,
         prep.use_rag,
+        matched_files=list(prep.matched_files),
+        project_root=Path(project_root),
     )
     polished_text: dict[str, str] = {depth: text for depth, (text, _path) in polished.items()}
 
@@ -512,6 +524,8 @@ def _maybe_polish(
     source_info: _SourceInfo,
     template_type: str = "generic",
     use_rag: bool = True,
+    matched_files: list[str] | None = None,
+    project_root: Path | None = None,
 ) -> str:
     """Run the LLM polish pass on rendered template content.
 
@@ -564,12 +578,34 @@ def _maybe_polish(
             template_type,
         )
 
+    # Phase 2 ground-truth context injection. The block carries
+    # authoritative surface details (CLI --help, public API, dataclass
+    # fields). Composed BEFORE the RAG block so the model reads the
+    # ground truth first; the anchor clause is added to the system
+    # prompt only when this block is actually present.
+    ground_truth_text: str | None = None
+    if matched_files and project_root is not None:
+        from attune_author.ground_truth import build_context as build_ground_truth
+
+        absolute_sources = [project_root / rel_path for rel_path in matched_files]
+        ground_truth_text = build_ground_truth(
+            feature,
+            absolute_sources,
+            project_root=project_root,
+        )
+
+    if ground_truth_text and augmented_context:
+        augmented_context = ground_truth_text + "\n" + augmented_context
+    elif ground_truth_text:
+        augmented_context = ground_truth_text
+
     return polish_template(
         content,
         feature.name,
         summary,
         template_type=template_type,
         augmented_context=augmented_context,
+        include_ground_truth_anchor=ground_truth_text is not None,
     )
 
 
 
@@ -0,0 +1,116 @@
+"""Ground-truth context injection for the polish pass.
+
+Phase 2 of the polish-fact-check spec
+(``docs/specs/polish-fact-check``). Builds and injects authoritative
+surface details (CLI ``--help`` output, public API signatures,
+dataclass fields) into the polish prompt so the LLM has to anchor
+on real names instead of inventing them.
+
+The fact-check pass (Phase 1) catches mistakes after the fact.
+This phase prevents them by changing what the model sees.
+"""
+
+from __future__ import annotations
+
+import logging
+from pathlib import Path
+
+from attune_author.manifest import Feature
+
+from .budget import enforce_budget
+from .cli_help import extract_cli_help
+from .config import GroundTruthConfig, load_config
+from .dataclass_refs import extract_dataclasses
+from .public_api import extract_public_api
+
+logger = logging.getLogger(__name__)
+
+
+#: System-prompt clause appended (by callers that inject context)
+#: instructing the model to anchor on the ground-truth blocks. The
+#: text is intentionally short and concrete — the existing polish
+#: system prompts are already long, so we keep this addition tight.
+ANCHORING_CLAUSE = (
+    "\n\nThe user message contains <cli_help>, <public_api>, and "
+    "<dataclasses> blocks with ground-truth surface details for this "
+    "feature. When you reference a CLI flag, public function, import "
+    "path, or dataclass field, it MUST appear verbatim in one of those "
+    "blocks. If you need to describe something not in the ground "
+    "truth, describe the behavior without inventing a specific name."
+)
+
+
+def build_context(
+    feature: Feature,
+    source_paths: list[Path],
+    *,
+    project_root: Path,
+    config: GroundTruthConfig | None = None,
+) -> str | None:
+    """Build a ground-truth context string for the polish prompt.
+
+    Args:
+        feature: The feature being documented. ``feature.cli_command``
+            drives the CLI ``--help`` block; absence skips that block.
+        source_paths: Source ``.py`` files matched by ``feature.files``.
+            Used for ``__all__``, public-API signatures, and dataclass
+            extraction.
+        project_root: Used to invoke the consumer's CLI for ``--help``.
+        config: Optional explicit config; ``None`` loads from the
+            project's ``pyproject.toml``.
+
+    Returns:
+        A context string with sentinel-tagged blocks, ready to pass as
+        ``augmented_context=`` to :func:`attune_author.polish.polish_template`.
+        Returns ``None`` if the feature is disabled or no source had any
+        extractable surface.
+    """
+    cfg = config if config is not None else load_config(project_root)
+    if not cfg.enabled:
+        return None
+
+    blocks: list[tuple[str, str]] = []
+
+    cli_help_text = ""
+    if cfg.inject_cli_help and feature.cli_command:
+        cli_help_text = extract_cli_help(
+            cfg.cli_executable,
+            feature.cli_command,
+            project_root=project_root,
+        )
+        if cli_help_text:
+            blocks.append(("cli_help", cli_help_text))
+
+    public_api_text = ""
+    if cfg.inject_public_api:
+        public_api_text = extract_public_api(source_paths)
+        if public_api_text:
+            blocks.append(("public_api", public_api_text))
+
+    dataclass_text = ""
+    if cfg.inject_dataclasses:
+        dataclass_text = extract_dataclasses(source_paths)
+        if dataclass_text:
+            blocks.append(("dataclasses", dataclass_text))
+
+    if not blocks:
+        return None
+
+    blocks = enforce_budget(blocks, cfg.budget_bytes)
+
+    parts: list[str] = ["## Ground-truth context\n"]
+    for tag, body in blocks:
+        parts.append(f"<{tag}>\n{body.rstrip()}\n</{tag}>\n")
+    return "\n".join(parts) + "\n"
+
+
+__all__ = [
+    "ANCHORING_CLAUSE",
+    "GroundTruthConfig",
+    "build_context",
+    "enforce_budget",
+    "extract_cli_help",
+    "extract_dataclasses",
+    "extract_public_api",
+    "load_config",
+]