feat(polish): ground-truth context injection (Phase 2 of polish-fact-check) by silversurfer562 · Pull Request #35 · Smart-AI-Memory/attune-author

silversurfer562 · 2026-05-16T06:51:52Z

Summary

Ship Phase 2 of the polish-fact-check spec: inject ground-truth context (CLI --help, public API signatures, dataclass fields) into the polish prompt so the model has authoritative surface details to anchor on, rather than relying solely on Phase 1's post-generation fact-check.
New src/attune_author/ground_truth/ package with three extractors (subprocess + LRU-cached CLI help, AST-walked public API, AST-walked dataclasses), a budget enforcer (5KB cap, documented drop order), and [tool.attune-author.context-injection] config.
New optional Feature.cli_command field on the manifest model — drives the <cli_help> block. Legacy manifests load cleanly; save omits the field when None.
Wired into both the synchronous (generator._maybe_polish) and batch (maintenance_batch._collect_polish_prompts) polish paths via a new include_ground_truth_anchor flag on polish_template / build_polish_prompt. The flag also shifts the prompt-cache key when set so stale cached entries invalidate cleanly without bespoke plumbing.

Motivation

attune-ai PR #351 documented six distinct hallucination shapes that escaped the polish pass on the ops-dashboard feature: invented CLI flags (--allow-run), fabricated private-module imports (from attune.ops._readers ...), wrong route paths, a hallucinated 498 templates count, missing security callouts, and broken cross-references. Phase 1 catches these after generation; Phase 2 prevents them by changing what the model sees during generation.

Test plan

Unit tests: 60 new tests under tests/unit/ground_truth/ covering each extractor, budget drop order, config schema, build_context shape, manifest round-trip, and polish-prompt integration (anchor clause + sentinel-tag embedding in the user message)
Full attune-author suite: pytest -q reports 896 passed, 37 pre-existing skips
Touched-file regression: pytest tests/test_generator.py tests/test_polish*.py tests/test_manifest.py tests/test_maintenance_batch.py all green
ruff check clean across all touched files
Live-LLM acceptance (gated to a follow-up once an ANTHROPIC_API_KEY lane is available): polish ops-dashboard with Phase 2 on + Phase 1 off and confirm 0 of the 3 high-severity error shapes (CLI flag, private import, route path) recur. Captured as an open Phase 2 exit-checklist item in tasks.md.
Cost-delta < 10% (deferred): folded into Phase 3's calibration run to avoid two separate real-LLM cycles. Captured in decisions.md.

Notes for review

uv.lock is intentionally NOT in this PR — pre-existing drift (lockfile still recorded attune-author 0.6.1) surfaced during the editable reinstall; addressing it separately keeps this PR scoped per the lockfile-drift lesson.
The CLI-flag part of task 2.8 (--inject-cli-help / --no-context-injection) is deferred — env-driven defaults ([tool.attune-author.context-injection]) were sufficient for v1 and a follow-up will bundle them with Phase 3's --faithfulness-threshold flag.

🤖 Generated with Claude Code

…check) Phase 2 of the polish-fact-check spec changes what the model sees during the polish pass rather than catching mistakes after the fact (Phase 1's job). Three sentinel-tagged blocks carrying authoritative surface details are injected into the user message, and a short anchoring clause is appended to the system prompt instructing the model to only reference names that appear verbatim in those blocks. Goal: prevent the six hallucination shapes documented in attune-ai PR #351's ops-dashboard editorial pass (invented CLI flags, fabricated _readers/_models imports, wrong route paths, hallucinated counts) at the prompt layer. New package: src/attune_author/ground_truth/ - cli_help.py subprocess (10s timeout) + LRU cache per (exe,sub,cwd) - public_api.py AST walk: __all__ + public function/class signatures - dataclass_refs.py AST walk: @DataClass field names + type strings (named to avoid shadowing the stdlib module) - budget.py 5KB cap; drop order dataclasses to public_api to cli_help - config.py [tool.attune-author.context-injection] schema Wiring: - Feature.cli_command optional field on the manifest model (legacy manifests round-trip; save omits the field when None). - build_polish_prompt (used by both sync and batch paths) gains include_ground_truth_anchor. Cache key shifts when set so old cached entries invalidate cleanly without bespoke plumbing. - generator._maybe_polish and maintenance_batch._collect_polish_prompts each build the ground-truth string once per feature and prepend it to the RAG hook's existing augmented_context. Tests: 60 new tests under tests/unit/ground_truth/ covering each extractor, budget drop order, config loading, build_context shape, and the polish-prompt integration (anchor clause + sentinel-tag embedding). Full suite: 896 passed, 37 pre-existing skips. Decisions captured during impl (see decisions.md): - Compose with RAG instead of replacing it. - Anchor clause as system-prompt suffix. - Cache-key participation via system-prompt change. - CLI flags + live-LLM acceptance + cost-delta deferred to follow-ups. Spec: docs/specs/polish-fact-check/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

silversurfer562 merged commit d5c060b into main May 16, 2026
12 checks passed

silversurfer562 deleted the feat/polish-fact-check-phase-2 branch May 16, 2026 07:57

silversurfer562 mentioned this pull request May 22, 2026

release: v0.14.0 — polish-fact-check Phases 2-4 + workspace_staleness helper #40

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(polish): ground-truth context injection (Phase 2 of polish-fact-check)#35

feat(polish): ground-truth context injection (Phase 2 of polish-fact-check)#35
silversurfer562 merged 1 commit into
mainfrom
feat/polish-fact-check-phase-2

silversurfer562 commented May 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

silversurfer562 commented May 16, 2026

Summary

Motivation

Test plan

Notes for review

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant