feat(polish): ground-truth context injection (Phase 2 of polish-fact-check)#35
Merged
Merged
Conversation
…check) Phase 2 of the polish-fact-check spec changes what the model sees during the polish pass rather than catching mistakes after the fact (Phase 1's job). Three sentinel-tagged blocks carrying authoritative surface details are injected into the user message, and a short anchoring clause is appended to the system prompt instructing the model to only reference names that appear verbatim in those blocks. Goal: prevent the six hallucination shapes documented in attune-ai PR #351's ops-dashboard editorial pass (invented CLI flags, fabricated _readers/_models imports, wrong route paths, hallucinated counts) at the prompt layer. New package: src/attune_author/ground_truth/ - cli_help.py subprocess (10s timeout) + LRU cache per (exe,sub,cwd) - public_api.py AST walk: __all__ + public function/class signatures - dataclass_refs.py AST walk: @DataClass field names + type strings (named to avoid shadowing the stdlib module) - budget.py 5KB cap; drop order dataclasses to public_api to cli_help - config.py [tool.attune-author.context-injection] schema Wiring: - Feature.cli_command optional field on the manifest model (legacy manifests round-trip; save omits the field when None). - build_polish_prompt (used by both sync and batch paths) gains include_ground_truth_anchor. Cache key shifts when set so old cached entries invalidate cleanly without bespoke plumbing. - generator._maybe_polish and maintenance_batch._collect_polish_prompts each build the ground-truth string once per feature and prepend it to the RAG hook's existing augmented_context. Tests: 60 new tests under tests/unit/ground_truth/ covering each extractor, budget drop order, config loading, build_context shape, and the polish-prompt integration (anchor clause + sentinel-tag embedding). Full suite: 896 passed, 37 pre-existing skips. Decisions captured during impl (see decisions.md): - Compose with RAG instead of replacing it. - Anchor clause as system-prompt suffix. - Cache-key participation via system-prompt change. - CLI flags + live-LLM acceptance + cost-delta deferred to follow-ups. Spec: docs/specs/polish-fact-check/ Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This was referenced May 16, 2026
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
--help, public API signatures, dataclass fields) into the polish prompt so the model has authoritative surface details to anchor on, rather than relying solely on Phase 1's post-generation fact-check.src/attune_author/ground_truth/package with three extractors (subprocess + LRU-cached CLI help, AST-walked public API, AST-walked dataclasses), a budget enforcer (5KB cap, documented drop order), and[tool.attune-author.context-injection]config.Feature.cli_commandfield on the manifest model — drives the<cli_help>block. Legacy manifests load cleanly; save omits the field whenNone.generator._maybe_polish) and batch (maintenance_batch._collect_polish_prompts) polish paths via a newinclude_ground_truth_anchorflag onpolish_template/build_polish_prompt. The flag also shifts the prompt-cache key when set so stale cached entries invalidate cleanly without bespoke plumbing.Motivation
attune-ai PR #351 documented six distinct hallucination shapes that escaped the polish pass on the ops-dashboard feature: invented CLI flags (
--allow-run), fabricated private-module imports (from attune.ops._readers ...), wrong route paths, a hallucinated498 templatescount, missing security callouts, and broken cross-references. Phase 1 catches these after generation; Phase 2 prevents them by changing what the model sees during generation.Test plan
tests/unit/ground_truth/covering each extractor, budget drop order, config schema,build_contextshape, manifest round-trip, and polish-prompt integration (anchor clause + sentinel-tag embedding in the user message)pytest -qreports 896 passed, 37 pre-existing skipspytest tests/test_generator.py tests/test_polish*.py tests/test_manifest.py tests/test_maintenance_batch.pyall greenruff checkclean across all touched filesANTHROPIC_API_KEYlane is available): polish ops-dashboard with Phase 2 on + Phase 1 off and confirm 0 of the 3 high-severity error shapes (CLI flag, private import, route path) recur. Captured as an open Phase 2 exit-checklist item intasks.md.decisions.md.Notes for review
uv.lockis intentionally NOT in this PR — pre-existing drift (lockfile still recordedattune-author 0.6.1) surfaced during the editable reinstall; addressing it separately keeps this PR scoped per the lockfile-drift lesson.--inject-cli-help/--no-context-injection) is deferred — env-driven defaults ([tool.attune-author.context-injection]) were sufficient for v1 and a follow-up will bundle them with Phase 3's--faithfulness-thresholdflag.🤖 Generated with Claude Code