Skip to content

feat(polish): ground-truth context injection (Phase 2 of polish-fact-check)#35

Merged
silversurfer562 merged 1 commit into
mainfrom
feat/polish-fact-check-phase-2
May 16, 2026
Merged

feat(polish): ground-truth context injection (Phase 2 of polish-fact-check)#35
silversurfer562 merged 1 commit into
mainfrom
feat/polish-fact-check-phase-2

Conversation

@silversurfer562
Copy link
Copy Markdown
Member

Summary

  • Ship Phase 2 of the polish-fact-check spec: inject ground-truth context (CLI --help, public API signatures, dataclass fields) into the polish prompt so the model has authoritative surface details to anchor on, rather than relying solely on Phase 1's post-generation fact-check.
  • New src/attune_author/ground_truth/ package with three extractors (subprocess + LRU-cached CLI help, AST-walked public API, AST-walked dataclasses), a budget enforcer (5KB cap, documented drop order), and [tool.attune-author.context-injection] config.
  • New optional Feature.cli_command field on the manifest model — drives the <cli_help> block. Legacy manifests load cleanly; save omits the field when None.
  • Wired into both the synchronous (generator._maybe_polish) and batch (maintenance_batch._collect_polish_prompts) polish paths via a new include_ground_truth_anchor flag on polish_template / build_polish_prompt. The flag also shifts the prompt-cache key when set so stale cached entries invalidate cleanly without bespoke plumbing.

Motivation

attune-ai PR #351 documented six distinct hallucination shapes that escaped the polish pass on the ops-dashboard feature: invented CLI flags (--allow-run), fabricated private-module imports (from attune.ops._readers ...), wrong route paths, a hallucinated 498 templates count, missing security callouts, and broken cross-references. Phase 1 catches these after generation; Phase 2 prevents them by changing what the model sees during generation.

Test plan

  • Unit tests: 60 new tests under tests/unit/ground_truth/ covering each extractor, budget drop order, config schema, build_context shape, manifest round-trip, and polish-prompt integration (anchor clause + sentinel-tag embedding in the user message)
  • Full attune-author suite: pytest -q reports 896 passed, 37 pre-existing skips
  • Touched-file regression: pytest tests/test_generator.py tests/test_polish*.py tests/test_manifest.py tests/test_maintenance_batch.py all green
  • ruff check clean across all touched files
  • Live-LLM acceptance (gated to a follow-up once an ANTHROPIC_API_KEY lane is available): polish ops-dashboard with Phase 2 on + Phase 1 off and confirm 0 of the 3 high-severity error shapes (CLI flag, private import, route path) recur. Captured as an open Phase 2 exit-checklist item in tasks.md.
  • Cost-delta < 10% (deferred): folded into Phase 3's calibration run to avoid two separate real-LLM cycles. Captured in decisions.md.

Notes for review

  • uv.lock is intentionally NOT in this PR — pre-existing drift (lockfile still recorded attune-author 0.6.1) surfaced during the editable reinstall; addressing it separately keeps this PR scoped per the lockfile-drift lesson.
  • The CLI-flag part of task 2.8 (--inject-cli-help / --no-context-injection) is deferred — env-driven defaults ([tool.attune-author.context-injection]) were sufficient for v1 and a follow-up will bundle them with Phase 3's --faithfulness-threshold flag.

🤖 Generated with Claude Code

…check)

Phase 2 of the polish-fact-check spec changes what the model sees
during the polish pass rather than catching mistakes after the fact
(Phase 1's job). Three sentinel-tagged blocks carrying authoritative
surface details are injected into the user message, and a short
anchoring clause is appended to the system prompt instructing the
model to only reference names that appear verbatim in those blocks.

Goal: prevent the six hallucination shapes documented in attune-ai
PR #351's ops-dashboard editorial pass (invented CLI flags,
fabricated _readers/_models imports, wrong route paths, hallucinated
counts) at the prompt layer.

New package: src/attune_author/ground_truth/
  - cli_help.py    subprocess (10s timeout) + LRU cache per (exe,sub,cwd)
  - public_api.py  AST walk: __all__ + public function/class signatures
  - dataclass_refs.py  AST walk: @DataClass field names + type strings
                       (named to avoid shadowing the stdlib module)
  - budget.py      5KB cap; drop order dataclasses to public_api to cli_help
  - config.py      [tool.attune-author.context-injection] schema

Wiring:
  - Feature.cli_command optional field on the manifest model
    (legacy manifests round-trip; save omits the field when None).
  - build_polish_prompt (used by both sync and batch paths) gains
    include_ground_truth_anchor. Cache key shifts when set so old
    cached entries invalidate cleanly without bespoke plumbing.
  - generator._maybe_polish and maintenance_batch._collect_polish_prompts
    each build the ground-truth string once per feature and prepend it
    to the RAG hook's existing augmented_context.

Tests: 60 new tests under tests/unit/ground_truth/ covering each
extractor, budget drop order, config loading, build_context shape,
and the polish-prompt integration (anchor clause + sentinel-tag
embedding). Full suite: 896 passed, 37 pre-existing skips.

Decisions captured during impl (see decisions.md):
  - Compose with RAG instead of replacing it.
  - Anchor clause as system-prompt suffix.
  - Cache-key participation via system-prompt change.
  - CLI flags + live-LLM acceptance + cost-delta deferred to follow-ups.

Spec: docs/specs/polish-fact-check/

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant