You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(polish): ground-truth context injection (Phase 2 of polish-fact-check) (#35)
Phase 2 of the polish-fact-check spec changes what the model sees
during the polish pass rather than catching mistakes after the fact
(Phase 1's job). Three sentinel-tagged blocks carrying authoritative
surface details are injected into the user message, and a short
anchoring clause is appended to the system prompt instructing the
model to only reference names that appear verbatim in those blocks.
Goal: prevent the six hallucination shapes documented in attune-ai
PR #351's ops-dashboard editorial pass (invented CLI flags,
fabricated _readers/_models imports, wrong route paths, hallucinated
counts) at the prompt layer.
New package: src/attune_author/ground_truth/
- cli_help.py subprocess (10s timeout) + LRU cache per (exe,sub,cwd)
- public_api.py AST walk: __all__ + public function/class signatures
- dataclass_refs.py AST walk: @DataClass field names + type strings
(named to avoid shadowing the stdlib module)
- budget.py 5KB cap; drop order dataclasses to public_api to cli_help
- config.py [tool.attune-author.context-injection] schema
Wiring:
- Feature.cli_command optional field on the manifest model
(legacy manifests round-trip; save omits the field when None).
- build_polish_prompt (used by both sync and batch paths) gains
include_ground_truth_anchor. Cache key shifts when set so old
cached entries invalidate cleanly without bespoke plumbing.
- generator._maybe_polish and maintenance_batch._collect_polish_prompts
each build the ground-truth string once per feature and prepend it
to the RAG hook's existing augmented_context.
Tests: 60 new tests under tests/unit/ground_truth/ covering each
extractor, budget drop order, config loading, build_context shape,
and the polish-prompt integration (anchor clause + sentinel-tag
embedding). Full suite: 896 passed, 37 pre-existing skips.
Decisions captured during impl (see decisions.md):
- Compose with RAG instead of replacing it.
- Anchor clause as system-prompt suffix.
- Cache-key participation via system-prompt change.
- CLI flags + live-LLM acceptance + cost-delta deferred to follow-ups.
Spec: docs/specs/polish-fact-check/
Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>
| 2.4 | Implement `ground_truth.extract_dataclasses(source_paths)`| attune-author |**done**| ASTwalk: `@dataclass` decorator + AnnAssign field collection. Module named `dataclass_refs` to avoid stdlib shadowing|
85
+
| 2.5 | Add `<cli_help>`, `<public_api>`, `<dataclasses>` sentinel blocks to polish prompt builder | attune-author |**done**|Composed in `ground_truth.build_context`; prepended to RAG context when both exist|
86
+
| 2.6 | Add system-prompt anchoring clause | attune-author |**done**|`ANCHORING_CLAUSE` exposed; appended via new `include_ground_truth_anchor` flag on `polish_template`/`build_polish_prompt`. Cache key shifts accordingly.|
87
+
| 2.7 | Implement 5KB context budget enforcement with drop order | attune-author |**done**|`ground_truth.budget.enforce_budget`; drops dataclasses → public_api → cli_help; logs warning per drop|
88
+
| 2.8 | Add `[tool.attune-author.context-injection]` config + CLI flags | attune-author |**done**|Config schema landed (enabled, per-source toggles, budget, executable); CLI flag deferred (env-driven defaults sufficient for first iteration)|
89
+
| 2.9 | Test: ground-truth extractors produce expected output on ops-dashboard source | attune-author |**done**|25 tests across `test_public_api.py` + `test_dataclass_refs.py`|
90
+
| 2.10 | Test: polishing ops-dashboard with Phase 2 on, Phase 1 off recurs 0/3 high-severity errors | attune-author |**partial**|Unit-level: `test_polish_integration.py` asserts the sentinel blocks reach the user message and the anchor clause reaches the system prompt. Live-LLM acceptance run gated to a follow-up once an `ANTHROPIC_API_KEY` lane is available.|
91
+
| 2.11 | Test: budget enforcement drops sources in documented order | attune-author |**done**|8 tests in `test_budget.py` covering drop order, fallback, log emission|
92
+
| 2.12 | Cost-delta measurement: 3-feature regression set with vs without Phase 2 | attune-author |deferred|Requires real-LLM run; defer to Phase 3 calibration when judge cost is also measured|
93
+
| 2.13 | Update CHANGELOG + README | attune-author |**done**| CHANGELOG entry under Unreleased. README addition in same PR.|
94
94
95
95
### Phase 2 exit checklist
96
96
97
-
-[ ] Tasks 2.1–2.13 done
98
-
-[ ] 0/3 high-severity ops-dashboard errors recur in Phase-2-only polish
99
-
-[ ] Cost delta < 10%
100
-
-[ ] Spec status updated
97
+
-[x] Tasks 2.1–2.11, 2.13 done (60 new tests)
98
+
-[x] Spec status updated
99
+
-[ ] Live acceptance: 0/3 high-severity ops-dashboard errors recur in
100
+
Phase-2-only polish (requires real-LLM run — gated to a follow-up
101
+
task once `ANTHROPIC_API_KEY` is available in a CI lane)
0 commit comments