Merge pull request #491 from raifdmueller/fix/phase2-code-evidence-citations

rdmueller · web-flow · commit 00aa614f2b4b · 2026-05-17T13:11:37.000+02:00
fix(socratic-recovery): make synthesized docs self-contained, cite code not Q-IDs
diff --git a/docs/brownfield-workflow.adoc b/docs/brownfield-workflow.adoc
@@ -163,7 +163,7 @@ The LLM synthesizes the answered questions plus the code evidence from Phase 1 i
 * *arc42* with all 12 chapters from Q-3 branch
 * *Nygard ADRs* with Pugh Matrix from Q-3.9 branch
 
-Every claim references a Question ID and marks team-provided information with `(team answer)`. This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.
+Code-derived claims carry the `file:line` evidence from their `[ANSWERED]` leaf — a reference to the code, the only durable artifact; team-provided information is marked `(team answer)`. The Question Tree is temporary scaffolding, so Q-IDs are not written into the final documents — during synthesis every claim is traced back to a leaf as a build-time check. This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.
 
 === Establish Baseline Tests
 
@@ -261,7 +261,7 @@ Stable code that nobody touches does not need specs.
 |{empty}--
 
 |Theory Recovery (Phase 2)
-|`Synthesize documentation from the Question Tree and team answers. Every claim references a Q-ID. Mark team input with (team answer).`
+|`Synthesize self-contained documentation from the Question Tree and team answers. Cite file:line evidence for code-derived claims, mark team input with (team answer), keep deferred questions as explicit gaps. Q-IDs stay out of the output.`
 |link:#/spec-driven-development[Spec-Driven Workflow]
 
 |Baseline Tests
diff --git a/docs/brownfield-workflow.de.adoc b/docs/brownfield-workflow.de.adoc
@@ -161,7 +161,7 @@ Das LLM synthetisiert die beantworteten Fragen plus die Code-Evidenz aus Phase 1
 * *arc42* mit allen 12 Kapiteln aus dem Q3-Ast
 * *Nygard-ADRs* mit Pugh-Matrix aus dem Q3.9-Ast
 
-Jede Aussage referenziert eine Question-ID und markiert teamgegebene Information mit `(team answer)`. Diese doppelte Rückverfolgbarkeit (Code-Evidenz + Team-Input) ist der entscheidende Unterschied zu einem einfachen Reverse-Engineering-Prompt.
+Code-basierte Aussagen tragen die `file:line`-Evidenz aus ihrem `[ANSWERED]`-Leaf — eine Referenz auf den Code, das einzige dauerhafte Artefakt; teamgegebene Information wird mit `(team answer)` markiert. Der Question Tree ist temporäres Gerüst, daher landen Q-IDs nicht in den finalen Dokumenten — beim Synthetisieren wird jede Aussage als Build-Time-Prüfung auf ein Leaf zurückgeführt. Diese doppelte Rückverfolgbarkeit (Code-Evidenz + Team-Input) ist der entscheidende Unterschied zu einem einfachen Reverse-Engineering-Prompt.
 
 === Basis-Tests aufbauen
 
@@ -259,7 +259,7 @@ Stabiler Code, den niemand anfasst, braucht keine Specs.
 |{empty}--
 
 |Theory Recovery (Phase 2)
-|`Synthesize documentation from the Question Tree and team answers. Every claim references a Q-ID. Mark team input with (team answer).`
+|`Synthesize self-contained documentation from the Question Tree and team answers. Cite file:line evidence for code-derived claims, mark team input with (team answer), keep deferred questions as explicit gaps. Q-IDs stay out of the output.`
 |link:#/spec-driven-development[Spec-Driven Workflow]
 
 |Basis-Tests
diff --git a/docs/socratic-recovery-skill.adoc b/docs/socratic-recovery-skill.adoc
@@ -22,7 +22,7 @@ Outputs two AsciiDoc files: `QUESTION_TREE.adoc` (full reasoning trace) and `OPE
 
 === Phase 2 — Synthesize documentation
 
-The skill takes the answered tree and produces a PRD, Cockburn use cases, an arc42 architecture document, and Nygard ADRs with Pugh matrices. Every claim cites a Q-ID; team-supplied facts are marked `(team answer)`.
+The skill takes the answered tree and produces a PRD, Cockburn use cases, an arc42 architecture document, and Nygard ADRs with Pugh matrices. Code-derived claims cite the `file:line` evidence from their `[ANSWERED]` leaf, and team-supplied facts are marked `(team answer)`. The Question Tree is temporary scaffolding, so Q-IDs stay out of the final documents.
 
 == When to use it
 
@@ -87,7 +87,8 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
 
 The skill enforces a two-phase workflow: build a Question Tree first
 ([ANSWERED] with code evidence vs [OPEN] with role), let the team answer
-the OPEN leaves, then synthesize documentation with full Q-ID traceability.
+the OPEN leaves, then synthesize self-contained documentation that traces
+every claim to code evidence or a team answer.
 ----
 
 === link:https://github.com/google-gemini/gemini-cli[Gemini CLI]
@@ -105,7 +106,9 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
 Build a Question Tree before writing any documentation. Mark each leaf
 [ANSWERED] (with file:line evidence) or [OPEN] (with Category and Ask role).
 Synthesize docs from the answered tree only after the team has filled in
-the OPEN leaves. Cite Q-IDs in every claim.
+the OPEN leaves. The docs must be self-contained: cite file:line evidence
+for code-derived claims, mark team input with (team answer). Q-IDs stay
+out of the output.
 ----
 
 === link:https://docs.cursor.com/[Cursor]
@@ -138,8 +141,9 @@ Recovery workflow at
 https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
 
 Two phases: first a Question Tree separating code-derivable facts from
-open questions routed by role; second, synthesis with Q-ID traceability
-after the team fills the gaps.
+open questions routed by role; second, synthesis into self-contained
+documentation — code-evidenced or team-answered — after the team fills
+the gaps.
 ----
 
 === link:https://kiro.dev/[Amazon Kiro]
diff --git a/docs/socratic-recovery-skill.de.adoc b/docs/socratic-recovery-skill.de.adoc
@@ -22,7 +22,7 @@ Output sind zwei AsciiDoc-Dateien: `QUESTION_TREE.adoc` (vollständige Begründu
 
 === Phase 2 — Dokumentation synthetisieren
 
-Der Skill nimmt den beantworteten Baum und erzeugt ein PRD, Cockburn Use Cases, eine arc42-Architekturbeschreibung und Nygard-ADRs mit Pugh-Matrix. Jede Aussage zitiert eine Q-ID; team-gegebene Fakten sind mit `(team answer)` markiert.
+Der Skill nimmt den beantworteten Baum und erzeugt ein PRD, Cockburn Use Cases, eine arc42-Architekturbeschreibung und Nygard-ADRs mit Pugh-Matrix. Code-basierte Aussagen zitieren die `file:line`-Evidenz aus ihrem `[ANSWERED]`-Leaf, team-gegebene Fakten sind mit `(team answer)` markiert. Der Question Tree ist temporäres Gerüst, daher landen Q-IDs nicht in den finalen Dokumenten.
 
 == Wann zu verwenden
 
@@ -87,7 +87,8 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
 
 The skill enforces a two-phase workflow: build a Question Tree first
 ([ANSWERED] with code evidence vs [OPEN] with role), let the team answer
-the OPEN leaves, then synthesize documentation with full Q-ID traceability.
+the OPEN leaves, then synthesize self-contained documentation that traces
+every claim to code evidence or a team answer.
 ----
 
 === link:https://github.com/google-gemini/gemini-cli[Gemini CLI]
@@ -105,7 +106,9 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
 Build a Question Tree before writing any documentation. Mark each leaf
 [ANSWERED] (with file:line evidence) or [OPEN] (with Category and Ask role).
 Synthesize docs from the answered tree only after the team has filled in
-the OPEN leaves. Cite Q-IDs in every claim.
+the OPEN leaves. The docs must be self-contained: cite file:line evidence
+for code-derived claims, mark team input with (team answer). Q-IDs stay
+out of the output.
 ----
 
 === link:https://docs.cursor.com/[Cursor]
@@ -138,8 +141,9 @@ Recovery workflow at
 https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
 
 Two phases: first a Question Tree separating code-derivable facts from
-open questions routed by role; second, synthesis with Q-ID traceability
-after the team fills the gaps.
+open questions routed by role; second, synthesis into self-contained
+documentation — code-evidenced or team-answered — after the team fills
+the gaps.
 ----
 
 === link:https://kiro.dev/[Amazon Kiro]
diff --git a/plugins/semantic-anchors/skills/socratic-code-theory-recovery/SKILL.md b/plugins/semantic-anchors/skills/socratic-code-theory-recovery/SKILL.md
diff --git a/plugins/semantic-anchors/skills/socratic-code-theory-recovery/prompts/phase-2-synthesize.md b/plugins/semantic-anchors/skills/socratic-code-theory-recovery/prompts/phase-2-synthesize.md
diff --git a/plugins/semantic-anchors/skills/socratic-code-theory-recovery/references/output-schema.md b/plugins/semantic-anchors/skills/socratic-code-theory-recovery/references/output-schema.md
diff --git a/skill/socratic-code-theory-recovery/SKILL.md b/skill/socratic-code-theory-recovery/SKILL.md
@@ -65,7 +65,7 @@ The fix: model the gaps explicitly. Every question about the system is either `[
                   ┌────────────────────────────────┐
    Phase 2        │  Answered tree ──► Docs         │
                   │  PRD · Cockburn UCs · arc42 ·   │
-                  │  Nygard ADRs (every claim Q-ID) │
+                  │  Nygard ADRs (claims cite code) │
                   └────────────────────────────────┘
 ```
 
@@ -104,7 +104,7 @@ Use [prompts/phase-2-synthesize.md](prompts/phase-2-synthesize.md). The Phase 2
 - **arc42** with all 12 chapters from the Q3 branch
 - **Nygard ADRs** with Pugh Matrix from the Q3.9 branch
 
-Every claim references a Q-ID. Team-supplied information is marked `(team answer)`. This dual traceability — code evidence plus team input — is the difference from a simple reverse-engineering prompt that fills in gaps silently.
+Code-derived claims cite the `file:line` evidence from their `[ANSWERED]` leaf — a reference to the code, the only durable, canonical artifact. Team-supplied information is marked `(team answer)`. The Question Tree is temporary scaffolding, so its Q-IDs are not written into the final documents; during synthesis every claim is still traced back to a leaf as a build-time check. This dual traceability — code evidence plus team input — is the difference from a simple reverse-engineering prompt that fills in gaps silently.
 
 ## What the LLM can and cannot recover
 
diff --git a/skill/socratic-code-theory-recovery/prompts/phase-2-synthesize.md b/skill/socratic-code-theory-recovery/prompts/phase-2-synthesize.md
@@ -50,12 +50,23 @@ Produce four artifacts:
    - Anchor: ADR according to Nygard
 
 Rules for traceability:
-- Every paragraph references the Q-IDs that support it, in square brackets:
-  "The system uses Hexagonal Architecture [Q3.5]."
-- Team-supplied facts get an inline marker: "Sessions expire after 24 hours
-  (team answer, Q3.4.2)."
+- The synthesized documentation must be self-contained. The Question Tree
+  is temporary scaffolding — it is renumbered on every re-run — so Q-IDs
+  must NOT appear in the output. While synthesizing, trace every claim
+  back to a leaf: each claim must come from an [ANSWERED] leaf or an
+  answered [OPEN] leaf. This tracing is a build-time check, not something
+  written into the documents.
+- A claim backed by an [ANSWERED] leaf cites the code evidence from that
+  leaf — the reference to the code, the only durable, canonical artifact:
+  "The system uses Hexagonal Architecture [src/app/Ports.java,
+  src/adapter/JpaOrderRepository.java:30]."
+  Copy the Evidence line verbatim from the leaf; do not invent, shorten,
+  or re-derive file paths. A leaf with no Evidence line is not [ANSWERED]
+  and must not be cited as fact.
+- Team-supplied facts have no code evidence — mark them (team answer):
+  "Sessions expire after 24 hours (team answer)."
 - Deferred questions stay as explicit gaps: "Quality-goal priorities are
-  deferred (Q4.1.deferred) and must be resolved before the next release."
+  deferred and must be resolved before the next release."
 - Do not introduce facts that do not appear in QUESTION_TREE.adoc or
   OPEN_QUESTIONS.adoc. If a Section feels under-specified, leave it
   under-specified — that is signal, not a defect.
diff --git a/skill/socratic-code-theory-recovery/references/output-schema.md b/skill/socratic-code-theory-recovery/references/output-schema.md
@@ -141,13 +141,19 @@ _(write here)_
 
 ## Phase 2 traceability
 
-After Phase 2, every paragraph in the synthesized documentation cites at least one Q-ID:
+The synthesized documentation must be self-contained. The Question Tree is temporary scaffolding — it is renumbered on every re-run — so its Q-IDs are NOT carried into the final documents. During Phase 2, every claim is traced back to a leaf as a build-time check; what gets *written* is the durable reference only:
 
 ```
-The system uses Hexagonal Architecture [Q3.9.HexagonalArchitecture]. Sessions
-expire after 24 hours (team answer, Q3.8.Security.SessionLifetime).
-Quality-goal priorities are deferred (Q4.0.deferred) and must be resolved
+The system uses Hexagonal Architecture [src/app/Ports.java,
+src/adapter/JpaOrderRepository.java:30]. Sessions expire after 24 hours
+(team answer). Quality-goal priorities are deferred and must be resolved
 before the next release.
 ```
 
-This is the auditable trace from documentation back to either code evidence or a team answer. Anything without a Q-ID is invention.
+The three forms are deliberate:
+
+- `[file:line, ...]` — code-derived fact. Copied verbatim from the `Evidence` line of the `[ANSWERED]` leaf; it points at the code, the only canonical, persistent artifact.
+- `(team answer)` — team-supplied fact. No code evidence exists; the marker tells the reader a human asserted this and it must be re-verified with a human, not derived from code.
+- `deferred` — a known gap, stated explicitly, not a fact.
+
+This is the auditable trace: a code-derived claim without its `file:line` evidence is incomplete; a fact that is neither code-evidenced nor marked `(team answer)` is invention.