Skip to content

Commit 00aa614

Browse files
authored
Merge pull request #491 from raifdmueller/fix/phase2-code-evidence-citations
fix(socratic-recovery): make synthesized docs self-contained, cite code not Q-IDs
2 parents e9b2066 + 7e79cef commit 00aa614

10 files changed

Lines changed: 80 additions & 38 deletions

File tree

docs/brownfield-workflow.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -163,7 +163,7 @@ The LLM synthesizes the answered questions plus the code evidence from Phase 1 i
163163
* *arc42* with all 12 chapters from Q-3 branch
164164
* *Nygard ADRs* with Pugh Matrix from Q-3.9 branch
165165

166-
Every claim references a Question ID and marks team-provided information with `(team answer)`. This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.
166+
Code-derived claims carry the `file:line` evidence from their `[ANSWERED]` leaf — a reference to the code, the only durable artifact; team-provided information is marked `(team answer)`. The Question Tree is temporary scaffolding, so Q-IDs are not written into the final documents — during synthesis every claim is traced back to a leaf as a build-time check. This dual traceability (code evidence + team input) is the key difference from a simple reverse-engineering prompt.
167167

168168
=== Establish Baseline Tests
169169

@@ -261,7 +261,7 @@ Stable code that nobody touches does not need specs.
261261
|{empty}--
262262

263263
|Theory Recovery (Phase 2)
264-
|`Synthesize documentation from the Question Tree and team answers. Every claim references a Q-ID. Mark team input with (team answer).`
264+
|`Synthesize self-contained documentation from the Question Tree and team answers. Cite file:line evidence for code-derived claims, mark team input with (team answer), keep deferred questions as explicit gaps. Q-IDs stay out of the output.`
265265
|link:#/spec-driven-development[Spec-Driven Workflow]
266266

267267
|Baseline Tests

docs/brownfield-workflow.de.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@ Das LLM synthetisiert die beantworteten Fragen plus die Code-Evidenz aus Phase 1
161161
* *arc42* mit allen 12 Kapiteln aus dem Q3-Ast
162162
* *Nygard-ADRs* mit Pugh-Matrix aus dem Q3.9-Ast
163163

164-
Jede Aussage referenziert eine Question-ID und markiert teamgegebene Information mit `(team answer)`. Diese doppelte Rückverfolgbarkeit (Code-Evidenz + Team-Input) ist der entscheidende Unterschied zu einem einfachen Reverse-Engineering-Prompt.
164+
Code-basierte Aussagen tragen die `file:line`-Evidenz aus ihrem `[ANSWERED]`-Leaf — eine Referenz auf den Code, das einzige dauerhafte Artefakt; teamgegebene Information wird mit `(team answer)` markiert. Der Question Tree ist temporäres Gerüst, daher landen Q-IDs nicht in den finalen Dokumenten — beim Synthetisieren wird jede Aussage als Build-Time-Prüfung auf ein Leaf zurückgeführt. Diese doppelte Rückverfolgbarkeit (Code-Evidenz + Team-Input) ist der entscheidende Unterschied zu einem einfachen Reverse-Engineering-Prompt.
165165

166166
=== Basis-Tests aufbauen
167167

@@ -259,7 +259,7 @@ Stabiler Code, den niemand anfasst, braucht keine Specs.
259259
|{empty}--
260260

261261
|Theory Recovery (Phase 2)
262-
|`Synthesize documentation from the Question Tree and team answers. Every claim references a Q-ID. Mark team input with (team answer).`
262+
|`Synthesize self-contained documentation from the Question Tree and team answers. Cite file:line evidence for code-derived claims, mark team input with (team answer), keep deferred questions as explicit gaps. Q-IDs stay out of the output.`
263263
|link:#/spec-driven-development[Spec-Driven Workflow]
264264

265265
|Basis-Tests

docs/socratic-recovery-skill.adoc

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Outputs two AsciiDoc files: `QUESTION_TREE.adoc` (full reasoning trace) and `OPE
2222

2323
=== Phase 2 — Synthesize documentation
2424

25-
The skill takes the answered tree and produces a PRD, Cockburn use cases, an arc42 architecture document, and Nygard ADRs with Pugh matrices. Every claim cites a Q-ID; team-supplied facts are marked `(team answer)`.
25+
The skill takes the answered tree and produces a PRD, Cockburn use cases, an arc42 architecture document, and Nygard ADRs with Pugh matrices. Code-derived claims cite the `file:line` evidence from their `[ANSWERED]` leaf, and team-supplied facts are marked `(team answer)`. The Question Tree is temporary scaffolding, so Q-IDs stay out of the final documents.
2626

2727
== When to use it
2828

@@ -87,7 +87,8 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
8787
8888
The skill enforces a two-phase workflow: build a Question Tree first
8989
([ANSWERED] with code evidence vs [OPEN] with role), let the team answer
90-
the OPEN leaves, then synthesize documentation with full Q-ID traceability.
90+
the OPEN leaves, then synthesize self-contained documentation that traces
91+
every claim to code evidence or a team answer.
9192
----
9293

9394
=== link:https://github.com/google-gemini/gemini-cli[Gemini CLI]
@@ -105,7 +106,9 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
105106
Build a Question Tree before writing any documentation. Mark each leaf
106107
[ANSWERED] (with file:line evidence) or [OPEN] (with Category and Ask role).
107108
Synthesize docs from the answered tree only after the team has filled in
108-
the OPEN leaves. Cite Q-IDs in every claim.
109+
the OPEN leaves. The docs must be self-contained: cite file:line evidence
110+
for code-derived claims, mark team input with (team answer). Q-IDs stay
111+
out of the output.
109112
----
110113

111114
=== link:https://docs.cursor.com/[Cursor]
@@ -138,8 +141,9 @@ Recovery workflow at
138141
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
139142
140143
Two phases: first a Question Tree separating code-derivable facts from
141-
open questions routed by role; second, synthesis with Q-ID traceability
142-
after the team fills the gaps.
144+
open questions routed by role; second, synthesis into self-contained
145+
documentation — code-evidenced or team-answered — after the team fills
146+
the gaps.
143147
----
144148

145149
=== link:https://kiro.dev/[Amazon Kiro]

docs/socratic-recovery-skill.de.adoc

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Output sind zwei AsciiDoc-Dateien: `QUESTION_TREE.adoc` (vollständige Begründu
2222

2323
=== Phase 2 — Dokumentation synthetisieren
2424

25-
Der Skill nimmt den beantworteten Baum und erzeugt ein PRD, Cockburn Use Cases, eine arc42-Architekturbeschreibung und Nygard-ADRs mit Pugh-Matrix. Jede Aussage zitiert eine Q-ID; team-gegebene Fakten sind mit `(team answer)` markiert.
25+
Der Skill nimmt den beantworteten Baum und erzeugt ein PRD, Cockburn Use Cases, eine arc42-Architekturbeschreibung und Nygard-ADRs mit Pugh-Matrix. Code-basierte Aussagen zitieren die `file:line`-Evidenz aus ihrem `[ANSWERED]`-Leaf, team-gegebene Fakten sind mit `(team answer)` markiert. Der Question Tree ist temporäres Gerüst, daher landen Q-IDs nicht in den finalen Dokumenten.
2626

2727
== Wann zu verwenden
2828

@@ -87,7 +87,8 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
8787
8888
The skill enforces a two-phase workflow: build a Question Tree first
8989
([ANSWERED] with code evidence vs [OPEN] with role), let the team answer
90-
the OPEN leaves, then synthesize documentation with full Q-ID traceability.
90+
the OPEN leaves, then synthesize self-contained documentation that traces
91+
every claim to code evidence or a team answer.
9192
----
9293

9394
=== link:https://github.com/google-gemini/gemini-cli[Gemini CLI]
@@ -105,7 +106,9 @@ https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-the
105106
Build a Question Tree before writing any documentation. Mark each leaf
106107
[ANSWERED] (with file:line evidence) or [OPEN] (with Category and Ask role).
107108
Synthesize docs from the answered tree only after the team has filled in
108-
the OPEN leaves. Cite Q-IDs in every claim.
109+
the OPEN leaves. The docs must be self-contained: cite file:line evidence
110+
for code-derived claims, mark team input with (team answer). Q-IDs stay
111+
out of the output.
109112
----
110113

111114
=== link:https://docs.cursor.com/[Cursor]
@@ -138,8 +141,9 @@ Recovery workflow at
138141
https://github.com/LLM-Coding/Semantic-Anchors/tree/main/skill/socratic-code-theory-recovery
139142
140143
Two phases: first a Question Tree separating code-derivable facts from
141-
open questions routed by role; second, synthesis with Q-ID traceability
142-
after the team fills the gaps.
144+
open questions routed by role; second, synthesis into self-contained
145+
documentation — code-evidenced or team-answered — after the team fills
146+
the gaps.
143147
----
144148

145149
=== link:https://kiro.dev/[Amazon Kiro]

plugins/semantic-anchors/skills/socratic-code-theory-recovery/SKILL.md

Lines changed: 2 additions & 2 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

plugins/semantic-anchors/skills/socratic-code-theory-recovery/prompts/phase-2-synthesize.md

Lines changed: 16 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

plugins/semantic-anchors/skills/socratic-code-theory-recovery/references/output-schema.md

Lines changed: 11 additions & 5 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

skill/socratic-code-theory-recovery/SKILL.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ The fix: model the gaps explicitly. Every question about the system is either `[
6565
┌────────────────────────────────┐
6666
Phase 2 │ Answered tree ──► Docs │
6767
│ PRD · Cockburn UCs · arc42 · │
68-
│ Nygard ADRs (every claim Q-ID) │
68+
│ Nygard ADRs (claims cite code) │
6969
└────────────────────────────────┘
7070
```
7171

@@ -104,7 +104,7 @@ Use [prompts/phase-2-synthesize.md](prompts/phase-2-synthesize.md). The Phase 2
104104
- **arc42** with all 12 chapters from the Q3 branch
105105
- **Nygard ADRs** with Pugh Matrix from the Q3.9 branch
106106

107-
Every claim references a Q-ID. Team-supplied information is marked `(team answer)`. This dual traceability — code evidence plus team input — is the difference from a simple reverse-engineering prompt that fills in gaps silently.
107+
Code-derived claims cite the `file:line` evidence from their `[ANSWERED]` leaf — a reference to the code, the only durable, canonical artifact. Team-supplied information is marked `(team answer)`. The Question Tree is temporary scaffolding, so its Q-IDs are not written into the final documents; during synthesis every claim is still traced back to a leaf as a build-time check. This dual traceability — code evidence plus team input — is the difference from a simple reverse-engineering prompt that fills in gaps silently.
108108

109109
## What the LLM can and cannot recover
110110

skill/socratic-code-theory-recovery/prompts/phase-2-synthesize.md

Lines changed: 16 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -50,12 +50,23 @@ Produce four artifacts:
5050
- Anchor: ADR according to Nygard
5151
5252
Rules for traceability:
53-
- Every paragraph references the Q-IDs that support it, in square brackets:
54-
"The system uses Hexagonal Architecture [Q3.5]."
55-
- Team-supplied facts get an inline marker: "Sessions expire after 24 hours
56-
(team answer, Q3.4.2)."
53+
- The synthesized documentation must be self-contained. The Question Tree
54+
is temporary scaffolding — it is renumbered on every re-run — so Q-IDs
55+
must NOT appear in the output. While synthesizing, trace every claim
56+
back to a leaf: each claim must come from an [ANSWERED] leaf or an
57+
answered [OPEN] leaf. This tracing is a build-time check, not something
58+
written into the documents.
59+
- A claim backed by an [ANSWERED] leaf cites the code evidence from that
60+
leaf — the reference to the code, the only durable, canonical artifact:
61+
"The system uses Hexagonal Architecture [src/app/Ports.java,
62+
src/adapter/JpaOrderRepository.java:30]."
63+
Copy the Evidence line verbatim from the leaf; do not invent, shorten,
64+
or re-derive file paths. A leaf with no Evidence line is not [ANSWERED]
65+
and must not be cited as fact.
66+
- Team-supplied facts have no code evidence — mark them (team answer):
67+
"Sessions expire after 24 hours (team answer)."
5768
- Deferred questions stay as explicit gaps: "Quality-goal priorities are
58-
deferred (Q4.1.deferred) and must be resolved before the next release."
69+
deferred and must be resolved before the next release."
5970
- Do not introduce facts that do not appear in QUESTION_TREE.adoc or
6071
OPEN_QUESTIONS.adoc. If a Section feels under-specified, leave it
6172
under-specified — that is signal, not a defect.

skill/socratic-code-theory-recovery/references/output-schema.md

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -141,13 +141,19 @@ _(write here)_
141141

142142
## Phase 2 traceability
143143

144-
After Phase 2, every paragraph in the synthesized documentation cites at least one Q-ID:
144+
The synthesized documentation must be self-contained. The Question Tree is temporary scaffolding — it is renumbered on every re-run — so its Q-IDs are NOT carried into the final documents. During Phase 2, every claim is traced back to a leaf as a build-time check; what gets *written* is the durable reference only:
145145

146146
```
147-
The system uses Hexagonal Architecture [Q3.9.HexagonalArchitecture]. Sessions
148-
expire after 24 hours (team answer, Q3.8.Security.SessionLifetime).
149-
Quality-goal priorities are deferred (Q4.0.deferred) and must be resolved
147+
The system uses Hexagonal Architecture [src/app/Ports.java,
148+
src/adapter/JpaOrderRepository.java:30]. Sessions expire after 24 hours
149+
(team answer). Quality-goal priorities are deferred and must be resolved
150150
before the next release.
151151
```
152152

153-
This is the auditable trace from documentation back to either code evidence or a team answer. Anything without a Q-ID is invention.
153+
The three forms are deliberate:
154+
155+
- `[file:line, ...]` — code-derived fact. Copied verbatim from the `Evidence` line of the `[ANSWERED]` leaf; it points at the code, the only canonical, persistent artifact.
156+
- `(team answer)` — team-supplied fact. No code evidence exists; the marker tells the reader a human asserted this and it must be re-verified with a human, not derived from code.
157+
- `deferred` — a known gap, stated explicitly, not a fact.
158+
159+
This is the auditable trace: a code-derived claim without its `file:line` evidence is incomplete; a fact that is neither code-evidenced nor marked `(team answer)` is invention.

0 commit comments

Comments
 (0)