Skip to content

Commit 2acdc80

Browse files
authored
Merge pull request #464 from raifdmueller/main
Sync from fork: host Brownfield Experiment & Fair Comparison reports on-site
2 parents 4cd2da3 + a1ce7f6 commit 2acdc80

7 files changed

Lines changed: 850 additions & 2 deletions

File tree

docs/brownfield-experiment-report.adoc

Lines changed: 671 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,127 @@
1+
= Fair Comparison: Three Approaches with Team Answers
2+
:toc: left
3+
:toclevels: 3
4+
:sectnums:
5+
:icons: font
6+
7+
== Context
8+
9+
The previous Two-Phase report had a validity problem: the Two-Phase approach received 11 team-answered Open Questions while Direct and Socratic did not. This made the comparison unfair.
10+
11+
To fix this, we ran follow-up prompts on both the Direct and Socratic experiments, providing the same team answers. All three approaches now have identical information. The comparison below measures the value of the *structure* (template-based vs. question-tree vs. two-phase), not the value of the answers.
12+
13+
== Results After Team Answers
14+
15+
[cols="3,2,2,2,2",options="header"]
16+
|===
17+
| Metric | Original | Direct | Socratic | Two-Phase
18+
19+
| Total lines (adoc) | 11,756 | 3,886 | 2,481 | 4,083
20+
| Compression vs. Original | 100% | 33% | 21% | 35%
21+
| ADRs | 5 | 7 | 3 | 5
22+
| ADR topics match Original | — | No | No | *Yes*
23+
| Quality goal priorities | Yes | Yes (6, expanded) | Yes (3, correct) | Yes (3, correct)
24+
| Performance budgets (Ch. 7) | Yes | Yes | Yes | Yes
25+
| Threat model (3 boundaries) | No (separate doc) | *Yes (inline)* | No | No
26+
| Team answer markers | 0 | 26 | 35 | 50
27+
| Q-ID traceability | 0 | 101 | 123 | 109
28+
| Open Questions remaining | — | 0 | 0 | 0
29+
| Competitive context | 4 mentions | 2 | 2 | 2
30+
|===
31+
32+
All three approaches now have performance budgets, quality goal priorities, and zero remaining Open Questions. The differences are structural.
33+
34+
== What Each Approach Does Best
35+
36+
=== Direct: Broadest Coverage
37+
38+
The Direct approach produced the most ADRs (7, including a new ADR-007 for the layout engine created from the team answer) and is the only version that documents the threat model with 3 explicit trust boundaries inline in Chapter 10. It has 101 Q-ID references despite not starting with a Question Tree — the follow-up prompt added them retroactively.
39+
40+
The trade-off: 7 ADRs means 2 extra ADRs that weren't in the Original. The Direct approach *over-generates* when given information — it creates new artifacts rather than just integrating answers.
41+
42+
=== Socratic: Most Efficient
43+
44+
At 2,481 lines (21% of Original), the Socratic approach achieves the highest Q-ID density (123 references) and strong team-answer traceability (35 markers) with the least text. It is the most concise version that still covers all essential content.
45+
46+
The trade-off: only 3 ADRs (the Question Tree identified fewer decision points), and no threat model documentation. The Socratic approach is *selective* — it documents only what the Question Tree covered, and the tree didn't branch into security narrative.
47+
48+
=== Two-Phase: Highest Fidelity
49+
50+
The Two-Phase approach is the only version where the ADR topics match the Original exactly (5 ADRs, correct subjects, correct status including ADR-004 Rejected). It has the most team-answer markers (50) and a resolution log in OPEN_QUESTIONS.adoc mapping each answer to its landing page.
51+
52+
The trade-off: no threat model (same as Socratic), and 35% compression vs. Original is less efficient than Socratic's 21%.
53+
54+
== Structural Differences That Persist
55+
56+
Even with identical information, the three approaches produce structurally different output:
57+
58+
[cols="2,2,2,2",options="header"]
59+
|===
60+
| Dimension | Direct | Socratic | Two-Phase
61+
62+
| ADR generation | Over-generates (7) | Under-generates (3) | Matches Original (5)
63+
| Threat model | Included | Missing | Missing
64+
| Answer integration | Inline updates | Question Tree + inline | Resolution log + inline
65+
| Traceability style | Retroactive Q-IDs | Native Q-IDs | Native Q-IDs + OQ markers
66+
| Volume control | Medium (33%) | Tight (21%) | Medium (35%)
67+
|===
68+
69+
=== Why ADR fidelity differs
70+
71+
The Direct approach sees each team answer as an opportunity to create or expand an artifact. When it received OQ-022 (layout engine rationale), it created a new ADR-007. The Two-Phase approach, guided by OQ-4 ("which ADRs exist?"), already knew there were exactly 5 and stuck to them. The Socratic approach only created ADRs for decisions its Question Tree branched into.
72+
73+
This is the core structural difference: *the Question Tree constrains the output*. Without it, the LLM follows its own judgment about what deserves an ADR. With it, the LLM follows the tree's decomposition.
74+
75+
=== Why the threat model only appears in Direct
76+
77+
The Direct approach received OQ-053 (threat model) as a standalone answer and integrated it into Chapter 10. The Socratic and Two-Phase approaches had equivalent information (OQ-7 / Q-4.7.2) but placed security coverage differently — in quality scenarios rather than as a dedicated threat-model section. This suggests the *placement* of security information is a prompt-design issue, not an information issue. All three have the same facts; only Direct has a named "Threat Model" section.
78+
79+
== Lessons Learned
80+
81+
=== The value of the Question Tree
82+
83+
The Question Tree doesn't just improve honesty (Experiment 1c finding). It also *constrains output fidelity*. The Two-Phase approach matched the Original's ADR structure precisely because Phase 1 asked "which ADRs exist?" and the team answer locked in the 5 topics. Without this constraint, the Direct approach hallucinated 2 extra ADRs.
84+
85+
=== Team answers close the same gaps regardless of approach
86+
87+
All three approaches achieved:
88+
89+
* Zero remaining Open Questions
90+
* Performance budgets in Chapter 7
91+
* Quality goal priorities in Chapter 1
92+
* Correct competitive context in PRD
93+
94+
This confirms that the team answers, not the approach structure, determine information completeness. The structure determines *how well the information is organized and traceable*.
95+
96+
=== Traceability is a function of process, not information
97+
98+
[cols="2,1,1,1",options="header"]
99+
|===
100+
| Traceability type | Direct | Socratic | Two-Phase
101+
102+
| Team answer markers | 26 | 35 | 50
103+
| Q-ID references | 101 | 123 | 109
104+
| Resolution log | No | No | Yes
105+
|===
106+
107+
Two-Phase has the most team-answer markers because the Phase 2 prompt *required* marking every team-provided claim. Socratic has the most Q-IDs because the Question Tree *is* the documentation structure. Direct has fewer of both because traceability was added retroactively, not built into the process.
108+
109+
== Recommendation
110+
111+
[cols="3,2",options="header"]
112+
|===
113+
| Scenario | Recommended Approach
114+
115+
| Quick documentation, no team access | Direct (broadest coverage from code alone)
116+
| Identifying knowledge gaps for team | Socratic Phase 1 (cheapest way to produce targeted questions)
117+
| Production-quality Brownfield docs | Two-Phase (highest ADR fidelity, best traceability)
118+
| Security-critical projects | Direct (only version with inline threat model)
119+
| Maximum conciseness | Socratic (21% of Original, all essentials covered)
120+
|===
121+
122+
For most Brownfield projects preparing for the Dark Factory, the recommended workflow is:
123+
124+
. *Socratic Phase 1* to identify the 10-15 questions the team must answer
125+
. *Team answers* the questions (routed by Ask role)
126+
. *Two-Phase Phase 2* to produce documentation with Q-ID traceability and team-answer markers
127+
. *Direct follow-up* for security-specific sections (threat model, trust boundaries) if needed

docs/brownfield-workflow.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -234,5 +234,5 @@ If the system cannot be built or started, you have a different problem -- fix th
234234
* Eric Evans, https://www.domainlanguage.com/ddd/[Domain-Driven Design] -- the foundational work on bounded contexts and strategic design.
235235
* Michael Feathers, _Working Effectively with Legacy Code_ -- techniques for establishing test coverage in systems without tests.
236236
* Peter Naur, "Programming as Theory Building" (1985) -- argues that programming is about building a mental model ("theory") that cannot be fully captured in documentation. Socratic Code Theory Recovery tests this claim in the context of LLM-generated code.
237-
* https://github.com/rdmueller/personalAssistant/blob/main/resources/brownfield-experiment-report.adoc[Brownfield Experiment Report] -- controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.
238-
* https://github.com/rdmueller/personalAssistant/blob/main/resources/brownfield-fair-comparison.adoc[Fair Comparison Report] -- three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.
237+
* link:#/brownfield-experiment-report[Brownfield Experiment Report] -- controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.
238+
* link:#/brownfield-fair-comparison[Fair Comparison Report] -- three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.

scripts/prerender-routes.js

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -58,6 +58,20 @@ const ROUTES = [
5858
description:
5959
'Applying semantic anchors to brownfield codebases using a bounded-context approach.',
6060
},
61+
{
62+
path: '/brownfield-experiment-report',
63+
fragment: 'docs/brownfield-experiment-report.html',
64+
title: 'Brownfield Experiment 1a Report — Semantic Anchors',
65+
description:
66+
'Controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Methodology, findings, and the Brownfield Preparation Checklist.',
67+
},
68+
{
69+
path: '/brownfield-fair-comparison',
70+
fragment: 'docs/brownfield-fair-comparison.html',
71+
title: 'Brownfield Fair Comparison — Semantic Anchors',
72+
description:
73+
'Three approaches (Direct, Socratic, Two-Phase) compared with identical team answers. Measures the structural value of the Question Tree, not the answers.',
74+
},
6175
{
6276
path: '/contracts',
6377
fragment: 'docs/contracts.html',

scripts/render-docs.js

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,16 @@ renderFile(
9393
path.join(WEB_DOCS, 'brownfield-workflow.de.html')
9494
)
9595

96+
renderFile(
97+
path.join(ROOT, 'docs/brownfield-experiment-report.adoc'),
98+
path.join(WEB_DOCS, 'brownfield-experiment-report.html')
99+
)
100+
101+
renderFile(
102+
path.join(ROOT, 'docs/brownfield-fair-comparison.adoc'),
103+
path.join(WEB_DOCS, 'brownfield-fair-comparison.html')
104+
)
105+
96106
renderFile(
97107
path.join(ROOT, 'docs/anchor-evaluations.adoc'),
98108
path.join(WEB_DOCS, 'anchor-evaluations.html')

website/src/main.js

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,8 @@ function initApp() {
149149
addRoute('/spec-driven-development', renderWorkflowPage)
150150
addRoute('/workflow', () => navigate('/spec-driven-development', { replace: true }))
151151
addRoute('/brownfield', renderBrownfieldPage)
152+
addRoute('/brownfield-experiment-report', renderBrownfieldExperimentReportPage)
153+
addRoute('/brownfield-fair-comparison', renderBrownfieldFairComparisonPage)
152154
addRoute('/contracts', renderContractsPageHandler)
153155
addRoute('/evaluations', renderEvaluationsPage)
154156

@@ -277,6 +279,24 @@ function renderBrownfieldPage() {
277279
loadDocContent('docs/brownfield-workflow.adoc')
278280
}
279281

282+
function renderBrownfieldExperimentReportPage() {
283+
const pageContent = document.getElementById('page-content')
284+
if (!pageContent) return
285+
286+
pageContent.innerHTML = renderDocPage()
287+
updateActiveNavLink()
288+
loadDocContent('docs/brownfield-experiment-report.adoc')
289+
}
290+
291+
function renderBrownfieldFairComparisonPage() {
292+
const pageContent = document.getElementById('page-content')
293+
if (!pageContent) return
294+
295+
pageContent.innerHTML = renderDocPage()
296+
updateActiveNavLink()
297+
loadDocContent('docs/brownfield-fair-comparison.adoc')
298+
}
299+
280300
function renderContractsPageHandler() {
281301
const pageContent = document.getElementById('page-content')
282302
if (!pageContent) return
@@ -504,6 +524,10 @@ function handleLanguageChange() {
504524
loadDocContent('docs/spec-driven-workflow.adoc')
505525
} else if (currentRoute === '/brownfield') {
506526
loadDocContent('docs/brownfield-workflow.adoc')
527+
} else if (currentRoute === '/brownfield-experiment-report') {
528+
loadDocContent('docs/brownfield-experiment-report.adoc')
529+
} else if (currentRoute === '/brownfield-fair-comparison') {
530+
loadDocContent('docs/brownfield-fair-comparison.adoc')
507531
} else if (currentRoute === '/') {
508532
initCardGridVisualization()
509533
}

website/src/utils/router.js

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ const ROUTE_TITLES = {
1818
'/contracts': 'Semantic Contracts — Semantic Anchors',
1919
'/spec-driven-development': 'Spec-Driven Development with Semantic Anchors',
2020
'/brownfield': 'Brownfield Workflow — Semantic Anchors',
21+
'/brownfield-experiment-report': 'Brownfield Experiment 1a Report — Semantic Anchors',
22+
'/brownfield-fair-comparison': 'Brownfield Fair Comparison — Semantic Anchors',
2123
'/evaluations': 'Evaluations — Semantic Anchors',
2224
'/contributing': 'Contributing — Semantic Anchors',
2325
'/changelog': 'Changelog — Semantic Anchors',

0 commit comments

Comments
 (0)