Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
671 changes: 671 additions & 0 deletions docs/brownfield-experiment-report.adoc

Large diffs are not rendered by default.

127 changes: 127 additions & 0 deletions docs/brownfield-fair-comparison.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
= Fair Comparison: Three Approaches with Team Answers
:toc: left
:toclevels: 3
:sectnums:
:icons: font

== Context

The previous Two-Phase report had a validity problem: the Two-Phase approach received 11 team-answered Open Questions while Direct and Socratic did not. This made the comparison unfair.

To fix this, we ran follow-up prompts on both the Direct and Socratic experiments, providing the same team answers. All three approaches now have identical information. The comparison below measures the value of the *structure* (template-based vs. question-tree vs. two-phase), not the value of the answers.

== Results After Team Answers

[cols="3,2,2,2,2",options="header"]
|===
| Metric | Original | Direct | Socratic | Two-Phase

| Total lines (adoc) | 11,756 | 3,886 | 2,481 | 4,083
| Compression vs. Original | 100% | 33% | 21% | 35%
| ADRs | 5 | 7 | 3 | 5
| ADR topics match Original | — | No | No | *Yes*
| Quality goal priorities | Yes | Yes (6, expanded) | Yes (3, correct) | Yes (3, correct)
| Performance budgets (Ch. 7) | Yes | Yes | Yes | Yes
| Threat model (3 boundaries) | No (separate doc) | *Yes (inline)* | No | No
| Team answer markers | 0 | 26 | 35 | 50
| Q-ID traceability | 0 | 101 | 123 | 109
| Open Questions remaining | — | 0 | 0 | 0
| Competitive context | 4 mentions | 2 | 2 | 2
|===

All three approaches now have performance budgets, quality goal priorities, and zero remaining Open Questions. The differences are structural.

== What Each Approach Does Best

=== Direct: Broadest Coverage

The Direct approach produced the most ADRs (7, including a new ADR-007 for the layout engine created from the team answer) and is the only version that documents the threat model with 3 explicit trust boundaries inline in Chapter 10. It has 101 Q-ID references despite not starting with a Question Tree — the follow-up prompt added them retroactively.

The trade-off: 7 ADRs means 2 extra ADRs that weren't in the Original. The Direct approach *over-generates* when given information — it creates new artifacts rather than just integrating answers.

=== Socratic: Most Efficient

At 2,481 lines (21% of Original), the Socratic approach achieves the highest Q-ID density (123 references) and strong team-answer traceability (35 markers) with the least text. It is the most concise version that still covers all essential content.

The trade-off: only 3 ADRs (the Question Tree identified fewer decision points), and no threat model documentation. The Socratic approach is *selective* — it documents only what the Question Tree covered, and the tree didn't branch into security narrative.

=== Two-Phase: Highest Fidelity

The Two-Phase approach is the only version where the ADR topics match the Original exactly (5 ADRs, correct subjects, correct status including ADR-004 Rejected). It has the most team-answer markers (50) and a resolution log in OPEN_QUESTIONS.adoc mapping each answer to its landing page.

The trade-off: no threat model (same as Socratic), and 35% compression vs. Original is less efficient than Socratic's 21%.

== Structural Differences That Persist

Even with identical information, the three approaches produce structurally different output:

[cols="2,2,2,2",options="header"]
|===
| Dimension | Direct | Socratic | Two-Phase

| ADR generation | Over-generates (7) | Under-generates (3) | Matches Original (5)
| Threat model | Included | Missing | Missing
| Answer integration | Inline updates | Question Tree + inline | Resolution log + inline
| Traceability style | Retroactive Q-IDs | Native Q-IDs | Native Q-IDs + OQ markers
| Volume control | Medium (33%) | Tight (21%) | Medium (35%)
|===

=== Why ADR fidelity differs

The Direct approach sees each team answer as an opportunity to create or expand an artifact. When it received OQ-022 (layout engine rationale), it created a new ADR-007. The Two-Phase approach, guided by OQ-4 ("which ADRs exist?"), already knew there were exactly 5 and stuck to them. The Socratic approach only created ADRs for decisions its Question Tree branched into.

This is the core structural difference: *the Question Tree constrains the output*. Without it, the LLM follows its own judgment about what deserves an ADR. With it, the LLM follows the tree's decomposition.

=== Why the threat model only appears in Direct

The Direct approach received OQ-053 (threat model) as a standalone answer and integrated it into Chapter 10. The Socratic and Two-Phase approaches had equivalent information (OQ-7 / Q-4.7.2) but placed security coverage differently — in quality scenarios rather than as a dedicated threat-model section. This suggests the *placement* of security information is a prompt-design issue, not an information issue. All three have the same facts; only Direct has a named "Threat Model" section.

== Lessons Learned

=== The value of the Question Tree

The Question Tree doesn't just improve honesty (Experiment 1c finding). It also *constrains output fidelity*. The Two-Phase approach matched the Original's ADR structure precisely because Phase 1 asked "which ADRs exist?" and the team answer locked in the 5 topics. Without this constraint, the Direct approach hallucinated 2 extra ADRs.

=== Team answers close the same gaps regardless of approach

All three approaches achieved:

* Zero remaining Open Questions
* Performance budgets in Chapter 7
* Quality goal priorities in Chapter 1
* Correct competitive context in PRD

This confirms that the team answers, not the approach structure, determine information completeness. The structure determines *how well the information is organized and traceable*.

=== Traceability is a function of process, not information

[cols="2,1,1,1",options="header"]
|===
| Traceability type | Direct | Socratic | Two-Phase

| Team answer markers | 26 | 35 | 50
| Q-ID references | 101 | 123 | 109
| Resolution log | No | No | Yes
|===

Two-Phase has the most team-answer markers because the Phase 2 prompt *required* marking every team-provided claim. Socratic has the most Q-IDs because the Question Tree *is* the documentation structure. Direct has fewer of both because traceability was added retroactively, not built into the process.

== Recommendation

[cols="3,2",options="header"]
|===
| Scenario | Recommended Approach

| Quick documentation, no team access | Direct (broadest coverage from code alone)
| Identifying knowledge gaps for team | Socratic Phase 1 (cheapest way to produce targeted questions)
| Production-quality Brownfield docs | Two-Phase (highest ADR fidelity, best traceability)
| Security-critical projects | Direct (only version with inline threat model)
| Maximum conciseness | Socratic (21% of Original, all essentials covered)
|===

For most Brownfield projects preparing for the Dark Factory, the recommended workflow is:

. *Socratic Phase 1* to identify the 10-15 questions the team must answer
. *Team answers* the questions (routed by Ask role)
. *Two-Phase Phase 2* to produce documentation with Q-ID traceability and team-answer markers
. *Direct follow-up* for security-specific sections (threat model, trust boundaries) if needed
4 changes: 2 additions & 2 deletions docs/brownfield-workflow.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -234,5 +234,5 @@ If the system cannot be built or started, you have a different problem -- fix th
* Eric Evans, https://www.domainlanguage.com/ddd/[Domain-Driven Design] -- the foundational work on bounded contexts and strategic design.
* Michael Feathers, _Working Effectively with Legacy Code_ -- techniques for establishing test coverage in systems without tests.
* Peter Naur, "Programming as Theory Building" (1985) -- argues that programming is about building a mental model ("theory") that cannot be fully captured in documentation. Socratic Code Theory Recovery tests this claim in the context of LLM-generated code.
* https://github.com/rdmueller/personalAssistant/blob/main/resources/brownfield-experiment-report.adoc[Brownfield Experiment Report] -- controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.
* https://github.com/rdmueller/personalAssistant/blob/main/resources/brownfield-fair-comparison.adoc[Fair Comparison Report] -- three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.
* link:#/brownfield-experiment-report[Brownfield Experiment Report] -- controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Full methodology and findings.
* link:#/brownfield-fair-comparison[Fair Comparison Report] -- three approaches (Direct, Socratic, Two-Phase) with identical team answers. Measures the structural value of the Question Tree.
14 changes: 14 additions & 0 deletions scripts/prerender-routes.js
Original file line number Diff line number Diff line change
Expand Up @@ -58,6 +58,20 @@ const ROUTES = [
description:
'Applying semantic anchors to brownfield codebases using a bounded-context approach.',
},
{
path: '/brownfield-experiment-report',
fragment: 'docs/brownfield-experiment-report.html',
title: 'Brownfield Experiment 1a Report — Semantic Anchors',
description:
'Controlled experiment: delete documentation from a greenfield project, regenerate from code, compare. Methodology, findings, and the Brownfield Preparation Checklist.',
},
{
path: '/brownfield-fair-comparison',
fragment: 'docs/brownfield-fair-comparison.html',
title: 'Brownfield Fair Comparison — Semantic Anchors',
description:
'Three approaches (Direct, Socratic, Two-Phase) compared with identical team answers. Measures the structural value of the Question Tree, not the answers.',
},
{
path: '/contracts',
fragment: 'docs/contracts.html',
Expand Down
10 changes: 10 additions & 0 deletions scripts/render-docs.js
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,16 @@ renderFile(
path.join(WEB_DOCS, 'brownfield-workflow.de.html')
)

renderFile(
path.join(ROOT, 'docs/brownfield-experiment-report.adoc'),
path.join(WEB_DOCS, 'brownfield-experiment-report.html')
)

renderFile(
path.join(ROOT, 'docs/brownfield-fair-comparison.adoc'),
path.join(WEB_DOCS, 'brownfield-fair-comparison.html')
)

renderFile(
path.join(ROOT, 'docs/anchor-evaluations.adoc'),
path.join(WEB_DOCS, 'anchor-evaluations.html')
Expand Down
24 changes: 24 additions & 0 deletions website/src/main.js
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,8 @@ function initApp() {
addRoute('/spec-driven-development', renderWorkflowPage)
addRoute('/workflow', () => navigate('/spec-driven-development', { replace: true }))
addRoute('/brownfield', renderBrownfieldPage)
addRoute('/brownfield-experiment-report', renderBrownfieldExperimentReportPage)
addRoute('/brownfield-fair-comparison', renderBrownfieldFairComparisonPage)
addRoute('/contracts', renderContractsPageHandler)
addRoute('/evaluations', renderEvaluationsPage)

Expand Down Expand Up @@ -277,6 +279,24 @@ function renderBrownfieldPage() {
loadDocContent('docs/brownfield-workflow.adoc')
}

function renderBrownfieldExperimentReportPage() {
const pageContent = document.getElementById('page-content')
if (!pageContent) return

pageContent.innerHTML = renderDocPage()
updateActiveNavLink()
loadDocContent('docs/brownfield-experiment-report.adoc')
}

function renderBrownfieldFairComparisonPage() {
const pageContent = document.getElementById('page-content')
if (!pageContent) return

pageContent.innerHTML = renderDocPage()
updateActiveNavLink()
loadDocContent('docs/brownfield-fair-comparison.adoc')
}

function renderContractsPageHandler() {
const pageContent = document.getElementById('page-content')
if (!pageContent) return
Expand Down Expand Up @@ -504,6 +524,10 @@ function handleLanguageChange() {
loadDocContent('docs/spec-driven-workflow.adoc')
} else if (currentRoute === '/brownfield') {
loadDocContent('docs/brownfield-workflow.adoc')
} else if (currentRoute === '/brownfield-experiment-report') {
loadDocContent('docs/brownfield-experiment-report.adoc')
} else if (currentRoute === '/brownfield-fair-comparison') {
loadDocContent('docs/brownfield-fair-comparison.adoc')
} else if (currentRoute === '/') {
initCardGridVisualization()
}
Expand Down
2 changes: 2 additions & 0 deletions website/src/utils/router.js
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@ const ROUTE_TITLES = {
'/contracts': 'Semantic Contracts — Semantic Anchors',
'/spec-driven-development': 'Spec-Driven Development with Semantic Anchors',
'/brownfield': 'Brownfield Workflow — Semantic Anchors',
'/brownfield-experiment-report': 'Brownfield Experiment 1a Report — Semantic Anchors',
'/brownfield-fair-comparison': 'Brownfield Fair Comparison — Semantic Anchors',
'/evaluations': 'Evaluations — Semantic Anchors',
'/contributing': 'Contributing — Semantic Anchors',
'/changelog': 'Changelog — Semantic Anchors',
Expand Down
Loading