Context
We integrated MDD into a real product project on 2026-03-10 (based on the worktree-support version of /mdd) and have been using it daily for ~5 weeks on a test-management tool for SAP rollouts (React 19 + Express + Supabase, pnpm monorepo).
First: thank you. The workflow has genuinely changed how we build — docs-before-code stuck, and the .mdd/docs/ tree is now the living spec. We also noticed your recent additions on 2026-04-12 (scan / update / deprecate / reverse-engineer / graph modes + the data flow analysis phase) and plan to pull those into our fork. The proposals below are orthogonal to those additions — they came out of daily use on the parts of the workflow that existed on 2026-03-10.
Happy to send focused PRs for any of the below if useful — all are additive, no breaking changes.
Tier 1 — TDD loop gates (the strongest case)
These three fix concrete fail-modes we hit repeatedly before adding them.
1. Red Gate — verify skeletons actually fail before implementing
Problem. Phase 4 (skeleton generation) → Phase 6 (implement) currently has no check that the generated skeletons are actually red. We hit two fail-modes:
- A skeleton with no real assertion (e.g., an empty
it that doesn't call expect.fail) passes vacuously — implementation "succeeds" against nothing.
- Pre-existing code occasionally already satisfies a skeleton, so the test passes green from the start and the "TDD" phase is fiction.
Fix. Add a mandatory Phase 4b between skeletons and implementation: run the new test file(s), assert that every new test fails, report 🔴 Red Gate: N/N failing (expected). Don't proceed until red.
Our implementation:
**Phase 3b — Red Gate (mandatory)**
Run all newly created test skeletons to confirm they fail:
pnpm --filter backend run test -- <test-file-path>
Every single skeleton test MUST fail. If any test passes unexpectedly,
investigate — either the skeleton has no assertion or there's pre-existing
code that already satisfies it. Fix the skeleton or adjust scope.
Report:
🔴 Red Gate: <N>/<N> tests failing (expected)
All skeletons confirmed RED — ready to implement.
Do NOT proceed to Phase 4 until the Red Gate passes.
2. Green Gate — iterative diagnosis-first loop with a hard cap
Problem. Phase 6 runs pnpm typecheck + pnpm test:unit once. On failure there's no recovery strategy, which in practice becomes an uncapped blind-retry loop: the agent re-runs tests, tries a guess, re-runs, tries again, runs out of context.
Fix. A spec-driven loop capped at 5 iterations, each enforcing a diagnosis step before the fix:
- Run feature tests +
tsc --noEmit
- Diagnose — exact error? Which implementation assumption was wrong? Known pattern (check
.claude/learnings.md)? What is the one targeted fix?
- Fix — adjust implementation, not the test. If the test seems wrong → re-read the doc. If the doc is wrong → pause, ask the user.
- Report — one-line root cause + one-line fix summary per iteration.
- Repeat until green.
Exit conditions:
- All feature tests + tsc green → proceed to regression check (full suite; failures count against the same 5-iteration budget).
- After 5 iterations still failing → stop, report remaining failures with diagnoses to the user. Do not keep trying.
This catches runaway loops and — importantly — forces the agent to state why each fix was chosen, which improves quality.
3. Integration happy-path hard gate before reporting "done"
Problem. Phase 7 treats unit tests green + typecheck clean as "MDD Complete". This is the biggest gap we hit — several times the full unit suite was green, but against the real DB / real external API the feature didn't actually work (classic mock/prod divergence, or the UI page never actually rendered the new data). The "✅ MDD Complete" report created false confidence.
Fix. Phase 7 must include an explicit integration step:
- Backend features: trigger the full flow against the real DB / API (not just individual endpoints). Actively watch backend logs during the run. Any rate anomaly or unexpected error pattern is an immediate log-check trigger.
- Frontend features: click through the actual user flow in the browser, open the target page, visually verify the expected data appears.
- DB features: check the rows actually written/read via SQL, not "insert returned no error".
Key rule: external conditions ("API offline", "slow", "missing test data") are not an excuse to mark the feature done. They are hypotheses that need to be empirically proven before being accepted as blockers. Default assumption on any external failure: my code is wrong until proven otherwise.
If integration is not verified, the report is not "✅ MDD Complete" — it's "⏸️ MDD Blocked on condition X" with a concrete next action for the user.
Tier 2 — Structural improvements
4. Block structure instead of flat steps (Phase 5)
Currently Phase 5 presents a flat Step 1 (Types), Step 2 (Handler), ... list. We replaced that with blocks — a block is a unit of work that satisfies three qualitative criteria:
- Runnable end-state — after the block, code compiles, tests green, no half-open interfaces
- Commit-worthy scope — a clear "why" for a standalone commit message
- Own verification — a concrete test/check command that proves "done"
Each block is labeled small / medium / large as a sanity check. large requires a justification why no split is sensible (typically: shared type contract forces backend + frontend into one block). If the justification doesn't hold → split.
Each block also includes a handoff contract — the interface/assumption the next block depends on.
Benefit: better commit hygiene, better abort points for long features, much less "half-done" state mid-session. For very small features, the rule is "one block, don't over-structure".
5. Parallel agents with model-per-task routing
Currently the /mdd flow runs sequentially with an implicit single model. We get measurable speed-ups (and cost savings) by routing:
| Task type |
Model |
Rationale |
| Research, scanning, reading docs |
sonnet |
Fast, cost-effective, information gathering |
| Architecture, doc writing (Phase 2) |
opus |
Deep reasoning about system design |
| Planning, build plan (Phase 4) |
opus |
Dependency analysis, risk assessment |
| Test skeletons |
sonnet |
Structured template work |
| Simple implementation (types, routing) |
sonnet |
Pattern-following |
| Complex implementation (business logic, multi-file) |
opus |
Deep reasoning, cross-file consistency |
| Typecheck, test runs, verification |
sonnet |
Command running + reporting |
Rules for parallelization:
- Only parallelize tasks with no data dependencies between them
- Each agent gets a complete, self-contained prompt — all context it needs (file paths, feature description, project conventions)
- Agent results are collected and synthesized in the main conversation before presenting to the user
Concrete application: Phase 1 runs three Sonnet agents in parallel (context-rules, context-features, context-codebase). Phase 3 runs two Sonnet agents in parallel for unit + E2E skeletons.
6. Layered parallelization within implementation (Phase 6)
Within a block, we group steps into layers by dependency:
Layer 1 (no dependencies): Types, shared interfaces
Layer 2 (depends on Layer 1): Backend services, frontend components ← parallel agents here
Layer 3 (depends on Layer 2): Route wiring, integration points
Layer 4 (depends on all): Test implementation, final wiring
Sequential layers, parallel agents within a layer when multiple steps are independent. If only one step in a layer → execute directly in main conversation (no agent overhead).
Tier 3 — Culture / ownership default
7. Ownership default when hitting external errors
Documented in our CLAUDE.md and referenced from Phase 7: "My code is wrong until proven otherwise. Not: the API has a problem."
Procedure on any anomaly (400/500/timeout/unexpectedly slow): (1) read backend logs, (2) run a minimal probe script against the real interface, (3) then form a root-cause hypothesis.
Even when an external cause is empirically confirmed, the feature is not "done with an asterisk" — it's "blocked on condition X". Status is "in progress", not "done".
This costs ~1 extra minute per incident but eliminates a class of wrong-attribution that otherwise sends the agent patching the wrong thing.
What we'd like to pull from you (for symmetry)
Not asking for anything here, just acknowledging — these are excellent and we'll be adopting them:
- Phase 2 data flow & impact analysis gate
scan / update / deprecate / reverse-engineer / graph modes
.mdd/.startup.md auto-context
last_synced / status / phase frontmatter for drift tracking
- Tooling-task detection (skipping DB/API questions)
PR offer
If any of the above sounds useful, I'm happy to send focused PRs — one per proposal, small diffs, each additive with no breaking changes. Tier 1 would be three short PRs (~20–80 lines each). I can also port them to match your current Phase numbering (our fork is at your 2026-03-04 baseline, so the phases renumbered after your 2026-04-12 additions).
Let me know which (if any) you'd want, and whether you'd prefer them as separate PRs or one combined.
Thanks again for building this — it's genuinely shifted how I think about feature work.
Context
We integrated MDD into a real product project on 2026-03-10 (based on the worktree-support version of
/mdd) and have been using it daily for ~5 weeks on a test-management tool for SAP rollouts (React 19 + Express + Supabase, pnpm monorepo).First: thank you. The workflow has genuinely changed how we build — docs-before-code stuck, and the
.mdd/docs/tree is now the living spec. We also noticed your recent additions on 2026-04-12 (scan/update/deprecate/reverse-engineer/graphmodes + the data flow analysis phase) and plan to pull those into our fork. The proposals below are orthogonal to those additions — they came out of daily use on the parts of the workflow that existed on 2026-03-10.Happy to send focused PRs for any of the below if useful — all are additive, no breaking changes.
Tier 1 — TDD loop gates (the strongest case)
These three fix concrete fail-modes we hit repeatedly before adding them.
1. Red Gate — verify skeletons actually fail before implementing
Problem. Phase 4 (skeleton generation) → Phase 6 (implement) currently has no check that the generated skeletons are actually red. We hit two fail-modes:
itthat doesn't callexpect.fail) passes vacuously — implementation "succeeds" against nothing.Fix. Add a mandatory Phase 4b between skeletons and implementation: run the new test file(s), assert that every new test fails, report
🔴 Red Gate: N/N failing (expected). Don't proceed until red.Our implementation:
2. Green Gate — iterative diagnosis-first loop with a hard cap
Problem. Phase 6 runs
pnpm typecheck+pnpm test:unitonce. On failure there's no recovery strategy, which in practice becomes an uncapped blind-retry loop: the agent re-runs tests, tries a guess, re-runs, tries again, runs out of context.Fix. A spec-driven loop capped at 5 iterations, each enforcing a diagnosis step before the fix:
tsc --noEmit.claude/learnings.md)? What is the one targeted fix?Exit conditions:
This catches runaway loops and — importantly — forces the agent to state why each fix was chosen, which improves quality.
3. Integration happy-path hard gate before reporting "done"
Problem. Phase 7 treats
unit tests green + typecheck cleanas "MDD Complete". This is the biggest gap we hit — several times the full unit suite was green, but against the real DB / real external API the feature didn't actually work (classic mock/prod divergence, or the UI page never actually rendered the new data). The "✅ MDD Complete" report created false confidence.Fix. Phase 7 must include an explicit integration step:
Key rule: external conditions ("API offline", "slow", "missing test data") are not an excuse to mark the feature done. They are hypotheses that need to be empirically proven before being accepted as blockers. Default assumption on any external failure: my code is wrong until proven otherwise.
If integration is not verified, the report is not "✅ MDD Complete" — it's "⏸️ MDD Blocked on condition X" with a concrete next action for the user.
Tier 2 — Structural improvements
4. Block structure instead of flat steps (Phase 5)
Currently Phase 5 presents a flat
Step 1 (Types), Step 2 (Handler), ...list. We replaced that with blocks — a block is a unit of work that satisfies three qualitative criteria:Each block is labeled
small/medium/largeas a sanity check.largerequires a justification why no split is sensible (typically: shared type contract forces backend + frontend into one block). If the justification doesn't hold → split.Each block also includes a handoff contract — the interface/assumption the next block depends on.
Benefit: better commit hygiene, better abort points for long features, much less "half-done" state mid-session. For very small features, the rule is "one block, don't over-structure".
5. Parallel agents with model-per-task routing
Currently the
/mddflow runs sequentially with an implicit single model. We get measurable speed-ups (and cost savings) by routing:Rules for parallelization:
Concrete application: Phase 1 runs three Sonnet agents in parallel (
context-rules,context-features,context-codebase). Phase 3 runs two Sonnet agents in parallel for unit + E2E skeletons.6. Layered parallelization within implementation (Phase 6)
Within a block, we group steps into layers by dependency:
Sequential layers, parallel agents within a layer when multiple steps are independent. If only one step in a layer → execute directly in main conversation (no agent overhead).
Tier 3 — Culture / ownership default
7. Ownership default when hitting external errors
Documented in our CLAUDE.md and referenced from Phase 7: "My code is wrong until proven otherwise. Not: the API has a problem."
Procedure on any anomaly (400/500/timeout/unexpectedly slow): (1) read backend logs, (2) run a minimal probe script against the real interface, (3) then form a root-cause hypothesis.
Even when an external cause is empirically confirmed, the feature is not "done with an asterisk" — it's "blocked on condition X". Status is "in progress", not "done".
This costs ~1 extra minute per incident but eliminates a class of wrong-attribution that otherwise sends the agent patching the wrong thing.
What we'd like to pull from you (for symmetry)
Not asking for anything here, just acknowledging — these are excellent and we'll be adopting them:
scan/update/deprecate/reverse-engineer/graphmodes.mdd/.startup.mdauto-contextlast_synced/status/phasefrontmatter for drift trackingPR offer
If any of the above sounds useful, I'm happy to send focused PRs — one per proposal, small diffs, each additive with no breaking changes. Tier 1 would be three short PRs (~20–80 lines each). I can also port them to match your current Phase numbering (our fork is at your 2026-03-04 baseline, so the phases renumbered after your 2026-04-12 additions).
Let me know which (if any) you'd want, and whether you'd prefer them as separate PRs or one combined.
Thanks again for building this — it's genuinely shifted how I think about feature work.