MDD workflow improvements from 5 weeks of daily use: TDD gates, block structure, parallel agents

## Context

We integrated MDD into a real product project on **2026-03-10** (based on the worktree-support version of `/mdd`) and have been using it daily for ~5 weeks on a test-management tool for SAP rollouts (React 19 + Express + Supabase, pnpm monorepo).

First: thank you. The workflow has genuinely changed how we build — docs-before-code stuck, and the `.mdd/docs/` tree is now the living spec. We also noticed your recent additions on 2026-04-12 (`scan` / `update` / `deprecate` / `reverse-engineer` / `graph` modes + the data flow analysis phase) and plan to pull those into our fork. The proposals below are **orthogonal to those additions** — they came out of daily use on the parts of the workflow that existed on 2026-03-10.

Happy to send focused PRs for any of the below if useful — all are additive, no breaking changes.

---

## Tier 1 — TDD loop gates (the strongest case)

These three fix concrete fail-modes we hit repeatedly before adding them.

### 1. Red Gate — verify skeletons actually fail before implementing

**Problem.** Phase 4 (skeleton generation) → Phase 6 (implement) currently has no check that the generated skeletons are actually red. We hit two fail-modes:

- A skeleton with no real assertion (e.g., an empty `it` that doesn't call `expect.fail`) passes vacuously — implementation "succeeds" against nothing.
- Pre-existing code occasionally already satisfies a skeleton, so the test passes green from the start and the "TDD" phase is fiction.

**Fix.** Add a mandatory Phase 4b between skeletons and implementation: run the new test file(s), assert that every new test fails, report `🔴 Red Gate: N/N failing (expected)`. Don't proceed until red.

Our implementation:

```
**Phase 3b — Red Gate (mandatory)**

Run all newly created test skeletons to confirm they fail:

    pnpm --filter backend run test -- <test-file-path>

Every single skeleton test MUST fail. If any test passes unexpectedly,
investigate — either the skeleton has no assertion or there's pre-existing
code that already satisfies it. Fix the skeleton or adjust scope.

Report:
    🔴 Red Gate: <N>/<N> tests failing (expected)
       All skeletons confirmed RED — ready to implement.

Do NOT proceed to Phase 4 until the Red Gate passes.
```

### 2. Green Gate — iterative diagnosis-first loop with a hard cap

**Problem.** Phase 6 runs `pnpm typecheck` + `pnpm test:unit` once. On failure there's no recovery strategy, which in practice becomes an uncapped blind-retry loop: the agent re-runs tests, tries a guess, re-runs, tries again, runs out of context.

**Fix.** A spec-driven loop capped at **5 iterations**, each enforcing a diagnosis step before the fix:

1. Run feature tests + `tsc --noEmit`
2. **Diagnose** — exact error? Which implementation assumption was wrong? Known pattern (check `.claude/learnings.md`)? What is the *one* targeted fix?
3. **Fix** — adjust implementation, not the test. If the test seems wrong → re-read the doc. If the doc is wrong → pause, ask the user.
4. **Report** — one-line root cause + one-line fix summary per iteration.
5. **Repeat** until green.

**Exit conditions:**
- All feature tests + tsc green → proceed to regression check (full suite; failures count against the same 5-iteration budget).
- After 5 iterations still failing → **stop**, report remaining failures with diagnoses to the user. Do not keep trying.

This catches runaway loops and — importantly — forces the agent to state *why* each fix was chosen, which improves quality.

### 3. Integration happy-path hard gate before reporting "done"

**Problem.** Phase 7 treats `unit tests green + typecheck clean` as "MDD Complete". This is the biggest gap we hit — several times the full unit suite was green, but against the real DB / real external API the feature didn't actually work (classic mock/prod divergence, or the UI page never actually rendered the new data). The "✅ MDD Complete" report created false confidence.

**Fix.** Phase 7 must include an explicit integration step:

- **Backend features:** trigger the full flow against the real DB / API (not just individual endpoints). Actively watch backend logs during the run. Any rate anomaly or unexpected error pattern is an immediate log-check trigger.
- **Frontend features:** click through the actual user flow in the browser, open the target page, visually verify the expected data appears.
- **DB features:** check the rows actually written/read via SQL, not "insert returned no error".

Key rule: *external conditions ("API offline", "slow", "missing test data") are **not** an excuse to mark the feature done.* They are hypotheses that need to be empirically proven before being accepted as blockers. Default assumption on any external failure: **my code is wrong until proven otherwise**.

If integration is not verified, the report is not "✅ MDD Complete" — it's "⏸️ MDD Blocked on condition X" with a concrete next action for the user.

---

## Tier 2 — Structural improvements

### 4. Block structure instead of flat steps (Phase 5)

Currently Phase 5 presents a flat `Step 1 (Types), Step 2 (Handler), ...` list. We replaced that with **blocks** — a block is a unit of work that satisfies three qualitative criteria:

1. **Runnable end-state** — after the block, code compiles, tests green, no half-open interfaces
2. **Commit-worthy scope** — a clear "why" for a standalone commit message
3. **Own verification** — a concrete test/check command that proves "done"

Each block is labeled `small` / `medium` / `large` as a sanity check. `large` requires a justification why no split is sensible (typically: shared type contract forces backend + frontend into one block). If the justification doesn't hold → split.

Each block also includes a **handoff contract** — the interface/assumption the next block depends on.

Benefit: better commit hygiene, better abort points for long features, much less "half-done" state mid-session. For very small features, the rule is "one block, don't over-structure".

### 5. Parallel agents with model-per-task routing

Currently the `/mdd` flow runs sequentially with an implicit single model. We get measurable speed-ups (and cost savings) by routing:

| Task type | Model | Rationale |
|-----------|-------|-----------|
| Research, scanning, reading docs | **sonnet** | Fast, cost-effective, information gathering |
| Architecture, doc writing (Phase 2) | **opus** | Deep reasoning about system design |
| Planning, build plan (Phase 4) | **opus** | Dependency analysis, risk assessment |
| Test skeletons | **sonnet** | Structured template work |
| Simple implementation (types, routing) | **sonnet** | Pattern-following |
| Complex implementation (business logic, multi-file) | **opus** | Deep reasoning, cross-file consistency |
| Typecheck, test runs, verification | **sonnet** | Command running + reporting |

Rules for parallelization:

- Only parallelize tasks with **no data dependencies** between them
- Each agent gets a **complete, self-contained prompt** — all context it needs (file paths, feature description, project conventions)
- Agent results are **collected and synthesized** in the main conversation before presenting to the user

Concrete application: Phase 1 runs three Sonnet agents in parallel (`context-rules`, `context-features`, `context-codebase`). Phase 3 runs two Sonnet agents in parallel for unit + E2E skeletons.

### 6. Layered parallelization within implementation (Phase 6)

Within a block, we group steps into layers by dependency:

```
Layer 1 (no dependencies):     Types, shared interfaces
Layer 2 (depends on Layer 1):  Backend services, frontend components  ← parallel agents here
Layer 3 (depends on Layer 2):  Route wiring, integration points
Layer 4 (depends on all):      Test implementation, final wiring
```

Sequential layers, parallel agents *within* a layer when multiple steps are independent. If only one step in a layer → execute directly in main conversation (no agent overhead).

---

## Tier 3 — Culture / ownership default

### 7. Ownership default when hitting external errors

Documented in our CLAUDE.md and referenced from Phase 7: *"My code is wrong until proven otherwise. Not: the API has a problem."*

Procedure on any anomaly (400/500/timeout/unexpectedly slow): (1) read backend logs, (2) run a minimal probe script against the real interface, (3) *then* form a root-cause hypothesis.

Even when an external cause is empirically confirmed, the feature is not "done with an asterisk" — it's "blocked on condition X". Status is "in progress", not "done".

This costs ~1 extra minute per incident but eliminates a class of wrong-attribution that otherwise sends the agent patching the wrong thing.

---

## What we'd like to pull from you (for symmetry)

Not asking for anything here, just acknowledging — these are excellent and we'll be adopting them:

- Phase 2 data flow & impact analysis gate
- `scan` / `update` / `deprecate` / `reverse-engineer` / `graph` modes
- `.mdd/.startup.md` auto-context
- `last_synced` / `status` / `phase` frontmatter for drift tracking
- Tooling-task detection (skipping DB/API questions)

---

## PR offer

If any of the above sounds useful, I'm happy to send **focused PRs** — one per proposal, small diffs, each additive with no breaking changes. Tier 1 would be three short PRs (~20–80 lines each). I can also port them to match your current Phase numbering (our fork is at your 2026-03-04 baseline, so the phases renumbered after your 2026-04-12 additions).

Let me know which (if any) you'd want, and whether you'd prefer them as separate PRs or one combined.

Thanks again for building this — it's genuinely shifted how I think about feature work.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MDD workflow improvements from 5 weeks of daily use: TDD gates, block structure, parallel agents #2

Context

Tier 1 — TDD loop gates (the strongest case)

1. Red Gate — verify skeletons actually fail before implementing

2. Green Gate — iterative diagnosis-first loop with a hard cap

3. Integration happy-path hard gate before reporting "done"

Tier 2 — Structural improvements

4. Block structure instead of flat steps (Phase 5)

5. Parallel agents with model-per-task routing

6. Layered parallelization within implementation (Phase 6)

Tier 3 — Culture / ownership default

7. Ownership default when hitting external errors

What we'd like to pull from you (for symmetry)

PR offer

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Task type	Model	Rationale
Research, scanning, reading docs	sonnet	Fast, cost-effective, information gathering
Architecture, doc writing (Phase 2)	opus	Deep reasoning about system design
Planning, build plan (Phase 4)	opus	Dependency analysis, risk assessment
Test skeletons	sonnet	Structured template work
Simple implementation (types, routing)	sonnet	Pattern-following
Complex implementation (business logic, multi-file)	opus	Deep reasoning, cross-file consistency
Typecheck, test runs, verification	sonnet	Command running + reporting

MDD workflow improvements from 5 weeks of daily use: TDD gates, block structure, parallel agents #2

Description

Context

Tier 1 — TDD loop gates (the strongest case)

1. Red Gate — verify skeletons actually fail before implementing

2. Green Gate — iterative diagnosis-first loop with a hard cap

3. Integration happy-path hard gate before reporting "done"

Tier 2 — Structural improvements

4. Block structure instead of flat steps (Phase 5)

5. Parallel agents with model-per-task routing

6. Layered parallelization within implementation (Phase 6)

Tier 3 — Culture / ownership default

7. Ownership default when hitting external errors

What we'd like to pull from you (for symmetry)

PR offer

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions