docs: reframe no-mock rule — scope to stack/E2E tests, endorse mocks in unit tests

franklywatson · claude · franklywatson · commit ae7d68a5ff32 · 2026-04-08T10:58:09.000-07:00
Constitutional rule #1 changes from blanket "No mocking core system components" to "Real dependencies in E2E/integration and stack tests." Mocks are now explicitly endorsed for unit test isolation. Pattern 1.5 renamed from "No-Mock Philosophy" to "Real Dependencies in E2E/Integration and Stack Tests." All cross-references updated across 11 files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -100,7 +100,7 @@ See @README.md for detailed project overview.
 
 ## Constitutional Rules (Never Violate)
 
-1. **No mocking core system components** — Use real databases, real services, real blockchains
+1. **Real dependencies in E2E/integration and stack tests** — Use real databases, services, and caches; mocks are appropriate in unit tests
 2. **Evidence-based claims only** — "Tests pass" must show test output
 3. **Zero-defect tolerance** — Every error/warning must be addressed
 4. **Doc freshness mandatory** — Code changes require doc updates in same session
diff --git a/README.md b/README.md
@@ -45,7 +45,7 @@ Patterns are organized into five levels, each building on the previous. The leve
 
 **[L0: Foundation](docs/L0-foundation.md)** — Structure your codebase so an AI with zero prior context can navigate, understand, and contribute. Deep modules, progressive disclosure, conceptual file organization, CLAUDE.md as project constitution, unit tests as contract, documentation as system map, and aggressive cleanup. The "can a new starter figure this out?" test.
 
-**[L1: Closed Loop Design and Verification](docs/L1-feedback-loops.md)** — The level where agents stop guessing and start designing. Context harvesting gathers targeted evidence before implementation. Stack tests validate design intent end-to-end through the full application stack — no mocks, no partial integration, no ambiguous results. Full-loop assertion layering catches regressions at primary, secondary, and tertiary levels.
+**[L1: Closed Loop Design and Verification](docs/L1-feedback-loops.md)** — The level where agents stop guessing and start designing. Context harvesting gathers targeted evidence before implementation. Stack tests validate design intent end-to-end through the full application stack — real dependencies (no mocks in stack tests), no partial integration, no ambiguous results. Mocks are appropriate in unit tests for isolation. Full-loop assertion layering catches regressions at primary, secondary, and tertiary levels.
 
 **[L2: Behavioral Guardrails](docs/L2-behavioral-guardrails.md)** — Rules written in prose are suggestions. Skills and hooks are enforcement. Overlay skills on top of base agent capabilities, chain them into a complete development lifecycle, and automate discipline through the tool layer.
 
diff --git a/docs/L0-foundation.md b/docs/L0-foundation.md
@@ -178,7 +178,7 @@ One-sentence summary of what this project does.
 Tree diagram showing major directories and their purposes.
 
 ## Constitutional Rules (Never Violate)
-1. No mocking core system components
+1. Real dependencies in E2E/integration and stack tests
 2. Evidence-based claims only
 3. Zero-defect tolerance
 
@@ -227,7 +227,7 @@ During a refactoring session:
 5. Tests document what the code actually does → docs stay accurate
 ```
 
-**Unit tests and stack tests are complementary, not competing concerns.** Stack tests ([L1](L1-feedback-loops.md#pattern-11--stack-tests)) validate end-to-end user journeys through the full system. Unit tests validate individual module contracts in isolation. A codebase needs both: stack tests catch integration failures; unit tests catch logic errors within modules. Dismissing a unit test failure while trusting stack test results means relying on partial feedback.
+**Unit tests and stack tests are complementary, not competing concerns.** Stack tests ([L1](L1-feedback-loops.md#pattern-11--stack-tests)) validate end-to-end user journeys through the full system with real dependencies ([Pattern 1.5](L1-patterns/1.5-no-mock-philosophy.md)). Unit tests validate individual module contracts in isolation, where mocks provide the necessary isolation to test logic without standing up infrastructure. Mocks are appropriate and encouraged in unit tests — they enable fast, focused, diagnostic tests of module behavior. A codebase needs both: stack tests catch integration failures; unit tests catch logic errors within modules. Dismissing a unit test failure while trusting stack test results means relying on partial feedback.
 
 **Anti-Pattern**: Relying on documentation as the primary contract. Documentation drifts. Tests execute. Treating unit tests as subordinate to stack tests — both layers provide distinct, necessary signals. Writing tests for coverage metrics rather than diagnostic value.
 
diff --git a/docs/L1-feedback-loops.md b/docs/L1-feedback-loops.md
@@ -89,9 +89,9 @@ Tests ordered by dependency so that if test N fails, the agent knows tests 1 thr
 
 Four mechanisms ensuring tests never interfere: unique container names, dynamic port allocation, transient volumes, and per-test compose files. Aggressive cleanup prevents Docker resource exhaustion during concurrent test execution.
 
-### [Pattern 1.5 — No-Mock Philosophy](L1-patterns/1.5-no-mock-philosophy.md)
+### [Pattern 1.5 — Real Dependencies in E2E/Integration and Stack Tests](L1-patterns/1.5-no-mock-philosophy.md)
 
-Stack tests use real everything — real PostgreSQL, real Redis, real message queues. The only acceptable mocks are external services without test environments. If you own it, run it. If you can run it in Docker, run it in Docker.
+Stack tests and E2E/integration tests use real everything — real PostgreSQL, real Redis, real message queues. The only acceptable mocks in these tests are external services without test environments. If you own it, run it. If you can run it in Docker, run it in Docker. Mocks are appropriate and encouraged in unit tests, which validate module contracts in isolation.
 
 ### [Pattern 1.6 — Test Integrity Rules](L1-patterns/1.6-test-integrity.md)
 
diff --git a/docs/L1-patterns/1.1-stack-tests.md b/docs/L1-patterns/1.1-stack-tests.md
@@ -147,7 +147,7 @@ The bootstrap step (typically `02-bootstrap-test-data.stack.test.ts`) runs after
 - **[Pattern 1.2 — Full-Loop Assertion Layering](1.2-full-loop-assertions.md)**: How to structure assertions within stack tests
 - **[Pattern 1.3 — Sequential/Additive Test Design](1.3-sequential-design.md)**: How to order stack tests for maximum diagnostic value
 - **[Pattern 1.4 — Container Isolation](1.4-container-isolation.md)**: How to run stack tests concurrently without collision
-- **[Pattern 1.5 — No-Mock Philosophy](1.5-no-mock-philosophy.md)**: Why stack tests avoid mocks entirely
+- **[Pattern 1.5 — Real Dependencies in E2E/Integration and Stack Tests](1.5-no-mock-philosophy.md)**: Why stack tests use real dependencies — mocks are appropriate in unit tests, not in stack tests
 - **[L0 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract)**: Unit tests validate individual module contracts — stack tests validate system behavior
 
 ---
diff --git a/docs/L1-patterns/1.5-no-mock-philosophy.md b/docs/L1-patterns/1.5-no-mock-philosophy.md
@@ -1,62 +1,81 @@
-# Pattern 1.5 — No-Mock Philosophy
+# Pattern 1.5 — Real Dependencies in E2E/Integration and Stack Tests
 
 ## Problem
 
-Mock system components and you test your mocks, not your system. Mocks lie: they return perfect data, never timeout, never throw unexpected errors. Tests pass but production fails because real databases have latency, real APIs return errors, real caches miss. Mocking creates a fantasy system that doesn't exist.
+Mock system components in end-to-end tests and you test your mocks, not your system. Mocks lie: they return perfect data, never timeout, never throw unexpected errors. Tests pass but production fails because real databases have latency, real APIs return errors, real caches miss. Mocking in E2E and stack tests creates a fantasy system that doesn't exist.
+
+This principle applies to **stack tests and E2E/integration tests only** — tests that verify system behavior through the API. Unit tests are a different concern: they validate individual module contracts in isolation, where mocks provide the necessary isolation to test logic without standing up infrastructure. See [Pattern 0.5 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract).
 
 ## Solution
 
-**Stack tests use real everything** — real PostgreSQL, real Redis, real RabbitMQ, real HTTP calls to other services. The only acceptable mocks are external services you don't control: third-party APIs where no sandbox exists, payment processors where test mode is unavailable.
+**Stack tests and E2E/integration tests use real everything** — real PostgreSQL, real Redis, real RabbitMQ, real HTTP calls to other services. The only acceptable mocks in these tests are external services you don't control: third-party APIs where no sandbox exists, payment processors where test mode is unavailable.
+
+Real dependencies philosophy: if you own it, run it. If you can run it in Docker, run it in Docker. If you can't, that's a deployment dependency, not a testing concern.
 
-No-mock philosophy: if you own it, run it. If you can run it in Docker, run it in Docker. If you can't, that's a deployment dependency, not a testing concern.
+**Mocks in unit tests are appropriate and encouraged.** Unit tests validate module contracts — how a function handles edge cases, error paths, and boundary conditions. Mocks provide the isolation that makes unit tests fast, focused, and diagnostic. See [Pattern 0.5 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract) for the unit test perspective.
 
 ## In Practice
 
-What to mock vs. not mock:
+What to mock vs. not mock — this table applies to stack tests and E2E/integration tests:
 
-| Component | Mock? | Reason |
-|-----------|-------|--------|
-| PostgreSQL, MySQL, MongoDB | No | Run in Docker — free, fast, realistic |
-| Redis, Memcached | No | Run in Docker — trivial setup |
-| RabbitMQ, Kafka | No | Run in Docker — handles real edge cases |
-| Internal microservices | No | Run the full stack — integration is what you're testing |
-| External APIs with sandbox (Stripe, Plaid) | No | Use sandbox — they provide it for this reason |
-| External APIs without sandbox | Yes | Mock the client, test error handling |
-| Time (for testing expiry) | Maybe | Use time-skewing libraries if system clock dependency is critical |
+| Component | Mock in Stack/E2E? | Mock in Unit? | Reason |
+|-----------|-------------------|---------------|--------|
+| PostgreSQL, MySQL, MongoDB | No | Yes | Stack: run in Docker. Unit: mock for isolation. |
+| Redis, Memcached | No | Yes | Stack: run in Docker. Unit: mock for isolation. |
+| RabbitMQ, Kafka | No | Yes | Stack: run in Docker. Unit: mock for isolation. |
+| Internal microservices | No | Yes | Stack: run the full stack. Unit: mock to test module logic. |
+| External APIs with sandbox (Stripe, Plaid) | No | Yes | Stack: use sandbox. Unit: mock for isolation. |
+| External APIs without sandbox | Yes | Yes | No test environment available at any level. |
+| Time (for testing expiry) | Maybe | Yes | Stack: time-skewing libraries. Unit: mock freely. |
 
-Example: Testing a payment flow
+Example: Testing a payment flow in a stack test
 
 ```typescript
-// Good: Use Stripe testnet
+// Good: Use Stripe testnet in stack tests
 const stripe = new Stripe(process.env.STRIPE_TEST_KEY);
 const payment = await stripe.paymentIntents.create({
   amount: 1000,
   currency: 'usd',
   // Real Stripe testnet handles edge cases: declined cards, network errors
 });
 
-// Bad: Mock Stripe client
+// Bad: Mock Stripe client in a stack test
 const mockStripe = {
   paymentIntents: {
     create: () => ({ id: 'pi_mock', status: 'succeeded' })
   };
   // This passes tests but tells you nothing about real integration
+}
+```
+
+Example: Mocking in a unit test is appropriate
+
+```typescript
+// Good: Mock the Stripe client in a unit test to isolate payment logic
+const mockStripe = {
+  paymentIntents: {
+    create: vi.fn().mockResolvedValue({ id: 'pi_test', status: 'succeeded' })
+  }
 };
+// Unit test validates how processPayment handles the response,
+// including edge cases like declined cards and network errors
 ```
 
 ## Anti-Pattern
 
-**Don't** mock databases "because they're slow." PostgreSQL in Docker adds ~2 seconds to startup. Mock databases to test complex queries in unit tests, not in stack tests.
+**Don't** mock databases in stack tests "because they're slow." PostgreSQL in Docker adds ~2 seconds to startup. That's the cost of real confidence.
+
+**Don't** mock external services that provide test environments in stack tests. Stripe, Plaid, Twilio, etc. all provide test/sandbox modes. Use them — they catch real integration bugs.
 
-**Don't** mock external services that provide test environments. Stripe, Plaid, Twilio, etc. all provide test/sandbox modes. Use them — they catch real integration bugs.
+**Don't** mock for "determinism" in stack tests. Real systems are non-deterministic. You want tests to fail when race conditions exist, not hide them behind perfect mocks.
 
-**Don't** mock for "determinism." Real systems are non-deterministic. You want tests to fail when race conditions exist, not hide them behind perfect mocks.
+**Don't** avoid mocks in unit tests out of misplaced consistency. Unit tests and stack tests serve different purposes — mocks provide isolation in unit tests, real dependencies provide confidence in stack tests.
 
 ## Cross-References
 
-- **[Pattern 1.1 — Stack Tests](1.1-stack-tests.md)**: No-mocks is core to stack test philosophy
-- **[L0 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract)**: Unit tests validate individual module contracts in isolation — stack tests validate system behavior through the API
-- **[L2 — Deterministic Simulation](../L2-behavioral-guardrails.md)**: When you truly need determinism (e.g., replay tests), use simulation, not mocks
+- **[Pattern 1.1 — Stack Tests](1.1-stack-tests.md)**: Real dependencies is core to stack test philosophy
+- **[L0 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract)**: Unit tests validate individual module contracts — mocks are appropriate for isolation
+- **[L2 — Constitutional Rules](../L2-behavioral-guardrails.md#pattern-24--constitutional-rules)**: Constitutional rule enforces real dependencies in stack tests specifically
 
 ---
 
diff --git a/docs/L2-behavioral-guardrails.md b/docs/L2-behavioral-guardrails.md
@@ -54,8 +54,8 @@ Skills are the mechanism by which a project's constitution (@L0-foundation.md#pa
 
 A TDD skill for an ecommerce project might extend `superpowers:test-driven-development` and add:
 
-- No mocking database drivers (constitutional rule)
-- No mocking payment processor libraries (Stripe, PayPal) (constitutional rule)
+- Real dependencies in E2E/integration and stack tests — no mocking database drivers in stack tests (constitutional rule)
+- Real dependencies in E2E/integration and stack tests — no mocking payment processor libraries in stack tests (constitutional rule)
 - Full-loop assertion requirements (project convention)
 - Hook to track test file changes (integration with workflow)
 
@@ -252,7 +252,7 @@ Constitutional rules are hard constraints declared in CLAUDE.md that never relax
 
 **Example constitutional rules from a production ecommerce platform:**
 
-- **Never mock core system components** — logger, payment processor libraries (Stripe, PayPal), database drivers, HTTP clients for first-party services. Use real components in stack tests.
+- **Real dependencies in E2E/integration and stack tests** — logger, payment processor libraries (Stripe, PayPal), database drivers, HTTP clients for first-party services. Use real components in stack tests. Mocks are appropriate in unit tests.
 - **Full accounting for every state change** — every inventory change, every order, every transaction fee must be logged and queryable.
 - **Evidence-based claims only** — show command output before claiming done. "Tests pass" is not evidence; show the test output.
 - **Docker-first development** — no local OS execution. Everything runs in containers.
@@ -263,7 +263,7 @@ Constitutional rules are hard constraints declared in CLAUDE.md that never relax
 ```
 CLAUDE.md declares constitutional rules
     +-> plan+ includes rules in plan template (what must this plan respect?)
-            +-> tdd+ rejects mocked components in test generation
+            +-> tdd+ rejects mocked components in stack test generation
                     +-> review+ checks constitutional compliance (did this violate any rules?)
 ```
 
@@ -272,10 +272,10 @@ CLAUDE.md declares constitutional rules
 Constitutional rule enforcement in the skill chain:
 
 1. **plan+** reads CLAUDE.md and adds a "Constitutional compliance" section to each plan
-2. **tdd+** checks that proposed tests don't mock protected components
+2. **tdd+** checks that proposed stack tests don't mock protected components
 3. **review+** runs a checklist that includes "No constitutional rules violated"
 
-When the agent attempts to mock a database driver, the tdd+ skill blocks it with a reference to the constitutional rule.
+When the agent attempts to mock a database driver in a stack test, the tdd+ skill blocks it with a reference to the constitutional rule.
 
 ### Anti-Pattern
 
@@ -284,7 +284,7 @@ Writing "soft" rules with exceptions. Constitutional rules must have no escape h
 ### Cross-References
 
 - @L0-foundation.md#pattern-04-claude-md-as-project-constitution — CLAUDE.md format
-- @L1-patterns/1.5-no-mock-philosophy.md — Mock avoidance rationale
+- @L1-patterns/1.5-no-mock-philosophy.md — Real dependencies in E2E/integration and stack tests
 
 ### Reference Implementation
 
diff --git a/docs/cross-cutting/adoption-guide.md b/docs/cross-cutting/adoption-guide.md
@@ -19,7 +19,7 @@ Before choosing a path, assess your current state. Run through this checklist to
 
 ### Testing
 
-- [ ] Integration tests use real components (no mocks for owned services)
+- [ ] Stack tests and E2E/integration tests use real components (no mocks for owned services)
 - [ ] Tests run in isolated environments (no shared state)
 - [ ] Test failures provide clear diagnostic signals
 - [ ] Tests assert on side effects, not just responses
@@ -56,7 +56,7 @@ Document every assumption they had to make. Each assumption is a gap.
 | Level | Focus | Common Gaps | Quick Win |
 |-------|-------|-------------|-----------|
 | **L0 Foundation** | Structure, CLAUDE.md, doc freshness, cleanup | No CLAUDE.md, layer-based organization, stale docs | Write CLAUDE.md, restructure by domain |
-| **L1 Closed Loop Design** | Design-led verification with stack tests | Mock-heavy integration tests, shallow assertions | Add app-startup stack test |
+| **L1 Closed Loop Design** | Design-led verification with stack tests | Mock-heavy stack/E2E tests, shallow assertions | Add app-startup stack test |
 | **L2 Guardrails** | Skills, hooks, behavioral rules | No enforcement, agents make common errors | Add test-integrity skill |
 | **L3 Optimization** | Smart routing, structured search | Raw grep/cat commands, token waste | Set up jcodemunch indexing |
 | **L4 Standards & Measurement** | Evidence, drift detection, metrics | Claims without evidence, spec drift | Establish evidence standard |
@@ -81,13 +81,13 @@ Start with L0, the highest-impact, lowest-effort starting point. Then build upwa
 
 If your integration tests are the biggest pain point, start at L1.
 
-1. **L1** — Add stack tests for your most brittle integration test areas. Remove mocks for owned services. See [Pattern 1.1 — Stack Tests](../L1-feedback-loops.md#pattern-11--stack-tests) and [examples/stack-test/](../../examples/stack-test/).
+1. **L1** — Add stack tests for your most brittle integration test areas. Remove mocks for owned services in stack tests (mocks are fine in unit tests). See [Pattern 1.1 — Stack Tests](../L1-feedback-loops.md#pattern-11--stack-tests) and [examples/stack-test/](../../examples/stack-test/).
 2. **L0** — Once stack tests are working, structure the project to make them maintainable. Deep modules, CLAUDE.md, progressive disclosure.
 3. **L2-L4** — Continue upward as in Path A.
 
 ### Path C: Guardrails-First (Teams with Agent Errors)
 
-If agents are making consistent errors (wrong mocking, missing tests, ignoring rules), start at L2.
+If agents are making consistent errors (wrong mocking in stack tests, missing tests, ignoring rules), start at L2.
 
 1. **L2** — Add skills for your most violated rules, hooks to block destructive patterns. See [L2-behavioral-guardrails.md](../L2-behavioral-guardrails.md).
 2. **L0** — Structure the project so agents have clear context to work with.
diff --git a/docs/cross-cutting/anti-patterns.md b/docs/cross-cutting/anti-patterns.md
diff --git a/docs/cross-cutting/glossary.md b/docs/cross-cutting/glossary.md
diff --git a/docs/references/reference-telegram-trading-bot-case-study.md b/docs/references/reference-telegram-trading-bot-case-study.md