Skip to content

Commit ae7d68a

Browse files
franklywatsonclaude
andcommitted
docs: reframe no-mock rule — scope to stack/E2E tests, endorse mocks in unit tests
Constitutional rule #1 changes from blanket "No mocking core system components" to "Real dependencies in E2E/integration and stack tests." Mocks are now explicitly endorsed for unit test isolation. Pattern 1.5 renamed from "No-Mock Philosophy" to "Real Dependencies in E2E/Integration and Stack Tests." All cross-references updated across 11 files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 7d78d21 commit ae7d68a

11 files changed

Lines changed: 71 additions & 51 deletions

CLAUDE.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,7 +100,7 @@ See @README.md for detailed project overview.
100100

101101
## Constitutional Rules (Never Violate)
102102

103-
1. **No mocking core system components** — Use real databases, real services, real blockchains
103+
1. **Real dependencies in E2E/integration and stack tests** — Use real databases, services, and caches; mocks are appropriate in unit tests
104104
2. **Evidence-based claims only** — "Tests pass" must show test output
105105
3. **Zero-defect tolerance** — Every error/warning must be addressed
106106
4. **Doc freshness mandatory** — Code changes require doc updates in same session

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ Patterns are organized into five levels, each building on the previous. The leve
4545

4646
**[L0: Foundation](docs/L0-foundation.md)** — Structure your codebase so an AI with zero prior context can navigate, understand, and contribute. Deep modules, progressive disclosure, conceptual file organization, CLAUDE.md as project constitution, unit tests as contract, documentation as system map, and aggressive cleanup. The "can a new starter figure this out?" test.
4747

48-
**[L1: Closed Loop Design and Verification](docs/L1-feedback-loops.md)** — The level where agents stop guessing and start designing. Context harvesting gathers targeted evidence before implementation. Stack tests validate design intent end-to-end through the full application stack — no mocks, no partial integration, no ambiguous results. Full-loop assertion layering catches regressions at primary, secondary, and tertiary levels.
48+
**[L1: Closed Loop Design and Verification](docs/L1-feedback-loops.md)** — The level where agents stop guessing and start designing. Context harvesting gathers targeted evidence before implementation. Stack tests validate design intent end-to-end through the full application stack — real dependencies (no mocks in stack tests), no partial integration, no ambiguous results. Mocks are appropriate in unit tests for isolation. Full-loop assertion layering catches regressions at primary, secondary, and tertiary levels.
4949

5050
**[L2: Behavioral Guardrails](docs/L2-behavioral-guardrails.md)** — Rules written in prose are suggestions. Skills and hooks are enforcement. Overlay skills on top of base agent capabilities, chain them into a complete development lifecycle, and automate discipline through the tool layer.
5151

docs/L0-foundation.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,7 @@ One-sentence summary of what this project does.
178178
Tree diagram showing major directories and their purposes.
179179
180180
## Constitutional Rules (Never Violate)
181-
1. No mocking core system components
181+
1. Real dependencies in E2E/integration and stack tests
182182
2. Evidence-based claims only
183183
3. Zero-defect tolerance
184184
@@ -227,7 +227,7 @@ During a refactoring session:
227227
5. Tests document what the code actually does → docs stay accurate
228228
```
229229

230-
**Unit tests and stack tests are complementary, not competing concerns.** Stack tests ([L1](L1-feedback-loops.md#pattern-11--stack-tests)) validate end-to-end user journeys through the full system. Unit tests validate individual module contracts in isolation. A codebase needs both: stack tests catch integration failures; unit tests catch logic errors within modules. Dismissing a unit test failure while trusting stack test results means relying on partial feedback.
230+
**Unit tests and stack tests are complementary, not competing concerns.** Stack tests ([L1](L1-feedback-loops.md#pattern-11--stack-tests)) validate end-to-end user journeys through the full system with real dependencies ([Pattern 1.5](L1-patterns/1.5-no-mock-philosophy.md)). Unit tests validate individual module contracts in isolation, where mocks provide the necessary isolation to test logic without standing up infrastructure. Mocks are appropriate and encouraged in unit tests — they enable fast, focused, diagnostic tests of module behavior. A codebase needs both: stack tests catch integration failures; unit tests catch logic errors within modules. Dismissing a unit test failure while trusting stack test results means relying on partial feedback.
231231

232232
**Anti-Pattern**: Relying on documentation as the primary contract. Documentation drifts. Tests execute. Treating unit tests as subordinate to stack tests — both layers provide distinct, necessary signals. Writing tests for coverage metrics rather than diagnostic value.
233233

docs/L1-feedback-loops.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,9 +89,9 @@ Tests ordered by dependency so that if test N fails, the agent knows tests 1 thr
8989

9090
Four mechanisms ensuring tests never interfere: unique container names, dynamic port allocation, transient volumes, and per-test compose files. Aggressive cleanup prevents Docker resource exhaustion during concurrent test execution.
9191

92-
### [Pattern 1.5 — No-Mock Philosophy](L1-patterns/1.5-no-mock-philosophy.md)
92+
### [Pattern 1.5 — Real Dependencies in E2E/Integration and Stack Tests](L1-patterns/1.5-no-mock-philosophy.md)
9393

94-
Stack tests use real everything — real PostgreSQL, real Redis, real message queues. The only acceptable mocks are external services without test environments. If you own it, run it. If you can run it in Docker, run it in Docker.
94+
Stack tests and E2E/integration tests use real everything — real PostgreSQL, real Redis, real message queues. The only acceptable mocks in these tests are external services without test environments. If you own it, run it. If you can run it in Docker, run it in Docker. Mocks are appropriate and encouraged in unit tests, which validate module contracts in isolation.
9595

9696
### [Pattern 1.6 — Test Integrity Rules](L1-patterns/1.6-test-integrity.md)
9797

docs/L1-patterns/1.1-stack-tests.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ The bootstrap step (typically `02-bootstrap-test-data.stack.test.ts`) runs after
147147
- **[Pattern 1.2 — Full-Loop Assertion Layering](1.2-full-loop-assertions.md)**: How to structure assertions within stack tests
148148
- **[Pattern 1.3 — Sequential/Additive Test Design](1.3-sequential-design.md)**: How to order stack tests for maximum diagnostic value
149149
- **[Pattern 1.4 — Container Isolation](1.4-container-isolation.md)**: How to run stack tests concurrently without collision
150-
- **[Pattern 1.5 — No-Mock Philosophy](1.5-no-mock-philosophy.md)**: Why stack tests avoid mocks entirely
150+
- **[Pattern 1.5 — Real Dependencies in E2E/Integration and Stack Tests](1.5-no-mock-philosophy.md)**: Why stack tests use real dependencies — mocks are appropriate in unit tests, not in stack tests
151151
- **[L0 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract)**: Unit tests validate individual module contracts — stack tests validate system behavior
152152

153153
---

docs/L1-patterns/1.5-no-mock-philosophy.md

Lines changed: 42 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -1,62 +1,81 @@
1-
# Pattern 1.5 — No-Mock Philosophy
1+
# Pattern 1.5 — Real Dependencies in E2E/Integration and Stack Tests
22

33
## Problem
44

5-
Mock system components and you test your mocks, not your system. Mocks lie: they return perfect data, never timeout, never throw unexpected errors. Tests pass but production fails because real databases have latency, real APIs return errors, real caches miss. Mocking creates a fantasy system that doesn't exist.
5+
Mock system components in end-to-end tests and you test your mocks, not your system. Mocks lie: they return perfect data, never timeout, never throw unexpected errors. Tests pass but production fails because real databases have latency, real APIs return errors, real caches miss. Mocking in E2E and stack tests creates a fantasy system that doesn't exist.
6+
7+
This principle applies to **stack tests and E2E/integration tests only** — tests that verify system behavior through the API. Unit tests are a different concern: they validate individual module contracts in isolation, where mocks provide the necessary isolation to test logic without standing up infrastructure. See [Pattern 0.5 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract).
68

79
## Solution
810

9-
**Stack tests use real everything** — real PostgreSQL, real Redis, real RabbitMQ, real HTTP calls to other services. The only acceptable mocks are external services you don't control: third-party APIs where no sandbox exists, payment processors where test mode is unavailable.
11+
**Stack tests and E2E/integration tests use real everything** — real PostgreSQL, real Redis, real RabbitMQ, real HTTP calls to other services. The only acceptable mocks in these tests are external services you don't control: third-party APIs where no sandbox exists, payment processors where test mode is unavailable.
12+
13+
Real dependencies philosophy: if you own it, run it. If you can run it in Docker, run it in Docker. If you can't, that's a deployment dependency, not a testing concern.
1014

11-
No-mock philosophy: if you own it, run it. If you can run it in Docker, run it in Docker. If you can't, that's a deployment dependency, not a testing concern.
15+
**Mocks in unit tests are appropriate and encouraged.** Unit tests validate module contracts — how a function handles edge cases, error paths, and boundary conditions. Mocks provide the isolation that makes unit tests fast, focused, and diagnostic. See [Pattern 0.5 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract) for the unit test perspective.
1216

1317
## In Practice
1418

15-
What to mock vs. not mock:
19+
What to mock vs. not mock — this table applies to stack tests and E2E/integration tests:
1620

17-
| Component | Mock? | Reason |
18-
|-----------|-------|--------|
19-
| PostgreSQL, MySQL, MongoDB | No | Run in Docker — free, fast, realistic |
20-
| Redis, Memcached | No | Run in Docker — trivial setup |
21-
| RabbitMQ, Kafka | No | Run in Docker — handles real edge cases |
22-
| Internal microservices | No | Run the full stack — integration is what you're testing |
23-
| External APIs with sandbox (Stripe, Plaid) | No | Use sandbox — they provide it for this reason |
24-
| External APIs without sandbox | Yes | Mock the client, test error handling |
25-
| Time (for testing expiry) | Maybe | Use time-skewing libraries if system clock dependency is critical |
21+
| Component | Mock in Stack/E2E? | Mock in Unit? | Reason |
22+
|-----------|-------------------|---------------|--------|
23+
| PostgreSQL, MySQL, MongoDB | No | Yes | Stack: run in Docker. Unit: mock for isolation. |
24+
| Redis, Memcached | No | Yes | Stack: run in Docker. Unit: mock for isolation. |
25+
| RabbitMQ, Kafka | No | Yes | Stack: run in Docker. Unit: mock for isolation. |
26+
| Internal microservices | No | Yes | Stack: run the full stack. Unit: mock to test module logic. |
27+
| External APIs with sandbox (Stripe, Plaid) | No | Yes | Stack: use sandbox. Unit: mock for isolation. |
28+
| External APIs without sandbox | Yes | Yes | No test environment available at any level. |
29+
| Time (for testing expiry) | Maybe | Yes | Stack: time-skewing libraries. Unit: mock freely. |
2630

27-
Example: Testing a payment flow
31+
Example: Testing a payment flow in a stack test
2832

2933
```typescript
30-
// Good: Use Stripe testnet
34+
// Good: Use Stripe testnet in stack tests
3135
const stripe = new Stripe(process.env.STRIPE_TEST_KEY);
3236
const payment = await stripe.paymentIntents.create({
3337
amount: 1000,
3438
currency: 'usd',
3539
// Real Stripe testnet handles edge cases: declined cards, network errors
3640
});
3741

38-
// Bad: Mock Stripe client
42+
// Bad: Mock Stripe client in a stack test
3943
const mockStripe = {
4044
paymentIntents: {
4145
create: () => ({ id: 'pi_mock', status: 'succeeded' })
4246
};
4347
// This passes tests but tells you nothing about real integration
48+
}
49+
```
50+
51+
Example: Mocking in a unit test is appropriate
52+
53+
```typescript
54+
// Good: Mock the Stripe client in a unit test to isolate payment logic
55+
const mockStripe = {
56+
paymentIntents: {
57+
create: vi.fn().mockResolvedValue({ id: 'pi_test', status: 'succeeded' })
58+
}
4459
};
60+
// Unit test validates how processPayment handles the response,
61+
// including edge cases like declined cards and network errors
4562
```
4663

4764
## Anti-Pattern
4865

49-
**Don't** mock databases "because they're slow." PostgreSQL in Docker adds ~2 seconds to startup. Mock databases to test complex queries in unit tests, not in stack tests.
66+
**Don't** mock databases in stack tests "because they're slow." PostgreSQL in Docker adds ~2 seconds to startup. That's the cost of real confidence.
67+
68+
**Don't** mock external services that provide test environments in stack tests. Stripe, Plaid, Twilio, etc. all provide test/sandbox modes. Use them — they catch real integration bugs.
5069

51-
**Don't** mock external services that provide test environments. Stripe, Plaid, Twilio, etc. all provide test/sandbox modes. Use them — they catch real integration bugs.
70+
**Don't** mock for "determinism" in stack tests. Real systems are non-deterministic. You want tests to fail when race conditions exist, not hide them behind perfect mocks.
5271

53-
**Don't** mock for "determinism." Real systems are non-deterministic. You want tests to fail when race conditions exist, not hide them behind perfect mocks.
72+
**Don't** avoid mocks in unit tests out of misplaced consistency. Unit tests and stack tests serve different purposes — mocks provide isolation in unit tests, real dependencies provide confidence in stack tests.
5473

5574
## Cross-References
5675

57-
- **[Pattern 1.1 — Stack Tests](1.1-stack-tests.md)**: No-mocks is core to stack test philosophy
58-
- **[L0 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract)**: Unit tests validate individual module contracts in isolation — stack tests validate system behavior through the API
59-
- **[L2 — Deterministic Simulation](../L2-behavioral-guardrails.md)**: When you truly need determinism (e.g., replay tests), use simulation, not mocks
76+
- **[Pattern 1.1 — Stack Tests](1.1-stack-tests.md)**: Real dependencies is core to stack test philosophy
77+
- **[L0 — Unit Tests as Contract](../L0-foundation.md#pattern-05--unit-tests-as-contract)**: Unit tests validate individual module contracts — mocks are appropriate for isolation
78+
- **[L2 — Constitutional Rules](../L2-behavioral-guardrails.md#pattern-24--constitutional-rules)**: Constitutional rule enforces real dependencies in stack tests specifically
6079

6180
---
6281

docs/L2-behavioral-guardrails.md

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,8 @@ Skills are the mechanism by which a project's constitution (@L0-foundation.md#pa
5454

5555
A TDD skill for an ecommerce project might extend `superpowers:test-driven-development` and add:
5656

57-
- No mocking database drivers (constitutional rule)
58-
- No mocking payment processor libraries (Stripe, PayPal) (constitutional rule)
57+
- Real dependencies in E2E/integration and stack tests — no mocking database drivers in stack tests (constitutional rule)
58+
- Real dependencies in E2E/integration and stack tests — no mocking payment processor libraries in stack tests (constitutional rule)
5959
- Full-loop assertion requirements (project convention)
6060
- Hook to track test file changes (integration with workflow)
6161

@@ -252,7 +252,7 @@ Constitutional rules are hard constraints declared in CLAUDE.md that never relax
252252

253253
**Example constitutional rules from a production ecommerce platform:**
254254

255-
- **Never mock core system components** — logger, payment processor libraries (Stripe, PayPal), database drivers, HTTP clients for first-party services. Use real components in stack tests.
255+
- **Real dependencies in E2E/integration and stack tests** — logger, payment processor libraries (Stripe, PayPal), database drivers, HTTP clients for first-party services. Use real components in stack tests. Mocks are appropriate in unit tests.
256256
- **Full accounting for every state change** — every inventory change, every order, every transaction fee must be logged and queryable.
257257
- **Evidence-based claims only** — show command output before claiming done. "Tests pass" is not evidence; show the test output.
258258
- **Docker-first development** — no local OS execution. Everything runs in containers.
@@ -263,7 +263,7 @@ Constitutional rules are hard constraints declared in CLAUDE.md that never relax
263263
```
264264
CLAUDE.md declares constitutional rules
265265
+-> plan+ includes rules in plan template (what must this plan respect?)
266-
+-> tdd+ rejects mocked components in test generation
266+
+-> tdd+ rejects mocked components in stack test generation
267267
+-> review+ checks constitutional compliance (did this violate any rules?)
268268
```
269269

@@ -272,10 +272,10 @@ CLAUDE.md declares constitutional rules
272272
Constitutional rule enforcement in the skill chain:
273273

274274
1. **plan+** reads CLAUDE.md and adds a "Constitutional compliance" section to each plan
275-
2. **tdd+** checks that proposed tests don't mock protected components
275+
2. **tdd+** checks that proposed stack tests don't mock protected components
276276
3. **review+** runs a checklist that includes "No constitutional rules violated"
277277

278-
When the agent attempts to mock a database driver, the tdd+ skill blocks it with a reference to the constitutional rule.
278+
When the agent attempts to mock a database driver in a stack test, the tdd+ skill blocks it with a reference to the constitutional rule.
279279

280280
### Anti-Pattern
281281

@@ -284,7 +284,7 @@ Writing "soft" rules with exceptions. Constitutional rules must have no escape h
284284
### Cross-References
285285

286286
- @L0-foundation.md#pattern-04-claude-md-as-project-constitution — CLAUDE.md format
287-
- @L1-patterns/1.5-no-mock-philosophy.md — Mock avoidance rationale
287+
- @L1-patterns/1.5-no-mock-philosophy.md — Real dependencies in E2E/integration and stack tests
288288

289289
### Reference Implementation
290290

docs/cross-cutting/adoption-guide.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Before choosing a path, assess your current state. Run through this checklist to
1919

2020
### Testing
2121

22-
- [ ] Integration tests use real components (no mocks for owned services)
22+
- [ ] Stack tests and E2E/integration tests use real components (no mocks for owned services)
2323
- [ ] Tests run in isolated environments (no shared state)
2424
- [ ] Test failures provide clear diagnostic signals
2525
- [ ] Tests assert on side effects, not just responses
@@ -56,7 +56,7 @@ Document every assumption they had to make. Each assumption is a gap.
5656
| Level | Focus | Common Gaps | Quick Win |
5757
|-------|-------|-------------|-----------|
5858
| **L0 Foundation** | Structure, CLAUDE.md, doc freshness, cleanup | No CLAUDE.md, layer-based organization, stale docs | Write CLAUDE.md, restructure by domain |
59-
| **L1 Closed Loop Design** | Design-led verification with stack tests | Mock-heavy integration tests, shallow assertions | Add app-startup stack test |
59+
| **L1 Closed Loop Design** | Design-led verification with stack tests | Mock-heavy stack/E2E tests, shallow assertions | Add app-startup stack test |
6060
| **L2 Guardrails** | Skills, hooks, behavioral rules | No enforcement, agents make common errors | Add test-integrity skill |
6161
| **L3 Optimization** | Smart routing, structured search | Raw grep/cat commands, token waste | Set up jcodemunch indexing |
6262
| **L4 Standards & Measurement** | Evidence, drift detection, metrics | Claims without evidence, spec drift | Establish evidence standard |
@@ -81,13 +81,13 @@ Start with L0, the highest-impact, lowest-effort starting point. Then build upwa
8181

8282
If your integration tests are the biggest pain point, start at L1.
8383

84-
1. **L1** — Add stack tests for your most brittle integration test areas. Remove mocks for owned services. See [Pattern 1.1 — Stack Tests](../L1-feedback-loops.md#pattern-11--stack-tests) and [examples/stack-test/](../../examples/stack-test/).
84+
1. **L1** — Add stack tests for your most brittle integration test areas. Remove mocks for owned services in stack tests (mocks are fine in unit tests). See [Pattern 1.1 — Stack Tests](../L1-feedback-loops.md#pattern-11--stack-tests) and [examples/stack-test/](../../examples/stack-test/).
8585
2. **L0** — Once stack tests are working, structure the project to make them maintainable. Deep modules, CLAUDE.md, progressive disclosure.
8686
3. **L2-L4** — Continue upward as in Path A.
8787

8888
### Path C: Guardrails-First (Teams with Agent Errors)
8989

90-
If agents are making consistent errors (wrong mocking, missing tests, ignoring rules), start at L2.
90+
If agents are making consistent errors (wrong mocking in stack tests, missing tests, ignoring rules), start at L2.
9191

9292
1. **L2** — Add skills for your most violated rules, hooks to block destructive patterns. See [L2-behavioral-guardrails.md](../L2-behavioral-guardrails.md).
9393
2. **L0** — Structure the project so agents have clear context to work with.

0 commit comments

Comments
 (0)