Skip to content

Commit c01e210

Browse files
committed
feat(code-ops): add specification-implementation test separation rules
Introduce Rule 31 (code.md), Rule 10 (testing.md), and spec test sections in make_plan.md to enforce physical separation of specification tests and implementation tests. This prevents tautological testing where tests mirror the implementation instead of independently verifying behavior against the specification, allowing bugs to ship undetected. Key additions: - Spec vs impl test file naming conventions (.spec.test / .impl.test) - Immutable oracle rule: spec test failures mean the implementation is wrong, not the test - Red-phase verification protocol and escalation procedures - Mandatory spec test case tables in planning documents with source traceability to requirements, API contracts, and ambiguity register
1 parent 34fca77 commit c01e210

7 files changed

Lines changed: 594 additions & 20 deletions

File tree

docs/code.md

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -402,6 +402,47 @@ These rules are **mandatory** and must be applied **strictly and consistently**
402402
- api.errors.test.go
403403
```
404404
405+
31. **🚨 Specification-Implementation Test Separation (🚨 NON-NEGOTIABLE)**
406+
407+
Every feature MUST have two distinct categories of tests, physically separated into different files. This rule prevents **tautological testing** — where tests mirror the implementation instead of independently verifying it against the specification, causing bugs to ship to production undetected.
408+
409+
**Two mandatory test categories:**
410+
411+
| Category | Source of Truth | File Convention | Purpose |
412+
|----------|----------------|-----------------|---------|
413+
| **Specification Tests** | Requirements, acceptance criteria, API contracts, RFCs | `[feature].spec.test.[ext]` | Verify the code does what the **specification** says |
414+
| **Implementation Tests** | The code itself | `[feature].impl.test.[ext]` | Verify internals, edge cases, error paths, boundary conditions |
415+
416+
**Specification test rules:**
417+
- Spec test expectations MUST be derived from specification documents (requirements, acceptance criteria, API contracts) — NEVER from reading the implementation code
418+
- Spec tests are **immutable oracles** — if a spec test fails after implementation, the **implementation is wrong**, not the test
419+
- The agent MUST NOT modify spec test expectations to make them match the implementation without explicit user approval
420+
- When writing spec tests, the agent MAY read type definitions and function signatures (public API surface) but MUST NOT read implementation logic (function bodies, internal algorithms)
421+
- Every spec test MUST include a traceability comment linking it to its source requirement or AR entry
422+
423+
**File organization:**
424+
```
425+
tests/
426+
├── auth/
427+
│ ├── auth.login.spec.test.[ext] # Specification tests — from requirements
428+
│ ├── auth.login.impl.test.[ext] # Implementation tests — edge cases, internals
429+
├── user/
430+
│ ├── user.creation.spec.test.[ext] # Specification tests
431+
│ └── user.creation.impl.test.[ext] # Implementation tests
432+
```
433+
434+
**Describe block labeling:** Within spec test files, use `describe('Specification: [Feature]', ...)` to make the test category unmistakable in test output.
435+
436+
**🚫 PROHIBITED — The agent MUST NOT:**
437+
- ❌ Write only implementation tests without specification tests
438+
- ❌ Combine spec and impl tests in the same file
439+
- ❌ Derive spec test expectations from running the code and observing output
440+
- ❌ Modify spec test assertions to match a broken implementation
441+
- ❌ Skip, disable, or weaken spec tests that fail after implementation
442+
- ❌ Rationalize spec test failures as "the spec was wrong" without user approval
443+
444+
> **📖 See `testing.md` Rule 10 (Specification-First Testing Protocol)** for the full protocol including the red-phase verification, escalation procedures, and interaction with `make_plan`.
445+
405446
---
406447
407448
## 10. Security-First Development

docs/make_plan.md

Lines changed: 148 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -662,13 +662,60 @@ Choose based on estimated size — each document should be manageable within AI
662662
- Integration tests: Key workflows covered
663663
- E2E tests: Complete feature verification
664664

665+
## 🚨 Specification Test Cases (MANDATORY — NON-NEGOTIABLE)
666+
667+
> **These test cases are derived EXCLUSIVELY from requirements (`01-requirements.md`),
668+
> component specifications (`03-XX-*.md`), API contracts, RFCs, and the Ambiguity Register
669+
> (`00-ambiguity-register.md`). They define the expected behavior BEFORE any
670+
> implementation exists.**
671+
>
672+
> **IMMUTABLE ORACLE RULE:** The agent MUST NOT modify these expectations to match the
673+
> implementation. If the implementation does not match a spec test case, the implementation
674+
> is wrong — not the test. See `testing.md` Rule 10 for the full protocol.
675+
>
676+
> **Every spec test case MUST include a source reference** tracing it to the requirement,
677+
> spec document, or AR entry that defines the expected behavior.
678+
679+
### [Component/Feature 1]
680+
681+
| # | Input / Scenario | Expected Output / Behavior | Source |
682+
|---|-----------------|---------------------------|--------|
683+
| ST-1 | [Concrete input or action] | [Concrete expected output or behavior] | [Req X.X / AR #X / RFC §X] |
684+
| ST-2 | [Concrete input or action] | [Concrete expected output or behavior] | [Req X.X / AR #X] |
685+
| ST-3 | [Error/edge scenario] | [Expected error type and message] | [Req X.X / AR #X] |
686+
687+
### [Component/Feature 2]
688+
689+
| # | Input / Scenario | Expected Output / Behavior | Source |
690+
|---|-----------------|---------------------------|--------|
691+
| ST-4 | [Concrete input or action] | [Concrete expected output or behavior] | [Req X.X / AR #X] |
692+
| ST-5 | [Concrete input or action] | [Concrete expected output or behavior] | [Req X.X / AR #X] |
693+
694+
> **⚠️ AUTHORING RULE:** When writing spec test cases, the plan author MUST derive
695+
> expectations from the specification documents listed above. The author MUST NOT
696+
> imagine or infer what the implementation will produce. If the expected output cannot
697+
> be determined from the specification, this is an ambiguity — add it to the Ambiguity
698+
> Register and resolve with the user before defining the test case.
699+
665700
## Test Categories
666701

667-
### Unit Tests
702+
### Specification Tests (from ST-cases above)
703+
704+
> Written BEFORE implementation. Filed as `[feature].spec.test.[ext]`.
705+
> See `testing.md` Rule 10 and `code.md` Rule 31.
668706
669-
| Test | Description | Priority |
670-
| ----------- | ---------------- | ------------ |
671-
| [Test name] | [What it tests] | High/Med/Low |
707+
| Test File | ST Cases Covered | Component |
708+
| --------- | ---------------- | --------- |
709+
| `[feature].spec.test.[ext]` | ST-1, ST-2, ST-3 | [Component 1] |
710+
| `[feature].spec.test.[ext]` | ST-4, ST-5 | [Component 2] |
711+
712+
### Implementation Tests (edge cases, internals)
713+
714+
> Written AFTER implementation. Filed as `[feature].impl.test.[ext]`.
715+
716+
| Test File | Description | Priority |
717+
| --------- | ----------- | -------- |
718+
| `[feature].impl.test.[ext]` | [Edge cases, boundary conditions, internal logic] | High/Med/Low |
672719

673720
### Integration Tests
674721

@@ -694,6 +741,12 @@ Choose based on estimated size — each document should be manageable within AI
694741

695742
## Verification Checklist
696743

744+
- [ ] All specification test cases (ST-*) defined with concrete input/output pairs
745+
- [ ] Every ST case traces to a requirement, spec doc, or AR entry
746+
- [ ] Specification tests written BEFORE implementation
747+
- [ ] Specification tests verified to FAIL before implementation (red phase)
748+
- [ ] All specification tests pass after implementation (green phase)
749+
- [ ] Implementation tests written for edge cases and internals
697750
- [ ] All unit tests pass
698751
- [ ] All integration tests pass
699752
- [ ] All E2E tests pass
@@ -729,6 +782,16 @@ Before finalizing plan documents, run this checklist:
729782
- [ ] E2E tests planned
730783
- [ ] Test coverage goals defined
731784

785+
**✅ Specification-First Testing (per `testing.md` Rule 10, `code.md` Rule 31) — 🚨 NON-NEGOTIABLE**
786+
- [ ] `07-testing-strategy.md` contains the `🚨 Specification Test Cases` section with concrete ST-cases
787+
- [ ] Every ST-case has concrete input → expected output pairs (not just test names/descriptions)
788+
- [ ] Every ST-case traces to a requirement, spec document, RFC, or AR entry
789+
- [ ] ST-case expectations are derived from specification documents, NOT from imagined implementation behavior
790+
- [ ] `99-execution-plan.md` follows the three-phase task ordering: spec tests → implementation → impl tests
791+
- [ ] Spec test tasks reference ST-cases from `07-testing-strategy.md`
792+
- [ ] Spec test and impl test files use separate naming convention (`*.spec.test.*` and `*.impl.test.*`)
793+
- [ ] Red-phase verification task exists in execution plan (verify spec tests fail before implementation)
794+
732795
**✅ No Dead Code (per `code.md` rule 4)**
733796
- [ ] No unused parameters (except interface contracts, overrides, and framework-required signatures)
734797
- [ ] No unused functions, classes, or modules
@@ -858,6 +921,87 @@ For each task in order:
858921
4. **Techdocs check (after each phase):** If `docs/index.md` exists with `techdocs: true` frontmatter and the just-completed phase introduced architectural changes (new components, data entities, API endpoints, integrations, or infrastructure), perform an incremental techdocs update (see `techdocs.md` Phase 6.1)
859922
5. Continue until all tasks complete OR context window reaches 90%
860923

924+
> **🚨 SPECIFICATION-FIRST TASK ORDERING — NON-NEGOTIABLE 🚨**
925+
>
926+
> When executing implementation tasks for any feature, the agent MUST follow the three-phase task ordering defined below. This is enforced at the execution plan level — every generated `99-execution-plan.md` MUST structure feature phases in this order. See `testing.md` Rule 10 for the full Specification-First Testing Protocol.
927+
928+
---
929+
930+
## **🚨 CRITICAL: Specification-First Task Ordering in Execution Plans (NON-NEGOTIABLE) 🚨**
931+
932+
**Every feature implementation phase in `99-execution-plan.md` MUST follow this three-phase task structure.** This prevents tautological testing — where tests mirror the implementation instead of independently verifying it against the specification. See `testing.md` Rule 10 and `code.md` Rule 31.
933+
934+
### Mandatory Task Ordering Per Feature
935+
936+
```
937+
Phase N: [Feature Name]
938+
939+
Session N.1: Specification Tests (BEFORE implementation)
940+
N.1.1 Write specification tests from 07-testing-strategy.md ST-cases
941+
→ File: [feature].spec.test.[ext]
942+
→ Source: 07-testing-strategy.md ST-1 through ST-X
943+
→ Agent MUST NOT read implementation logic when writing these tests
944+
N.1.2 Run spec tests — verify they FAIL (red phase)
945+
→ Document any that pass pre-implementation with justification
946+
947+
Session N.2: Implementation
948+
N.2.1 Implement [feature/component] per technical specification
949+
→ File: [implementation files]
950+
→ Reference: 03-XX-[component].md
951+
N.2.2 Run spec tests — verify they PASS (green phase)
952+
→ If any spec test fails: STOP, fix implementation (NOT the test)
953+
954+
Session N.3: Implementation Tests & Hardening
955+
N.3.1 Write implementation tests (edge cases, internals, error paths)
956+
→ File: [feature].impl.test.[ext]
957+
N.3.2 Full verification (project's verify command)
958+
```
959+
960+
### Why This Ordering Is Non-Negotiable
961+
962+
| Step | What It Prevents |
963+
|------|-----------------|
964+
| **Spec tests BEFORE implementation** | Prevents agent from deriving test expectations from the code it just wrote |
965+
| **Red phase verification** | Proves spec tests are meaningful (they test something that doesn't exist yet) |
966+
| **Spec tests PASS after implementation** | Proves the implementation satisfies the specification |
967+
| **Impl tests AFTER implementation** | These tests CAN be derived from the code (edge cases, internals) — but spec tests cannot |
968+
969+
### Enforcement Rules
970+
971+
**🚫 PROHIBITED — The agent MUST NOT:**
972+
973+
- ❌ Write implementation code before specification tests exist for that feature
974+
- ❌ Skip the spec test phase ("we'll write tests after")
975+
- ❌ Combine spec tests and implementation in the same task
976+
- ❌ Write spec tests and implementation simultaneously
977+
- ❌ Generate an execution plan where implementation tasks come before spec test tasks for the same feature
978+
979+
**✅ REQUIRED — Every generated `99-execution-plan.md` MUST:**
980+
981+
- ✅ Structure each feature phase with the three-session ordering above
982+
- ✅ Include explicit spec test file references (`[feature].spec.test.[ext]`)
983+
- ✅ Include explicit impl test file references (`[feature].impl.test.[ext]`)
984+
- ✅ Reference the ST-cases from `07-testing-strategy.md` in spec test tasks
985+
- ✅ Include red-phase verification as a distinct task
986+
987+
### Adaptation for Small Features
988+
989+
For small features where three separate sessions would be excessive, the agent MAY compress into a single session — but the **task ordering is still mandatory**:
990+
991+
```
992+
Session N.1: [Feature Name]
993+
N.1.1 Write specification tests (from ST-cases)
994+
N.1.2 Verify spec tests fail (red phase)
995+
N.1.3 Implement feature
996+
N.1.4 Verify spec tests pass (green phase)
997+
N.1.5 Write implementation tests
998+
N.1.6 Full verification
999+
```
1000+
1001+
The order `spec tests → red phase → implement → green phase → impl tests → verify` is NEVER negotiable, regardless of feature size.
1002+
1003+
---
1004+
8611005
#### Step 3: Session Wrap-Up
8621006

8631007
1. ✅ Complete current task before stopping

docs/preflight.md

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -524,6 +524,33 @@ For any finding in dimensions 2, 4, 5, 6, 11, or 13 — and for any finding that
524524

525525
Codebase reconnaissance (Step 2) should be thorough but proportional. For a plan that modifies 3 files, read those 3 files deeply plus their direct dependents. For a requirements document about a new subsystem, understand the overall architecture and the integration points. Do NOT attempt to read the entire codebase for a small, scoped artifact — that wastes context window. Focus on the code that the artifact actually touches or depends on.
526526

527+
### Rule 10: Same-Agent Bias Awareness — 🚨 NON-NEGOTIABLE
528+
529+
**The agent performing preflight MUST explicitly acknowledge and counteract the risk of same-agent bias.** When the same AI model created the artifact and reviews it, systematic blind spots are likely — the agent shares the same training biases, the same knowledge gaps, and the same reasoning patterns. A bug the agent missed during creation is exactly the kind of bug it will miss during review.
530+
531+
**Structural safeguards:**
532+
533+
1. **Fresh context required** — If the agent created the artifact in the CURRENT session, it MUST note this at the top of the preflight report:
534+
```
535+
⚠️ SAME-SESSION REVIEW: This artifact was created in the current session.
536+
Same-agent bias risk is elevated. Consider running preflight in a new session
537+
for maximum review independence.
538+
```
539+
540+
2. **Standard-first checking** — For any behavior that must conform to an external standard (RFC, protocol, specification, regulation), the agent MUST verify conformance by **citing the specific standard text**, not by reasoning from memory. If the agent cannot cite the standard, it MUST flag this as a limitation:
541+
```
542+
⚠️ Unable to verify conformance with [standard] — agent does not have
543+
access to the full standard text. Flag for human review.
544+
```
545+
546+
3. **Adversarial question checklist** — Before concluding the 13-dimension scan, the agent MUST ask itself:
547+
- "What assumption did I make during creation that I might be unconsciously confirming now?"
548+
- "What external standard or convention might this violate that I'm not aware of?"
549+
- "What would a domain expert who disagrees with my approach flag as wrong?"
550+
If any of these questions surface concerns, add them as 🔵 OBSERVATION findings.
551+
552+
4. **User recommendation** — If the artifact is high-stakes (security-related, compliance-related, or architecturally foundational), the agent SHOULD recommend: *"Consider having a human domain expert review this artifact in addition to the automated preflight."*
553+
527554
---
528555

529556
## **Cross-References**

docs/requirements.md

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -638,6 +638,44 @@ When writing each RD:
638638
- **Complexity Estimates**: Tag each requirement section with estimated complexity (S/M/L/XL) to aid planning
639639
- **Non-Functional RD**: Always create one dedicated RD for non-functional requirements (performance targets, security, scalability, accessibility, availability, backup/recovery). Users frequently forget these.
640640

641+
### 3.4B 🚨 Acceptance Criteria Specificity — NON-NEGOTIABLE
642+
643+
**Acceptance criteria MUST be specific enough that a developer who has never spoken to the user can write a correct test from the criterion alone.** This rule prevents the acceptance criteria tautology — where the agent writes vague criteria, then later writes tests that interpret the criteria however the implementation happens to work, creating a self-validating loop.
644+
645+
**Every acceptance criterion MUST meet ALL of these requirements:**
646+
647+
1. **Measurable outcome** — States a concrete, observable result (not "works correctly" or "handles errors properly")
648+
2. **Specific values** — Includes exact numbers, formats, status codes, or field names where applicable
649+
3. **Standard references** — When the behavior must conform to a standard (RFC, protocol, specification), the criterion MUST cite the specific standard and section (e.g., "per RFC 8414 §2" not "follows the OIDC spec")
650+
4. **Boundary conditions** — States what happens at the edges (empty input, maximum length, zero items, expired tokens)
651+
5. **Negative cases** — States what should NOT happen or what should be rejected
652+
653+
**Examples:**
654+
655+
```
656+
❌ BAD: "The API returns a valid OIDC discovery document"
657+
✅ GOOD: "GET /.well-known/openid-configuration returns a JSON document where
658+
the 'issuer' field exactly matches the URL used to access the endpoint
659+
(per RFC 8414 §2), and includes all REQUIRED fields: issuer,
660+
authorization_endpoint, token_endpoint, jwks_uri,
661+
response_types_supported, subject_types_supported,
662+
id_token_signing_alg_values_supported"
663+
664+
❌ BAD: "Users can reset their password"
665+
✅ GOOD: "POST /auth/reset-password with a valid email returns 202 Accepted,
666+
sends an email with a one-time reset link that expires after 60 minutes,
667+
and the link cannot be reused after the password is changed"
668+
669+
❌ BAD: "The system handles invalid input gracefully"
670+
✅ GOOD: "POST /api/users with a missing 'email' field returns 400 with
671+
{ error: 'VALIDATION_ERROR', details: [{ field: 'email', message: '...' }] }.
672+
POST /api/users with an email longer than 254 characters returns 400."
673+
```
674+
675+
**If the user provides vague acceptance criteria** during review (Step 3.5), the agent MUST ask for specifics: *"This criterion says 'handles errors properly' — what specific error conditions should be handled, and what should the response look like for each?"*
676+
677+
**Traceability to tests:** When `make_plan` later derives test cases from these criteria, each spec test expectation MUST map directly to a specific acceptance criterion. If a criterion is too vague to produce a concrete test assertion, the criterion is defective — not the test.
678+
641679
### 3.5 Authoring Workflow
642680

643681
Write RDs one at a time, presenting each to the user for review:

0 commit comments

Comments
 (0)