Skip to content

Commit 0b8d6a9

Browse files
committed
Strengthen testing methodology requirements
1 parent 3acf7ba commit 0b8d6a9

File tree

3 files changed

+52
-4
lines changed

3 files changed

+52
-4
lines changed

docs/templates/AGENTS.md

Lines changed: 22 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -173,10 +173,20 @@ Local `AGENTS.md` files may tighten these values, but they must not loosen them
173173
- task goal and scope
174174
- a detailed implementation plan with detailed ordered steps
175175
- constraints and risks
176+
- explicit test steps as part of the ordered plan, not as a later add-on
177+
- the test and verification strategy for each planned step
178+
- the testing methodology for the task: what flows will be tested, how they will be tested, and what quality bar the tests must meet
179+
- an explicit full-test baseline step after the plan is prepared
180+
- a tracked list of already failing tests, with one checklist item per failing test
181+
- root-cause notes and intended fix path for each failing test that must be addressed
176182
- a checklist with explicit done criteria for each step
177183
- ordered final validation skills and commands, with reason for each
178184
- Use the Ralph Loop for every non-trivial task:
179185
- plan in detail in `<slug>.plan.md` before coding or document edits
186+
- include test creation, test updates, and verification work in the ordered steps from the start
187+
- once the initial plan is ready, run the full relevant test suite to establish the real baseline
188+
- if tests are already failing, add each failing test back into `<slug>.plan.md` as a tracked item with its failure symptom, suspected cause, and fix status
189+
- work through failing tests one by one: reproduce, find the root cause, apply the fix, rerun, and update the plan file
180190
- include ordered final validation skills in the plan file, with reason for each skill
181191
- require each selected skill to produce a concrete action, artifact, or verification outcome
182192
- execute one planned step at a time
@@ -190,6 +200,7 @@ Local `AGENTS.md` files may tighten these values, but they must not loosen them
190200
- broader required regressions
191201
- If `build` is separate from `test`, run `build` before `test`.
192202
- After tests pass, run `format`, then the final required verification commands.
203+
- The task is complete only when every planned checklist item is done and all relevant tests are green.
193204
- Summarize the change, risks, and verification before marking the task complete.
194205

195206
### Documentation
@@ -204,6 +215,11 @@ Local `AGENTS.md` files may tighten these values, but they must not loosen them
204215
- Public bootstrap templates are limited to root-level agent files. Authoring scaffolds for architecture, features, ADRs, and other workflows live in skills.
205216
- Update feature docs when behaviour changes.
206217
- Update ADRs when architecture, boundaries, or standards change.
218+
- For non-trivial work, the plan file, feature doc, or ADR MUST document the testing methodology:
219+
- what flows are covered
220+
- how they are tested
221+
- which commands prove them
222+
- what quality and coverage requirements must hold
207223
- Every feature doc under `docs/Features/` MUST contain at least one Mermaid diagram for the main behaviour or flow.
208224
- Every ADR under `docs/ADR/` MUST contain at least one Mermaid diagram for the decision, boundaries, or interactions.
209225
- Mermaid diagrams are mandatory in architecture docs, feature docs, and ADRs.
@@ -213,16 +229,19 @@ Local `AGENTS.md` files may tighten these values, but they must not loosen them
213229

214230
- TDD is the default for new behaviour and bug fixes: write the failing test first, make it pass, then refactor.
215231
- Bug fixes start with a failing regression test that reproduces the issue.
216-
- Every behaviour change needs automated tests with meaningful assertions.
217-
- Tests must prove the user flow or caller-visible system flow, including the happy path and the most important failure or edge path.
232+
- Every behaviour change needs new or updated automated tests with meaningful assertions. New tests are mandatory for new behaviour and bug fixes.
233+
- Tests must prove the real user flow or caller-visible system flow, not only internal implementation details.
234+
- Tests should be as realistic as possible and exercise the system through real flows, contracts, and dependencies.
235+
- Tests must cover positive flows, negative flows, edge cases, and unexpected paths from multiple relevant angles when the behaviour can fail in different ways.
218236
- Prefer integration/API/UI tests over isolated unit tests when behaviour crosses boundaries.
219237
- Do not use mocks, fakes, stubs, or service doubles in verification.
220238
- Exercise internal and external dependencies through real containers, test instances, or sandbox environments that match the real contract.
221239
- Flaky tests are failures. Fix the cause.
222240
- Changed production code MUST reach at least 80% line coverage, and at least 70% branch coverage where branch coverage is available.
223241
- Critical flows and public contracts MUST reach at least 90% line coverage with explicit success and failure assertions.
224-
- Repository or module coverage must not decrease without an explicit written exception.
242+
- Repository or module coverage must not decrease without an explicit written exception. Coverage after the change must stay at least at the previous baseline or improve.
225243
- Coverage is for finding gaps, not gaming a number. Coverage numbers do not replace scenario coverage or user-flow verification.
244+
- The task is not done until the full relevant test suite is green, not only the newly added tests.
226245
- If the stack is `.NET`, document the active framework and runner model explicitly so agents do not mix VSTest and Microsoft.Testing.Platform assumptions.
227246
- If the stack is `.NET`, after changing production code run the repo-defined quality pass: format, build, analyze, focused tests, broader tests, coverage, and any configured extra gates such as architecture, security, or mutation checks.
228247

skills/mcaf-adr-writing/references/adr-template.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -141,6 +141,16 @@ This section is mandatory: describe how to prove the decision (tests + commands)
141141
- Data and reset strategy (seed data, migrations, rollback plan):
142142
- External dependencies (real / sandbox / test environment required):
143143

144+
### Testing methodology
145+
146+
- Core flows and invariants that MUST be proven:
147+
- Positive flows that MUST pass:
148+
- Negative / forbidden flows that MUST be rejected or fail safely:
149+
- Edge / boundary / unexpected flows that MUST be covered:
150+
- Required realism level (real dependencies, contracts, environments):
151+
- Coverage baseline requirement (must stay at least at the pre-change level or improve):
152+
- Pass criteria for considering the ADR implementation complete (all relevant tests green, new tests added, verification complete):
153+
144154
### Test commands
145155

146156
- build: (paste from `AGENTS.md`)
@@ -159,6 +169,7 @@ This section is mandatory: describe how to prove the decision (tests + commands)
159169
- Regression suites to run (must stay green):
160170
- Static analysis (tools/configs that must pass):
161171
- Monitoring during rollout (logs/metrics/alerts to watch):
172+
- Coverage comparison against baseline:
162173

163174
---
164175

@@ -184,4 +195,7 @@ This section is mandatory: describe how to prove the decision (tests + commands)
184195
- [ ] Status reflects real state (`Proposed`, `Accepted`, `Rejected`, `Superseded`).
185196
- [ ] Links to related features, tests, and ADRs are filled in.
186197
- [ ] Diagram section contains at least one Mermaid diagram.
198+
- [ ] Testing methodology is filled in with positive, negative, and edge flows plus pass criteria.
199+
- [ ] New or updated automated tests exist for the changed behaviour.
200+
- [ ] All relevant tests are green and coverage did not fall below baseline.
187201
- [ ] `docs/Architecture/Overview.md` updated if module boundaries or interactions changed.

skills/mcaf-feature-spec/references/feature-template.md

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,6 +111,16 @@ This section is mandatory: describe how to test (scenarios + commands).
111111
- Data and reset strategy (seed data, fixtures, migration steps):
112112
- External dependencies (real / sandbox / test environment required):
113113

114+
### Testing methodology
115+
116+
- Main flows that MUST be proven end-to-end:
117+
- Positive flows that MUST pass:
118+
- Negative flows that MUST fail safely and predictably:
119+
- Edge / boundary / unexpected flows that MUST be covered:
120+
- Test realism requirements (real dependencies, contracts, environments):
121+
- Coverage baseline requirement (must stay at least at the pre-change level or improve):
122+
- Pass criteria for considering the task done (all relevant tests green, new tests added, verification complete):
123+
114124
### Test commands
115125

116126
- build: (paste from `AGENTS.md`)
@@ -145,6 +155,7 @@ This section is mandatory: describe how to test (scenarios + commands).
145155
- UI / E2E tests:
146156
- Unit tests:
147157
- Static analysis:
158+
- Coverage comparison against baseline:
148159

149160
### Non-functional checks
150161

@@ -161,8 +172,12 @@ Include this section only if it applies to this feature; otherwise remove it.
161172
- Behaviour matches rules and flows in this document.
162173
- Diagram section contains at least one Mermaid diagram that renders in the repo.
163174
- All test flows above are covered by automated tests (Integration / API / UI as applicable).
175+
- Testing methodology is written down and matches the implemented tests.
176+
- New or updated automated tests were added for the changed behaviour.
177+
- Positive, negative, and edge flows are all covered where applicable.
164178
- Static analysis passes with no new unresolved issues.
165-
- Test and build commands listed above run clean in local and CI environments.
179+
- Test and build commands listed above run clean in local and CI environments, and all relevant tests are green.
180+
- Coverage is at least at the pre-change baseline or better.
166181
- Documentation updated: this feature doc, related ADRs, Testing / API / Architecture docs, `AGENTS.md` if rules or patterns changed.
167182
- Feature flags / migrations rolled out or cleaned up.
168183

0 commit comments

Comments
 (0)