|
| 1 | +# Test Authoring Guide |
| 2 | + |
| 3 | +This is the enforcement-strength standard for all new ClawZero tests. |
| 4 | + |
| 5 | +## Non-Negotiable Standard |
| 6 | + |
| 7 | +Every test must enforce behavior, not just execution. |
| 8 | + |
| 9 | +- No weak assertions like only `is not None`, existence-only checks, or broad `in` checks when exact behavior is known. |
| 10 | +- No tolerance paths that silently accept both `allow` and `block` unless the contract explicitly allows both and each branch has strict assertions. |
| 11 | +- No assumptions about engine behavior. Assertions must match documented runtime contracts and current policy semantics. |
| 12 | + |
| 13 | +## Required Assertions by Path |
| 14 | + |
| 15 | +### Block Path |
| 16 | + |
| 17 | +Use `pytest.raises(ExecutionBlocked)` and assert: |
| 18 | + |
| 19 | +- `decision.decision == "block"` |
| 20 | +- `decision.sink_type == expected_sink` |
| 21 | +- `decision.reason_code == expected_reason` (or documented bounded set only where contract requires) |
| 22 | + |
| 23 | +### Allow / Annotate Path |
| 24 | + |
| 25 | +Assert all of: |
| 26 | + |
| 27 | +- Returned result semantics (exact expected payload/shape when deterministic) |
| 28 | +- Witness sink, decision, and reason code |
| 29 | +- Provenance contract fields when applicable (`taint_level`, markers, source chain) |
| 30 | + |
| 31 | +## Session / Chain Tests |
| 32 | + |
| 33 | +For multi-step/session tests, assert: |
| 34 | + |
| 35 | +- Chain detections include expected pattern(s) |
| 36 | +- Detection evidence references real request IDs from the executed chain |
| 37 | +- Threshold-sensitive behavior is validated against profile thresholds |
| 38 | +- Session report counts and persisted log contents match executed calls |
| 39 | + |
| 40 | +## Witness Assertions |
| 41 | + |
| 42 | +When a test depends on witness artifacts, assert: |
| 43 | + |
| 44 | +- witness exists and is a dict |
| 45 | +- witness request linkage (`request_id`) |
| 46 | +- decision/sink/reason match expected enforcement outcome |
| 47 | +- provenance fields are validated against engine normalization rules |
| 48 | + |
| 49 | +## Generated Test Files |
| 50 | + |
| 51 | +Generated suites are held to the same bar as handwritten suites. |
| 52 | + |
| 53 | +- No exception-assertion no-ops |
| 54 | +- No count-inflation-only assertions |
| 55 | +- Same strict enforcement/result/witness/session checks as non-generated tests |
| 56 | + |
| 57 | +## Review Checklist (PR Gate) |
| 58 | + |
| 59 | +Before merge, reviewers should verify: |
| 60 | + |
| 61 | +- Weak assertion patterns are absent |
| 62 | +- Enforcement path(s) are explicitly required by assertions |
| 63 | +- Reason codes and sinks are validated, not implied |
| 64 | +- Tests are grounded in current contracts, not aspirational behavior |
| 65 | +- Local targeted run and CI are both green |
| 66 | + |
| 67 | +## Useful Contract Anchors |
| 68 | + |
| 69 | +- `tests/policy_matrix_data.py` |
| 70 | +- `tests/test_policy_matrix_generated.py` |
| 71 | +- `tests/runtime/test_engine_parity_contract.py` |
| 72 | +- `tests/test_witness_integrity_matrix.py` |
0 commit comments