Skip to content

Commit 1e72e0c

Browse files
committed
docs: add generated test suite audit summary baseline
1 parent 3468d14 commit 1e72e0c

1 file changed

Lines changed: 131 additions & 0 deletions

File tree

docs/test-suite-audit-summary.md

Lines changed: 131 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,131 @@
1+
# ClawZero Test Suite Audit Summary
2+
3+
Date: 2026-04-13
4+
5+
This document is the enforcement-strength baseline from the generated-test audit pass across:
6+
7+
- `tests/attack_pack/test_attack_pack_expanded_generated.py`
8+
- `tests/owasp/test_asi_2026_generated.py`
9+
- `tests/test_policy_matrix_generated.py`
10+
- `tests/compliance/test_eu_ai_act_generated.py`
11+
- `tests/session/test_cross_session_isolation_generated.py`
12+
- `tests/fuzzing/test_engine_fuzz_generated.py`
13+
14+
## What This Suite Guarantees
15+
16+
1. Deterministic decision contracts are asserted (not just decision class membership) for audited generated suites.
17+
2. Witness artifacts are semantically validated (request linkage, decision/reason/sink/target/profile coherence), not just existence-checked.
18+
3. Provenance normalization is tested explicitly (input-class and taint normalization behavior).
19+
4. Coverage boundaries are explicit and machine-visible via intentional `pytest.skip(...)` gap markers.
20+
21+
## Engine Contracts Discovered and Codified
22+
23+
1. `unknown`/`untrusted` normalization to untrusted taint in witness/provenance contexts.
24+
Source evidence:
25+
- `tests/attack_pack/test_attack_pack_expanded_generated.py` (`_expected_witness_taint_level`)
26+
- `tests/owasp/test_asi_2026_generated.py` (`_expected_witness_taint_level`)
27+
- `tests/test_policy_matrix_generated.py` (`_expected_witness_taint_level`)
28+
- `tests/compliance/test_eu_ai_act_generated.py` (`_expected_witness_taint_level`)
29+
- `tests/fuzzing/test_engine_fuzz_generated.py` (`_expected_witness_taint_level`)
30+
31+
2. Input class resolution drives effective profile normalization.
32+
Behavior:
33+
- `dev_balanced` + normalized untrusted input => effective profile `dev_strict`.
34+
Source evidence:
35+
- `src/clawzero/runtime/engine.py` (`_resolve_input_class`, `_apply_input_class_overrides`, `_prepare_request`)
36+
- enforced in all audited generated suites via `effective_policy_profile` assertions.
37+
38+
3. Source-label invariance in policy matrix outcomes.
39+
Behavior:
40+
- For fixed `taint × sink × profile`, source label is coverage metadata and does not change decision contract.
41+
Source evidence:
42+
- `tests/test_policy_matrix_generated.py::test_policy_matrix_contract_source_dimension_is_explicitly_invariant`
43+
44+
4. Filesystem safety guards are layered and dominate permissive paths.
45+
Behavior:
46+
- Traversal/sensitive path patterns block via `PATH_BLOCKED` even where embedded policy might otherwise permit.
47+
Source evidence:
48+
- `src/clawzero/runtime/engine.py` (`_apply_filesystem_safety_guards`)
49+
- codified in `tests/fuzzing/test_engine_fuzz_generated.py` expected-decision logic.
50+
51+
5. Session taint precedence contract in `AgentSession`.
52+
Behavior:
53+
- `ActionDecision.trust_level` is read first by `AgentSession._taint_level`; forged session blobs only affect taint when `trust_level` is absent.
54+
Source evidence:
55+
- `src/clawzero/runtime/session.py` (`_taint_level`)
56+
- codified by contamination-path test in `tests/session/test_cross_session_isolation_generated.py`.
57+
58+
6. Session ownership is local and enforced in enriched annotations.
59+
Behavior:
60+
- Enriched decision `session_id` is always the active session; forged cross-session metadata does not rebind ownership.
61+
Source evidence:
62+
- `tests/session/test_cross_session_isolation_generated.py::test_cross_session_isolation_contamination_attempt_is_fail_closed`
63+
64+
7. EU/OWASP mapping in generated compliance suites is model-based, not full-framework coverage.
65+
Behavior:
66+
- Each modeled control/article maps to explicit sink + reason-code contracts with scope notes.
67+
Source evidence:
68+
- `tests/owasp/test_asi_2026_generated.py` (`ASI_CONTROL_MAPPING_CONTRACTS`, mapping completeness test)
69+
- `tests/compliance/test_eu_ai_act_generated.py` (`EUAI_CONTROL_MAPPING_CONTRACTS`, mapping completeness test)
70+
71+
## Explicit Gap Markers Added (Intentional Boundaries)
72+
73+
1. Policy matrix allow-path boundary
74+
- Test: `tests/test_policy_matrix_generated.py::test_policy_matrix_gap_filesystem_read_allow_paths_not_covered`
75+
- Why: file pins `filesystem.read` to `/etc/passwd` block-path contract; allowlist workspace read paths are intentionally out of scope for this matrix file.
76+
77+
2. ASI cross-category chaining boundary
78+
- Test: `tests/owasp/test_asi_2026_generated.py::test_asi_cross_category_taint_chain_coverage_gap_is_explicit`
79+
- Why: suite validates per-control primary sink contracts only; cross-category taint-chain scenarios are intentionally not claimed here.
80+
81+
3. EU AI Act process-obligation boundary
82+
- Test: `tests/compliance/test_eu_ai_act_generated.py::test_eu_ai_act_gap_aug_2026_unmodeled_obligations_are_explicit`
83+
- Why: suite models runtime sink enforcement, not process-heavy obligations (technical documentation evidence workflows, conformity/CE-marking workflows, broader incident/reporting process obligations).
84+
85+
4. Session isolation breach signaling boundary
86+
- Test: `tests/session/test_cross_session_isolation_generated.py::test_cross_session_isolation_gap_dedicated_breach_reason_code_not_implemented`
87+
- Why: runtime currently fail-closes via taint/escalation but does not emit a dedicated `ISOLATION_BREACH` reason code / alert channel.
88+
89+
5. Legacy-vs-extended fuzz dedup boundary
90+
- Test: `tests/fuzzing/test_engine_fuzz_generated.py::test_engine_fuzz_generated_gap_cross_suite_dedup_not_enforced`
91+
- Why: both suites are now strong, but formal dedup ownership boundaries are not yet machine-enforced.
92+
93+
## Fuzz Suites: Overlap and Consolidation Recommendation
94+
95+
### Observed coverage relationship
96+
97+
- Legacy suite: `tests/fuzzing/test_engine_fuzz_generated.py`
98+
- 1000 cases
99+
- deterministic runtime contract assertions
100+
- 21 unique behavior signatures observed in audit analysis
101+
102+
- Extended suite: `tests/fuzzing/test_engine_fuzz_extended_generated.py`
103+
- 960 generic matrix cases + 48 targeted adversarial cases
104+
- targeted classes: prompt-injection boundaries, tool-chaining abuse, encoding policy escapes
105+
106+
- Relationship from audit analysis:
107+
- overlap exists but is partial
108+
- suites are not pure duplicates; each covers behaviors the other does not assert explicitly
109+
110+
### Recommended consolidation (do not execute silently)
111+
112+
1. Keep both suites but assign explicit ownership:
113+
- Legacy fuzz (`test_engine_fuzz_generated.py`): runtime contract and normalization/guard invariants.
114+
- Extended fuzz (`test_engine_fuzz_extended_generated.py`): adversarial scenario classes and exploit-shape behaviors.
115+
116+
2. Add a follow-up `fuzz-coverage-manifest` document/test that declares:
117+
- which behavior classes are owned by legacy vs extended
118+
- which classes are intentionally shared
119+
- what constitutes disallowed duplicate growth
120+
121+
3. Keep the current explicit gap marker until manifest enforcement exists.
122+
123+
## Due-Diligence Interpretation
124+
125+
This audited suite is designed to be honest under review:
126+
127+
- It makes modeled contracts explicit.
128+
- It does not imply coverage where none exists.
129+
- It records intentional omissions as executable skip markers.
130+
- It separates runtime-enforcement guarantees from broader process/compliance claims.
131+

0 commit comments

Comments
 (0)