|
| 1 | +# ClawZero Test Suite Audit Summary |
| 2 | + |
| 3 | +Date: 2026-04-13 |
| 4 | + |
| 5 | +This document is the enforcement-strength baseline from the generated-test audit pass across: |
| 6 | + |
| 7 | +- `tests/attack_pack/test_attack_pack_expanded_generated.py` |
| 8 | +- `tests/owasp/test_asi_2026_generated.py` |
| 9 | +- `tests/test_policy_matrix_generated.py` |
| 10 | +- `tests/compliance/test_eu_ai_act_generated.py` |
| 11 | +- `tests/session/test_cross_session_isolation_generated.py` |
| 12 | +- `tests/fuzzing/test_engine_fuzz_generated.py` |
| 13 | + |
| 14 | +## What This Suite Guarantees |
| 15 | + |
| 16 | +1. Deterministic decision contracts are asserted (not just decision class membership) for audited generated suites. |
| 17 | +2. Witness artifacts are semantically validated (request linkage, decision/reason/sink/target/profile coherence), not just existence-checked. |
| 18 | +3. Provenance normalization is tested explicitly (input-class and taint normalization behavior). |
| 19 | +4. Coverage boundaries are explicit and machine-visible via intentional `pytest.skip(...)` gap markers. |
| 20 | + |
| 21 | +## Engine Contracts Discovered and Codified |
| 22 | + |
| 23 | +1. `unknown`/`untrusted` normalization to untrusted taint in witness/provenance contexts. |
| 24 | +Source evidence: |
| 25 | +- `tests/attack_pack/test_attack_pack_expanded_generated.py` (`_expected_witness_taint_level`) |
| 26 | +- `tests/owasp/test_asi_2026_generated.py` (`_expected_witness_taint_level`) |
| 27 | +- `tests/test_policy_matrix_generated.py` (`_expected_witness_taint_level`) |
| 28 | +- `tests/compliance/test_eu_ai_act_generated.py` (`_expected_witness_taint_level`) |
| 29 | +- `tests/fuzzing/test_engine_fuzz_generated.py` (`_expected_witness_taint_level`) |
| 30 | + |
| 31 | +2. Input class resolution drives effective profile normalization. |
| 32 | +Behavior: |
| 33 | +- `dev_balanced` + normalized untrusted input => effective profile `dev_strict`. |
| 34 | +Source evidence: |
| 35 | +- `src/clawzero/runtime/engine.py` (`_resolve_input_class`, `_apply_input_class_overrides`, `_prepare_request`) |
| 36 | +- enforced in all audited generated suites via `effective_policy_profile` assertions. |
| 37 | + |
| 38 | +3. Source-label invariance in policy matrix outcomes. |
| 39 | +Behavior: |
| 40 | +- For fixed `taint × sink × profile`, source label is coverage metadata and does not change decision contract. |
| 41 | +Source evidence: |
| 42 | +- `tests/test_policy_matrix_generated.py::test_policy_matrix_contract_source_dimension_is_explicitly_invariant` |
| 43 | + |
| 44 | +4. Filesystem safety guards are layered and dominate permissive paths. |
| 45 | +Behavior: |
| 46 | +- Traversal/sensitive path patterns block via `PATH_BLOCKED` even where embedded policy might otherwise permit. |
| 47 | +Source evidence: |
| 48 | +- `src/clawzero/runtime/engine.py` (`_apply_filesystem_safety_guards`) |
| 49 | +- codified in `tests/fuzzing/test_engine_fuzz_generated.py` expected-decision logic. |
| 50 | + |
| 51 | +5. Session taint precedence contract in `AgentSession`. |
| 52 | +Behavior: |
| 53 | +- `ActionDecision.trust_level` is read first by `AgentSession._taint_level`; forged session blobs only affect taint when `trust_level` is absent. |
| 54 | +Source evidence: |
| 55 | +- `src/clawzero/runtime/session.py` (`_taint_level`) |
| 56 | +- codified by contamination-path test in `tests/session/test_cross_session_isolation_generated.py`. |
| 57 | + |
| 58 | +6. Session ownership is local and enforced in enriched annotations. |
| 59 | +Behavior: |
| 60 | +- Enriched decision `session_id` is always the active session; forged cross-session metadata does not rebind ownership. |
| 61 | +Source evidence: |
| 62 | +- `tests/session/test_cross_session_isolation_generated.py::test_cross_session_isolation_contamination_attempt_is_fail_closed` |
| 63 | + |
| 64 | +7. EU/OWASP mapping in generated compliance suites is model-based, not full-framework coverage. |
| 65 | +Behavior: |
| 66 | +- Each modeled control/article maps to explicit sink + reason-code contracts with scope notes. |
| 67 | +Source evidence: |
| 68 | +- `tests/owasp/test_asi_2026_generated.py` (`ASI_CONTROL_MAPPING_CONTRACTS`, mapping completeness test) |
| 69 | +- `tests/compliance/test_eu_ai_act_generated.py` (`EUAI_CONTROL_MAPPING_CONTRACTS`, mapping completeness test) |
| 70 | + |
| 71 | +## Explicit Gap Markers Added (Intentional Boundaries) |
| 72 | + |
| 73 | +1. Policy matrix allow-path boundary |
| 74 | +- Test: `tests/test_policy_matrix_generated.py::test_policy_matrix_gap_filesystem_read_allow_paths_not_covered` |
| 75 | +- Why: file pins `filesystem.read` to `/etc/passwd` block-path contract; allowlist workspace read paths are intentionally out of scope for this matrix file. |
| 76 | + |
| 77 | +2. ASI cross-category chaining boundary |
| 78 | +- Test: `tests/owasp/test_asi_2026_generated.py::test_asi_cross_category_taint_chain_coverage_gap_is_explicit` |
| 79 | +- Why: suite validates per-control primary sink contracts only; cross-category taint-chain scenarios are intentionally not claimed here. |
| 80 | + |
| 81 | +3. EU AI Act process-obligation boundary |
| 82 | +- Test: `tests/compliance/test_eu_ai_act_generated.py::test_eu_ai_act_gap_aug_2026_unmodeled_obligations_are_explicit` |
| 83 | +- Why: suite models runtime sink enforcement, not process-heavy obligations (technical documentation evidence workflows, conformity/CE-marking workflows, broader incident/reporting process obligations). |
| 84 | + |
| 85 | +4. Session isolation breach signaling boundary |
| 86 | +- Test: `tests/session/test_cross_session_isolation_generated.py::test_cross_session_isolation_gap_dedicated_breach_reason_code_not_implemented` |
| 87 | +- Why: runtime currently fail-closes via taint/escalation but does not emit a dedicated `ISOLATION_BREACH` reason code / alert channel. |
| 88 | + |
| 89 | +5. Legacy-vs-extended fuzz dedup boundary |
| 90 | +- Test: `tests/fuzzing/test_engine_fuzz_generated.py::test_engine_fuzz_generated_gap_cross_suite_dedup_not_enforced` |
| 91 | +- Why: both suites are now strong, but formal dedup ownership boundaries are not yet machine-enforced. |
| 92 | + |
| 93 | +## Fuzz Suites: Overlap and Consolidation Recommendation |
| 94 | + |
| 95 | +### Observed coverage relationship |
| 96 | + |
| 97 | +- Legacy suite: `tests/fuzzing/test_engine_fuzz_generated.py` |
| 98 | + - 1000 cases |
| 99 | + - deterministic runtime contract assertions |
| 100 | + - 21 unique behavior signatures observed in audit analysis |
| 101 | + |
| 102 | +- Extended suite: `tests/fuzzing/test_engine_fuzz_extended_generated.py` |
| 103 | + - 960 generic matrix cases + 48 targeted adversarial cases |
| 104 | + - targeted classes: prompt-injection boundaries, tool-chaining abuse, encoding policy escapes |
| 105 | + |
| 106 | +- Relationship from audit analysis: |
| 107 | + - overlap exists but is partial |
| 108 | + - suites are not pure duplicates; each covers behaviors the other does not assert explicitly |
| 109 | + |
| 110 | +### Recommended consolidation (do not execute silently) |
| 111 | + |
| 112 | +1. Keep both suites but assign explicit ownership: |
| 113 | +- Legacy fuzz (`test_engine_fuzz_generated.py`): runtime contract and normalization/guard invariants. |
| 114 | +- Extended fuzz (`test_engine_fuzz_extended_generated.py`): adversarial scenario classes and exploit-shape behaviors. |
| 115 | + |
| 116 | +2. Add a follow-up `fuzz-coverage-manifest` document/test that declares: |
| 117 | +- which behavior classes are owned by legacy vs extended |
| 118 | +- which classes are intentionally shared |
| 119 | +- what constitutes disallowed duplicate growth |
| 120 | + |
| 121 | +3. Keep the current explicit gap marker until manifest enforcement exists. |
| 122 | + |
| 123 | +## Due-Diligence Interpretation |
| 124 | + |
| 125 | +This audited suite is designed to be honest under review: |
| 126 | + |
| 127 | +- It makes modeled contracts explicit. |
| 128 | +- It does not imply coverage where none exists. |
| 129 | +- It records intentional omissions as executable skip markers. |
| 130 | +- It separates runtime-enforcement guarantees from broader process/compliance claims. |
| 131 | + |
0 commit comments