Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 19 additions & 0 deletions docs/failure_taxonomy.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,3 +30,22 @@ Each registered label includes:
- `HIGH_CRITICAL_EVIDENCE_LOSS`

These preferred labels are operationally defined in the canonical registry, regardless of whether a given fixture family currently emits each one.

## Capability/security taxonomy expansion (registration-only)

The following labels are registered for future deterministic fixture/artifact hardening, with operational semantics anchored to explicit contracts and replay evidence:

- `CAPABILITY_BOUNDARY_LOSS`
- deterministic focus: explicit boundary preservation loss in reconstructed replay state
- expected evidence shape: missing boundary nodes/edges in capability-boundary contracts, fixtures, or artifacts
- `UNAUTHORIZED_CAPABILITY_PATH`
- deterministic focus: explicit new capability/resource/tool path introduced in reconstruction
- expected evidence shape: added boundary edges or nodes that create a new path not present in allowed baseline
- `APPROVAL_GATE_LOSS`
- deterministic focus: required approval/validation/human-gate commitment missing before guarded action path
- expected evidence shape: ordering/capability-boundary fixtures or artifacts showing absent gate precondition
- `POLICY_ENFORCEMENT_GAP`
- deterministic focus: policy enforcement condition dropped while related action/dependency path remains present
- expected evidence shape: policy/guard contract evidence showing missing enforcement constraint with surviving action path

Registration in this taxonomy does not itself change fixture expectations or generated artifacts. Any future fixture use of these labels must be backed by deterministic contracts or artifact evidence.
28 changes: 28 additions & 0 deletions src/validation/failure_taxonomy.py
Original file line number Diff line number Diff line change
Expand Up @@ -140,6 +140,34 @@
"severity_class": "critical",
"non_goal": "Not a single-run runtime fault outside deterministic replay validation.",
},
"CAPABILITY_BOUNDARY_LOSS": {
"operational_meaning": "Reconstructed replay state no longer preserves an explicit capability, resource, or tool boundary present in the original operational state.",
"observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports missing boundary nodes or boundary edges after reconstruction.",
"contract_or_invariant_type": "capability_boundary",
"severity_class": "safety",
"non_goal": "Not a runtime exploitability claim, live access-control verdict, or external security-breach assertion.",
},
"UNAUTHORIZED_CAPABILITY_PATH": {
"operational_meaning": "Reconstructed replay state introduces an explicit capability, tool, or resource path absent from the original allowed capability boundary.",
"observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports added boundary edges or capability nodes that create a new explicit path.",
"contract_or_invariant_type": "capability_boundary",
"severity_class": "safety",
"non_goal": "Not an intent inference, exploitability judgment, or authorization conclusion derived from prose or unstated policy.",
},
"APPROVAL_GATE_LOSS": {
"operational_meaning": "Replay reconstruction drops an explicit approval, validation, or human-gate commitment required before a guarded action.",
"observable_trigger": "Fixture expectation, ordering artifact, capability-boundary artifact, or validator reports that a required approval or validation gate is missing before a guarded action path.",
"contract_or_invariant_type": "governance_gate",
"severity_class": "governance",
"non_goal": "Not a requirement for live human-in-the-loop runtime behavior and not a clinical, legal, or production approval claim.",
},
"POLICY_ENFORCEMENT_GAP": {
"operational_meaning": "Reconstructed replay state preserves an action or dependency while losing the explicit policy enforcement condition that constrained it.",
"observable_trigger": "Fixture expectation, policy-order contract, capability-boundary artifact, or validator reports a missing policy or guard condition while the related action path remains present.",
"contract_or_invariant_type": "policy_enforcement",
"severity_class": "governance",
"non_goal": "Not a live policy-engine bypass claim, external compliance assertion, or runtime exploitability determination.",
},
Comment on lines +143 to +170

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new severity_class values "safety" and "governance" are inconsistent with the existing values (critical, high, medium) used in the taxonomy. This could cause confusion as they seem to represent categories rather than severity levels.

To improve clarity and maintainability, please consider mapping these to the existing severity scale. For example:

  • CAPABILITY_BOUNDARY_LOSS and UNAUTHORIZED_CAPABILITY_PATH seem to be critical issues.
  • APPROVAL_GATE_LOSS and POLICY_ENFORCEMENT_GAP seem to be high severity issues.

If "safety" and "governance" are intended as new classification axes, it might be better to introduce a separate field for them to avoid overloading the meaning of severity_class.

    "CAPABILITY_BOUNDARY_LOSS": {
        "operational_meaning": "Reconstructed replay state no longer preserves an explicit capability, resource, or tool boundary present in the original operational state.",
        "observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports missing boundary nodes or boundary edges after reconstruction.",
        "contract_or_invariant_type": "capability_boundary",
        "severity_class": "critical",
        "non_goal": "Not a runtime exploitability claim, live access-control verdict, or external security-breach assertion.",
    },
    "UNAUTHORIZED_CAPABILITY_PATH": {
        "operational_meaning": "Reconstructed replay state introduces an explicit capability, tool, or resource path absent from the original allowed capability boundary.",
        "observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports added boundary edges or capability nodes that create a new explicit path.",
        "contract_or_invariant_type": "capability_boundary",
        "severity_class": "critical",
        "non_goal": "Not an intent inference, exploitability judgment, or authorization conclusion derived from prose or unstated policy.",
    },
    "APPROVAL_GATE_LOSS": {
        "operational_meaning": "Replay reconstruction drops an explicit approval, validation, or human-gate commitment required before a guarded action.",
        "observable_trigger": "Fixture expectation, ordering artifact, capability-boundary artifact, or validator reports that a required approval or validation gate is missing before a guarded action path.",
        "contract_or_invariant_type": "governance_gate",
        "severity_class": "high",
        "non_goal": "Not a requirement for live human-in-the-loop runtime behavior and not a clinical, legal, or production approval claim.",
    },
    "POLICY_ENFORCEMENT_GAP": {
        "operational_meaning": "Reconstructed replay state preserves an action or dependency while losing the explicit policy enforcement condition that constrained it.",
        "observable_trigger": "Fixture expectation, policy-order contract, capability-boundary artifact, or validator reports a missing policy or guard condition while the related action path remains present.",
        "contract_or_invariant_type": "policy_enforcement",
        "severity_class": "high",
        "non_goal": "Not a live policy-engine bypass claim, external compliance assertion, or runtime exploitability determination.",
    },

"CONSTRAINT_DRIFT": {
"operational_meaning": "Constraint preservation falls below full deterministic survival.",
"observable_trigger": "constraint_survival_rate < 1.0 in replay metrics.",
Expand Down
11 changes: 11 additions & 0 deletions tests/test_failure_taxonomy.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,3 +76,14 @@ def test_registered_labels_do_not_use_banned_fuzzy_terms() -> None:
normalized = label.lower()
for banned in BANNED_FUZZY_TERMS:
assert banned not in normalized, f"label '{label}' contains banned fuzzy term '{banned}'"


def test_capability_security_expansion_labels_are_registered() -> None:
expected_labels = {
"CAPABILITY_BOUNDARY_LOSS",
"UNAUTHORIZED_CAPABILITY_PATH",
"APPROVAL_GATE_LOSS",
"POLICY_ENFORCEMENT_GAP",
}
missing = sorted(label for label in expected_labels if label not in FAILURE_TAXONOMY)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to find missing labels can be expressed more concisely and idiomatically using set operations. This is also generally more performant for large collections.

Suggested change
missing = sorted(label for label in expected_labels if label not in FAILURE_TAXONOMY)
missing = sorted(expected_labels - FAILURE_TAXONOMY.keys())

assert not missing, f"expected capability/security labels missing from taxonomy: {missing}"
Loading