Skip to content

Add capability security failure taxonomy labels#154

Merged
ProfRandom92 merged 1 commit into
mainfrom
codex/add-capability-security-failure-taxonomy-labels
May 20, 2026
Merged

Add capability security failure taxonomy labels#154
ProfRandom92 merged 1 commit into
mainfrom
codex/add-capability-security-failure-taxonomy-labels

Conversation

@ProfRandom92

Copy link
Copy Markdown
Owner

Motivation

  • Expand the canonical failure taxonomy with a small set of capability/security labels for future deterministic fixture/artifact hardening.
  • Provide deterministic, operational semantics for capability-boundary and governance-related replay failures while avoiding exploitability or production-security claims.
  • Keep the change minimal and registration-only so fixture expectations, generated artifacts, README, workflows, and runtime behavior remain unchanged.
  • Ensure the new labels are documented and covered by focused tests so future fixture PRs can rely on a stable registry.

Description

  • Register four new labels in FAILURE_TAXONOMY in src/validation/failure_taxonomy.py: CAPABILITY_BOUNDARY_LOSS, UNAUTHORIZED_CAPABILITY_PATH, APPROVAL_GATE_LOSS, and POLICY_ENFORCEMENT_GAP, each including the required fields operational_meaning, observable_trigger, contract_or_invariant_type, severity_class, and non_goal.
  • Add a registration-only capability/security expansion section to docs/failure_taxonomy.md that documents deterministic, fixture-bound semantics and evidence expectations for these labels.
  • Add a focused test test_capability_security_expansion_labels_are_registered in tests/test_failure_taxonomy.py asserting the four labels are present, while preserving existing generic taxonomy tests and banned-term checks.
  • No fixture manifests, generated artifacts, README, workflows, package files, or runtime orchestration changes are included in this PR.

Testing

  • Ran pytest tests/test_failure_taxonomy.py -q and the new/updated taxonomy tests passed.
  • Ran pytest tests/test_fixture_manifest.py -q and it passed, verifying fixture manifests remain compatible with the taxonomy.
  • Ran npm run check which completed successfully (full test suite executed and passed), validating typechecks, builds, and broader test coverage.

Codex Task

@ProfRandom92 ProfRandom92 merged commit 7dc279d into main May 20, 2026
4 checks passed

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request expands the failure taxonomy by introducing four new capability and security-related labels: CAPABILITY_BOUNDARY_LOSS, UNAUTHORIZED_CAPABILITY_PATH, APPROVAL_GATE_LOSS, and POLICY_ENFORCEMENT_GAP. These changes include updates to the documentation, the core taxonomy definition in Python, and the addition of a registration test. Review feedback highlights an inconsistency in the severity_class naming convention compared to existing values and suggests a more idiomatic set-based comparison in the test suite.

Comment on lines +143 to +170
"CAPABILITY_BOUNDARY_LOSS": {
"operational_meaning": "Reconstructed replay state no longer preserves an explicit capability, resource, or tool boundary present in the original operational state.",
"observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports missing boundary nodes or boundary edges after reconstruction.",
"contract_or_invariant_type": "capability_boundary",
"severity_class": "safety",
"non_goal": "Not a runtime exploitability claim, live access-control verdict, or external security-breach assertion.",
},
"UNAUTHORIZED_CAPABILITY_PATH": {
"operational_meaning": "Reconstructed replay state introduces an explicit capability, tool, or resource path absent from the original allowed capability boundary.",
"observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports added boundary edges or capability nodes that create a new explicit path.",
"contract_or_invariant_type": "capability_boundary",
"severity_class": "safety",
"non_goal": "Not an intent inference, exploitability judgment, or authorization conclusion derived from prose or unstated policy.",
},
"APPROVAL_GATE_LOSS": {
"operational_meaning": "Replay reconstruction drops an explicit approval, validation, or human-gate commitment required before a guarded action.",
"observable_trigger": "Fixture expectation, ordering artifact, capability-boundary artifact, or validator reports that a required approval or validation gate is missing before a guarded action path.",
"contract_or_invariant_type": "governance_gate",
"severity_class": "governance",
"non_goal": "Not a requirement for live human-in-the-loop runtime behavior and not a clinical, legal, or production approval claim.",
},
"POLICY_ENFORCEMENT_GAP": {
"operational_meaning": "Reconstructed replay state preserves an action or dependency while losing the explicit policy enforcement condition that constrained it.",
"observable_trigger": "Fixture expectation, policy-order contract, capability-boundary artifact, or validator reports a missing policy or guard condition while the related action path remains present.",
"contract_or_invariant_type": "policy_enforcement",
"severity_class": "governance",
"non_goal": "Not a live policy-engine bypass claim, external compliance assertion, or runtime exploitability determination.",
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The new severity_class values "safety" and "governance" are inconsistent with the existing values (critical, high, medium) used in the taxonomy. This could cause confusion as they seem to represent categories rather than severity levels.

To improve clarity and maintainability, please consider mapping these to the existing severity scale. For example:

  • CAPABILITY_BOUNDARY_LOSS and UNAUTHORIZED_CAPABILITY_PATH seem to be critical issues.
  • APPROVAL_GATE_LOSS and POLICY_ENFORCEMENT_GAP seem to be high severity issues.

If "safety" and "governance" are intended as new classification axes, it might be better to introduce a separate field for them to avoid overloading the meaning of severity_class.

    "CAPABILITY_BOUNDARY_LOSS": {
        "operational_meaning": "Reconstructed replay state no longer preserves an explicit capability, resource, or tool boundary present in the original operational state.",
        "observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports missing boundary nodes or boundary edges after reconstruction.",
        "contract_or_invariant_type": "capability_boundary",
        "severity_class": "critical",
        "non_goal": "Not a runtime exploitability claim, live access-control verdict, or external security-breach assertion.",
    },
    "UNAUTHORIZED_CAPABILITY_PATH": {
        "operational_meaning": "Reconstructed replay state introduces an explicit capability, tool, or resource path absent from the original allowed capability boundary.",
        "observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports added boundary edges or capability nodes that create a new explicit path.",
        "contract_or_invariant_type": "capability_boundary",
        "severity_class": "critical",
        "non_goal": "Not an intent inference, exploitability judgment, or authorization conclusion derived from prose or unstated policy.",
    },
    "APPROVAL_GATE_LOSS": {
        "operational_meaning": "Replay reconstruction drops an explicit approval, validation, or human-gate commitment required before a guarded action.",
        "observable_trigger": "Fixture expectation, ordering artifact, capability-boundary artifact, or validator reports that a required approval or validation gate is missing before a guarded action path.",
        "contract_or_invariant_type": "governance_gate",
        "severity_class": "high",
        "non_goal": "Not a requirement for live human-in-the-loop runtime behavior and not a clinical, legal, or production approval claim.",
    },
    "POLICY_ENFORCEMENT_GAP": {
        "operational_meaning": "Reconstructed replay state preserves an action or dependency while losing the explicit policy enforcement condition that constrained it.",
        "observable_trigger": "Fixture expectation, policy-order contract, capability-boundary artifact, or validator reports a missing policy or guard condition while the related action path remains present.",
        "contract_or_invariant_type": "policy_enforcement",
        "severity_class": "high",
        "non_goal": "Not a live policy-engine bypass claim, external compliance assertion, or runtime exploitability determination.",
    },

"APPROVAL_GATE_LOSS",
"POLICY_ENFORCEMENT_GAP",
}
missing = sorted(label for label in expected_labels if label not in FAILURE_TAXONOMY)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic to find missing labels can be expressed more concisely and idiomatically using set operations. This is also generally more performant for large collections.

Suggested change
missing = sorted(label for label in expected_labels if label not in FAILURE_TAXONOMY)
missing = sorted(expected_labels - FAILURE_TAXONOMY.keys())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant