Harden MCP capability boundary fixtures by ProfRandom92 · Pull Request #155 · ProfRandom92/Comptextv7

ProfRandom92 · 2026-05-20T16:43:54Z

Summary:

Harden existing mcp_trace_replay fixture family by adding explicit structured capability-boundary data to fixture states (capability_boundaries, allowed_tools, resource_boundaries, permission_scopes, capability_scope) and by introducing deterministic boundary drift in reconstructed degraded variants; add deterministic mapping from contract failures to capability/security labels via optional failure_label_on_violation in selected contracts; regenerate affected artifacts.

Changed files:

Fixtures (mcp_trace_replay family): updated original and reconstructed state.json files for mcp_trace_replay_v1, mcp_trace_replay_mild_v1, mcp_trace_replay_moderate_v1, mcp_trace_replay_degraded_v1 and updated expected/failures.json and expected/admissibility.json where applicable.
fixtures/manifest.json (aligned expected_failure_labels for hardened fixtures).
Added failure_label_on_violation to selected contracts in fixtures/mcp_trace_replay_*/original/contracts/*.json to deterministically map contract violations to registered capability/security labels.
src/validation/contract_validator.py (accept and validate optional failure_label_on_violation and use it when emitting failure labels).
Regenerated artifacts changed by fixture updates: artifacts/mcp_trace_replay_results.json, artifacts/replay_semantic_integrity_results.json, artifacts/graph_diff_results.json, artifacts/capability_boundary_replay_results.json, artifacts/tool_ordering_replay_results.json, artifacts/multi_family_admissibility_results.json, and artifacts/multi_family_admissibility_curves.svg.
Tests updated to accept the capability/security labels: tests/test_capability_boundary_replay_artifact.py and tests/test_tool_ordering_replay_artifact.py.

Testing:

Generated artifacts (commands executed or equivalent Python generators run):
- npm run generate:mcp-trace-replay (Python generator used), npm run generate:replay-semantic-integrity, npm run generate:graph-diff.
- python scripts/generate_capability_boundary_replay_artifact.py (used directly when npm script missing).
- python scripts/generate_tool_ordering_replay_artifact.py.
- python scripts/generate_multi_family_admissibility_artifact.py and python scripts/render_multi_family_admissibility_svg.py.
Tests run (all succeeded):
- pytest tests/test_failure_taxonomy.py — passed.
- pytest tests/test_fixture_manifest.py — passed.
- pytest tests/test_mcp_trace_replay_artifact.py — passed.
- pytest tests/test_replay_semantic_integrity_artifact.py — passed.
- pytest tests/test_graph_diff_artifact.py — passed.
- pytest tests/test_capability_boundary_replay_artifact.py — passed.
- pytest tests/test_tool_ordering_replay_artifact.py — passed.
- pytest tests/test_multi_family_admissibility_artifact.py — passed.
- pytest tests/test_multi_family_svg_renderer.py — passed.
- Full test suite: npm run check (which runs layout/typecheck/validate/build/test) completed successfully and the repository test suite passed (271 tests).

Scope note:

This PR is limited to existing mcp_trace_replay fixture hardening only; no new fixture family was added.
No README or workflow changes were made.
No runtime/orchestration behavior, external dependencies, LLM/embedding/vector/fuzzy behavior, or new taxonomy labels were introduced.

Risks:

Fixture expectations and regenerated artifacts are intentionally strict and may require synchronized updates if validator semantics or taxonomy mappings change.

Request review focused on the deterministic capability-boundary evidence structures and the failure_label_on_violation usage in contracts, and confirm the regenerated artifacts are acceptable for downstream validators.

Codex Task

gemini-code-assist

Code Review

This pull request introduces a more granular failure taxonomy for contract validation, replacing generic labels like CAUSAL_DEPENDENCY_LOSS and INVARIANT_VIOLATION with specific ones such as APPROVAL_GATE_LOSS, CAPABILITY_BOUNDARY_LOSS, and POLICY_ENFORCEMENT_GAP. The changes span across artifact results, test fixtures, and the core validation logic in src/validation/contract_validator.py, where a new failure_label_on_violation field is now supported in contract definitions. Feedback focuses on improving the robustness of this new logic by suggesting early normalization of the failure label to a string, which would eliminate redundant type conversions and allow for more precise type hints across the internal validation methods.

gemini-code-assist · 2026-05-20T16:46:11Z

+        failure_label_on_violation = contract.get("failure_label_on_violation")
+        if failure_label_on_violation is not None and str(failure_label_on_violation) not in FAILURE_TAXONOMY:
+            raise ContractValidationError(
+                f"contract '{contract_id}' has unregistered failure_label_on_violation: {failure_label_on_violation}"
+            )
+
        if contract_type == ContractType.ORDERING:
-            return self._validate_ordering(contract_id, layer, severity, definition, reconstructed)
+            return self._validate_ordering(contract_id, layer, severity, definition, reconstructed, failure_label_on_violation)
        if contract_type == ContractType.REACHABILITY:
-            return self._validate_reachability(contract_id, layer, severity, definition, original, reconstructed)
+            return self._validate_reachability(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)
        if contract_type == ContractType.CAUSALITY:
-            return self._validate_causality(contract_id, layer, severity, definition, original, reconstructed)
+            return self._validate_causality(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)
        if contract_type == ContractType.INVARIANT:
-            return self._validate_invariant(contract_id, layer, severity, definition, original, reconstructed)
+            return self._validate_invariant(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)


Normalizing failure_label_on_violation to a string early ensures consistent type handling and allows for more precise type hints in the internal validation methods.

Suggested change

failure_label_on_violation = contract.get("failure_label_on_violation")

if failure_label_on_violation is not None and str(failure_label_on_violation) not in FAILURE_TAXONOMY:

raise ContractValidationError(

f"contract '{contract_id}' has unregistered failure_label_on_violation: {failure_label_on_violation}"

)

if contract_type == ContractType.ORDERING:

return self._validate_ordering(contract_id, layer, severity, definition, reconstructed)

return self._validate_ordering(contract_id, layer, severity, definition, reconstructed, failure_label_on_violation)

if contract_type == ContractType.REACHABILITY:

return self._validate_reachability(contract_id, layer, severity, definition, original, reconstructed)

return self._validate_reachability(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)

if contract_type == ContractType.CAUSALITY:

return self._validate_causality(contract_id, layer, severity, definition, original, reconstructed)

return self._validate_causality(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)

if contract_type == ContractType.INVARIANT:

return self._validate_invariant(contract_id, layer, severity, definition, original, reconstructed)

return self._validate_invariant(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)

failure_label_on_violation = contract.get("failure_label_on_violation")

if failure_label_on_violation is not None:

failure_label_on_violation = str(failure_label_on_violation)

if failure_label_on_violation not in FAILURE_TAXONOMY:

raise ContractValidationError(

f"contract '{contract_id}' has unregistered failure_label_on_violation: {failure_label_on_violation}"

)

if contract_type == ContractType.ORDERING:

return self._validate_ordering(contract_id, layer, severity, definition, reconstructed, failure_label_on_violation)

if contract_type == ContractType.REACHABILITY:

return self._validate_reachability(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)

if contract_type == ContractType.CAUSALITY:

return self._validate_causality(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)

if contract_type == ContractType.INVARIANT:

return self._validate_invariant(contract_id, layer, severity, definition, original, reconstructed, failure_label_on_violation)

gemini-code-assist · 2026-05-20T16:46:11Z

        return False

-    def _validate_ordering(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], reconstructed: dict[str, Any]) -> ValidationResult:
+    def _validate_ordering(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:


Use a specific type hint for failure_label_on_violation instead of Any to improve type safety.

Suggested change

def _validate_ordering(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:

def _validate_ordering(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str | None) -> ValidationResult:

gemini-code-assist · 2026-05-20T16:46:12Z

            passed=passed,
            severity=severity,
-            failure_label=None if passed else "POLICY_ORDER_BROKEN",
+            failure_label=None if passed else str(failure_label_on_violation or "POLICY_ORDER_BROKEN"),


The str() conversion is redundant if the input is already normalized to a string or None.

Suggested change

failure_label=None if passed else str(failure_label_on_violation or "POLICY_ORDER_BROKEN"),

failure_label=None if passed else (failure_label_on_violation or "POLICY_ORDER_BROKEN"),

gemini-code-assist · 2026-05-20T16:46:12Z

        return sorted(target for target in targets if target in visited)

-    def _validate_reachability(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any]) -> ValidationResult:
+    def _validate_reachability(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:


Use a specific type hint for failure_label_on_violation instead of Any.

Suggested change

def _validate_reachability(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:

def _validate_reachability(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str | None) -> ValidationResult:

gemini-code-assist · 2026-05-20T16:46:12Z

            passed=passed,
            severity=severity,
-            failure_label=None if passed else "RECOVERY_PATH_INVALID",
+            failure_label=None if passed else str(failure_label_on_violation or "RECOVERY_PATH_INVALID"),


The str() conversion is redundant if the input is already normalized.

Suggested change

failure_label=None if passed else str(failure_label_on_violation or "RECOVERY_PATH_INVALID"),

failure_label=None if passed else (failure_label_on_violation or "RECOVERY_PATH_INVALID"),

gemini-code-assist · 2026-05-20T16:46:12Z

        )

-    def _validate_causality(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any]) -> ValidationResult:
+    def _validate_causality(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:


Use a specific type hint for failure_label_on_violation instead of Any.

Suggested change

def _validate_causality(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:

def _validate_causality(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str | None) -> ValidationResult:

gemini-code-assist · 2026-05-20T16:46:12Z

            passed=passed,
            severity=severity,
-            failure_label=None if passed else "CAUSAL_DEPENDENCY_LOSS",
+            failure_label=None if passed else str(failure_label_on_violation or "CAUSAL_DEPENDENCY_LOSS"),


The str() conversion is redundant if the input is already normalized.

Suggested change

failure_label=None if passed else str(failure_label_on_violation or "CAUSAL_DEPENDENCY_LOSS"),

failure_label=None if passed else (failure_label_on_violation or "CAUSAL_DEPENDENCY_LOSS"),

gemini-code-assist · 2026-05-20T16:46:12Z

        )

-    def _validate_invariant(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any]) -> ValidationResult:
+    def _validate_invariant(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:


Use a specific type hint for failure_label_on_violation instead of Any.

Suggested change

def _validate_invariant(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:

def _validate_invariant(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str | None) -> ValidationResult:

gemini-code-assist · 2026-05-20T16:46:12Z

            passed=passed,
            severity=severity,
-            failure_label=None if passed else "INVARIANT_VIOLATION",
+            failure_label=None if passed else str(failure_label_on_violation or "INVARIANT_VIOLATION"),


The str() conversion is redundant if the input is already normalized.

Suggested change

failure_label=None if passed else str(failure_label_on_violation or "INVARIANT_VIOLATION"),

failure_label=None if passed else (failure_label_on_violation or "INVARIANT_VIOLATION"),

ProfRandom92 · 2026-05-20T17:15:56Z

Review fixes addressed:

Missing focused tests for optional contract failure labels: fixed in tests/test_contract_validator.py.
Legacy behavior not proven: added a test confirming contracts without the optional field still emit the default ordering label.
Configured registered label not tested: added a test confirming a registered configured label is emitted on failure.
Invalid configured label not tested: added a test confirming an unregistered configured label raises ContractValidationError.
Validator scope risk: no additional validator behavior was added. The change remains a minimal optional mapping and legacy contracts keep previous default behavior.

Follow-up commit: 239aa6a

Only follow-up file changed: tests/test_contract_validator.py

No README, workflow, package, taxonomy, fixture, or artifact changes were made by this follow-up fix.

Harden MCP capability boundary fixtures

8917a1f

ProfRandom92 added the codex label May 20, 2026 — with ChatGPT Codex Connector

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

Add focused tests for optional contract failure labels

239aa6a

ProfRandom92 added 2 commits May 20, 2026 10:49

Validate failure_label_on_violation type strictly

4a67892

Test non-string failure_label_on_violation rejection

10d6683

ProfRandom92 merged commit ff63e92 into main May 20, 2026
4 checks passed

gemini-code-assist Bot mentioned this pull request May 20, 2026

docs: add central project source and scope governance #157

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Harden MCP capability boundary fixtures#155

Harden MCP capability boundary fixtures#155
ProfRandom92 merged 4 commits into
mainfrom
codex/harden-mcp-capability-boundary-fixtures

ProfRandom92 commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

ProfRandom92 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	def _validate_ordering(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:
	def _validate_ordering(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str \| None) -> ValidationResult:

	failure_label=None if passed else str(failure_label_on_violation or "POLICY_ORDER_BROKEN"),
	failure_label=None if passed else (failure_label_on_violation or "POLICY_ORDER_BROKEN"),

	def _validate_reachability(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:
	def _validate_reachability(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str \| None) -> ValidationResult:

	def _validate_causality(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:
	def _validate_causality(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str \| None) -> ValidationResult:

	def _validate_invariant(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: Any) -> ValidationResult:
	def _validate_invariant(self, contract_id: str, layer: Layer, severity: str, definition: dict[str, Any], original: dict[str, Any], reconstructed: dict[str, Any], failure_label_on_violation: str \| None) -> ValidationResult:

Conversation

ProfRandom92 commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

ProfRandom92 commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant