Skip to content

Add capability boundary replay artifact#152

Merged
ProfRandom92 merged 1 commit into
mainfrom
codex/add-deterministic-capability-boundary-replay-artifact
May 20, 2026
Merged

Add capability boundary replay artifact#152
ProfRandom92 merged 1 commit into
mainfrom
codex/add-deterministic-capability-boundary-replay-artifact

Conversation

@ProfRandom92

Copy link
Copy Markdown
Owner

Motivation

  • Provide a deterministic, fixture-driven artifact that validates whether explicit capability-boundary commitments survive replay reconstruction, building on the MCP trace replay and graph-diff foundations.
  • Deliver deterministic evidence for capability-boundary node/edge survival without changing fixtures, README, workflows, validators, or adding external dependencies.

Description

  • Added a generator script scripts/generate_capability_boundary_replay_artifact.py that loads fixtures/manifest.json, reads original/*.json and reconstructed/*.json payloads, conservatively extracts only explicit structured capability-boundary data from supported keys, normalizes edges/nodes, and compares original vs reconstructed graphs using normalize_edges, nodes_from_edges, and compare_edges from the graph core.
  • Committed the produced artifact artifacts/capability_boundary_replay_results.json following the stable schema (artifact_id, generated_by, version, evaluation_mode, llm_judges, external_apis, families, global_summary) with deterministic ordering and no timestamps or environment fields.
  • Added tests tests/test_capability_boundary_replay_artifact.py that assert artifact existence, exact regeneration parity, top-level schema stability, determinism/sanitization, manifest alignment (family/fixture counts and IDs), capability-boundary evidence/drift behavior (including zero-data handling), and label discipline using only registered failure labels.
  • Scope note: deterministic capability-boundary replay artifact only; no fixture payload changes, no README or workflow changes, no runtime/orchestration behavior, no new failure labels, and no LLM/embedding/vector/fuzzy/network behavior.

Testing

  • Ran the generator: python scripts/generate_capability_boundary_replay_artifact.py which produces artifacts/capability_boundary_replay_results.json and matches the committed artifact exactly.
  • Ran unit tests: pytest tests/test_capability_boundary_replay_artifact.py -q (all tests passed) and additionally ran pytest suites relied on (tests/test_graph_diff_artifact.py, tests/test_replay_graph_core.py, tests/test_fixture_manifest.py, tests/test_failure_taxonomy.py) as well as the full test run where 262 passed was observed under npm run check.
  • Determinism checks: generator run twice on tmp outputs produced identical content and artifact contains no timestamps, absolute paths, or environment/user fields.
  • Risks: extraction is intentionally conservative and only uses explicit structured capability-boundary fields so relations expressed only in prose are excluded by design.

Codex Task

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a system for generating and testing deterministic capability-boundary replay artifacts. It includes a generation script that extracts boundary graphs from JSON payloads, a sample artifact file, and a comprehensive test suite. Review feedback focuses on optimizing memory efficiency by refactoring the graph extraction logic to use iterables and generator expressions, ensuring the script can handle large numbers of payloads without excessive memory consumption.

import sys
from collections import defaultdict
from pathlib import Path
from typing import Any

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add Iterable to the imports to support memory-efficient processing of payloads using generators in the following functions.

Suggested change
from typing import Any
from typing import Any, Iterable

Comment on lines +131 to +140
def _extract_boundary_graph(payloads: list[dict[str, Any]]) -> tuple[tuple[tuple[str, str], ...], tuple[str, ...]]:
edges: list[tuple[str, str]] = []
nodes: set[str] = set()

for relation_key in SUPPORTED_RELATION_KEYS:
for payload in payloads:
for relation_value in _collect_relation_values(payload, relation_key):
rel_edges, rel_nodes = _extract_relation_data(relation_value, relation_key)
edges.extend(rel_edges)
nodes.update(rel_nodes)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of _extract_boundary_graph iterates over the payloads list for each key in SUPPORTED_RELATION_KEYS. By swapping the loops, you can process each payload fully in a single pass. This also allows the function to accept an Iterable instead of a list, which is necessary for memory-efficient processing of large numbers of payloads using generators.

Suggested change
def _extract_boundary_graph(payloads: list[dict[str, Any]]) -> tuple[tuple[tuple[str, str], ...], tuple[str, ...]]:
edges: list[tuple[str, str]] = []
nodes: set[str] = set()
for relation_key in SUPPORTED_RELATION_KEYS:
for payload in payloads:
for relation_value in _collect_relation_values(payload, relation_key):
rel_edges, rel_nodes = _extract_relation_data(relation_value, relation_key)
edges.extend(rel_edges)
nodes.update(rel_nodes)
def _extract_boundary_graph(payloads: Iterable[dict[str, Any]]) -> tuple[tuple[tuple[str, str], ...], tuple[str, ...]]:
edges: list[tuple[str, str]] = []
nodes: set[str] = set()
for payload in payloads:
for relation_key in SUPPORTED_RELATION_KEYS:
for relation_value in _collect_relation_values(payload, relation_key):
rel_edges, rel_nodes = _extract_relation_data(relation_value, relation_key)
edges.extend(rel_edges)
nodes.update(rel_nodes)

Comment on lines +166 to +170
original_payloads = [_load_json(path) for path in _discover_payload_files(fixture_root / "original")]
replay_payloads = [_load_json(path) for path in _discover_payload_files(fixture_root / "reconstructed")]

original_edges, original_nodes = _extract_boundary_graph(original_payloads)
replay_edges, replay_nodes = _extract_boundary_graph(replay_payloads)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Loading all JSON payloads into memory at once using list comprehensions can be very memory-intensive as the number and size of fixtures grow. Using generator expressions combined with the updated _extract_boundary_graph (accepting an Iterable) significantly reduces the memory footprint by processing files one at a time.

Suggested change
original_payloads = [_load_json(path) for path in _discover_payload_files(fixture_root / "original")]
replay_payloads = [_load_json(path) for path in _discover_payload_files(fixture_root / "reconstructed")]
original_edges, original_nodes = _extract_boundary_graph(original_payloads)
replay_edges, replay_nodes = _extract_boundary_graph(replay_payloads)
original_edges, original_nodes = _extract_boundary_graph(
_load_json(path) for path in _discover_payload_files(fixture_root / "original")
)
replay_edges, replay_nodes = _extract_boundary_graph(
_load_json(path) for path in _discover_payload_files(fixture_root / "reconstructed")
)

@ProfRandom92 ProfRandom92 merged commit 5213a8c into main May 20, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant