Add tool ordering replay artifact by ProfRandom92 · Pull Request #153 · ProfRandom92/Comptextv7

ProfRandom92 · 2026-05-20T16:21:30Z

Motivation

Add a deterministic artifact that validates whether explicit tool-ordering commitments survive replay reconstruction and to surface compact, deterministic evidence for tool-order drift.
This builds on the existing MCP trace replay, replay graph core, graph-diff, and capability-boundary artifact foundation without touching fixture payloads or runtime systems.

Description

New generator scripts/generate_tool_ordering_replay_artifact.py extracts only explicit structured tool-order and required-before relations, uses graph helpers normalize_edges, nodes_from_edges, compare_edges, and find_order_violations, and emits a stable JSON artifact at artifacts/tool_ordering_replay_results.json following the project schema and sorting rules.
Added committed artifact artifacts/tool_ordering_replay_results.json representing deterministic evidence per family/fixture and a global summary.
Added tests at tests/test_tool_ordering_replay_artifact.py that verify artifact existence, generator reproducibility, top-level schema, determinism/sanitization, manifest alignment, tool-order evidence behavior (including zero-data handling), and label discipline.
Changed files: scripts/generate_tool_ordering_replay_artifact.py, artifacts/tool_ordering_replay_results.json, tests/test_tool_ordering_replay_artifact.py.

Testing

Ran generator: python scripts/generate_tool_ordering_replay_artifact.py and verified it writes artifacts/tool_ordering_replay_results.json reproducibly.
Ran targeted tests: pytest tests/test_tool_ordering_replay_artifact.py -q (new test set passed), plus pytest targets used during validation: tests/test_capability_boundary_replay_artifact.py, tests/test_graph_diff_artifact.py, tests/test_replay_graph_core.py, tests/test_fixture_manifest.py, tests/test_failure_taxonomy.py, and the full suite; all tests passed (new tests and full suite green, total test run shown: 270 passed).
Per-project check: npm run check completed successfully in CI-local validation.
Scope note: deterministic tool-ordering replay artifact only; no fixture payload changes, no README changes, no workflow changes, no runtime/orchestration behavior, no new failure labels, and no LLM/embedding/vector/fuzzy/network behavior introduced.
Risks: extraction is intentionally conservative and only uses explicit structured tool-ordering fields (prose-expressed relations are ignored by design).

Codex Task

gemini-code-assist

Code Review

This pull request introduces a tool-ordering replay artifact along with a generation script and a comprehensive test suite. The script analyzes execution sequences from manifest fixtures to detect drift and ordering violations between original and reconstructed payloads. Feedback focuses on a critical logic error in sequence extraction that destroys temporal order, a fragility in edge generation from ordered lists, and the need for more robust testing using synthetic drift scenarios to verify the validation logic.

gemini-code-assist · 2026-05-20T16:23:51Z

+def _extract_sequence(payloads: list[dict[str, Any]]) -> tuple[str, ...]:
+    sequence_values: set[str] = set()
+    for payload in payloads:
+        edges: list[tuple[str, str]] = []
+        required: list[tuple[str, str]] = []
+        _walk(payload, edges, required)
+        for left, right in edges:
+            sequence_values.add(left)
+            sequence_values.add(right)
+    return tuple(sorted(sequence_values))


The _extract_sequence function contains a critical logic error. It extracts nodes from edges and then sorts them alphabetically (sorted(sequence_values)). This completely destroys the temporal order of tool calls found in the replay. As a result, find_order_violations (line 164) will only check if constraints match alphabetical order, failing to detect actual execution order drift.

Additionally, this function is inefficient as it re-traverses the entire payload set. I suggest refactoring the extraction to collect the observed sequence in a single pass while preserving the order of appearance.

Suggested change

def _extract_sequence(payloads: list[dict[str, Any]]) -> tuple[str, ...]:

sequence_values: set[str] = set()

for payload in payloads:

edges: list[tuple[str, str]] = []

required: list[tuple[str, str]] = []

_walk(payload, edges, required)

for left, right in edges:

sequence_values.add(left)

sequence_values.add(right)

return tuple(sorted(sequence_values))

def _extract_sequence(payloads: list[dict[str, Any]]) -> tuple[str, ...]:

sequence: list[str] = []

seen: set[str] = set()

def walk(obj):

if isinstance(obj, dict):

for key, value in obj.items():

if key in ORDERING_KEYS and isinstance(value, list):

for item in value:

node = item if isinstance(item, str) else _id_from_obj(item)

if isinstance(node, str) and node and node not in seen:

sequence.append(node)

seen.add(node)

walk(value)

elif isinstance(obj, list):

for item in obj:

walk(item)

for p in payloads:

walk(p)

return tuple(sequence)

gemini-code-assist · 2026-05-20T16:23:51Z

+def _edges_from_ordered_list(value: list[object]) -> list[tuple[str, str]]:
+    ordered: list[str] = []
+    for item in value:
+        if isinstance(item, str) and item:
+            ordered.append(item)
+            continue
+        obj_id = _id_from_obj(item)
+        if obj_id is not None:
+            ordered.append(obj_id)
+            continue
+        return []
+    return [(ordered[i], ordered[i + 1]) for i in range(len(ordered) - 1)]


The current implementation of _edges_from_ordered_list is fragile. If a single item in the list is not a string and lacks an identifier key, the function returns an empty list, causing the entire sequence of tool calls to be discarded. It should instead skip invalid items to preserve the remaining ordering information.

def _edges_from_ordered_list(value: list[object]) -> list[tuple[str, str]]: ordered: list[str] = [] for item in value: node = item if isinstance(item, str) else _id_from_obj(item) if isinstance(node, str) and node: ordered.append(node) return [(ordered[i], ordered[i + 1]) for i in range(len(ordered) - 1)]

gemini-code-assist · 2026-05-20T16:23:51Z

+def test_tool_ordering_evidence_behavior() -> None:
+    artifact = _load_json(ARTIFACT_PATH)
+    fixtures = [fixture for family in artifact["families"] for fixture in family["fixtures"]]
+
+    with_data = [
+        fixture
+        for fixture in fixtures
+        if fixture["tool_ordering"]["original_edge_count"] > 0
+        or fixture["tool_ordering"]["replay_edge_count"] > 0
+    ]
+
+    if with_data:
+        assert artifact["global_summary"]["fixtures_with_tool_ordering_data"] > 0
+    else:
+        assert artifact["global_summary"]["fixtures_with_tool_ordering_data"] == 0
+
+    drift_count = sum(1 for fixture in fixtures if fixture["tool_ordering"]["drift_detected"])
+    assert artifact["global_summary"]["fixtures_with_tool_ordering_drift"] == drift_count
+
+    if drift_count > 0:
+        assert any(fixture["tool_ordering"]["drift_detected"] for fixture in fixtures)
+


The current tests for tool ordering evidence behavior only validate the 'zero-data' state of the current artifact. There are no tests that verify the generator's ability to actually detect drift or violations when provided with mismatched original and replay payloads. I recommend adding a test case with synthetic payloads that contain known ordering violations to ensure the validation logic is correct and remains so after refactoring.

Add deterministic tool ordering replay artifact

204c3a5

ProfRandom92 added the codex label May 20, 2026 — with ChatGPT Codex Connector

gemini-code-assist Bot reviewed May 20, 2026

View reviewed changes

ProfRandom92 merged commit 2887f2c into main May 20, 2026
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add tool ordering replay artifact#153

Add tool ordering replay artifact#153
ProfRandom92 merged 1 commit into
mainfrom
codex/add-tool-ordering-replay-artifact

ProfRandom92 commented May 20, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

gemini-code-assist Bot May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

-def _extract_sequence(payloads: list[dict[str, Any]]) -> tuple[str, ...]:
-    sequence_values: set[str] = set()
-    for payload in payloads:
-        edges: list[tuple[str, str]] = []
-        required: list[tuple[str, str]] = []
-        _walk(payload, edges, required)
-        for left, right in edges:
-            sequence_values.add(left)
-            sequence_values.add(right)
-    return tuple(sorted(sequence_values))
+def _extract_sequence(payloads: list[dict[str, Any]]) -> tuple[str, ...]:
+    sequence: list[str] = []
+    seen: set[str] = set()
+    def walk(obj):
+        if isinstance(obj, dict):
+            for key, value in obj.items():
+                if key in ORDERING_KEYS and isinstance(value, list):
+                    for item in value:
+                        node = item if isinstance(item, str) else _id_from_obj(item)
+                        if isinstance(node, str) and node and node not in seen:
+                            sequence.append(node)
+                            seen.add(node)
+                walk(value)
+        elif isinstance(obj, list):
+            for item in obj:
+                walk(item)
+    for p in payloads:
+        walk(p)
+    return tuple(sequence)

Conversation

ProfRandom92 commented May 20, 2026

Motivation

Description

Testing

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant