Skip to content

Make coding-agent review loop proof deterministic#480

Open
bmdhodl wants to merge 1 commit into
mainfrom
codex/deterministic-review-loop-proof
Open

Make coding-agent review loop proof deterministic#480
bmdhodl wants to merge 1 commit into
mainfrom
codex/deterministic-review-loop-proof

Conversation

@bmdhodl
Copy link
Copy Markdown
Owner

@bmdhodl bmdhodl commented May 18, 2026

Summary

  • clear the coding-agent review-loop trace before each example run
  • add regression coverage proving reruns do not append stale guard events
  • keep the dogfood incident report aligned to a single proof run

Validation

  • python -m pytest sdk/tests/test_example_starters.py -q
  • python examples/coding_agent_review_loop.py
  • python examples/coding_agent_review_loop.py
  • python -m agentguard.cli incident coding_agent_review_loop_traces.jsonl
  • rg -n "guard\.(budget_exceeded|retry_limit_exceeded)" coding_agent_review_loop_traces.jsonl

Copilot AI review requested due to automatic review settings May 18, 2026 03:52
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR makes the coding-agent review-loop example produce deterministic proof output by clearing its JSONL trace before each run and adds regression coverage for reruns.

Changes:

  • Clears coding_agent_review_loop_traces.jsonl before the example writes new trace events.
  • Adds a regression test that runs the example twice and verifies guard events are not duplicated.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
examples/coding_agent_review_loop.py Removes any prior trace file before creating the JSONL sink.
sdk/tests/test_example_starters.py Adds rerun coverage for the coding-agent review-loop trace output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants