Fix-loop circuit breaker doesn't account for multi-failure regenerations; test generator writes exact-string assertions that subsequent regenerations don't preserve

**Summary**

Two coupled issues that together make full-pipeline regeneration unreliable when the LLM rewords any error message.
(a) On regeneration, the LLM keeps all contractual behavior (function signatures, exception types, numeric results) but reworded all five ValueError messages, e.g., "Matrix multiplication shapes incompatible" instead of "Incompatible shapes for matrix multiplication." PDD-generated tests assert on exact wording via pytest.raises(match=...), so all five tests fail simultaneously.
(b) Inspection of .pdd/backups/linalg_engine/20260511_124217/ shows the fix loop was making progress — iteration 3 (code_3_0_1_0.py) correctly restored matmul's error string and passed that test. But the LLM addresses roughly one failure per iteration, and the breaker fires at 5 consecutive fix operations. With 5+ simultaneous wording mismatches, the loop cannot converge before being killed.

**Reproduction**

Run A: pdd --local --force --verbose sync --skip-tests --skip-verify --no-steer linalg_engine
       → Success, 61.88s, $0.0202

Run B: pdd --local --force --verbose sync --no-steer linalg_engine
       → Failed, 160.62s, $0.1175
       → "Detected 5 consecutive fix operations. Breaking infinite fix loop."
Evidence: evidence/bug9_full_sync.log, .pdd/backups/linalg_engine/20260511_124217/{code_1..code_3}.py showing per-iteration progress on the wording fix.

**Expected**

A regeneration that mismatches N error strings should be able to converge in N iterations.

**Suggested fix (either is sufficient; both would be better)**

(a) Make the breaker progress-sensitive: don't terminate if the failure count strictly decreased in the prior iteration. Cap by total iterations or wall-clock instead.
(b) Update the test-generation prompt to discourage pytest.raises(match="<exact string>") in favor of pattern-based matches (match=r"shape|incompatible|dimension") or assertions on exc.args/exception type alone. PDD's own prompting guidance advises this ("Prefer observable behavior over private implementation details"), but PDD's test generator doesn't follow it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix-loop circuit breaker doesn't account for multi-failure regenerations; test generator writes exact-string assertions that subsequent regenerations don't preserve #1203

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Fix-loop circuit breaker doesn't account for multi-failure regenerations; test generator writes exact-string assertions that subsequent regenerations don't preserve #1203

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions