You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
detect_structural_test_patterns() scans test files on disk with a start_line offset to skip pre-existing content. The approach keeps breaking on edge cases:
Now — off-by-one: effective_start = 1 if start_line > len(lines) fires when start_line = len(lines) + 1 (the normal "nothing appended" case). File has 4144 lines, start_line=4145, clamp rescans everything, flags pre-existing inspect.getsource at lines 3346/3362, triggers retry, CLI killed (SIGKILL) after 17 min.
The clamp at pdd/agentic_bug_orchestrator.py:980 treats start_line > len(lines) as "file was truncated, scan everything." But start_line = len(lines) + 1 is the normal case — it means "no new lines were appended." These two cases are indistinguishable from line-count alone.
Pattern: this is the third time this scanning approach has broken. Every fix adds a new edge-case handler that the next edge case breaks.
Root Cause
File-on-disk scanning with line-count offsets can't distinguish "nothing new" from "file truncated." Every patch on this logic has been fragile because line counts are a lossy summary of the file state — they can't express which lines are new.
Recommended Fix — Diff-Based Scanning
Replace the line-count snapshot with a full content snapshot, and use difflib.SequenceMatcher to compute the exact set of new/changed line numbers in the current file. Only report violations whose line number is in that set.
Why this works:
nothing added -> diff is empty -> no lines reportable -> zero false positives (fixes this bug)
lines appended -> diff marks just the appended lines -> only those get flagged
file truncated / rewritten -> diff marks all current lines as new -> scanned correctly
new file (no snapshot) -> previous_content=None -> scan everything (unchanged behavior)
insertions in the middle -> difflib identifies them correctly (current offset-based approach cannot)
Callers switch from snapshotting len(content.splitlines()) to snapshotting content itself.
Variable-tracking semantics are preserved: the function still scans the whole file for source-variable assignments (var = file.read_text()) so a cross-boundary pattern — old line defines the variable, new line uses it in a string-match assertion — still flags correctly. Only violation reporting is gated by the diff result.
Why not the "scan LLM output" alternative: Step 9 output lacks reliable line numbers relative to the target file. Violations need file line numbers so the retry prompt can tell the LLM where to fix.
Affected Files
pdd/agentic_bug_orchestrator.py:931 — detect_structural_test_patterns() function (replace start_line param with previous_content, remove clamp at line 980, add _compute_changed_lines helper using difflib)
pdd/agentic_bug_orchestrator.py:1528 — rename pre_step9_line_counts: Dict[str, int] to pre_step9_content: Dict[str, str]
pdd/agentic_bug_orchestrator.py:1537 — snapshot full read_text() instead of len(splitlines())
pdd/agentic_bug_orchestrator.py:2004 — retry violation scan: unchanged (no snapshot needed — .bak rewrite means everything is new)
pdd/agentic_bug_orchestrator.py:2136-2137 — coverage-retry violation scan: same caller update
tests/test_structural_test_guard.py — update TestStartLineTruncationSafety and related tests to use the new previous_content API; add regression test for the exact start_line=N+1 scenario (expressed as previous_content=same_content -> 0 violations)
Test Cases (must pass)
Off-by-one regression — file unchanged relative to snapshot -> detect_structural_test_patterns(path, previous_content=original) returns [] even when the unchanged file contains structural patterns.
Appended violations — snapshot has no patterns, file has patterns appended -> only the appended line numbers are reported.
Pre-existing violations preserved — snapshot already contains patterns -> those lines are not reported.
Problem
detect_structural_test_patterns()scans test files on disk with astart_lineoffset to skip pre-existing content. The approach keeps breaking on edge cases:start_line+ line-count snapshot.effective_start = 1 if start_line > len(lines)fires whenstart_line = len(lines) + 1(the normal "nothing appended" case). File has 4144 lines,start_line=4145, clamp rescans everything, flags pre-existinginspect.getsourceat lines 3346/3362, triggers retry, CLI killed (SIGKILL) after 17 min.Proof
The clamp at
pdd/agentic_bug_orchestrator.py:980treatsstart_line > len(lines)as "file was truncated, scan everything." Butstart_line = len(lines) + 1is the normal case — it means "no new lines were appended." These two cases are indistinguishable from line-count alone.Impact
pdd_cloudissue Define and emit a machine-readable claim/evidence ledger for pdd change #1064:pdd bugattempt 1 ran Step 9, validator false-flagged pre-existing patterns, triggered retry, killed after 17 min (exit -9). Attempt 2 repeated the same failure.Root Cause
File-on-disk scanning with line-count offsets can't distinguish "nothing new" from "file truncated." Every patch on this logic has been fragile because line counts are a lossy summary of the file state — they can't express which lines are new.
Recommended Fix — Diff-Based Scanning
Replace the line-count snapshot with a full content snapshot, and use
difflib.SequenceMatcherto compute the exact set of new/changed line numbers in the current file. Only report violations whose line number is in that set.Why this works:
nothing added-> diff is empty -> no lines reportable -> zero false positives (fixes this bug)lines appended-> diff marks just the appended lines -> only those get flaggedfile truncated / rewritten-> diff marks all current lines as new -> scanned correctlynew file (no snapshot)->previous_content=None-> scan everything (unchanged behavior)insertions in the middle-> difflib identifies them correctly (current offset-based approach cannot)API change:
Callers switch from snapshotting
len(content.splitlines())to snapshottingcontentitself.Variable-tracking semantics are preserved: the function still scans the whole file for source-variable assignments (
var = file.read_text()) so a cross-boundary pattern — old line defines the variable, new line uses it in a string-match assertion — still flags correctly. Only violation reporting is gated by the diff result.Why not the "scan LLM output" alternative: Step 9 output lacks reliable line numbers relative to the target file. Violations need file line numbers so the retry prompt can tell the LLM where to fix.
Affected Files
pdd/agentic_bug_orchestrator.py:931—detect_structural_test_patterns()function (replacestart_lineparam withprevious_content, remove clamp at line 980, add_compute_changed_lineshelper usingdifflib)pdd/agentic_bug_orchestrator.py:1528— renamepre_step9_line_counts: Dict[str, int]topre_step9_content: Dict[str, str]pdd/agentic_bug_orchestrator.py:1537— snapshot fullread_text()instead oflen(splitlines())pdd/agentic_bug_orchestrator.py:1868-1871— initial Step 9 violation scan: passprevious_content=pre_step9_content.get(abs_path.resolve_str())pdd/agentic_bug_orchestrator.py:2004— retry violation scan: unchanged (no snapshot needed —.bakrewrite means everything is new)pdd/agentic_bug_orchestrator.py:2136-2137— coverage-retry violation scan: same caller updatetests/test_structural_test_guard.py— updateTestStartLineTruncationSafetyand related tests to use the newprevious_contentAPI; add regression test for the exactstart_line=N+1scenario (expressed asprevious_content=same_content-> 0 violations)Test Cases (must pass)
detect_structural_test_patterns(path, previous_content=original)returns[]even when the unchanged file contains structural patterns.var = x.read_text()in snapshot,assert "y" in varin new lines -> the new assertion is flagged.previous_content=None-> every line scanned (unchanged behavior).History
start_line)