Skip to content

Bug: structural test validator should scan generated output, not file on disk #774

@Serhan-Asad

Description

@Serhan-Asad

Problem

detect_structural_test_patterns() scans test files on disk with a start_line offset to skip pre-existing content. The approach keeps breaking on edge cases:

Proof

# test_orchestrator.py has 4144 lines, inspect.getsource at lines 3346/3362

detect_structural_test_patterns(file, start_line=4145)  # -> 2 violations (WRONG)
detect_structural_test_patterns(file, start_line=4144)  # -> 0 violations (correct)

The clamp at pdd/agentic_bug_orchestrator.py:980 treats start_line > len(lines) as "file was truncated, scan everything." But start_line = len(lines) + 1 is the normal case — it means "no new lines were appended." These two cases are indistinguishable from line-count alone.

Impact

  • pdd_cloud issue Define and emit a machine-readable claim/evidence ledger for pdd change #1064: pdd bug attempt 1 ran Step 9, validator false-flagged pre-existing patterns, triggered retry, killed after 17 min (exit -9). Attempt 2 repeated the same failure.
  • Pattern: this is the third time this scanning approach has broken. Every fix adds a new edge-case handler that the next edge case breaks.

Root Cause

File-on-disk scanning with line-count offsets can't distinguish "nothing new" from "file truncated." Every patch on this logic has been fragile because line counts are a lossy summary of the file state — they can't express which lines are new.

Recommended Fix — Diff-Based Scanning

Replace the line-count snapshot with a full content snapshot, and use difflib.SequenceMatcher to compute the exact set of new/changed line numbers in the current file. Only report violations whose line number is in that set.

Why this works:

  • nothing added -> diff is empty -> no lines reportable -> zero false positives (fixes this bug)
  • lines appended -> diff marks just the appended lines -> only those get flagged
  • file truncated / rewritten -> diff marks all current lines as new -> scanned correctly
  • new file (no snapshot) -> previous_content=None -> scan everything (unchanged behavior)
  • insertions in the middle -> difflib identifies them correctly (current offset-based approach cannot)

API change:

# Before:
def detect_structural_test_patterns(file_path: str, start_line: Optional[int] = None) -> List[str]: ...

# After:
def detect_structural_test_patterns(file_path: str, previous_content: Optional[str] = None) -> List[str]: ...

Callers switch from snapshotting len(content.splitlines()) to snapshotting content itself.

Variable-tracking semantics are preserved: the function still scans the whole file for source-variable assignments (var = file.read_text()) so a cross-boundary pattern — old line defines the variable, new line uses it in a string-match assertion — still flags correctly. Only violation reporting is gated by the diff result.

Why not the "scan LLM output" alternative: Step 9 output lacks reliable line numbers relative to the target file. Violations need file line numbers so the retry prompt can tell the LLM where to fix.

Affected Files

  • pdd/agentic_bug_orchestrator.py:931detect_structural_test_patterns() function (replace start_line param with previous_content, remove clamp at line 980, add _compute_changed_lines helper using difflib)
  • pdd/agentic_bug_orchestrator.py:1528 — rename pre_step9_line_counts: Dict[str, int] to pre_step9_content: Dict[str, str]
  • pdd/agentic_bug_orchestrator.py:1537 — snapshot full read_text() instead of len(splitlines())
  • pdd/agentic_bug_orchestrator.py:1868-1871 — initial Step 9 violation scan: pass previous_content=pre_step9_content.get(abs_path.resolve_str())
  • pdd/agentic_bug_orchestrator.py:2004 — retry violation scan: unchanged (no snapshot needed — .bak rewrite means everything is new)
  • pdd/agentic_bug_orchestrator.py:2136-2137 — coverage-retry violation scan: same caller update
  • tests/test_structural_test_guard.py — update TestStartLineTruncationSafety and related tests to use the new previous_content API; add regression test for the exact start_line=N+1 scenario (expressed as previous_content=same_content -> 0 violations)

Test Cases (must pass)

  1. Off-by-one regression — file unchanged relative to snapshot -> detect_structural_test_patterns(path, previous_content=original) returns [] even when the unchanged file contains structural patterns.
  2. Appended violations — snapshot has no patterns, file has patterns appended -> only the appended line numbers are reported.
  3. Pre-existing violations preserved — snapshot already contains patterns -> those lines are not reported.
  4. Truncation safe — file rewritten shorter than snapshot -> remaining lines scanned correctly.
  5. Cross-boundary variable trackingvar = x.read_text() in snapshot, assert "y" in var in new lines -> the new assertion is flagged.
  6. New fileprevious_content=None -> every line scanned (unchanged behavior).

History

Metadata

Metadata

Labels

pdd-bugPDD: fix a bugpdd-sonnetPDD: use Claude Sonnet model

Type

No type
No fields configured for issues without a type.

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions