Skip to content

pdd checkup --pr step 5 (test execution) times out on large repos and blocks step 7 verify #1114

@Serhan-Asad

Description

@Serhan-Asad

Problem

pdd checkup --pr <pr-url> --issue <issue-url> runs an 8-step agentic checkup. Step 5 (test execution, prompt pdd/prompts/agentic_checkup_step5_test_LLM.prompt) instructs the agent to "Run ALL tests — unit tests, integration tests, and end-to-end tests". On large repos like pdd-gltanaka itself (~8,600+ unit tests, plus integration/e2e suites), the agent process exceeds even the retry timeout budget and returns "Timeout expired", aborting the entire pipeline before steps 6.1–6.3 (fix) and step 7 (verify) can run.

The user-facing error is misleading: the orchestrator prints Aborting after Step 5: agent providers unavailable even though providers are healthy — the agent simply ran out of wall-clock time.

Reproduction (verbatim, on PR #1111)

pdd checkup --pr https://github.com/promptdriven/pdd/pull/1111 \
            --issue https://github.com/promptdriven/pdd/issues/1106

Observed timing in .pdd/worktrees/.../checkup-pr-1111/.pdd/agentic-logs/session_*.jsonl:

Step Duration Cost Result
1 discover 63s $1.86 ok
2 deps 68s $0.49 ok
3 build 34s $0.26 ok
4 interfaces 127s $2.96 ok
5 test 600s $0 Timeout expired
5 test (retry) 1200s $0 Timeout expired
6.1–6.3, 7, 8 unreached

Total: $5.57 spent, no verdict.

Root Causes

  1. Step 5 prompt scope is unbounded. agentic_checkup_step5_test_LLM.prompt says "Run ALL tests" with no awareness that in PR mode the changed files (and therefore the relevant tests) are a tiny subset of the suite. For a PR touching 6 files, blanket-running the whole suite is wasteful even when it succeeds.
  2. Per-step timeout budget (600s default, 1200s retry) is fixed regardless of repo size. Configurable via --timeout-adder but there's no auto-scaling heuristic.
  3. Misleading error message. The orchestrator surfaces agent providers unavailable when the actual failure is a hard timeout; the JSONL log shows "response": "Timeout expired". Users reasonably assume API/auth problems and waste time investigating the wrong thing.
  4. No way to resume past a failing step. GitHub state-resume picks up where the last successful comment was posted — step 4 here — so retries are stuck at step 5 forever. There's no --start-step / --skip-step flag to jump to step 7 verify when step 5 is the bottleneck.

Suggested Fix (in order of impact, smallest first)

  1. PR-mode test scoping. In PR mode, derive changed_paths = gh pr diff → expand to associated test files (tests/test_<basename>.py and any <basename> mention in tests/). Pass those as the only mandatory test set in the step 5 prompt; let the agent expand if it finds extra signal. This should cut step 5 from minutes to seconds on most PRs.
  2. Surface the real failure. Change "agent providers unavailable" to "step N timeout (Xs exceeded budget Ys)" when the underlying response is "Timeout expired".
  3. --start-step flag. Add a CLI flag (or env var PDD_CHECKUP_START_STEP) that overrides the GitHub-state-derived resume point, so users can recover by skipping a known-broken step (with appropriate [warning] step N skipped manually in the issue comment trail).
  4. Step 5 timeout heuristic. Scale the per-step timeout by repo size (test count × avg duration) instead of a hardcoded constant.

Acceptance Criteria

  • pdd checkup --pr on a PR touching ≤10 files completes step 5 in < 5 minutes on the pdd-gltanaka repo.
  • When step 5 does time out, the orchestrator's user-facing message correctly says "step 5 timed out after Xs", not "agent providers unavailable".
  • A user can rerun pdd checkup --pr --start-step 7 (or equivalent) to obtain a verdict when an upstream step is broken.
  • Regression test covers the step-5 PR-mode narrowing.

Context

Discovered while verifying PR #1111 against issue #1106 (metadata finalization alignment). Steps 1–4 posted clean findings; the merge was already independently validated via codex review + targeted pytest (321 passed) + broader pytest (8,616 passed) + manual repro confirmation, so this issue blocks tooling UX rather than the underlying fix.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions