Problem
pdd checkup --pr <pr-url> --issue <issue-url> runs an 8-step agentic checkup. Step 5 (test execution, prompt pdd/prompts/agentic_checkup_step5_test_LLM.prompt) instructs the agent to "Run ALL tests — unit tests, integration tests, and end-to-end tests". On large repos like pdd-gltanaka itself (~8,600+ unit tests, plus integration/e2e suites), the agent process exceeds even the retry timeout budget and returns "Timeout expired", aborting the entire pipeline before steps 6.1–6.3 (fix) and step 7 (verify) can run.
The user-facing error is misleading: the orchestrator prints Aborting after Step 5: agent providers unavailable even though providers are healthy — the agent simply ran out of wall-clock time.
Reproduction (verbatim, on PR #1111)
pdd checkup --pr https://github.com/promptdriven/pdd/pull/1111 \
--issue https://github.com/promptdriven/pdd/issues/1106
Observed timing in .pdd/worktrees/.../checkup-pr-1111/.pdd/agentic-logs/session_*.jsonl:
| Step |
Duration |
Cost |
Result |
| 1 discover |
63s |
$1.86 |
ok |
| 2 deps |
68s |
$0.49 |
ok |
| 3 build |
34s |
$0.26 |
ok |
| 4 interfaces |
127s |
$2.96 |
ok |
| 5 test |
600s |
$0 |
Timeout expired |
| 5 test (retry) |
1200s |
$0 |
Timeout expired |
| 6.1–6.3, 7, 8 |
— |
— |
unreached |
Total: $5.57 spent, no verdict.
Root Causes
- Step 5 prompt scope is unbounded.
agentic_checkup_step5_test_LLM.prompt says "Run ALL tests" with no awareness that in PR mode the changed files (and therefore the relevant tests) are a tiny subset of the suite. For a PR touching 6 files, blanket-running the whole suite is wasteful even when it succeeds.
- Per-step timeout budget (600s default, 1200s retry) is fixed regardless of repo size. Configurable via
--timeout-adder but there's no auto-scaling heuristic.
- Misleading error message. The orchestrator surfaces
agent providers unavailable when the actual failure is a hard timeout; the JSONL log shows "response": "Timeout expired". Users reasonably assume API/auth problems and waste time investigating the wrong thing.
- No way to resume past a failing step. GitHub state-resume picks up where the last successful comment was posted — step 4 here — so retries are stuck at step 5 forever. There's no
--start-step / --skip-step flag to jump to step 7 verify when step 5 is the bottleneck.
Suggested Fix (in order of impact, smallest first)
- PR-mode test scoping. In PR mode, derive
changed_paths = gh pr diff → expand to associated test files (tests/test_<basename>.py and any <basename> mention in tests/). Pass those as the only mandatory test set in the step 5 prompt; let the agent expand if it finds extra signal. This should cut step 5 from minutes to seconds on most PRs.
- Surface the real failure. Change
"agent providers unavailable" to "step N timeout (Xs exceeded budget Ys)" when the underlying response is "Timeout expired".
--start-step flag. Add a CLI flag (or env var PDD_CHECKUP_START_STEP) that overrides the GitHub-state-derived resume point, so users can recover by skipping a known-broken step (with appropriate [warning] step N skipped manually in the issue comment trail).
- Step 5 timeout heuristic. Scale the per-step timeout by repo size (test count × avg duration) instead of a hardcoded constant.
Acceptance Criteria
pdd checkup --pr on a PR touching ≤10 files completes step 5 in < 5 minutes on the pdd-gltanaka repo.
- When step 5 does time out, the orchestrator's user-facing message correctly says
"step 5 timed out after Xs", not "agent providers unavailable".
- A user can rerun
pdd checkup --pr --start-step 7 (or equivalent) to obtain a verdict when an upstream step is broken.
- Regression test covers the step-5 PR-mode narrowing.
Context
Discovered while verifying PR #1111 against issue #1106 (metadata finalization alignment). Steps 1–4 posted clean findings; the merge was already independently validated via codex review + targeted pytest (321 passed) + broader pytest (8,616 passed) + manual repro confirmation, so this issue blocks tooling UX rather than the underlying fix.
Problem
pdd checkup --pr <pr-url> --issue <issue-url>runs an 8-step agentic checkup. Step 5 (test execution, promptpdd/prompts/agentic_checkup_step5_test_LLM.prompt) instructs the agent to "Run ALL tests — unit tests, integration tests, and end-to-end tests". On large repos like pdd-gltanaka itself (~8,600+ unit tests, plus integration/e2e suites), the agent process exceeds even the retry timeout budget and returns"Timeout expired", aborting the entire pipeline before steps 6.1–6.3 (fix) and step 7 (verify) can run.The user-facing error is misleading: the orchestrator prints
Aborting after Step 5: agent providers unavailableeven though providers are healthy — the agent simply ran out of wall-clock time.Reproduction (verbatim, on PR #1111)
pdd checkup --pr https://github.com/promptdriven/pdd/pull/1111 \ --issue https://github.com/promptdriven/pdd/issues/1106Observed timing in
.pdd/worktrees/.../checkup-pr-1111/.pdd/agentic-logs/session_*.jsonl:Total: $5.57 spent, no verdict.
Root Causes
agentic_checkup_step5_test_LLM.promptsays "Run ALL tests" with no awareness that in PR mode the changed files (and therefore the relevant tests) are a tiny subset of the suite. For a PR touching 6 files, blanket-running the whole suite is wasteful even when it succeeds.--timeout-adderbut there's no auto-scaling heuristic.agent providers unavailablewhen the actual failure is a hard timeout; the JSONL log shows"response": "Timeout expired". Users reasonably assume API/auth problems and waste time investigating the wrong thing.--start-step/--skip-stepflag to jump to step 7 verify when step 5 is the bottleneck.Suggested Fix (in order of impact, smallest first)
changed_paths = gh pr diff→ expand to associated test files (tests/test_<basename>.pyand any<basename>mention intests/). Pass those as the only mandatory test set in the step 5 prompt; let the agent expand if it finds extra signal. This should cut step 5 from minutes to seconds on most PRs."agent providers unavailable"to"step N timeout (Xs exceeded budget Ys)"when the underlying response is"Timeout expired".--start-stepflag. Add a CLI flag (or env varPDD_CHECKUP_START_STEP) that overrides the GitHub-state-derived resume point, so users can recover by skipping a known-broken step (with appropriate[warning] step N skipped manuallyin the issue comment trail).Acceptance Criteria
pdd checkup --pron a PR touching ≤10 files completes step 5 in < 5 minutes on the pdd-gltanaka repo."step 5 timed out after Xs", not"agent providers unavailable".pdd checkup --pr --start-step 7(or equivalent) to obtain a verdict when an upstream step is broken.Context
Discovered while verifying PR #1111 against issue #1106 (metadata finalization alignment). Steps 1–4 posted clean findings; the merge was already independently validated via codex review + targeted pytest (321 passed) + broader pytest (8,616 passed) + manual repro confirmation, so this issue blocks tooling UX rather than the underlying fix.