pdd checkup --pr step 5 (test execution) times out on large repos and blocks step 7 verify

## Problem

`pdd checkup --pr <pr-url> --issue <issue-url>` runs an 8-step agentic checkup. Step 5 (test execution, prompt `pdd/prompts/agentic_checkup_step5_test_LLM.prompt`) instructs the agent to **"Run ALL tests — unit tests, integration tests, and end-to-end tests"**. On large repos like pdd-gltanaka itself (~8,600+ unit tests, plus integration/e2e suites), the agent process exceeds even the retry timeout budget and returns `"Timeout expired"`, aborting the entire pipeline before steps 6.1–6.3 (fix) and step 7 (verify) can run.

The user-facing error is misleading: the orchestrator prints `Aborting after Step 5: agent providers unavailable` even though providers are healthy — the agent simply ran out of wall-clock time.

## Reproduction (verbatim, on PR #1111)

```bash
pdd checkup --pr https://github.com/promptdriven/pdd/pull/1111 \
            --issue https://github.com/promptdriven/pdd/issues/1106
```

Observed timing in `.pdd/worktrees/.../checkup-pr-1111/.pdd/agentic-logs/session_*.jsonl`:

| Step | Duration | Cost   | Result            |
|------|----------|--------|-------------------|
| 1 discover    | 63s   | \$1.86  | ok |
| 2 deps        | 68s   | \$0.49  | ok |
| 3 build       | 34s   | \$0.26  | ok |
| 4 interfaces  | 127s  | \$2.96  | ok |
| **5 test**    | **600s**  | \$0     | **Timeout expired** |
| **5 test (retry)** | **1200s** | \$0     | **Timeout expired** |
| 6.1–6.3, 7, 8 | —      | —       | unreached |

Total: \$5.57 spent, no verdict.

## Root Causes

1. **Step 5 prompt scope is unbounded.** `agentic_checkup_step5_test_LLM.prompt` says *"Run ALL tests"* with no awareness that in PR mode the changed files (and therefore the relevant tests) are a tiny subset of the suite. For a PR touching 6 files, blanket-running the whole suite is wasteful even when it succeeds.
2. **Per-step timeout budget (600s default, 1200s retry) is fixed regardless of repo size.** Configurable via `--timeout-adder` but there's no auto-scaling heuristic.
3. **Misleading error message.** The orchestrator surfaces `agent providers unavailable` when the actual failure is a hard timeout; the JSONL log shows `"response": "Timeout expired"`. Users reasonably assume API/auth problems and waste time investigating the wrong thing.
4. **No way to resume past a failing step.** GitHub state-resume picks up where the last successful comment was posted — step 4 here — so retries are stuck at step 5 forever. There's no `--start-step` / `--skip-step` flag to jump to step 7 verify when step 5 is the bottleneck.

## Suggested Fix (in order of impact, smallest first)

1. **PR-mode test scoping.** In PR mode, derive `changed_paths = gh pr diff` → expand to associated test files (`tests/test_<basename>.py` and any `<basename>` mention in `tests/`). Pass those as the only mandatory test set in the step 5 prompt; let the agent expand if it finds extra signal. This should cut step 5 from minutes to seconds on most PRs.
2. **Surface the real failure.** Change `"agent providers unavailable"` to `"step N timeout (Xs exceeded budget Ys)"` when the underlying response is `"Timeout expired"`.
3. **`--start-step` flag.** Add a CLI flag (or env var `PDD_CHECKUP_START_STEP`) that overrides the GitHub-state-derived resume point, so users can recover by skipping a known-broken step (with appropriate `[warning] step N skipped manually` in the issue comment trail).
4. **Step 5 timeout heuristic.** Scale the per-step timeout by repo size (test count × avg duration) instead of a hardcoded constant.

## Acceptance Criteria

- `pdd checkup --pr` on a PR touching ≤10 files completes step 5 in < 5 minutes on the pdd-gltanaka repo.
- When step 5 does time out, the orchestrator's user-facing message correctly says `"step 5 timed out after Xs"`, not `"agent providers unavailable"`.
- A user can rerun `pdd checkup --pr --start-step 7` (or equivalent) to obtain a verdict when an upstream step is broken.
- Regression test covers the step-5 PR-mode narrowing.

## Context

Discovered while verifying PR #1111 against issue #1106 (metadata finalization alignment). Steps 1–4 posted clean findings; the merge was already independently validated via codex review + targeted pytest (321 passed) + broader pytest (8,616 passed) + manual repro confirmation, so this issue blocks tooling UX rather than the underlying fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pdd checkup --pr step 5 (test execution) times out on large repos and blocks step 7 verify #1114

Problem

Reproduction (verbatim, on PR #1111)

Root Causes

Suggested Fix (in order of impact, smallest first)

Acceptance Criteria

Context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Step	Duration	Cost	Result
1 discover	63s	$1.86	ok
2 deps	68s	$0.49	ok
3 build	34s	$0.26	ok
4 interfaces	127s	$2.96	ok
5 test	600s	$0	Timeout expired
5 test (retry)	1200s	$0	Timeout expired
6.1–6.3, 7, 8	—	—	unreached

pdd checkup --pr step 5 (test execution) times out on large repos and blocks step 7 verify #1114

Description

Problem

Reproduction (verbatim, on PR #1111)

Root Causes

Suggested Fix (in order of impact, smallest first)

Acceptance Criteria

Context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions