feat(checkup): non-interactive prompt repair orchestrator (check → repair → re-check)

## Summary

When prompt checkup fails (#1420), run a **bounded, non-interactive repair orchestrator**: gather context (including Codex web search when useful), synthesize a repair brief from structured checkup findings, **apply via `pdd change`**, **re-run full checkup**, then continue or report remaining issues.

This issue delivers **Phase 2 (Non-interactive repair)** only.

Best-effort autonomous fixing with **no user questions**. Interactive clarification is #1423.

**Product framing:** Debug the prompt before debugging the code.

**Depends on:** #1420

Part of epic #833.

---

## Design alignment

Part of the **Unified PDD Checkup for Prompts** design (epic #833).

**Core decisions:**
- Single entrypoint: `pdd checkup <target>` (repair via flags, not new commands)
- **Deterministic** check orchestration; **agentic** context gathering and repair planning
- **Apply** through existing `pdd change` / `change()` pipeline (not a new repair-specific LLM API path or JSON patch writer)
- **Codex agent** (`run_agentic_task`) for non-interactive context gathering, including web search when needed
- No user interaction in this issue

**This issue covers:** Phase 2 — non-interactive repair orchestrator only.

---

## Orchestration model

**Deterministic control flow:**

```text
checkup (deterministic, #1420)
     ↓
pass → continue
fail → collect structured findings from report
     ↓
context gathering (Codex agent: repo + stories/tests + issue/PR + web search when useful)
     ↓
synthesize change instructions from findings + context brief
     ↓
apply via internal pdd change (change()) — in-place prompt write
     ↓
re-run full checkup (same authority as #1420)
     ↓
pass/warn → continue | strict → block | guards → stop + report
```

**Agentic (non-deterministic):** Codex/context phase and change-instruction synthesis.

**Deterministic:** check → decide → apply via known primitive → re-check → stop.

---

## Entry points (v1)

Repair runs only when checkup **failed** and `prompt_repair != off`.

1. **Workflow hook** — after automatic gate failure in `pdd generate` / `pdd change`
2. **Manual** — `pdd checkup <prompt-target> --prompt-repair best-effort|strict`
3. **Issue/PR** — optional; scoped to **changed / failed** prompt files only (not entire `prompts/`)

---

## Context gathering (v1)

**Always include:**

- prompt file under repair
- structured findings from `pdd.prompt_source_set_report.v1` (`message`, `recommended_action`, `fix_command`, `evidence`, `finding_id`)
- related stories / tests when present
- repo snippets / generated code context when available (for `pdd change` `input_code`)
- issue / PR text when in agentic workflow

**Codex agent context phase (v1, included):**

- Run via `run_agentic_task` (Codex or configured agentic provider)
- May use **web search** and other Codex tools to resolve ambiguous terms, external APIs, or domain definitions
- **Non-interactive** — no questions to the user; best-effort inference only
- Bounded by `max_prompt_repair_seconds` and repair loop guards
- Output: a concise internal **repair brief** fed into change-instruction synthesis (not shown as a user Q&A)

Does not ask the user questions (interactive path is #1423).

---

## Apply path: `pdd change` (not a new repair API)

**Product behavior:** `repair_prompt_from_findings(report, context)`.

1. Build a `change_prompt` from:
   - structured checkup findings (`recommended_action`, `message`, `fix_command`, `source_check`)
   - Codex repair brief (when produced)
   - explicit constraints: bounded edits only, preserve unrelated prompt content, no generated code edits
2. Resolve paired `input_prompt` + `input_code` via existing path rules (`construct_paths` / devunit layout)
3. Call internal `change()` (same engine as `pdd change` CLI) — **not** a bespoke repair LLM template or direct patch JSON applicator
4. Write the modified prompt in place (atomic write + backup)

**Rationale:**

- Reuses the established prompt-edit pipeline and delimiter contract (`<<<MODIFIED_PROMPT>>>`)
- Checkup findings already carry `recommended_action` and `fix_command` suitable as change instructions
- Avoids a parallel repair-specific LLM API/integration path

**Note:** `change()` uses the existing PDD LLM stack (`llm_invoke`). This issue does **not** add a new HTTP/API surface for repair; it orchestrates existing primitives.

---

## Repair scope (v1)

Target outcomes (via change instructions, not ad-hoc patch types):

- add missing vocabulary definitions
- normalize contract rule IDs
- add missing coverage lines
- TODO-style story/test action recommendations
- waiver placeholders only when policy allows
- clarify vague terms (structured edits, not full rewrite)
- add `<contract_rules>` skeleton when requirements exist but no contracts

No arbitrary full-file rewrites. No generated **code** edits.

---

## Modes

| Mode | Check | Context (Codex) | Apply (`change`) | Re-check |
|------|-------|-----------------|------------------|----------|
| `off` | yes | no | no | no |
| `best-effort` | yes | yes | yes | yes; continue on remaining issues |
| `strict` | yes | yes | yes | yes; block if still failing |

---

## Bounds and loop guards

```yaml
[tool.pdd.checkup]
prompt_repair: best-effort   # off | best-effort | strict
max_prompt_repair_rounds: 1
max_prompt_token_growth: 1000
max_prompt_repair_seconds: 120
max_repeated_failure_rounds: 1   # same finding_id repeats
min_repair_confidence: 0.6       # skip auto-apply below threshold (optional v1)
```

**Stop conditions:**

- max rounds reached
- wall-clock timeout (includes Codex context phase)
- token budget exceeded after apply
- same failure repeats (`max_repeated_failure_rounds`)
- confidence below threshold (when implemented)
- Codex / change failure in **strict** mode → block (do not silently continue)
- missing `input_code` or unresolvable devunit pairing → report and stop (best-effort: skip file)

---

## CLI

Primary (prompt targets):

```bash
pdd checkup prompts/foo_python.prompt --prompt-repair best-effort
pdd checkup prompts/foo_python.prompt --prompt-repair strict
```

Workflow forwarding (same orchestrator):

```bash
pdd generate ... --prompt-repair best-effort
pdd change ... --prompt-repair best-effort
pdd generate ... --prompt-repair off
```

Flags override `pyproject.toml` / `.pddrc` `[tool.pdd.checkup]` defaults.

---

## Token delta reporting

Measure or report:

- prompt tokens before / after repair
- added tokens
- cache-friendly preamble estimate when available
- warning if growth exceeds budget
- whether Codex web search was used (audit flag)

Example:

```text
Prompt token delta: +312 tokens
Note: 240 tokens are reusable contract preamble.
Context: codex web search used for 1 term lookup.
```

---

## Safety and observability

Separate **analysis** from **apply** in output:

- what failed (structured findings)
- what context was used (repo, stories, Codex brief, web search yes/no)
- change instructions summary (not necessarily full prompt dump)
- what was changed
- whether full checkup passed after repair

**Guards:**

- Deterministic checkup (#1420) remains authority — re-check uses full source-set checkup, not lint alone
- Change output validated (delimiter extraction, sanitize includes) before write
- No generated code edits
- Backup before in-place write

**Audit trail** — each attempt logs under `.pdd/evidence/prompt_repair/<slug>-<timestamp>.json`:

- prompt target
- checks run
- findings before / after
- context sources (incl. `codex_web_search_used`)
- change instructions hash / summary
- token delta
- final outcome

---

## Implementation feasibility (repo baseline)

| Piece | Status in codebase |
|-------|-------------------|
| Structured findings + `recommended_action` | ✅ `checkup_prompt_main.py` / `SourceSetFinding` |
| Full re-check | ✅ `run_checkup_prompt_paths` (#1420) |
| Workflow gate hook | ✅ `maybe_run_workflow_prompt_gate` |
| Codex agent + tools | ✅ `run_agentic_task` in `agentic_common.py` |
| `pdd change` apply | ✅ `change()` / `change_main()` |
| Devunit path + code pairing | ✅ `construct_paths` (may need prompt-only fallback) |

**Gaps to implement in this issue:**

- `repair_prompt_from_findings()` orchestrator wiring gate → context → change → recheck
- Change-instruction builder from findings (+ optional Codex brief)
- In-place apply with backup (today `change_main` often writes to output path)
- Refactor away from parallel JSON-patch repair path if present on branch

---

## Non-goals

- No separate `pdd checkup coach` command
- No interactive questions (#1423)
- No CI interactive mode
- No unbounded agentic repo search
- No new repair-specific LLM HTTP/API endpoint
- No repair of unrelated prompts (no whole-`prompts/` sweep)
- No generated code edits

---

## Acceptance criteria

- [ ] Repair triggers only after **failed** #1420 checkup (workflow + manual paths)
- [ ] Re-check uses **full** prompt source-set checkup (`run_checkup_prompt_paths`), not lint alone
- [ ] Consumes structured findings from `pdd.prompt_source_set_report.v1`
- [ ] Codex agent used for non-interactive context gathering; web search allowed in v1 when Codex uses it
- [ ] Apply uses internal `change()` / `pdd change` semantics (not bespoke patch JSON writer)
- [ ] Change instructions derived from checkup findings (`recommended_action`, `message`, `fix_command`)
- [ ] Stops at `max_prompt_repair_rounds` and `max_prompt_repair_seconds`
- [ ] Loop guards: repeated-failure detection; token budget enforcement
- [ ] Token delta recorded or reported
- [ ] Strict blocks on unresolved failures **and** on Codex/change failure
- [ ] Repair failure does not crash workflow in `best-effort` mode
- [ ] Audit record includes context sources and whether web search was used
- [ ] Tests: success, unrepaired failure, max rounds, token warning, disabled repair
- [ ] Integration test: gate fail → Codex context (mocked) → change apply (mocked) → full recheck


Piece	Status in codebase
Structured findings + `recommended_action`	✅ `checkup_prompt_main.py` / `SourceSetFinding`
Full re-check	✅ `run_checkup_prompt_paths` (#1420)
Workflow gate hook	✅ `maybe_run_workflow_prompt_gate`
Codex agent + tools	✅ `run_agentic_task` in `agentic_common.py`
`pdd change` apply	✅ `change()` / `change_main()`
Devunit path + code pairing	✅ `construct_paths` (may need prompt-only fallback)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(checkup): non-interactive prompt repair orchestrator (check → repair → re-check) #1422

Summary

Design alignment

Orchestration model

Entry points (v1)

Context gathering (v1)

Apply path: `pdd change` (not a new repair API)

Repair scope (v1)

Modes

Bounds and loop guards

CLI

Token delta reporting

Safety and observability

Implementation feasibility (repo baseline)

Non-goals

Acceptance criteria

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Mode	Check	Context (Codex)	Apply (`change`)	Re-check
`off`	yes	no	no	no
`best-effort`	yes	yes	yes	yes; continue on remaining issues
`strict`	yes	yes	yes	yes; block if still failing

feat(checkup): non-interactive prompt repair orchestrator (check → repair → re-check) #1422

Description

Summary

Design alignment

Orchestration model

Entry points (v1)

Context gathering (v1)

Apply path: pdd change (not a new repair API)

Repair scope (v1)

Modes

Bounds and loop guards

CLI

Token delta reporting

Safety and observability

Implementation feasibility (repo baseline)

Non-goals

Acceptance criteria

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Apply path: `pdd change` (not a new repair API)