Summary
When prompt checkup fails (#1420), run a bounded, non-interactive repair orchestrator: gather context (including Codex web search when useful), synthesize a repair brief from structured checkup findings, apply via pdd change, re-run full checkup, then continue or report remaining issues.
This issue delivers Phase 2 (Non-interactive repair) only.
Best-effort autonomous fixing with no user questions. Interactive clarification is #1423.
Product framing: Debug the prompt before debugging the code.
Depends on: #1420
Part of epic #833.
Design alignment
Part of the Unified PDD Checkup for Prompts design (epic #833).
Core decisions:
- Single entrypoint:
pdd checkup <target> (repair via flags, not new commands)
- Deterministic check orchestration; agentic context gathering and repair planning
- Apply through existing
pdd change / change() pipeline (not a new repair-specific LLM API path or JSON patch writer)
- Codex agent (
run_agentic_task) for non-interactive context gathering, including web search when needed
- No user interaction in this issue
This issue covers: Phase 2 — non-interactive repair orchestrator only.
Orchestration model
Deterministic control flow:
checkup (deterministic, #1420)
↓
pass → continue
fail → collect structured findings from report
↓
context gathering (Codex agent: repo + stories/tests + issue/PR + web search when useful)
↓
synthesize change instructions from findings + context brief
↓
apply via internal pdd change (change()) — in-place prompt write
↓
re-run full checkup (same authority as #1420)
↓
pass/warn → continue | strict → block | guards → stop + report
Agentic (non-deterministic): Codex/context phase and change-instruction synthesis.
Deterministic: check → decide → apply via known primitive → re-check → stop.
Entry points (v1)
Repair runs only when checkup failed and prompt_repair != off.
- Workflow hook — after automatic gate failure in
pdd generate / pdd change
- Manual —
pdd checkup <prompt-target> --prompt-repair best-effort|strict
- Issue/PR — optional; scoped to changed / failed prompt files only (not entire
prompts/)
Context gathering (v1)
Always include:
- prompt file under repair
- structured findings from
pdd.prompt_source_set_report.v1 (message, recommended_action, fix_command, evidence, finding_id)
- related stories / tests when present
- repo snippets / generated code context when available (for
pdd change input_code)
- issue / PR text when in agentic workflow
Codex agent context phase (v1, included):
- Run via
run_agentic_task (Codex or configured agentic provider)
- May use web search and other Codex tools to resolve ambiguous terms, external APIs, or domain definitions
- Non-interactive — no questions to the user; best-effort inference only
- Bounded by
max_prompt_repair_seconds and repair loop guards
- Output: a concise internal repair brief fed into change-instruction synthesis (not shown as a user Q&A)
Does not ask the user questions (interactive path is #1423).
Apply path: pdd change (not a new repair API)
Product behavior: repair_prompt_from_findings(report, context).
- Build a
change_prompt from:
- structured checkup findings (
recommended_action, message, fix_command, source_check)
- Codex repair brief (when produced)
- explicit constraints: bounded edits only, preserve unrelated prompt content, no generated code edits
- Resolve paired
input_prompt + input_code via existing path rules (construct_paths / devunit layout)
- Call internal
change() (same engine as pdd change CLI) — not a bespoke repair LLM template or direct patch JSON applicator
- Write the modified prompt in place (atomic write + backup)
Rationale:
- Reuses the established prompt-edit pipeline and delimiter contract (
<<<MODIFIED_PROMPT>>>)
- Checkup findings already carry
recommended_action and fix_command suitable as change instructions
- Avoids a parallel repair-specific LLM API/integration path
Note: change() uses the existing PDD LLM stack (llm_invoke). This issue does not add a new HTTP/API surface for repair; it orchestrates existing primitives.
Repair scope (v1)
Target outcomes (via change instructions, not ad-hoc patch types):
- add missing vocabulary definitions
- normalize contract rule IDs
- add missing coverage lines
- TODO-style story/test action recommendations
- waiver placeholders only when policy allows
- clarify vague terms (structured edits, not full rewrite)
- add
<contract_rules> skeleton when requirements exist but no contracts
No arbitrary full-file rewrites. No generated code edits.
Modes
| Mode |
Check |
Context (Codex) |
Apply (change) |
Re-check |
off |
yes |
no |
no |
no |
best-effort |
yes |
yes |
yes |
yes; continue on remaining issues |
strict |
yes |
yes |
yes |
yes; block if still failing |
Bounds and loop guards
[tool.pdd.checkup]
prompt_repair: best-effort # off | best-effort | strict
max_prompt_repair_rounds: 1
max_prompt_token_growth: 1000
max_prompt_repair_seconds: 120
max_repeated_failure_rounds: 1 # same finding_id repeats
min_repair_confidence: 0.6 # skip auto-apply below threshold (optional v1)
Stop conditions:
- max rounds reached
- wall-clock timeout (includes Codex context phase)
- token budget exceeded after apply
- same failure repeats (
max_repeated_failure_rounds)
- confidence below threshold (when implemented)
- Codex / change failure in strict mode → block (do not silently continue)
- missing
input_code or unresolvable devunit pairing → report and stop (best-effort: skip file)
CLI
Primary (prompt targets):
pdd checkup prompts/foo_python.prompt --prompt-repair best-effort
pdd checkup prompts/foo_python.prompt --prompt-repair strict
Workflow forwarding (same orchestrator):
pdd generate ... --prompt-repair best-effort
pdd change ... --prompt-repair best-effort
pdd generate ... --prompt-repair off
Flags override pyproject.toml / .pddrc [tool.pdd.checkup] defaults.
Token delta reporting
Measure or report:
- prompt tokens before / after repair
- added tokens
- cache-friendly preamble estimate when available
- warning if growth exceeds budget
- whether Codex web search was used (audit flag)
Example:
Prompt token delta: +312 tokens
Note: 240 tokens are reusable contract preamble.
Context: codex web search used for 1 term lookup.
Safety and observability
Separate analysis from apply in output:
- what failed (structured findings)
- what context was used (repo, stories, Codex brief, web search yes/no)
- change instructions summary (not necessarily full prompt dump)
- what was changed
- whether full checkup passed after repair
Guards:
Audit trail — each attempt logs under .pdd/evidence/prompt_repair/<slug>-<timestamp>.json:
- prompt target
- checks run
- findings before / after
- context sources (incl.
codex_web_search_used)
- change instructions hash / summary
- token delta
- final outcome
Implementation feasibility (repo baseline)
| Piece |
Status in codebase |
Structured findings + recommended_action |
✅ checkup_prompt_main.py / SourceSetFinding |
| Full re-check |
✅ run_checkup_prompt_paths (#1420) |
| Workflow gate hook |
✅ maybe_run_workflow_prompt_gate |
| Codex agent + tools |
✅ run_agentic_task in agentic_common.py |
pdd change apply |
✅ change() / change_main() |
| Devunit path + code pairing |
✅ construct_paths (may need prompt-only fallback) |
Gaps to implement in this issue:
repair_prompt_from_findings() orchestrator wiring gate → context → change → recheck
- Change-instruction builder from findings (+ optional Codex brief)
- In-place apply with backup (today
change_main often writes to output path)
- Refactor away from parallel JSON-patch repair path if present on branch
Non-goals
Acceptance criteria
Summary
When prompt checkup fails (#1420), run a bounded, non-interactive repair orchestrator: gather context (including Codex web search when useful), synthesize a repair brief from structured checkup findings, apply via
pdd change, re-run full checkup, then continue or report remaining issues.This issue delivers Phase 2 (Non-interactive repair) only.
Best-effort autonomous fixing with no user questions. Interactive clarification is #1423.
Product framing: Debug the prompt before debugging the code.
Depends on: #1420
Part of epic #833.
Design alignment
Part of the Unified PDD Checkup for Prompts design (epic #833).
Core decisions:
pdd checkup <target>(repair via flags, not new commands)pdd change/change()pipeline (not a new repair-specific LLM API path or JSON patch writer)run_agentic_task) for non-interactive context gathering, including web search when neededThis issue covers: Phase 2 — non-interactive repair orchestrator only.
Orchestration model
Deterministic control flow:
Agentic (non-deterministic): Codex/context phase and change-instruction synthesis.
Deterministic: check → decide → apply via known primitive → re-check → stop.
Entry points (v1)
Repair runs only when checkup failed and
prompt_repair != off.pdd generate/pdd changepdd checkup <prompt-target> --prompt-repair best-effort|strictprompts/)Context gathering (v1)
Always include:
pdd.prompt_source_set_report.v1(message,recommended_action,fix_command,evidence,finding_id)pdd changeinput_code)Codex agent context phase (v1, included):
run_agentic_task(Codex or configured agentic provider)max_prompt_repair_secondsand repair loop guardsDoes not ask the user questions (interactive path is #1423).
Apply path:
pdd change(not a new repair API)Product behavior:
repair_prompt_from_findings(report, context).change_promptfrom:recommended_action,message,fix_command,source_check)input_prompt+input_codevia existing path rules (construct_paths/ devunit layout)change()(same engine aspdd changeCLI) — not a bespoke repair LLM template or direct patch JSON applicatorRationale:
<<<MODIFIED_PROMPT>>>)recommended_actionandfix_commandsuitable as change instructionsNote:
change()uses the existing PDD LLM stack (llm_invoke). This issue does not add a new HTTP/API surface for repair; it orchestrates existing primitives.Repair scope (v1)
Target outcomes (via change instructions, not ad-hoc patch types):
<contract_rules>skeleton when requirements exist but no contractsNo arbitrary full-file rewrites. No generated code edits.
Modes
change)offbest-effortstrictBounds and loop guards
Stop conditions:
max_repeated_failure_rounds)input_codeor unresolvable devunit pairing → report and stop (best-effort: skip file)CLI
Primary (prompt targets):
Workflow forwarding (same orchestrator):
Flags override
pyproject.toml/.pddrc[tool.pdd.checkup]defaults.Token delta reporting
Measure or report:
Example:
Safety and observability
Separate analysis from apply in output:
Guards:
Audit trail — each attempt logs under
.pdd/evidence/prompt_repair/<slug>-<timestamp>.json:codex_web_search_used)Implementation feasibility (repo baseline)
recommended_actioncheckup_prompt_main.py/SourceSetFindingrun_checkup_prompt_paths(#1420)maybe_run_workflow_prompt_gaterun_agentic_taskinagentic_common.pypdd changeapplychange()/change_main()construct_paths(may need prompt-only fallback)Gaps to implement in this issue:
repair_prompt_from_findings()orchestrator wiring gate → context → change → recheckchange_mainoften writes to output path)Non-goals
pdd checkup coachcommandprompts/sweep)Acceptance criteria
run_checkup_prompt_paths), not lint alonepdd.prompt_source_set_report.v1change()/pdd changesemantics (not bespoke patch JSON writer)recommended_action,message,fix_command)max_prompt_repair_roundsandmax_prompt_repair_secondsbest-effortmode