Skip to content

feat(checkup): non-interactive prompt repair orchestrator (check → repair → re-check) #1422

@DianaTao

Description

@DianaTao

Summary

When prompt checkup fails (#1420), run a bounded, non-interactive repair orchestrator: gather context (including Codex web search when useful), synthesize a repair brief from structured checkup findings, apply via pdd change, re-run full checkup, then continue or report remaining issues.

This issue delivers Phase 2 (Non-interactive repair) only.

Best-effort autonomous fixing with no user questions. Interactive clarification is #1423.

Product framing: Debug the prompt before debugging the code.

Depends on: #1420

Part of epic #833.


Design alignment

Part of the Unified PDD Checkup for Prompts design (epic #833).

Core decisions:

  • Single entrypoint: pdd checkup <target> (repair via flags, not new commands)
  • Deterministic check orchestration; agentic context gathering and repair planning
  • Apply through existing pdd change / change() pipeline (not a new repair-specific LLM API path or JSON patch writer)
  • Codex agent (run_agentic_task) for non-interactive context gathering, including web search when needed
  • No user interaction in this issue

This issue covers: Phase 2 — non-interactive repair orchestrator only.


Orchestration model

Deterministic control flow:

checkup (deterministic, #1420)
     ↓
pass → continue
fail → collect structured findings from report
     ↓
context gathering (Codex agent: repo + stories/tests + issue/PR + web search when useful)
     ↓
synthesize change instructions from findings + context brief
     ↓
apply via internal pdd change (change()) — in-place prompt write
     ↓
re-run full checkup (same authority as #1420)
     ↓
pass/warn → continue | strict → block | guards → stop + report

Agentic (non-deterministic): Codex/context phase and change-instruction synthesis.

Deterministic: check → decide → apply via known primitive → re-check → stop.


Entry points (v1)

Repair runs only when checkup failed and prompt_repair != off.

  1. Workflow hook — after automatic gate failure in pdd generate / pdd change
  2. Manualpdd checkup <prompt-target> --prompt-repair best-effort|strict
  3. Issue/PR — optional; scoped to changed / failed prompt files only (not entire prompts/)

Context gathering (v1)

Always include:

  • prompt file under repair
  • structured findings from pdd.prompt_source_set_report.v1 (message, recommended_action, fix_command, evidence, finding_id)
  • related stories / tests when present
  • repo snippets / generated code context when available (for pdd change input_code)
  • issue / PR text when in agentic workflow

Codex agent context phase (v1, included):

  • Run via run_agentic_task (Codex or configured agentic provider)
  • May use web search and other Codex tools to resolve ambiguous terms, external APIs, or domain definitions
  • Non-interactive — no questions to the user; best-effort inference only
  • Bounded by max_prompt_repair_seconds and repair loop guards
  • Output: a concise internal repair brief fed into change-instruction synthesis (not shown as a user Q&A)

Does not ask the user questions (interactive path is #1423).


Apply path: pdd change (not a new repair API)

Product behavior: repair_prompt_from_findings(report, context).

  1. Build a change_prompt from:
    • structured checkup findings (recommended_action, message, fix_command, source_check)
    • Codex repair brief (when produced)
    • explicit constraints: bounded edits only, preserve unrelated prompt content, no generated code edits
  2. Resolve paired input_prompt + input_code via existing path rules (construct_paths / devunit layout)
  3. Call internal change() (same engine as pdd change CLI) — not a bespoke repair LLM template or direct patch JSON applicator
  4. Write the modified prompt in place (atomic write + backup)

Rationale:

  • Reuses the established prompt-edit pipeline and delimiter contract (<<<MODIFIED_PROMPT>>>)
  • Checkup findings already carry recommended_action and fix_command suitable as change instructions
  • Avoids a parallel repair-specific LLM API/integration path

Note: change() uses the existing PDD LLM stack (llm_invoke). This issue does not add a new HTTP/API surface for repair; it orchestrates existing primitives.


Repair scope (v1)

Target outcomes (via change instructions, not ad-hoc patch types):

  • add missing vocabulary definitions
  • normalize contract rule IDs
  • add missing coverage lines
  • TODO-style story/test action recommendations
  • waiver placeholders only when policy allows
  • clarify vague terms (structured edits, not full rewrite)
  • add <contract_rules> skeleton when requirements exist but no contracts

No arbitrary full-file rewrites. No generated code edits.


Modes

Mode Check Context (Codex) Apply (change) Re-check
off yes no no no
best-effort yes yes yes yes; continue on remaining issues
strict yes yes yes yes; block if still failing

Bounds and loop guards

[tool.pdd.checkup]
prompt_repair: best-effort   # off | best-effort | strict
max_prompt_repair_rounds: 1
max_prompt_token_growth: 1000
max_prompt_repair_seconds: 120
max_repeated_failure_rounds: 1   # same finding_id repeats
min_repair_confidence: 0.6       # skip auto-apply below threshold (optional v1)

Stop conditions:

  • max rounds reached
  • wall-clock timeout (includes Codex context phase)
  • token budget exceeded after apply
  • same failure repeats (max_repeated_failure_rounds)
  • confidence below threshold (when implemented)
  • Codex / change failure in strict mode → block (do not silently continue)
  • missing input_code or unresolvable devunit pairing → report and stop (best-effort: skip file)

CLI

Primary (prompt targets):

pdd checkup prompts/foo_python.prompt --prompt-repair best-effort
pdd checkup prompts/foo_python.prompt --prompt-repair strict

Workflow forwarding (same orchestrator):

pdd generate ... --prompt-repair best-effort
pdd change ... --prompt-repair best-effort
pdd generate ... --prompt-repair off

Flags override pyproject.toml / .pddrc [tool.pdd.checkup] defaults.


Token delta reporting

Measure or report:

  • prompt tokens before / after repair
  • added tokens
  • cache-friendly preamble estimate when available
  • warning if growth exceeds budget
  • whether Codex web search was used (audit flag)

Example:

Prompt token delta: +312 tokens
Note: 240 tokens are reusable contract preamble.
Context: codex web search used for 1 term lookup.

Safety and observability

Separate analysis from apply in output:

  • what failed (structured findings)
  • what context was used (repo, stories, Codex brief, web search yes/no)
  • change instructions summary (not necessarily full prompt dump)
  • what was changed
  • whether full checkup passed after repair

Guards:

Audit trail — each attempt logs under .pdd/evidence/prompt_repair/<slug>-<timestamp>.json:

  • prompt target
  • checks run
  • findings before / after
  • context sources (incl. codex_web_search_used)
  • change instructions hash / summary
  • token delta
  • final outcome

Implementation feasibility (repo baseline)

Piece Status in codebase
Structured findings + recommended_action checkup_prompt_main.py / SourceSetFinding
Full re-check run_checkup_prompt_paths (#1420)
Workflow gate hook maybe_run_workflow_prompt_gate
Codex agent + tools run_agentic_task in agentic_common.py
pdd change apply change() / change_main()
Devunit path + code pairing construct_paths (may need prompt-only fallback)

Gaps to implement in this issue:

  • repair_prompt_from_findings() orchestrator wiring gate → context → change → recheck
  • Change-instruction builder from findings (+ optional Codex brief)
  • In-place apply with backup (today change_main often writes to output path)
  • Refactor away from parallel JSON-patch repair path if present on branch

Non-goals


Acceptance criteria

  • Repair triggers only after failed feat(checkup): unified prompt checkup foundation (detection, report, workflow gate) #1420 checkup (workflow + manual paths)
  • Re-check uses full prompt source-set checkup (run_checkup_prompt_paths), not lint alone
  • Consumes structured findings from pdd.prompt_source_set_report.v1
  • Codex agent used for non-interactive context gathering; web search allowed in v1 when Codex uses it
  • Apply uses internal change() / pdd change semantics (not bespoke patch JSON writer)
  • Change instructions derived from checkup findings (recommended_action, message, fix_command)
  • Stops at max_prompt_repair_rounds and max_prompt_repair_seconds
  • Loop guards: repeated-failure detection; token budget enforcement
  • Token delta recorded or reported
  • Strict blocks on unresolved failures and on Codex/change failure
  • Repair failure does not crash workflow in best-effort mode
  • Audit record includes context sources and whether web search was used
  • Tests: success, unrepaired failure, max rounds, token warning, disabled repair
  • Integration test: gate fail → Codex context (mocked) → change apply (mocked) → full recheck

Metadata

Metadata

Assignees

Labels

pdd-codexPDD: use OpenAI Codex model

Type

No type
No fields configured for issues without a type.

Projects

Status
In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions