Skip to content

Integrate agentic reviewer into checkup CLI and evidence pipeline #1383

@prompt-driven-github

Description

@prompt-driven-github

Context

The agentic_reviewer module (built in the preceding sub-issue) needs to be wired into the pdd checkup policy check CLI and the evidence pipeline so users can invoke it via --semantic-review agentic and findings appear in evidence manifests and gates.

Task

Update three existing PDD prompt files:

1. pdd/prompts/commands/checkup_python.prompt

Add CLI flags to pdd checkup policy check:

  • --semantic-review [off|llm|agentic] (default: off)
  • --semantic-review-severity [warning|error|info] (default: warning)
  • --max-agent-files INT (default: 20)
  • --max-agent-depth INT (default: 2)

When --semantic-review agentic is set:

  1. Run deterministic checks first (existing behavior, unchanged).
  2. Invoke agentic_reviewer with contract IR, target artifact paths, and bounds from CLI flags.
  3. Merge agentic findings into the findings list (after deterministic findings).
  4. If agentic reviewer is unavailable or errors, emit a log warning and continue with deterministic-only results.

2. pdd/prompts/evidence_manifest_python.prompt

Add a semantic_review block to the policy_check section of the evidence manifest:

{
  "policy_check": {
    "semantic_review": {
      "enabled": true,
      "mode": "agentic",
      "max_files": 20,
      "max_follow_depth": 2,
      "finding_count": 1,
      "max_confidence": 0.93
    }
  }
}

When --semantic-review off, semantic_review.enabled is false.

3. pdd/prompts/checkup_gates_python.prompt

Consume agentic findings in gate evaluation:

  • source: agentic_reviewer findings are treated as advisory warnings by default.
  • Configurable via --semantic-review-severity error to escalate to gate failures.
  • Deterministic error findings remain blocking regardless of agentic config.

Interface Contract (from sub-issue 1)

The agentic_reviewer module exposes:

def run_agentic_review(
    contract_ir: list[dict],
    target_files: list[str],
    max_files: int = 20,
    max_follow_depth: int = 2,
    max_search_results: int = 30,
    max_runtime_seconds: int | None = None,
) -> list[dict]:  # normalized findings, may be empty
    ...

Returns empty list on LLM failure. Each finding has: {source, severity, confidence, effect, message, evidence, agent_steps}.

Reference files

  • pdd/prompts/commands/checkup_python.prompt — existing checkup CLI to update
  • pdd/prompts/checkup_gates_python.prompt — gate logic to update
  • pdd/prompts/evidence_manifest_python.prompt — manifest schema to extend
  • context/commands/checkup_example.py — current checkup command patterns
  • context/checkup_gates_example.py — current gate logic patterns

Acceptance Criteria

  • pdd checkup policy check prompts/refund.prompt src/refundPayment.ts --target typescript --semantic-review agentic --json runs without error
  • Deterministic checks always run before agentic review regardless of flags
  • Agentic reviewer failure (e.g. LLM unavailable) does not break deterministic policy checking
  • Evidence manifest includes semantic_review.mode = agentic and finding_count when agentic mode is active
  • With --semantic-review off, evidence manifest shows semantic_review.enabled = false
  • Agentic findings appear in JSON output with source: agentic_reviewer and severity: warning by default
  • With --semantic-review-severity error, agentic high-confidence findings can fail gates
  • Tests cover: end-to-end CLI invocation with agentic mode, evidence manifest metadata recording, gate behavior for warning vs error severity

PDD Command Hint

change, sync


Split Contract

Command sequence: change → sync
Allowed write set:

  • pdd/prompts/commands/checkup_python.prompt
  • pdd/prompts/checkup_gates_python.prompt
  • pdd/prompts/evidence_manifest_python.prompt
    Acceptance criteria:
  • pdd checkup policy check accepts --semantic-review agentic, --max-agent-files, --max-agent-depth flags
  • Deterministic checks always run before agentic review; agentic failure does not block deterministic results
  • Evidence manifest records semantic_review block with enabled, mode, max_files, max_follow_depth, finding_count, max_confidence
  • Agentic findings appear in JSON output with source: agentic_reviewer and severity: warning by default
  • With --semantic-review-severity error, high-confidence agentic findings can fail gates
  • Tests cover: CLI invocation with agentic mode, evidence manifest metadata, gate severity configuration
    Independently mergeable: True
    Scope rule: Do not expand beyond this contract or implement sibling sub-issue work. If the contract is insufficient, report the gap instead.

PDD Command Hint: This is a new feature. Use change → sync (modify prompts, then generate and validate code).


Parent: #1371
Parent issue: #1371

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions