Integrate agentic reviewer into checkup CLI and evidence pipeline

## Context

The `agentic_reviewer` module (built in the preceding sub-issue) needs to be wired into the `pdd checkup policy check` CLI and the evidence pipeline so users can invoke it via `--semantic-review agentic` and findings appear in evidence manifests and gates.

## Task

Update three existing PDD prompt files:

### 1. `pdd/prompts/commands/checkup_python.prompt`

Add CLI flags to `pdd checkup policy check`:
- `--semantic-review [off|llm|agentic]` (default: `off`)
- `--semantic-review-severity [warning|error|info]` (default: `warning`)
- `--max-agent-files INT` (default: `20`)
- `--max-agent-depth INT` (default: `2`)

When `--semantic-review agentic` is set:
1. Run deterministic checks first (existing behavior, unchanged).
2. Invoke `agentic_reviewer` with contract IR, target artifact paths, and bounds from CLI flags.
3. Merge agentic findings into the findings list (after deterministic findings).
4. If agentic reviewer is unavailable or errors, emit a log warning and continue with deterministic-only results.

### 2. `pdd/prompts/evidence_manifest_python.prompt`

Add a `semantic_review` block to the `policy_check` section of the evidence manifest:
```json
{
  "policy_check": {
    "semantic_review": {
      "enabled": true,
      "mode": "agentic",
      "max_files": 20,
      "max_follow_depth": 2,
      "finding_count": 1,
      "max_confidence": 0.93
    }
  }
}
```
When `--semantic-review off`, `semantic_review.enabled` is `false`.

### 3. `pdd/prompts/checkup_gates_python.prompt`

Consume agentic findings in gate evaluation:
- `source: agentic_reviewer` findings are treated as advisory warnings by default.
- Configurable via `--semantic-review-severity error` to escalate to gate failures.
- Deterministic `error` findings remain blocking regardless of agentic config.

## Interface Contract (from sub-issue 1)

The `agentic_reviewer` module exposes:
```python
def run_agentic_review(
    contract_ir: list[dict],
    target_files: list[str],
    max_files: int = 20,
    max_follow_depth: int = 2,
    max_search_results: int = 30,
    max_runtime_seconds: int | None = None,
) -> list[dict]:  # normalized findings, may be empty
    ...
```
Returns empty list on LLM failure. Each finding has: `{source, severity, confidence, effect, message, evidence, agent_steps}`.

## Reference files

- `pdd/prompts/commands/checkup_python.prompt` — existing checkup CLI to update
- `pdd/prompts/checkup_gates_python.prompt` — gate logic to update
- `pdd/prompts/evidence_manifest_python.prompt` — manifest schema to extend
- `context/commands/checkup_example.py` — current checkup command patterns
- `context/checkup_gates_example.py` — current gate logic patterns

## Acceptance Criteria

- `pdd checkup policy check prompts/refund.prompt src/refundPayment.ts --target typescript --semantic-review agentic --json` runs without error
- Deterministic checks always run before agentic review regardless of flags
- Agentic reviewer failure (e.g. LLM unavailable) does not break deterministic policy checking
- Evidence manifest includes `semantic_review.mode = agentic` and `finding_count` when agentic mode is active
- With `--semantic-review off`, evidence manifest shows `semantic_review.enabled = false`
- Agentic findings appear in JSON output with `source: agentic_reviewer` and `severity: warning` by default
- With `--semantic-review-severity error`, agentic high-confidence findings can fail gates
- Tests cover: end-to-end CLI invocation with agentic mode, evidence manifest metadata recording, gate behavior for warning vs error severity

## PDD Command Hint
change, sync

---
## Split Contract
**Command sequence:** change → sync
**Allowed write set:**
- `pdd/prompts/commands/checkup_python.prompt`
- `pdd/prompts/checkup_gates_python.prompt`
- `pdd/prompts/evidence_manifest_python.prompt`
**Acceptance criteria:**
- pdd checkup policy check accepts --semantic-review agentic, --max-agent-files, --max-agent-depth flags
- Deterministic checks always run before agentic review; agentic failure does not block deterministic results
- Evidence manifest records semantic_review block with enabled, mode, max_files, max_follow_depth, finding_count, max_confidence
- Agentic findings appear in JSON output with source: agentic_reviewer and severity: warning by default
- With --semantic-review-severity error, high-confidence agentic findings can fail gates
- Tests cover: CLI invocation with agentic mode, evidence manifest metadata, gate severity configuration
**Independently mergeable:** True
**Scope rule:** Do not expand beyond this contract or implement sibling sub-issue work. If the contract is insufficient, report the gap instead.

---
**PDD Command Hint:** This is a new feature. Use `change → sync` (modify prompts, then generate and validate code).

---
Parent: https://github.com/promptdriven/pdd/issues/1371
Parent issue: #1371

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate agentic reviewer into checkup CLI and evidence pipeline #1383

Context

Task

1. `pdd/prompts/commands/checkup_python.prompt`

2. `pdd/prompts/evidence_manifest_python.prompt`

3. `pdd/prompts/checkup_gates_python.prompt`

Interface Contract (from sub-issue 1)

Reference files

Acceptance Criteria

PDD Command Hint

Split Contract

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Integrate agentic reviewer into checkup CLI and evidence pipeline #1383

Description

Context

Task

1. pdd/prompts/commands/checkup_python.prompt

2. pdd/prompts/evidence_manifest_python.prompt

3. pdd/prompts/checkup_gates_python.prompt

Interface Contract (from sub-issue 1)

Reference files

Acceptance Criteria

PDD Command Hint

Split Contract

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

1. `pdd/prompts/commands/checkup_python.prompt`

2. `pdd/prompts/evidence_manifest_python.prompt`

3. `pdd/prompts/checkup_gates_python.prompt`