Context
The agentic_reviewer module (built in the preceding sub-issue) needs to be wired into the pdd checkup policy check CLI and the evidence pipeline so users can invoke it via --semantic-review agentic and findings appear in evidence manifests and gates.
Task
Update three existing PDD prompt files:
1. pdd/prompts/commands/checkup_python.prompt
Add CLI flags to pdd checkup policy check:
--semantic-review [off|llm|agentic] (default: off)
--semantic-review-severity [warning|error|info] (default: warning)
--max-agent-files INT (default: 20)
--max-agent-depth INT (default: 2)
When --semantic-review agentic is set:
- Run deterministic checks first (existing behavior, unchanged).
- Invoke
agentic_reviewer with contract IR, target artifact paths, and bounds from CLI flags.
- Merge agentic findings into the findings list (after deterministic findings).
- If agentic reviewer is unavailable or errors, emit a log warning and continue with deterministic-only results.
2. pdd/prompts/evidence_manifest_python.prompt
Add a semantic_review block to the policy_check section of the evidence manifest:
{
"policy_check": {
"semantic_review": {
"enabled": true,
"mode": "agentic",
"max_files": 20,
"max_follow_depth": 2,
"finding_count": 1,
"max_confidence": 0.93
}
}
}
When --semantic-review off, semantic_review.enabled is false.
3. pdd/prompts/checkup_gates_python.prompt
Consume agentic findings in gate evaluation:
source: agentic_reviewer findings are treated as advisory warnings by default.
- Configurable via
--semantic-review-severity error to escalate to gate failures.
- Deterministic
error findings remain blocking regardless of agentic config.
Interface Contract (from sub-issue 1)
The agentic_reviewer module exposes:
def run_agentic_review(
contract_ir: list[dict],
target_files: list[str],
max_files: int = 20,
max_follow_depth: int = 2,
max_search_results: int = 30,
max_runtime_seconds: int | None = None,
) -> list[dict]: # normalized findings, may be empty
...
Returns empty list on LLM failure. Each finding has: {source, severity, confidence, effect, message, evidence, agent_steps}.
Reference files
pdd/prompts/commands/checkup_python.prompt — existing checkup CLI to update
pdd/prompts/checkup_gates_python.prompt — gate logic to update
pdd/prompts/evidence_manifest_python.prompt — manifest schema to extend
context/commands/checkup_example.py — current checkup command patterns
context/checkup_gates_example.py — current gate logic patterns
Acceptance Criteria
pdd checkup policy check prompts/refund.prompt src/refundPayment.ts --target typescript --semantic-review agentic --json runs without error
- Deterministic checks always run before agentic review regardless of flags
- Agentic reviewer failure (e.g. LLM unavailable) does not break deterministic policy checking
- Evidence manifest includes
semantic_review.mode = agentic and finding_count when agentic mode is active
- With
--semantic-review off, evidence manifest shows semantic_review.enabled = false
- Agentic findings appear in JSON output with
source: agentic_reviewer and severity: warning by default
- With
--semantic-review-severity error, agentic high-confidence findings can fail gates
- Tests cover: end-to-end CLI invocation with agentic mode, evidence manifest metadata recording, gate behavior for warning vs error severity
PDD Command Hint
change, sync
Split Contract
Command sequence: change → sync
Allowed write set:
pdd/prompts/commands/checkup_python.prompt
pdd/prompts/checkup_gates_python.prompt
pdd/prompts/evidence_manifest_python.prompt
Acceptance criteria:
- pdd checkup policy check accepts --semantic-review agentic, --max-agent-files, --max-agent-depth flags
- Deterministic checks always run before agentic review; agentic failure does not block deterministic results
- Evidence manifest records semantic_review block with enabled, mode, max_files, max_follow_depth, finding_count, max_confidence
- Agentic findings appear in JSON output with source: agentic_reviewer and severity: warning by default
- With --semantic-review-severity error, high-confidence agentic findings can fail gates
- Tests cover: CLI invocation with agentic mode, evidence manifest metadata, gate severity configuration
Independently mergeable: True
Scope rule: Do not expand beyond this contract or implement sibling sub-issue work. If the contract is insufficient, report the gap instead.
PDD Command Hint: This is a new feature. Use change → sync (modify prompts, then generate and validate code).
Parent: #1371
Parent issue: #1371
Context
The
agentic_reviewermodule (built in the preceding sub-issue) needs to be wired into thepdd checkup policy checkCLI and the evidence pipeline so users can invoke it via--semantic-review agenticand findings appear in evidence manifests and gates.Task
Update three existing PDD prompt files:
1.
pdd/prompts/commands/checkup_python.promptAdd CLI flags to
pdd checkup policy check:--semantic-review [off|llm|agentic](default:off)--semantic-review-severity [warning|error|info](default:warning)--max-agent-files INT(default:20)--max-agent-depth INT(default:2)When
--semantic-review agenticis set:agentic_reviewerwith contract IR, target artifact paths, and bounds from CLI flags.2.
pdd/prompts/evidence_manifest_python.promptAdd a
semantic_reviewblock to thepolicy_checksection of the evidence manifest:{ "policy_check": { "semantic_review": { "enabled": true, "mode": "agentic", "max_files": 20, "max_follow_depth": 2, "finding_count": 1, "max_confidence": 0.93 } } }When
--semantic-review off,semantic_review.enabledisfalse.3.
pdd/prompts/checkup_gates_python.promptConsume agentic findings in gate evaluation:
source: agentic_reviewerfindings are treated as advisory warnings by default.--semantic-review-severity errorto escalate to gate failures.errorfindings remain blocking regardless of agentic config.Interface Contract (from sub-issue 1)
The
agentic_reviewermodule exposes:Returns empty list on LLM failure. Each finding has:
{source, severity, confidence, effect, message, evidence, agent_steps}.Reference files
pdd/prompts/commands/checkup_python.prompt— existing checkup CLI to updatepdd/prompts/checkup_gates_python.prompt— gate logic to updatepdd/prompts/evidence_manifest_python.prompt— manifest schema to extendcontext/commands/checkup_example.py— current checkup command patternscontext/checkup_gates_example.py— current gate logic patternsAcceptance Criteria
pdd checkup policy check prompts/refund.prompt src/refundPayment.ts --target typescript --semantic-review agentic --jsonruns without errorsemantic_review.mode = agenticandfinding_countwhen agentic mode is active--semantic-review off, evidence manifest showssemantic_review.enabled = falsesource: agentic_reviewerandseverity: warningby default--semantic-review-severity error, agentic high-confidence findings can fail gatesPDD Command Hint
change, sync
Split Contract
Command sequence: change → sync
Allowed write set:
pdd/prompts/commands/checkup_python.promptpdd/prompts/checkup_gates_python.promptpdd/prompts/evidence_manifest_python.promptAcceptance criteria:
Independently mergeable: True
Scope rule: Do not expand beyond this contract or implement sibling sub-issue work. If the contract is insufficient, report the gap instead.
PDD Command Hint: This is a new feature. Use
change → sync(modify prompts, then generate and validate code).Parent: #1371
Parent issue: #1371