Skip to content

Add core bounded agentic reviewer module for capability policy evidence collection #1382

@prompt-driven-github

Description

@prompt-driven-github

Context

PDD capability policy checks need a bounded agentic reviewer that can follow local import/wrapper chains to collect evidence for ambiguous effects that deterministic checks miss (e.g. notificationClient.sendRefundNotice(...) resolving to resend.emails.send(...)).

Task

Create a new PDD module agentic_reviewer by adding pdd/prompts/agentic_reviewer_python.prompt.

The module must:

  1. Accept as input:
    • A parsed contract/effect IR (from pdd/prompts/contract_ir_python.prompt schema: {modal, action, resource} list)
    • A list of target artifact file paths to inspect
    • Bounds config: max_files, max_follow_depth, max_search_results, max_runtime_seconds
  2. Implement bounded evidence collection:
    • Read target artifact: extract imports, calls, env reads, file writes, network calls, logging calls
    • Inspect dependency manifests: package.json, requirements.txt, pyproject.toml, go.mod
    • For ambiguous local symbols: follow local import/definition chains up to max_follow_depth (default: 2)
    • Optionally inspect co-located tests/docs that mention the same symbol
    • Hard stop at max_files files inspected
  3. For ambiguous effects, call a constrained LLM classifier with structured input:
    {"contract_effects": [...], "target": "typescript", "observed_evidence": [...], "deterministic_findings": []}
    LLM must return strict JSON findings list. On invalid JSON or LLM unavailability: emit no agentic findings (graceful fallback, do not propagate exception).
  4. Return normalized findings using same schema as deterministic findings plus extra fields:
    • source: "agentic_reviewer"
    • severity: "warning" (default)
    • confidence: float
    • effect: {action, resource}
    • message: str
    • evidence: [{file, line, excerpt}]
    • agent_steps: [...]
  5. When evidence is insufficient, return: {"judgment": "unknown", "confidence": <0.5, "message": "Insufficient evidence..."}
  6. Mode: read-only, no network access, no code execution.

Reference files

  • pdd/prompts/contract_ir_python.prompt — input IR schema
  • pdd/prompts/evidence_manifest_python.prompt — output schema conventions
  • context/agentic_checkup_example.py — agentic module patterns
  • context/agentic_checkup_orchestrator_example.py — orchestrator patterns

Acceptance Criteria

  • Module agentic_reviewer is importable and callable with contract IR + file paths + bounds
  • Given a TypeScript file that calls notificationClient.sendRefundNotice(email) and a wrapper file clients/notificationClient.ts that contains resend.emails.send(...), and a contract with MUST_NOT send email, the reviewer returns a warning finding with evidence from both files
  • Following stops at max_follow_depth=2; symbols beyond that depth are not traversed
  • Invalid LLM JSON (e.g. truncated or non-JSON response) causes graceful fallback: no findings emitted, no exception raised
  • When evidence is ambiguous or missing, returns judgment: unknown with confidence < 0.5
  • All bounds (max_files, max_follow_depth, max_search_results) are respected
  • Unit tests cover: wrapper-following (email hidden behind local client), unknown dependency classified as possible violation, insufficient evidence returning unknown, max-depth cutoff, invalid LLM JSON fallback

PDD Command Hint

change, sync


Split Contract

Command sequence: change → sync
Allowed write set:

  • pdd/prompts/agentic_reviewer_python.prompt
    Acceptance criteria:
  • New pdd/prompts/agentic_reviewer_python.prompt is created and agentic_reviewer module is generated
  • Given a target file calling a local wrapper that resolves to resend.emails.send(), reviewer returns a warning finding with evidence from both files when contract forbids sending email
  • Symbol following stops at max_follow_depth; beyond that depth is not traversed
  • Invalid LLM JSON triggers graceful fallback: no findings emitted and no exception raised
  • Insufficient evidence returns judgment: unknown with confidence < 0.5
  • All bounds (max_files, max_follow_depth, max_search_results) are enforced
  • Tests cover wrapper-following, unknown dependency, insufficient evidence, max-depth cutoff, invalid LLM JSON fallback
    Independently mergeable: True
    Scope rule: Do not expand beyond this contract or implement sibling sub-issue work. If the contract is insufficient, report the gap instead.

PDD Command Hint: This is a new feature. Use change → sync (modify prompts, then generate and validate code).


Parent: #1371
Parent issue: #1371

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    Status
    In progress

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions