aws-samples
diff --git a/‎AGENTS.md‎
Lines changed: 2 additions & 2 deletions b/‎AGENTS.md‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎agent/README.md‎
Lines changed: 22 additions & 9 deletions b/‎agent/README.md‎
Lines changed: 22 additions & 9 deletions
diff --git a/‎agent/entrypoint.py‎
Lines changed: 39 additions & 16 deletions b/‎agent/entrypoint.py‎
Lines changed: 39 additions & 16 deletions
diff --git a/‎agent/prompts/__init__.py‎
Lines changed: 2 additions & 0 deletions b/‎agent/prompts/__init__.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎agent/prompts/pr_review.py‎
Lines changed: 104 additions & 0 deletions b/‎agent/prompts/pr_review.py‎
Lines changed: 104 additions & 0 deletions
@@ -24,7 +24,7 @@ To get started and understand the developer flow, follow the [Developer guide](.
 - **`scripts/`** (root) — Optional cross-package helpers; **`scripts/ci-build.sh`** runs the full monorepo build (same as CI).
 - **`cdk/`** — CDK app package (`@abca/cdk`): `cdk/src/`, `cdk/test/`, `cdk/cdk.json`, `cdk/tsconfig.json`, `cdk/tsconfig.dev.json`, and `cdk/.eslintrc.json`.
 - **`cli/`** — `@backgroundagent/cli` — CLI tool for interacting with the deployed REST API (see below).
-- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements). The system prompt is refactored into `agent/prompts/` with a shared base template and per-task-type workflow variants (`new_task`, `pr_iteration`).
+- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements). The system prompt is refactored into `agent/prompts/` with a shared base template and per-task-type workflow variants (`new_task`, `pr_iteration`, `pr_review`).
 - **`docs/`** — Authoritative Markdown in `guides/` (developer, user, roadmap, prompt) and `design/`; assets in `diagrams/`, `imgs/`. The Starlight docs site lives here (`astro.config.mjs`, `package.json`); `src/content/docs/` is refreshed via `docs/scripts/sync-starlight.mjs`.
 - **`CONTRIBUTING.md`** — Contribution guidelines at the repository root.
 - **`package.json`** (root), **`yarn.lock`** — Yarn workspace root (minimal manifest); dependencies live in **`cdk/`**, **`cli/`**, and **`docs/`** package manifests.
@@ -40,7 +40,7 @@ The `@backgroundagent/cli` package provides the `bgagent` executable for submitt
 - `src/api-client.ts` — HTTP client wrapping `fetch` with auth header injection
 - `src/auth.ts` — Cognito login, token caching (`~/.bgagent/credentials.json`), auto-refresh
 - `src/config.ts` — Read/write `~/.bgagent/config.json`
-- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`), including `TaskType` (`new_task` | `pr_iteration`)
+- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`), including `TaskType` (`new_task` | `pr_iteration` | `pr_review`)
 - `src/format.ts` — Output formatting (table, detail view, JSON)
 - `src/debug.ts` — Verbose/debug logging (`--verbose` flag)
 - `src/errors.ts` — `CliError` and `ApiError` classes
 
@@ -1,6 +1,6 @@
 # Agent Runtime
 
-The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and opens a pull request. Runs as a Docker container with two modes:
+The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and delivers a result — a new pull request (`new_task`), updates to an existing PR (`pr_iteration`), or structured review comments on a PR (`pr_review`). Runs as a Docker container with two modes:
 
 - **Local mode** — batch execution via `run.sh` with AgentCore-matching constraints (2 vCPU, 8 GB RAM)
 - **AgentCore mode** — FastAPI server on port 8080 with `/invocations` and `/ping` endpoints, deployable to AWS Bedrock AgentCore Runtime
@@ -224,6 +224,12 @@ bgagent submit --repo owner/repo --task "update the rfc issue template"
 # Submit with a GitHub issue
 bgagent submit --repo owner/repo --issue 42
 
+# Iterate on a PR (address review feedback)
+bgagent submit --repo owner/repo --pr 42
+
+# Review a PR (read-only — posts structured review comments)
+bgagent submit --repo owner/repo --review-pr 55
+
 # Submit and wait for completion
 bgagent submit --repo owner/repo --issue 42 --wait
 ```
@@ -252,18 +258,18 @@ The `run.sh` script prints these commands when it starts.
 
 ## What It Does
 
-The agent pipeline (shared by both modes):
+The agent pipeline (shared by both modes). Behavior varies by task type (`new_task`, `pr_iteration`, `pr_review`):
 
 1. **Config validation** — checks required parameters
-2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided
-3. **Prompt assembly** — combines the system prompt (behavioral contract) with the issue context and task description
-4. **Deterministic pre-hooks** — clones repo, creates branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint`
+2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided; for `pr_iteration` and `pr_review`, fetches PR context (diff, description, review comments)
+3. **Prompt assembly** — combines the system prompt (behavioral contract, selected by task type from `prompts/`) with the issue/PR context and task description
+4. **Deterministic pre-hooks** — clones repo, creates or checks out branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint`
 5. **Agent execution** — invokes the Claude Agent SDK via the `ClaudeSDKClient` class (connect/query/receive_response pattern) in unattended mode. The agent:
    - Understands the codebase
-   - Makes changes, runs tests and linters
-   - Commits and pushes after each unit of work
-   - Creates a pull request with summary, testing notes, and decisions
-6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not)
+   - **`new_task`**: Makes changes, runs tests and linters, commits and pushes after each unit of work, creates a pull request
+   - **`pr_iteration`**: Reads review feedback, addresses it with focused changes, commits and pushes, posts a summary comment on the PR
+   - **`pr_review`**: Analyzes changes read-only (no `Write` or `Edit` tools available), composes structured review findings, posts a batch review via the GitHub Reviews API
+6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not). For `pr_review`, build status is informational only and the commit/push steps are skipped.
 7. **Metrics** — returns duration, disk usage, turn count, cost, and PR URL
 
 ## Metrics
@@ -322,9 +328,16 @@ agent/
 ├── task_state.py        Best-effort DynamoDB task status (no-op if TASK_TABLE_NAME unset)
 ├── observability.py     OpenTelemetry helpers (e.g. AgentCore session id)
 ├── memory.py            Optional memory / episode integration for the agent
+├── prompts/             Per-task-type system prompt workflows
+│   ├── __init__.py      Prompt registry — assembles base template + workflow for each task type
+│   ├── base.py          Shared base template (environment, rules, placeholders)
+│   ├── new_task.py      Workflow for new_task (create branch, implement, open PR)
+│   ├── pr_iteration.py  Workflow for pr_iteration (read feedback, address, push)
+│   └── pr_review.py     Workflow for pr_review (read-only analysis, structured review comments)
 ├── system_prompt.py     Behavioral contract (PRD Section 11)
 ├── prepare-commit-msg.sh Git hook (Task-Id / Prompt-Version trailers on commits)
 ├── run.sh               Build + run helper for local/server mode with AgentCore constraints
+├── tests/               pytest unit tests for pure functions and prompt assembly
 ├── test_sdk_smoke.py    Diagnostic: minimal SDK smoke test (ClaudeSDKClient → CLI → Bedrock)
 └── test_subprocess_threading.py  Diagnostic: subprocess-in-background-thread verification
 ```
 
@@ -40,6 +40,9 @@
 
 AGENT_WORKSPACE = os.environ.get("AGENT_WORKSPACE", "/workspace")
 
+# Task types that operate on an existing pull request.
+PR_TASK_TYPES = frozenset(("pr_iteration", "pr_review"))
+
 
 def resolve_github_token() -> str:
     """Resolve GitHub token from Secrets Manager or environment variable.
@@ -110,9 +113,9 @@ def build_config(
         errors.append("github_token is required")
     if not config["aws_region"]:
         errors.append("aws_region is required for Bedrock")
-    if config["task_type"] == "pr_iteration":
+    if config["task_type"] in PR_TASK_TYPES:
         if not config["pr_number"]:
-            errors.append("pr_number is required for pr_iteration task type")
+            errors.append("pr_number is required for pr_iteration/pr_review task type")
     elif not config["issue_number"] and not config["task_description"]:
         errors.append("Either issue_number or task_description is required")
 
@@ -313,7 +316,7 @@ def setup_repo(config: dict) -> dict:
     repo_dir = f"{AGENT_WORKSPACE}/{config['task_id']}"
     setup: dict[str, str | list[str] | bool] = {"repo_dir": repo_dir, "notes": []}
 
-    if config.get("task_type") == "pr_iteration" and config.get("branch_name"):
+    if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"):
         branch = config["branch_name"]
         setup["branch"] = branch
     else:
@@ -358,7 +361,7 @@ def setup_repo(config: dict) -> dict:
     )
 
     # Branch setup
-    if config.get("task_type") == "pr_iteration" and config.get("branch_name"):
+    if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"):
         log("SETUP", f"Checking out existing PR branch: {branch}")
         run_cmd(
             ["git", "fetch", "origin", branch],
@@ -429,8 +432,8 @@ def setup_repo(config: dict) -> dict:
         setup["lint_before"] = True
 
     # Detect default branch
-    # For PR iteration: use base_branch from orchestrator if available
-    if config.get("task_type") == "pr_iteration" and config.get("base_branch"):
+    # For PR tasks (pr_iteration, pr_review): use base_branch from orchestrator if available
+    if config.get("task_type") in PR_TASK_TYPES and config.get("base_branch"):
         setup["default_branch"] = config["base_branch"]
     else:
         setup["default_branch"] = detect_default_branch(config["repo_url"], repo_dir)
@@ -651,6 +654,10 @@ def ensure_pr(
 ) -> str | None:
     """Check if a PR exists for the branch; if not, create one.
 
+    For ``new_task``: creates a new PR if needed.
+    For ``pr_iteration``: pushes commits, then resolves the existing PR URL.
+    For ``pr_review``: resolves the existing PR URL without pushing (read-only).
+
     Returns the PR URL, or None if there are no commits beyond the default
     branch or PR creation failed. ``build_passed`` and ``lint_passed`` control
     the verification status shown in the PR body.
@@ -659,11 +666,14 @@ def ensure_pr(
     branch = setup["branch"]
     default_branch = setup.get("default_branch", "main")
 
-    # PR iteration: skip PR creation — just push and return existing PR URL
-    if config.get("task_type") == "pr_iteration":
-        if not ensure_pushed(repo_dir, branch):
-            log("WARN", "Failed to push commits before resolving PR URL")
-        log("POST", "PR iteration — returning existing PR URL")
+    # PR iteration/review: skip PR creation — just resolve existing PR URL
+    if config.get("task_type") in PR_TASK_TYPES:
+        if config.get("task_type") == "pr_iteration":
+            if not ensure_pushed(repo_dir, branch):
+                log("WARN", "Failed to push commits before resolving PR URL")
+        else:
+            log("POST", "pr_review task — skipping push (read-only)")
+        log("POST", f"{config.get('task_type')} — returning existing PR URL")
         result = subprocess.run(
             [
                 "gh",
@@ -1336,10 +1346,15 @@ def _on_stderr(line: str) -> None:
     else:
         log("WARN", "claude CLI not found on PATH")
 
+    if config.get("task_type") == "pr_review":
+        allowed_tools = ["Bash", "Read", "Glob", "Grep", "WebFetch"]
+    else:
+        allowed_tools = ["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"]
+
     options = ClaudeAgentOptions(
         model=config["anthropic_model"],
         system_prompt=system_prompt,
-        allowed_tools=["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"],
+        allowed_tools=allowed_tools,
         permission_mode="bypassPermissions",
         cwd=cwd,
         max_turns=config["max_turns"],
@@ -1849,8 +1864,11 @@ def run_task(
 
             # Post-hooks
             with task_span("task.post_hooks") as post_span:
-                # Safety net: commit any uncommitted tracked changes
-                safety_committed = ensure_committed(setup["repo_dir"])
+                # Safety net: commit any uncommitted tracked changes (skip for read-only tasks)
+                if config.get("task_type") == "pr_review":
+                    safety_committed = False
+                else:
+                    safety_committed = ensure_committed(setup["repo_dir"])
                 post_span.set_attribute("safety_net.committed", safety_committed)
 
                 build_passed = verify_build(setup["repo_dir"])
@@ -1894,8 +1912,13 @@ def run_task(
             # Default True = assume build was green before, so a post-agent
             # failure IS counted as a regression (conservative).
             build_before = setup.get("build_before", True)
-            build_ok = build_passed or not build_before
-            if not build_passed and not build_before:
+            if config.get("task_type") == "pr_review":
+                build_ok = True  # Review task — build status is informational only
+                if not build_passed:
+                    log("INFO", "pr_review: build failed — informational only, not gating")
+            else:
+                build_ok = build_passed or not build_before
+            if not build_passed and not build_before and config.get("task_type") != "pr_review":
                 log(
                     "WARN",
                     "Post-agent build failed, but build was already failing before "
 
@@ -3,10 +3,12 @@
 from .base import BASE_PROMPT
 from .new_task import NEW_TASK_WORKFLOW
 from .pr_iteration import PR_ITERATION_WORKFLOW
+from .pr_review import PR_REVIEW_WORKFLOW
 
 _PROMPTS = {
     "new_task": BASE_PROMPT.replace("{workflow}", NEW_TASK_WORKFLOW),
     "pr_iteration": BASE_PROMPT.replace("{workflow}", PR_ITERATION_WORKFLOW),
+    "pr_review": BASE_PROMPT.replace("{workflow}", PR_REVIEW_WORKFLOW),
 }
 
 
 
@@ -0,0 +1,104 @@
+"""Workflow section for pr_review (review an existing PR without modifying code)."""
+
+PR_REVIEW_WORKFLOW = """\
+## Rules override
+
+**This is a READ-ONLY review task.** The base prompt rules about "Full permissions", \
+"modify any files", and "install any dependencies" do NOT apply to this task. You \
+must NOT modify any source code files, configuration files, or project dependencies. \
+Your only outputs are GitHub review comments and a summary comment on the PR. \
+Your tool permissions enforce this: you have access to Bash, Read, Glob, Grep, \
+and WebFetch only — Write and Edit are not available.
+
+## Workflow
+
+You are reviewing pull request #{pr_number} on `{repo_url}`. Your goal is to \
+analyze the changes and post a structured code review using the GitHub Reviews API. \
+You must NOT modify any files — this is a read-only task.
+
+Follow these steps in order:
+
+1. **Understand the PR context**
+   Read the PR title, body, and any existing review or conversation comments. \
+Understand what the PR is trying to achieve and any constraints or requirements \
+mentioned by the author or reviewers.
+
+2. **Analyze the changes**
+   - Read the full source files for every file changed in the PR (not just the diff hunks). \
+Context matters — you need to understand how the changed code fits into the broader \
+file and module.
+   - Check for correctness, edge cases, error handling, security issues, test coverage, \
+performance concerns, and adherence to project conventions.
+   - Run `mise run build` to check whether the PR builds and tests pass. This is for \
+your analysis — the result does NOT gate the review.
+   - If the repository has a CLAUDE.md, CONTRIBUTING.md, or style guide, check \
+adherence to those guidelines.
+
+3. **Leverage repository memory context**
+   If previous knowledge about this repository is available (see "Previous knowledge \
+about this repository" above), use it to inform your review. Reference specific \
+repository conventions, past issues, or known patterns when relevant. When a finding \
+is informed by repository memory, note it in the description.
+
+4. **Compose findings using the structured comment format**
+   For each finding, use this format:
+
+   ```
+   **Type**: <comment | question | issue | good_point>
+   **Severity**: <minor | medium | major | critical>
+   **Title**: <Short descriptive title>
+
+   **Description**: <Detailed explanation of the finding. If informed by repository \
+memory, note: "(Informed by repository memory: <brief attribution>)")>
+
+   **Proposed fix**: <If applicable, describe what should change. Omit for questions \
+and good_point types.>
+
+   **AI prompt**: <A ready-to-use prompt that an AI coding assistant could use to \
+address this finding. Should be specific enough to act on without additional context. \
+Omit for good_point types.>
+   ```
+
+   Classification guidelines:
+   - `comment` — An observation, suggestion, or non-blocking recommendation.
+   - `question` — Something that needs clarification from the author before \
+the reviewer can form an opinion. Always phrase as a clear question.
+   - `issue` — A defect, bug, or problem that should be fixed. Severity:
+     - `minor` — Style, naming, minor readability concern.
+     - `medium` — Logic issue, missing validation, or test gap that could cause \
+problems in some scenarios.
+     - `major` — Significant bug, security vulnerability, or correctness issue \
+that will likely cause production problems.
+     - `critical` — Data loss, security breach, or crash affecting all users. \
+Must be fixed before merge.
+   - `good_point` — Something done well that is worth highlighting. No severity, \
+proposed fix, or AI prompt needed.
+
+   The `Severity` line should ONLY be present for `issue` type findings.
+
+5. **Post the review via the GitHub Reviews API**
+   Batch ALL findings into a single review submission using the GitHub Reviews API:
+   ```
+   gh api repos/{repo_url}/pulls/{pr_number}/reviews \\
+     --method POST \\
+     -f event="COMMENT" \\
+     -f body="<review summary>" \\
+     -f 'comments[]={{"path":"<file>","line":<line>,"body":"<finding>"}}'
+   ```
+   - Use `event: "COMMENT"` — do NOT approve or request changes.
+   - Place comments on the specific lines they refer to. Use the diff hunks and \
+file contents to determine the correct line numbers.
+   - For findings that are not file-specific (e.g. architecture concerns, missing \
+tests), include them in the review body rather than as line comments.
+
+6. **Post a summary comment on the PR**
+   After submitting the review, add a top-level summary comment:
+   ```
+   gh pr comment {pr_number} --repo {repo_url} --body "<summary>"
+   ```
+   The summary should include:
+   - Total number of findings by type (issues, comments, questions, good points)
+   - A brief assessment of overall PR quality
+   - Key areas that need attention before merge
+   - Build/test results from step 2\
+"""