Skip to content

Commit c03e244

Browse files
committed
feat(project): add review pr task
1 parent 9a8b993 commit c03e244

38 files changed

Lines changed: 565 additions & 133 deletions

AGENTS.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ To get started and understand the developer flow, follow the [Developer guide](.
2424
- **`scripts/`** (root) — Optional cross-package helpers; **`scripts/ci-build.sh`** runs the full monorepo build (same as CI).
2525
- **`cdk/`** — CDK app package (`@abca/cdk`): `cdk/src/`, `cdk/test/`, `cdk/cdk.json`, `cdk/tsconfig.json`, `cdk/tsconfig.dev.json`, and `cdk/.eslintrc.json`.
2626
- **`cli/`**`@backgroundagent/cli` — CLI tool for interacting with the deployed REST API (see below).
27-
- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements). The system prompt is refactored into `agent/prompts/` with a shared base template and per-task-type workflow variants (`new_task`, `pr_iteration`).
27+
- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements). The system prompt is refactored into `agent/prompts/` with a shared base template and per-task-type workflow variants (`new_task`, `pr_iteration`, `pr_review`).
2828
- **`docs/`** — Authoritative Markdown in `guides/` (developer, user, roadmap, prompt) and `design/`; assets in `diagrams/`, `imgs/`. The Starlight docs site lives here (`astro.config.mjs`, `package.json`); `src/content/docs/` is refreshed via `docs/scripts/sync-starlight.mjs`.
2929
- **`CONTRIBUTING.md`** — Contribution guidelines at the repository root.
3030
- **`package.json`** (root), **`yarn.lock`** — Yarn workspace root (minimal manifest); dependencies live in **`cdk/`**, **`cli/`**, and **`docs/`** package manifests.
@@ -40,7 +40,7 @@ The `@backgroundagent/cli` package provides the `bgagent` executable for submitt
4040
- `src/api-client.ts` — HTTP client wrapping `fetch` with auth header injection
4141
- `src/auth.ts` — Cognito login, token caching (`~/.bgagent/credentials.json`), auto-refresh
4242
- `src/config.ts` — Read/write `~/.bgagent/config.json`
43-
- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`), including `TaskType` (`new_task` | `pr_iteration`)
43+
- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`), including `TaskType` (`new_task` | `pr_iteration` | `pr_review`)
4444
- `src/format.ts` — Output formatting (table, detail view, JSON)
4545
- `src/debug.ts` — Verbose/debug logging (`--verbose` flag)
4646
- `src/errors.ts``CliError` and `ApiError` classes

agent/README.md

Lines changed: 22 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Agent Runtime
22

3-
The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and opens a pull request. Runs as a Docker container with two modes:
3+
The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and delivers a result — a new pull request (`new_task`), updates to an existing PR (`pr_iteration`), or structured review comments on a PR (`pr_review`). Runs as a Docker container with two modes:
44

55
- **Local mode** — batch execution via `run.sh` with AgentCore-matching constraints (2 vCPU, 8 GB RAM)
66
- **AgentCore mode** — FastAPI server on port 8080 with `/invocations` and `/ping` endpoints, deployable to AWS Bedrock AgentCore Runtime
@@ -224,6 +224,12 @@ bgagent submit --repo owner/repo --task "update the rfc issue template"
224224
# Submit with a GitHub issue
225225
bgagent submit --repo owner/repo --issue 42
226226

227+
# Iterate on a PR (address review feedback)
228+
bgagent submit --repo owner/repo --pr 42
229+
230+
# Review a PR (read-only — posts structured review comments)
231+
bgagent submit --repo owner/repo --review-pr 55
232+
227233
# Submit and wait for completion
228234
bgagent submit --repo owner/repo --issue 42 --wait
229235
```
@@ -252,18 +258,18 @@ The `run.sh` script prints these commands when it starts.
252258

253259
## What It Does
254260

255-
The agent pipeline (shared by both modes):
261+
The agent pipeline (shared by both modes). Behavior varies by task type (`new_task`, `pr_iteration`, `pr_review`):
256262

257263
1. **Config validation** — checks required parameters
258-
2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided
259-
3. **Prompt assembly** — combines the system prompt (behavioral contract) with the issue context and task description
260-
4. **Deterministic pre-hooks** — clones repo, creates branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint`
264+
2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided; for `pr_iteration` and `pr_review`, fetches PR context (diff, description, review comments)
265+
3. **Prompt assembly** — combines the system prompt (behavioral contract, selected by task type from `prompts/`) with the issue/PR context and task description
266+
4. **Deterministic pre-hooks** — clones repo, creates or checks out branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint`
261267
5. **Agent execution** — invokes the Claude Agent SDK via the `ClaudeSDKClient` class (connect/query/receive_response pattern) in unattended mode. The agent:
262268
- Understands the codebase
263-
- Makes changes, runs tests and linters
264-
- Commits and pushes after each unit of work
265-
- Creates a pull request with summary, testing notes, and decisions
266-
6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not)
269+
- **`new_task`**: Makes changes, runs tests and linters, commits and pushes after each unit of work, creates a pull request
270+
- **`pr_iteration`**: Reads review feedback, addresses it with focused changes, commits and pushes, posts a summary comment on the PR
271+
- **`pr_review`**: Analyzes changes read-only (no `Write` or `Edit` tools available), composes structured review findings, posts a batch review via the GitHub Reviews API
272+
6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not). For `pr_review`, build status is informational only and the commit/push steps are skipped.
267273
7. **Metrics** — returns duration, disk usage, turn count, cost, and PR URL
268274

269275
## Metrics
@@ -322,9 +328,16 @@ agent/
322328
├── task_state.py Best-effort DynamoDB task status (no-op if TASK_TABLE_NAME unset)
323329
├── observability.py OpenTelemetry helpers (e.g. AgentCore session id)
324330
├── memory.py Optional memory / episode integration for the agent
331+
├── prompts/ Per-task-type system prompt workflows
332+
│ ├── __init__.py Prompt registry — assembles base template + workflow for each task type
333+
│ ├── base.py Shared base template (environment, rules, placeholders)
334+
│ ├── new_task.py Workflow for new_task (create branch, implement, open PR)
335+
│ ├── pr_iteration.py Workflow for pr_iteration (read feedback, address, push)
336+
│ └── pr_review.py Workflow for pr_review (read-only analysis, structured review comments)
325337
├── system_prompt.py Behavioral contract (PRD Section 11)
326338
├── prepare-commit-msg.sh Git hook (Task-Id / Prompt-Version trailers on commits)
327339
├── run.sh Build + run helper for local/server mode with AgentCore constraints
340+
├── tests/ pytest unit tests for pure functions and prompt assembly
328341
├── test_sdk_smoke.py Diagnostic: minimal SDK smoke test (ClaudeSDKClient → CLI → Bedrock)
329342
└── test_subprocess_threading.py Diagnostic: subprocess-in-background-thread verification
330343
```

agent/entrypoint.py

Lines changed: 39 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -40,6 +40,9 @@
4040

4141
AGENT_WORKSPACE = os.environ.get("AGENT_WORKSPACE", "/workspace")
4242

43+
# Task types that operate on an existing pull request.
44+
PR_TASK_TYPES = frozenset(("pr_iteration", "pr_review"))
45+
4346

4447
def resolve_github_token() -> str:
4548
"""Resolve GitHub token from Secrets Manager or environment variable.
@@ -110,9 +113,9 @@ def build_config(
110113
errors.append("github_token is required")
111114
if not config["aws_region"]:
112115
errors.append("aws_region is required for Bedrock")
113-
if config["task_type"] == "pr_iteration":
116+
if config["task_type"] in PR_TASK_TYPES:
114117
if not config["pr_number"]:
115-
errors.append("pr_number is required for pr_iteration task type")
118+
errors.append("pr_number is required for pr_iteration/pr_review task type")
116119
elif not config["issue_number"] and not config["task_description"]:
117120
errors.append("Either issue_number or task_description is required")
118121

@@ -313,7 +316,7 @@ def setup_repo(config: dict) -> dict:
313316
repo_dir = f"{AGENT_WORKSPACE}/{config['task_id']}"
314317
setup: dict[str, str | list[str] | bool] = {"repo_dir": repo_dir, "notes": []}
315318

316-
if config.get("task_type") == "pr_iteration" and config.get("branch_name"):
319+
if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"):
317320
branch = config["branch_name"]
318321
setup["branch"] = branch
319322
else:
@@ -358,7 +361,7 @@ def setup_repo(config: dict) -> dict:
358361
)
359362

360363
# Branch setup
361-
if config.get("task_type") == "pr_iteration" and config.get("branch_name"):
364+
if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"):
362365
log("SETUP", f"Checking out existing PR branch: {branch}")
363366
run_cmd(
364367
["git", "fetch", "origin", branch],
@@ -429,8 +432,8 @@ def setup_repo(config: dict) -> dict:
429432
setup["lint_before"] = True
430433

431434
# Detect default branch
432-
# For PR iteration: use base_branch from orchestrator if available
433-
if config.get("task_type") == "pr_iteration" and config.get("base_branch"):
435+
# For PR tasks (pr_iteration, pr_review): use base_branch from orchestrator if available
436+
if config.get("task_type") in PR_TASK_TYPES and config.get("base_branch"):
434437
setup["default_branch"] = config["base_branch"]
435438
else:
436439
setup["default_branch"] = detect_default_branch(config["repo_url"], repo_dir)
@@ -651,6 +654,10 @@ def ensure_pr(
651654
) -> str | None:
652655
"""Check if a PR exists for the branch; if not, create one.
653656
657+
For ``new_task``: creates a new PR if needed.
658+
For ``pr_iteration``: pushes commits, then resolves the existing PR URL.
659+
For ``pr_review``: resolves the existing PR URL without pushing (read-only).
660+
654661
Returns the PR URL, or None if there are no commits beyond the default
655662
branch or PR creation failed. ``build_passed`` and ``lint_passed`` control
656663
the verification status shown in the PR body.
@@ -659,11 +666,14 @@ def ensure_pr(
659666
branch = setup["branch"]
660667
default_branch = setup.get("default_branch", "main")
661668

662-
# PR iteration: skip PR creation — just push and return existing PR URL
663-
if config.get("task_type") == "pr_iteration":
664-
if not ensure_pushed(repo_dir, branch):
665-
log("WARN", "Failed to push commits before resolving PR URL")
666-
log("POST", "PR iteration — returning existing PR URL")
669+
# PR iteration/review: skip PR creation — just resolve existing PR URL
670+
if config.get("task_type") in PR_TASK_TYPES:
671+
if config.get("task_type") == "pr_iteration":
672+
if not ensure_pushed(repo_dir, branch):
673+
log("WARN", "Failed to push commits before resolving PR URL")
674+
else:
675+
log("POST", "pr_review task — skipping push (read-only)")
676+
log("POST", f"{config.get('task_type')} — returning existing PR URL")
667677
result = subprocess.run(
668678
[
669679
"gh",
@@ -1336,10 +1346,15 @@ def _on_stderr(line: str) -> None:
13361346
else:
13371347
log("WARN", "claude CLI not found on PATH")
13381348

1349+
if config.get("task_type") == "pr_review":
1350+
allowed_tools = ["Bash", "Read", "Glob", "Grep", "WebFetch"]
1351+
else:
1352+
allowed_tools = ["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"]
1353+
13391354
options = ClaudeAgentOptions(
13401355
model=config["anthropic_model"],
13411356
system_prompt=system_prompt,
1342-
allowed_tools=["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"],
1357+
allowed_tools=allowed_tools,
13431358
permission_mode="bypassPermissions",
13441359
cwd=cwd,
13451360
max_turns=config["max_turns"],
@@ -1849,8 +1864,11 @@ def run_task(
18491864

18501865
# Post-hooks
18511866
with task_span("task.post_hooks") as post_span:
1852-
# Safety net: commit any uncommitted tracked changes
1853-
safety_committed = ensure_committed(setup["repo_dir"])
1867+
# Safety net: commit any uncommitted tracked changes (skip for read-only tasks)
1868+
if config.get("task_type") == "pr_review":
1869+
safety_committed = False
1870+
else:
1871+
safety_committed = ensure_committed(setup["repo_dir"])
18541872
post_span.set_attribute("safety_net.committed", safety_committed)
18551873

18561874
build_passed = verify_build(setup["repo_dir"])
@@ -1894,8 +1912,13 @@ def run_task(
18941912
# Default True = assume build was green before, so a post-agent
18951913
# failure IS counted as a regression (conservative).
18961914
build_before = setup.get("build_before", True)
1897-
build_ok = build_passed or not build_before
1898-
if not build_passed and not build_before:
1915+
if config.get("task_type") == "pr_review":
1916+
build_ok = True # Review task — build status is informational only
1917+
if not build_passed:
1918+
log("INFO", "pr_review: build failed — informational only, not gating")
1919+
else:
1920+
build_ok = build_passed or not build_before
1921+
if not build_passed and not build_before and config.get("task_type") != "pr_review":
18991922
log(
19001923
"WARN",
19011924
"Post-agent build failed, but build was already failing before "

agent/prompts/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,10 +3,12 @@
33
from .base import BASE_PROMPT
44
from .new_task import NEW_TASK_WORKFLOW
55
from .pr_iteration import PR_ITERATION_WORKFLOW
6+
from .pr_review import PR_REVIEW_WORKFLOW
67

78
_PROMPTS = {
89
"new_task": BASE_PROMPT.replace("{workflow}", NEW_TASK_WORKFLOW),
910
"pr_iteration": BASE_PROMPT.replace("{workflow}", PR_ITERATION_WORKFLOW),
11+
"pr_review": BASE_PROMPT.replace("{workflow}", PR_REVIEW_WORKFLOW),
1012
}
1113

1214

agent/prompts/pr_review.py

Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
"""Workflow section for pr_review (review an existing PR without modifying code)."""
2+
3+
PR_REVIEW_WORKFLOW = """\
4+
## Rules override
5+
6+
**This is a READ-ONLY review task.** The base prompt rules about "Full permissions", \
7+
"modify any files", and "install any dependencies" do NOT apply to this task. You \
8+
must NOT modify any source code files, configuration files, or project dependencies. \
9+
Your only outputs are GitHub review comments and a summary comment on the PR. \
10+
Your tool permissions enforce this: you have access to Bash, Read, Glob, Grep, \
11+
and WebFetch only — Write and Edit are not available.
12+
13+
## Workflow
14+
15+
You are reviewing pull request #{pr_number} on `{repo_url}`. Your goal is to \
16+
analyze the changes and post a structured code review using the GitHub Reviews API. \
17+
You must NOT modify any files — this is a read-only task.
18+
19+
Follow these steps in order:
20+
21+
1. **Understand the PR context**
22+
Read the PR title, body, and any existing review or conversation comments. \
23+
Understand what the PR is trying to achieve and any constraints or requirements \
24+
mentioned by the author or reviewers.
25+
26+
2. **Analyze the changes**
27+
- Read the full source files for every file changed in the PR (not just the diff hunks). \
28+
Context matters — you need to understand how the changed code fits into the broader \
29+
file and module.
30+
- Check for correctness, edge cases, error handling, security issues, test coverage, \
31+
performance concerns, and adherence to project conventions.
32+
- Run `mise run build` to check whether the PR builds and tests pass. This is for \
33+
your analysis — the result does NOT gate the review.
34+
- If the repository has a CLAUDE.md, CONTRIBUTING.md, or style guide, check \
35+
adherence to those guidelines.
36+
37+
3. **Leverage repository memory context**
38+
If previous knowledge about this repository is available (see "Previous knowledge \
39+
about this repository" above), use it to inform your review. Reference specific \
40+
repository conventions, past issues, or known patterns when relevant. When a finding \
41+
is informed by repository memory, note it in the description.
42+
43+
4. **Compose findings using the structured comment format**
44+
For each finding, use this format:
45+
46+
```
47+
**Type**: <comment | question | issue | good_point>
48+
**Severity**: <minor | medium | major | critical>
49+
**Title**: <Short descriptive title>
50+
51+
**Description**: <Detailed explanation of the finding. If informed by repository \
52+
memory, note: "(Informed by repository memory: <brief attribution>)")>
53+
54+
**Proposed fix**: <If applicable, describe what should change. Omit for questions \
55+
and good_point types.>
56+
57+
**AI prompt**: <A ready-to-use prompt that an AI coding assistant could use to \
58+
address this finding. Should be specific enough to act on without additional context. \
59+
Omit for good_point types.>
60+
```
61+
62+
Classification guidelines:
63+
- `comment` — An observation, suggestion, or non-blocking recommendation.
64+
- `question` — Something that needs clarification from the author before \
65+
the reviewer can form an opinion. Always phrase as a clear question.
66+
- `issue` — A defect, bug, or problem that should be fixed. Severity:
67+
- `minor` — Style, naming, minor readability concern.
68+
- `medium` — Logic issue, missing validation, or test gap that could cause \
69+
problems in some scenarios.
70+
- `major` — Significant bug, security vulnerability, or correctness issue \
71+
that will likely cause production problems.
72+
- `critical` — Data loss, security breach, or crash affecting all users. \
73+
Must be fixed before merge.
74+
- `good_point` — Something done well that is worth highlighting. No severity, \
75+
proposed fix, or AI prompt needed.
76+
77+
The `Severity` line should ONLY be present for `issue` type findings.
78+
79+
5. **Post the review via the GitHub Reviews API**
80+
Batch ALL findings into a single review submission using the GitHub Reviews API:
81+
```
82+
gh api repos/{repo_url}/pulls/{pr_number}/reviews \\
83+
--method POST \\
84+
-f event="COMMENT" \\
85+
-f body="<review summary>" \\
86+
-f 'comments[]={{"path":"<file>","line":<line>,"body":"<finding>"}}'
87+
```
88+
- Use `event: "COMMENT"` — do NOT approve or request changes.
89+
- Place comments on the specific lines they refer to. Use the diff hunks and \
90+
file contents to determine the correct line numbers.
91+
- For findings that are not file-specific (e.g. architecture concerns, missing \
92+
tests), include them in the review body rather than as line comments.
93+
94+
6. **Post a summary comment on the PR**
95+
After submitting the review, add a top-level summary comment:
96+
```
97+
gh pr comment {pr_number} --repo {repo_url} --body "<summary>"
98+
```
99+
The summary should include:
100+
- Total number of findings by type (issues, comments, questions, good points)
101+
- A brief assessment of overall PR quality
102+
- Key areas that need attention before merge
103+
- Build/test results from step 2\
104+
"""

0 commit comments

Comments
 (0)