Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ To get started and understand the developer flow, follow the [Developer guide](.
- **`scripts/`** (root) — Optional cross-package helpers; **`scripts/ci-build.sh`** runs the full monorepo build (same as CI).
- **`cdk/`** — CDK app package (`@abca/cdk`): `cdk/src/`, `cdk/test/`, `cdk/cdk.json`, `cdk/tsconfig.json`, `cdk/tsconfig.dev.json`, and `cdk/.eslintrc.json`.
- **`cli/`** — `@backgroundagent/cli` — CLI tool for interacting with the deployed REST API (see below).
- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements).
- **`agent/`** — Python code that runs inside the agent compute environment (entrypoint, server, system prompt, Dockerfile, requirements). The system prompt is refactored into `agent/prompts/` with a shared base template and per-task-type workflow variants (`new_task`, `pr_iteration`, `pr_review`).
- **`docs/`** — Authoritative Markdown in `guides/` (developer, user, roadmap, prompt) and `design/`; assets in `diagrams/`, `imgs/`. The Starlight docs site lives here (`astro.config.mjs`, `package.json`); `src/content/docs/` is refreshed via `docs/scripts/sync-starlight.mjs`.
- **`CONTRIBUTING.md`** — Contribution guidelines at the repository root.
- **`package.json`** (root), **`yarn.lock`** — Yarn workspace root (minimal manifest); dependencies live in **`cdk/`**, **`cli/`**, and **`docs/`** package manifests.
Expand All @@ -40,7 +40,7 @@ The `@backgroundagent/cli` package provides the `bgagent` executable for submitt
- `src/api-client.ts` — HTTP client wrapping `fetch` with auth header injection
- `src/auth.ts` — Cognito login, token caching (`~/.bgagent/credentials.json`), auto-refresh
- `src/config.ts` — Read/write `~/.bgagent/config.json`
- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`)
- `src/types.ts` — API request/response types (mirrored from `cdk/src/handlers/shared/types.ts`), including `TaskType` (`new_task` | `pr_iteration` | `pr_review`)
- `src/format.ts` — Output formatting (table, detail view, JSON)
- `src/debug.ts` — Verbose/debug logging (`--verbose` flag)
- `src/errors.ts` — `CliError` and `ApiError` classes
Expand Down
1 change: 1 addition & 0 deletions agent/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ RUN uv sync --frozen --no-dev --directory /app
# Copy agent code (ARG busts cache so file edits are always picked up)
ARG CACHE_BUST=0
COPY entrypoint.py system_prompt.py server.py task_state.py observability.py memory.py /app/
COPY prompts/ /app/prompts/
COPY prepare-commit-msg.sh /app/
COPY test_sdk_smoke.py test_subprocess_threading.py /app/

Expand Down
31 changes: 22 additions & 9 deletions agent/README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Agent Runtime

The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and opens a pull request. Runs as a Docker container with two modes:
The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and delivers a result — a new pull request (`new_task`), updates to an existing PR (`pr_iteration`), or structured review comments on a PR (`pr_review`). Runs as a Docker container with two modes:

- **Local mode** — batch execution via `run.sh` with AgentCore-matching constraints (2 vCPU, 8 GB RAM)
- **AgentCore mode** — FastAPI server on port 8080 with `/invocations` and `/ping` endpoints, deployable to AWS Bedrock AgentCore Runtime
Expand Down Expand Up @@ -224,6 +224,12 @@ bgagent submit --repo owner/repo --task "update the rfc issue template"
# Submit with a GitHub issue
bgagent submit --repo owner/repo --issue 42

# Iterate on a PR (address review feedback)
bgagent submit --repo owner/repo --pr 42

# Review a PR (read-only — posts structured review comments)
bgagent submit --repo owner/repo --review-pr 55

# Submit and wait for completion
bgagent submit --repo owner/repo --issue 42 --wait
```
Expand Down Expand Up @@ -252,18 +258,18 @@ The `run.sh` script prints these commands when it starts.

## What It Does

The agent pipeline (shared by both modes):
The agent pipeline (shared by both modes). Behavior varies by task type (`new_task`, `pr_iteration`, `pr_review`):

1. **Config validation** — checks required parameters
2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided
3. **Prompt assembly** — combines the system prompt (behavioral contract) with the issue context and task description
4. **Deterministic pre-hooks** — clones repo, creates branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint`
2. **Context hydration** — fetches the GitHub issue (title, body, comments) if an issue number is provided; for `pr_iteration` and `pr_review`, fetches PR context (diff, description, review comments)
3. **Prompt assembly** — combines the system prompt (behavioral contract, selected by task type from `prompts/`) with the issue/PR context and task description
4. **Deterministic pre-hooks** — clones repo, creates or checks out branch, configures git auth, runs `mise trust`, `mise install`, `mise run build`, and `mise run lint`
5. **Agent execution** — invokes the Claude Agent SDK via the `ClaudeSDKClient` class (connect/query/receive_response pattern) in unattended mode. The agent:
- Understands the codebase
- Makes changes, runs tests and linters
- Commits and pushes after each unit of work
- Creates a pull request with summary, testing notes, and decisions
6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not)
- **`new_task`**: Makes changes, runs tests and linters, commits and pushes after each unit of work, creates a pull request
- **`pr_iteration`**: Reads review feedback, addresses it with focused changes, commits and pushes, posts a summary comment on the PR
- **`pr_review`**: Analyzes changes read-only (no `Write` or `Edit` tools available), composes structured review findings, posts a batch review via the GitHub Reviews API
6. **Deterministic post-hooks** — verifies `mise run build` and `mise run lint`, ensures a PR exists (creates one if the agent did not). For `pr_review`, build status is informational only and the commit/push steps are skipped.
7. **Metrics** — returns duration, disk usage, turn count, cost, and PR URL

## Metrics
Expand Down Expand Up @@ -322,9 +328,16 @@ agent/
├── task_state.py Best-effort DynamoDB task status (no-op if TASK_TABLE_NAME unset)
├── observability.py OpenTelemetry helpers (e.g. AgentCore session id)
├── memory.py Optional memory / episode integration for the agent
├── prompts/ Per-task-type system prompt workflows
│ ├── __init__.py Prompt registry — assembles base template + workflow for each task type
│ ├── base.py Shared base template (environment, rules, placeholders)
│ ├── new_task.py Workflow for new_task (create branch, implement, open PR)
│ ├── pr_iteration.py Workflow for pr_iteration (read feedback, address, push)
│ └── pr_review.py Workflow for pr_review (read-only analysis, structured review comments)
├── system_prompt.py Behavioral contract (PRD Section 11)
├── prepare-commit-msg.sh Git hook (Task-Id / Prompt-Version trailers on commits)
├── run.sh Build + run helper for local/server mode with AgentCore constraints
├── tests/ pytest unit tests for pure functions and prompt assembly
├── test_sdk_smoke.py Diagnostic: minimal SDK smoke test (ClaudeSDKClient → CLI → Bedrock)
└── test_subprocess_threading.py Diagnostic: subprocess-in-background-thread verification
```
Expand Down
147 changes: 127 additions & 20 deletions agent/entrypoint.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import memory as agent_memory
import task_state
from observability import task_span
from prompts import get_system_prompt
from system_prompt import SYSTEM_PROMPT

# ---------------------------------------------------------------------------
Expand All @@ -39,6 +40,9 @@

AGENT_WORKSPACE = os.environ.get("AGENT_WORKSPACE", "/workspace")

# Task types that operate on an existing pull request.
PR_TASK_TYPES = frozenset(("pr_iteration", "pr_review"))


def resolve_github_token() -> str:
"""Resolve GitHub token from Secrets Manager or environment variable.
Expand Down Expand Up @@ -77,6 +81,9 @@ def build_config(
dry_run: bool = False,
task_id: str = "",
system_prompt_overrides: str = "",
task_type: str = "new_task",
branch_name: str = "",
pr_number: str = "",
) -> dict:
"""Build and validate configuration from explicit parameters.

Expand All @@ -94,6 +101,9 @@ def build_config(
"max_turns": max_turns,
"max_budget_usd": max_budget_usd,
"system_prompt_overrides": system_prompt_overrides,
"task_type": task_type,
"branch_name": branch_name,
"pr_number": pr_number,
}

errors = []
Expand All @@ -103,7 +113,10 @@ def build_config(
errors.append("github_token is required")
if not config["aws_region"]:
errors.append("aws_region is required for Bedrock")
if not config["issue_number"] and not config["task_description"]:
if config["task_type"] in PR_TASK_TYPES:
if not config["pr_number"]:
errors.append("pr_number is required for pr_iteration/pr_review task type")
elif not config["issue_number"] and not config["task_description"]:
errors.append("Either issue_number or task_description is required")

if errors:
Expand Down Expand Up @@ -303,15 +316,19 @@ def setup_repo(config: dict) -> dict:
repo_dir = f"{AGENT_WORKSPACE}/{config['task_id']}"
setup: dict[str, str | list[str] | bool] = {"repo_dir": repo_dir, "notes": []}

# Derive branch slug from issue title or task description
title = ""
if config.get("issue"):
title = config["issue"]["title"]
if not title:
title = config["task_description"]
slug = slugify(title)
branch = f"bgagent/{config['task_id']}/{slug}"
setup["branch"] = branch
if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"):
branch = config["branch_name"]
setup["branch"] = branch
else:
# Derive branch slug from issue title or task description
title = ""
if config.get("issue"):
title = config["issue"]["title"]
if not title:
title = config["task_description"]
slug = slugify(title)
branch = f"bgagent/{config['task_id']}/{slug}"
setup["branch"] = branch

# Mark the repo directory as safe for git. On persistent session storage
# the mount may be owned by a different UID than the container user,
Expand Down Expand Up @@ -343,9 +360,22 @@ def setup_repo(config: dict) -> dict:
cwd=repo_dir,
)

# Create branch
log("SETUP", f"Creating branch: {branch}")
run_cmd(["git", "checkout", "-b", branch], label="create-branch", cwd=repo_dir)
# Branch setup
if config.get("task_type") in PR_TASK_TYPES and config.get("branch_name"):
log("SETUP", f"Checking out existing PR branch: {branch}")
run_cmd(
["git", "fetch", "origin", branch],
label="fetch-pr-branch",
cwd=repo_dir,
)
run_cmd(
["git", "checkout", "-b", branch, f"origin/{branch}"],
label="checkout-pr-branch",
cwd=repo_dir,
)
else:
log("SETUP", f"Creating branch: {branch}")
run_cmd(["git", "checkout", "-b", branch], label="create-branch", cwd=repo_dir)

# Trust mise config files in the cloned repo (required before mise install)
run_cmd(
Expand Down Expand Up @@ -402,7 +432,11 @@ def setup_repo(config: dict) -> dict:
setup["lint_before"] = True

# Detect default branch
setup["default_branch"] = detect_default_branch(config["repo_url"], repo_dir)
# For PR tasks (pr_iteration, pr_review): use base_branch from orchestrator if available
if config.get("task_type") in PR_TASK_TYPES and config.get("base_branch"):
setup["default_branch"] = config["base_branch"]
else:
setup["default_branch"] = detect_default_branch(config["repo_url"], repo_dir)

# Install prepare-commit-msg hook for code attribution
_install_commit_hook(repo_dir)
Expand Down Expand Up @@ -620,6 +654,10 @@ def ensure_pr(
) -> str | None:
"""Check if a PR exists for the branch; if not, create one.

For ``new_task``: creates a new PR if needed.
For ``pr_iteration``: pushes commits, then resolves the existing PR URL.
For ``pr_review``: resolves the existing PR URL without pushing (read-only).

Returns the PR URL, or None if there are no commits beyond the default
branch or PR creation failed. ``build_passed`` and ``lint_passed`` control
the verification status shown in the PR body.
Expand All @@ -628,6 +666,40 @@ def ensure_pr(
branch = setup["branch"]
default_branch = setup.get("default_branch", "main")

# PR iteration/review: skip PR creation — just resolve existing PR URL
if config.get("task_type") in PR_TASK_TYPES:
if config.get("task_type") == "pr_iteration":
if not ensure_pushed(repo_dir, branch):
log("WARN", "Failed to push commits before resolving PR URL")
else:
log("POST", "pr_review task — skipping push (read-only)")
log("POST", f"{config.get('task_type')} — returning existing PR URL")
result = subprocess.run(
[
"gh",
"pr",
"view",
branch,
"--repo",
config["repo_url"],
"--json",
"url",
"-q",
".url",
],
cwd=repo_dir,
capture_output=True,
text=True,
timeout=60,
)
if result.returncode == 0 and result.stdout.strip():
pr_url = result.stdout.strip()
log("POST", f"Existing PR: {pr_url}")
return pr_url
stderr_msg = result.stderr.strip() if result.stderr else "(no stderr)"
log("WARN", f"Could not resolve existing PR URL (rc={result.returncode}): {stderr_msg}")
return None

# Check if the agent already created a PR for this branch
log("POST", "Checking for existing PR...")
result = subprocess.run(
Expand Down Expand Up @@ -1274,10 +1346,15 @@ def _on_stderr(line: str) -> None:
else:
log("WARN", "claude CLI not found on PATH")

if config.get("task_type") == "pr_review":
allowed_tools = ["Bash", "Read", "Glob", "Grep", "WebFetch"]
else:
allowed_tools = ["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"]

options = ClaudeAgentOptions(
model=config["anthropic_model"],
system_prompt=system_prompt,
allowed_tools=["Bash", "Read", "Write", "Edit", "Glob", "Grep", "WebFetch"],
allowed_tools=allowed_tools,
permission_mode="bypassPermissions",
cwd=cwd,
max_turns=config["max_turns"],
Expand Down Expand Up @@ -1482,7 +1559,13 @@ def _build_system_prompt(
overrides: str,
) -> str:
"""Assemble the system prompt with task-specific values and memory context."""
system_prompt = SYSTEM_PROMPT.replace("{repo_url}", config["repo_url"])
task_type = config.get("task_type", "new_task")
try:
system_prompt = get_system_prompt(task_type)
except ValueError:
log("ERROR", f"Unknown task_type {task_type!r} — falling back to default system prompt")
system_prompt = SYSTEM_PROMPT
system_prompt = system_prompt.replace("{repo_url}", config["repo_url"])
system_prompt = system_prompt.replace("{task_id}", config["task_id"])
system_prompt = system_prompt.replace("{workspace}", AGENT_WORKSPACE)
system_prompt = system_prompt.replace("{branch_name}", setup["branch"])
Expand Down Expand Up @@ -1513,6 +1596,14 @@ def _build_system_prompt(
memory_context_text = "\n".join(mc_parts)
system_prompt = system_prompt.replace("{memory_context}", memory_context_text)

# Substitute PR-specific placeholders
pr_number_val = config.get("pr_number", "")
if pr_number_val:
system_prompt = system_prompt.replace("{pr_number}", str(pr_number_val))
elif "{pr_number}" in system_prompt:
log("WARN", "System prompt contains {pr_number} placeholder but no pr_number in config")
system_prompt = system_prompt.replace("{pr_number}", "(unknown)")

# Append Blueprint system_prompt_overrides after all placeholder
# substitutions (avoids double-substitution if overrides contain
# template placeholders like {repo_url}).
Expand Down Expand Up @@ -1628,6 +1719,9 @@ def run_task(
system_prompt_overrides: str = "",
prompt_version: str = "",
memory_id: str = "",
task_type: str = "new_task",
branch_name: str = "",
pr_number: str = "",
) -> dict:
"""Run the full agent pipeline and return a result dict.

Expand All @@ -1652,6 +1746,9 @@ def run_task(
aws_region=aws_region,
task_id=task_id,
system_prompt_overrides=system_prompt_overrides,
task_type=task_type,
branch_name=branch_name,
pr_number=pr_number,
)

log("TASK", f"Task ID: {config['task_id']}")
Expand All @@ -1678,6 +1775,8 @@ def run_task(
prompt = hydrated_context["user_prompt"]
if hydrated_context.get("issue"):
config["issue"] = hydrated_context["issue"]
if hydrated_context.get("resolved_base_branch"):
config["base_branch"] = hydrated_context["resolved_base_branch"]
if hydrated_context.get("truncated"):
log("WARN", "Context was truncated by orchestrator token budget")
else:
Expand Down Expand Up @@ -1765,8 +1864,11 @@ def run_task(

# Post-hooks
with task_span("task.post_hooks") as post_span:
# Safety net: commit any uncommitted tracked changes
safety_committed = ensure_committed(setup["repo_dir"])
# Safety net: commit any uncommitted tracked changes (skip for read-only tasks)
if config.get("task_type") == "pr_review":
safety_committed = False
else:
safety_committed = ensure_committed(setup["repo_dir"])
post_span.set_attribute("safety_net.committed", safety_committed)

build_passed = verify_build(setup["repo_dir"])
Expand Down Expand Up @@ -1810,8 +1912,13 @@ def run_task(
# Default True = assume build was green before, so a post-agent
# failure IS counted as a regression (conservative).
build_before = setup.get("build_before", True)
build_ok = build_passed or not build_before
if not build_passed and not build_before:
if config.get("task_type") == "pr_review":
build_ok = True # Review task — build status is informational only
if not build_passed:
log("INFO", "pr_review: build failed — informational only, not gating")
else:
build_ok = build_passed or not build_before
if not build_passed and not build_before and config.get("task_type") != "pr_review":
log(
"WARN",
"Post-agent build failed, but build was already failing before "
Expand Down
Loading
Loading