Agent Runtime

The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and delivers a result — a new pull request (new_task), updates to an existing PR (pr_iteration), or structured review comments on a PR (pr_review). Runs as a Docker container with two modes:

Local mode — batch execution via run.sh with AgentCore-matching constraints (2 vCPU, 8 GB RAM)
AgentCore mode — FastAPI server on port 8080 with /invocations and /ping endpoints, deployable to AWS Bedrock AgentCore Runtime

The Docker image is built for linux/arm64 to match AgentCore Runtime requirements.

Prerequisites

Docker (with buildx for ARM64 cross-compilation if on x86)
AWS credentials with Bedrock access (Claude Sonnet)
GitHub fine-grained Personal Access Token

GitHub PAT — Minimal Permissions

Create a fine-grained PAT at GitHub > Settings > Developer settings > Personal access tokens > Fine-grained tokens.

Repository access: Select only the specific repo(s) the agent will work on.

Permission	Access	Reason
Contents	Read and write	`git clone` + `git push`
Pull requests	Read and write	`gh pr create`
Issues	Read	Fetch issue title, body, and comments for context
Metadata	Read	Granted by default

No other permissions are needed.

AWS Credentials

The agent uses Amazon Bedrock for Claude inference. You need credentials with bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream permissions.

Common ways to pass credentials into the container (when using run.sh):

Option A — Environment variables:

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..."   # if using temporary credentials
export AWS_REGION="us-east-1"

Option B — AWS CLI resolution (recommended for SSO): run.sh runs aws configure export-credentials when the AWS CLI is installed, so you can use aws sso login and optionally AWS_PROFILE without mounting ~/.aws.

Option C — Mount ~/.aws read-only (static access keys in files; SSO often does not work inside the container):

export AWS_PROFILE="my-profile"
export AWS_REGION="us-east-1"

Quick Start (Local Mode)

export GITHUB_TOKEN="ghp_..."
export AWS_REGION="us-east-1"
# Either export keys, or run `aws sso login` (and optionally AWS_PROFILE) and let run.sh resolve credentials
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

# Run against a GitHub issue
./agent/run.sh "owner/repo" 42

# Run with a task description (no issue)
./agent/run.sh "owner/repo" "Fix the login validation bug"

# Issue + additional instructions
./agent/run.sh "owner/repo" 42 "Focus on the backend validation only"

Local Mode Usage

./agent/run.sh <owner/repo> [issue_or_prompt] [extra_instructions]

The second argument is auto-detected:

If numeric (e.g., 42), it's treated as a GitHub issue number
Otherwise, it's treated as a task description

When an issue number is given, the optional third argument provides additional instructions on top of the issue context.

The run.sh script overrides the container's default CMD to run python /app/entrypoint.py (batch mode) instead of the uvicorn server.

Environment Variables

Variable	Required	Default	Description
`GITHUB_TOKEN`	Yes		Fine-grained PAT (see permissions above)
`AWS_REGION`	Yes		AWS region for Bedrock (e.g., `us-east-1`)
`AWS_ACCESS_KEY_ID`	Conditional†		Explicit keys, if you are not using CLI-based resolution
`AWS_SECRET_ACCESS_KEY`	Conditional†		Explicit keys, if you are not using CLI-based resolution
`AWS_SESSION_TOKEN`	No		For temporary credentials
`AWS_PROFILE`	No		Profile for `aws configure export-credentials` in `run.sh`, or default profile when using the `~/.aws` mount fallback
`ANTHROPIC_MODEL`	No	`us.anthropic.claude-sonnet-4-6`	Bedrock model ID
`MAX_TURNS`	No	`100`	Max agent turns before stopping
`MAX_BUDGET_USD`	No		Local batch only (shell env when running `entrypoint.py` directly). Range 0.01–100; agent stops when the budget is reached. For deployed AgentCore server mode and production tasks, set `max_budget_usd` on task creation (REST API, CLI `--max-budget`, or Blueprint default); the orchestrator sends it in the `/invocations` JSON body — server mode does not read `MAX_BUDGET_USD` from the environment.
`DRY_RUN`	No		Set to `1` to validate config and print the prompt without running the agent
`ANTHROPIC_DEFAULT_HAIKU_MODEL`	No	`anthropic.claude-haiku-4-5-20251001-v1:0`	Bedrock model ID for the pre-flight safety check (see below)

Pre-flight check model: Claude Code runs a quick safety verification using a small Haiku model before executing each tool command. On Bedrock, the default Haiku model ID may not be enabled in your account, causing the check to time out with "Pre-flight check is taking longer than expected" warnings. The agent sets ANTHROPIC_DEFAULT_HAIKU_MODEL to a known-available Bedrock Haiku model ID to avoid this. If you see pre-flight timeout warnings, verify that this model is enabled in your Bedrock model access settings.

† You need valid Bedrock credentials in the container: export keys (Option A), let run.sh inject keys from the AWS CLI after aws sso login or similar (Option B), or mount ~/.aws (Option C). run.sh also sets CLAUDE_CODE_USE_BEDROCK=1 so Claude Code uses Bedrock.

Examples

# Dry run — validate config, fetch issue, print assembled prompt, then exit
DRY_RUN=1 ./agent/run.sh "owner/repo" 42

# Run with a specific model
ANTHROPIC_MODEL="us.anthropic.claude-sonnet-4-6" ./agent/run.sh "owner/repo" 42

# Limit agent to 50 turns
MAX_TURNS=50 ./agent/run.sh "owner/repo" "Add unit tests for the auth module"

# Local batch only — cap cost (production tasks use API max_budget_usd instead)
MAX_BUDGET_USD=5 ./agent/run.sh "owner/repo" "Small refactor"

AgentCore Runtime Mode

When deployed to AgentCore Runtime (or run without CMD override), the container starts a FastAPI server on port 8080.

Container Lifecycle and Isolation

The AgentCore docs state that "each user session receives its own dedicated microVM with isolated compute, memory, and filesystem resources" and that "after session completion, the entire microVM is terminated and memory is sanitized."

To be safe, the agent isolates each task into its own workspace directory:

Task isolation via workspace: Each invocation clones the repo into /workspace/{task_id} (a unique directory per task).
Idle timeout: After ~15 minutes of no invocations, the MicroVM is terminated.
Disk accumulation: The 10 GB disk limit may apply across all invocations within the VM's lifetime.

Endpoints

GET /ping — Health check. Returns {"status": "healthy"}. Stays responsive while the agent runs.

POST /invocations — Accept a task and start the agent in a background thread. The handler returns immediately with an acceptance payload; it does not wait for the agent to finish. While the task runs, progress and the final outcome are written to DynamoDB when TASK_TABLE_NAME is set (see task_state.py); the deployed platform polls that table via the orchestrator. For ad-hoc local testing without DynamoDB, follow docker logs -f bgagent-run (or your container name).

Request payload (representative fields — the API orchestrator sends a fuller object including hydrated GitHub/issue context):

{
  "input": {
    "task_id": "9e285dba622d",
    "repo_url": "owner/repo",
    "prompt": "update the rfc issue template to add a codeowners field",
    "issue_number": "",
    "max_turns": 100,
    "max_budget_usd": 5.0,
    "model_id": "us.anthropic.claude-sonnet-4-6",
    "aws_region": "us-east-1"
  }
}

task_id — Correlates with DynamoDB and logs; if omitted for local experiments, the agent generates a short id.
model_id — Preferred key from the orchestrator; anthropic_model is also accepted.
Optional platform fields (when using the full stack) include hydrated_context, system_prompt_overrides, prompt_version, and memory_id.

All fields in input fall back to container environment variables when omitted. Secrets like GITHUB_TOKEN should be set as runtime environment variables via the CDK stack — not sent in the payload, since AgentCore logs the full request payload in plain text.

Immediate response (acceptance):

{
  "output": {
    "message": {
      "role": "assistant",
      "content": [{"text": "Task accepted: 9e285dba622d"}]
    },
    "result": {
      "status": "accepted",
      "task_id": "9e285dba622d"
    },
    "timestamp": "2026-02-20T01:00:00.000000+00:00"
  }
}

Final metrics (PR URL, cost, turns, build status, etc.) appear in container logs, in DynamoDB when configured, and in the REST API for deployed tasks (GET /v1/tasks/{task_id} via the bgagent CLI or HTTP client).

Testing Server Mode Locally

Use run.sh --server to build and start the server locally. It handles credentials, port mapping, and resource constraints automatically:

# Start server (builds image, resolves AWS creds, exposes :8080)
./agent/run.sh --server "owner/repo"

# Health check
curl http://localhost:8080/ping

# Invoke
curl -X POST http://localhost:8080/invocations \
  -H "Content-Type: application/json" \
  -d '{"input":{"prompt":"Fix the login bug"}}'

The repo URL passed to run.sh is set as a container env var, so it can be omitted from the payload. You can also start the server without a repo and pass it per-request:

./agent/run.sh --server

Invoking via the CLI

Use the bgagent CLI to submit tasks to the deployed agent through the REST API. See cli/ for build instructions.

# Configure the CLI (one-time setup using stack outputs)
bgagent configure \
  --api-url <ApiUrl> --region us-east-1 \
  --user-pool-id <UserPoolId> --client-id <AppClientId>

# Log in
bgagent login --username user@example.com

# Submit with a task description
bgagent submit --repo owner/repo --task "update the rfc issue template"

# Submit with a GitHub issue
bgagent submit --repo owner/repo --issue 42

# Iterate on a PR (address review feedback)
bgagent submit --repo owner/repo --pr 42

# Review a PR (read-only — posts structured review comments)
bgagent submit --repo owner/repo --review-pr 55

# Submit and wait for completion
bgagent submit --repo owner/repo --issue 42 --wait

For the full CLI reference, see the User guide.

Monitoring a Running Agent

The local container runs with a fixed name (bgagent-run). Open a second terminal to monitor it:

# Live agent output (follows logs in real time)
docker logs -f bgagent-run

# CPU, memory, and network usage (updates every second)
docker stats bgagent-run

# Disk usage inside the container (one-off check)
docker exec bgagent-run du -sh /workspace

# Shell into the running container to inspect files
docker exec -it bgagent-run bash

The run.sh script prints these commands when it starts.

What It Does

The agent pipeline (shared by both modes). Behavior varies by task type (new_task, pr_iteration, pr_review):

Config validation — checks required parameters
Context hydration — fetches the GitHub issue (title, body, comments) if an issue number is provided; for pr_iteration and pr_review, fetches PR context (diff, description, review comments)
Prompt assembly — combines the system prompt (behavioral contract, selected by task type from prompts/) with the issue/PR context and task description
Deterministic pre-hooks — clones repo, creates or checks out branch, configures git auth, runs mise trust, mise install, mise run build, and mise run lint
Agent execution — invokes the Claude Agent SDK via the ClaudeSDKClient class (connect/query/receive_response pattern) in unattended mode. The agent:
- Understands the codebase
- new_task: Makes changes, runs tests and linters, commits and pushes after each unit of work, creates a pull request
- pr_iteration: Reads review feedback, addresses it with focused changes, commits and pushes, posts a summary comment on the PR
- pr_review: Analyzes changes read-only (no Write or Edit tools available), composes structured review findings, posts a batch review via the GitHub Reviews API
Deterministic post-hooks — verifies mise run build and mise run lint, ensures a PR exists (creates one if the agent did not). For pr_review, build status is informational only and the commit/push steps are skipped.
Metrics — returns duration, disk usage, turn count, cost, and PR URL

Metrics

After the agent completes, a summary report is printed:

============================================================
METRICS REPORT
============================================================
  status                        : success
  agent_status                  : end_turn
  pr_url                        : https://github.com/owner/repo/pull/3
  build_passed                  : True
  cost_usd                      : 0.3598
  turns                         : 34
  duration_s                    : 312.4
  task_id                       : a1b2c3d4e5f6
  disk_before                   : 0.0 B
  disk_after                    : 487.2 MB
  disk_delta                    : 487.2 MB
============================================================

These map to AgentCore Runtime constraints:

Metric	AgentCore Limit
Docker image size	2 GB
Disk usage (clone + deps + build)	10 GB
Memory	8 GB
CPU	2 vCPU
Duration	8 hours

Building Manually

# Build for ARM64 (AgentCore Runtime target)
docker buildx build --platform linux/arm64 -t bgagent-local --load ./agent

# Check image size
docker images bgagent-local --format "{{.Size}}"

File Structure

agent/
├── Dockerfile           Python 3.13 + Node.js 20 + Claude Code CLI + git + gh + mise (default platform linux/arm64)
├── .dockerignore
├── pyproject.toml       App dependencies (claude-agent-sdk, FastAPI, boto3, OpenTelemetry distro, MCP, …)
├── uv.lock              Locked deps for reproducible `uv sync` in the image
├── mise.toml            Tool versions / tasks used when the target repo relies on mise
├── entrypoint.py        Config, context hydration, ClaudeSDKClient pipeline, metrics, run_task()
├── server.py            FastAPI — async /invocations (background thread) and /ping; OTEL session correlation
├── task_state.py        Best-effort DynamoDB task status (no-op if TASK_TABLE_NAME unset)
├── observability.py     OpenTelemetry helpers (e.g. AgentCore session id)
├── memory.py            Optional memory / episode integration for the agent
├── prompts/             Per-task-type system prompt workflows
│   ├── __init__.py      Prompt registry — assembles base template + workflow for each task type
│   ├── base.py          Shared base template (environment, rules, placeholders)
│   ├── new_task.py      Workflow for new_task (create branch, implement, open PR)
│   ├── pr_iteration.py  Workflow for pr_iteration (read feedback, address, push)
│   └── pr_review.py     Workflow for pr_review (read-only analysis, structured review comments)
├── system_prompt.py     Behavioral contract (PRD Section 11)
├── prepare-commit-msg.sh Git hook (Task-Id / Prompt-Version trailers on commits)
├── run.sh               Build + run helper for local/server mode with AgentCore constraints
├── tests/               pytest unit tests for pure functions and prompt assembly
├── test_sdk_smoke.py    Diagnostic: minimal SDK smoke test (ClaudeSDKClient → CLI → Bedrock)
└── test_subprocess_threading.py  Diagnostic: subprocess-in-background-thread verification

The container CMD runs the app under opentelemetry-instrument with uvicorn using the asyncio event loop (not uvloop), avoiding known subprocess issues with uvloop.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent Runtime

Prerequisites

GitHub PAT — Minimal Permissions

AWS Credentials

Quick Start (Local Mode)

Local Mode Usage

Environment Variables

Examples

AgentCore Runtime Mode

Container Lifecycle and Isolation

Endpoints

Testing Server Mode Locally

Invoking via the CLI

Monitoring a Running Agent

What It Does

Metrics

Building Manually

File Structure

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Agent Runtime

Prerequisites

GitHub PAT — Minimal Permissions

AWS Credentials

Quick Start (Local Mode)

Local Mode Usage

Environment Variables

Examples

AgentCore Runtime Mode

Container Lifecycle and Isolation

Endpoints

Testing Server Mode Locally

Invoking via the CLI

Monitoring a Running Agent

What It Does

Metrics

Building Manually

File Structure