Skip to content

Latest commit

 

History

History
345 lines (256 loc) · 16.2 KB

File metadata and controls

345 lines (256 loc) · 16.2 KB

Agent Runtime

The agent runtime container for ABCA. Each agent instance clones a GitHub repo, works on a task using Claude, and delivers a result — a new pull request (new_task), updates to an existing PR (pr_iteration), or structured review comments on a PR (pr_review). Runs as a Docker container with two modes:

  • Local mode — batch execution via run.sh with AgentCore-matching constraints (2 vCPU, 8 GB RAM)
  • AgentCore mode — FastAPI server on port 8080 with /invocations and /ping endpoints, deployable to AWS Bedrock AgentCore Runtime

The Docker image is built for linux/arm64 to match AgentCore Runtime requirements.

Prerequisites

  • Docker (with buildx for ARM64 cross-compilation if on x86)
  • AWS credentials with Bedrock access (Claude Sonnet)
  • GitHub fine-grained Personal Access Token

GitHub PAT — Minimal Permissions

Create a fine-grained PAT at GitHub > Settings > Developer settings > Personal access tokens > Fine-grained tokens.

Repository access: Select only the specific repo(s) the agent will work on.

Permission Access Reason
Contents Read and write git clone + git push
Pull requests Read and write gh pr create
Issues Read Fetch issue title, body, and comments for context
Metadata Read Granted by default

No other permissions are needed.

AWS Credentials

The agent uses Amazon Bedrock for Claude inference. You need credentials with bedrock:InvokeModel and bedrock:InvokeModelWithResponseStream permissions.

Common ways to pass credentials into the container (when using run.sh):

Option A — Environment variables:

export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."
export AWS_SESSION_TOKEN="..."   # if using temporary credentials
export AWS_REGION="us-east-1"

Option B — AWS CLI resolution (recommended for SSO): run.sh runs aws configure export-credentials when the AWS CLI is installed, so you can use aws sso login and optionally AWS_PROFILE without mounting ~/.aws.

Option C — Mount ~/.aws read-only (static access keys in files; SSO often does not work inside the container):

export AWS_PROFILE="my-profile"
export AWS_REGION="us-east-1"

Quick Start (Local Mode)

export GITHUB_TOKEN="ghp_..."
export AWS_REGION="us-east-1"
# Either export keys, or run `aws sso login` (and optionally AWS_PROFILE) and let run.sh resolve credentials
export AWS_ACCESS_KEY_ID="..."
export AWS_SECRET_ACCESS_KEY="..."

# Run against a GitHub issue
./agent/run.sh "owner/repo" 42

# Run with a task description (no issue)
./agent/run.sh "owner/repo" "Fix the login validation bug"

# Issue + additional instructions
./agent/run.sh "owner/repo" 42 "Focus on the backend validation only"

Local Mode Usage

./agent/run.sh <owner/repo> [issue_or_prompt] [extra_instructions]

The second argument is auto-detected:

  • If numeric (e.g., 42), it's treated as a GitHub issue number
  • Otherwise, it's treated as a task description

When an issue number is given, the optional third argument provides additional instructions on top of the issue context.

The run.sh script overrides the container's default CMD to run python /app/entrypoint.py (batch mode) instead of the uvicorn server.

Environment Variables

Variable Required Default Description
GITHUB_TOKEN Yes Fine-grained PAT (see permissions above)
AWS_REGION Yes AWS region for Bedrock (e.g., us-east-1)
AWS_ACCESS_KEY_ID Conditional† Explicit keys, if you are not using CLI-based resolution
AWS_SECRET_ACCESS_KEY Conditional† Explicit keys, if you are not using CLI-based resolution
AWS_SESSION_TOKEN No For temporary credentials
AWS_PROFILE No Profile for aws configure export-credentials in run.sh, or default profile when using the ~/.aws mount fallback
ANTHROPIC_MODEL No us.anthropic.claude-sonnet-4-6 Bedrock model ID
MAX_TURNS No 100 Max agent turns before stopping
MAX_BUDGET_USD No Local batch only (shell env when running entrypoint.py directly). Range 0.01–100; agent stops when the budget is reached. For deployed AgentCore server mode and production tasks, set max_budget_usd on task creation (REST API, CLI --max-budget, or Blueprint default); the orchestrator sends it in the /invocations JSON body — server mode does not read MAX_BUDGET_USD from the environment.
DRY_RUN No Set to 1 to validate config and print the prompt without running the agent
ANTHROPIC_DEFAULT_HAIKU_MODEL No anthropic.claude-haiku-4-5-20251001-v1:0 Bedrock model ID for the pre-flight safety check (see below)

Pre-flight check model: Claude Code runs a quick safety verification using a small Haiku model before executing each tool command. On Bedrock, the default Haiku model ID may not be enabled in your account, causing the check to time out with "Pre-flight check is taking longer than expected" warnings. The agent sets ANTHROPIC_DEFAULT_HAIKU_MODEL to a known-available Bedrock Haiku model ID to avoid this. If you see pre-flight timeout warnings, verify that this model is enabled in your Bedrock model access settings.

† You need valid Bedrock credentials in the container: export keys (Option A), let run.sh inject keys from the AWS CLI after aws sso login or similar (Option B), or mount ~/.aws (Option C). run.sh also sets CLAUDE_CODE_USE_BEDROCK=1 so Claude Code uses Bedrock.

Examples

# Dry run — validate config, fetch issue, print assembled prompt, then exit
DRY_RUN=1 ./agent/run.sh "owner/repo" 42

# Run with a specific model
ANTHROPIC_MODEL="us.anthropic.claude-sonnet-4-6" ./agent/run.sh "owner/repo" 42

# Limit agent to 50 turns
MAX_TURNS=50 ./agent/run.sh "owner/repo" "Add unit tests for the auth module"

# Local batch only — cap cost (production tasks use API max_budget_usd instead)
MAX_BUDGET_USD=5 ./agent/run.sh "owner/repo" "Small refactor"

AgentCore Runtime Mode

When deployed to AgentCore Runtime (or run without CMD override), the container starts a FastAPI server on port 8080.

Container Lifecycle and Isolation

The AgentCore docs state that "each user session receives its own dedicated microVM with isolated compute, memory, and filesystem resources" and that "after session completion, the entire microVM is terminated and memory is sanitized."

To be safe, the agent isolates each task into its own workspace directory:

  • Task isolation via workspace: Each invocation clones the repo into /workspace/{task_id} (a unique directory per task).
  • Idle timeout: After ~15 minutes of no invocations, the MicroVM is terminated.
  • Disk accumulation: The 10 GB disk limit may apply across all invocations within the VM's lifetime.

Endpoints

GET /ping — Health check. Returns {"status": "healthy"}. Stays responsive while the agent runs.

POST /invocations — Accept a task and start the agent in a background thread. The handler returns immediately with an acceptance payload; it does not wait for the agent to finish. While the task runs, progress and the final outcome are written to DynamoDB when TASK_TABLE_NAME is set (see task_state.py); the deployed platform polls that table via the orchestrator. For ad-hoc local testing without DynamoDB, follow docker logs -f bgagent-run (or your container name).

Request payload (representative fields — the API orchestrator sends a fuller object including hydrated GitHub/issue context):

{
  "input": {
    "task_id": "9e285dba622d",
    "repo_url": "owner/repo",
    "prompt": "update the rfc issue template to add a codeowners field",
    "issue_number": "",
    "max_turns": 100,
    "max_budget_usd": 5.0,
    "model_id": "us.anthropic.claude-sonnet-4-6",
    "aws_region": "us-east-1"
  }
}
  • task_id — Correlates with DynamoDB and logs; if omitted for local experiments, the agent generates a short id.
  • model_id — Preferred key from the orchestrator; anthropic_model is also accepted.
  • Optional platform fields (when using the full stack) include hydrated_context, system_prompt_overrides, prompt_version, and memory_id.

All fields in input fall back to container environment variables when omitted. Secrets like GITHUB_TOKEN should be set as runtime environment variables via the CDK stack — not sent in the payload, since AgentCore logs the full request payload in plain text.

Immediate response (acceptance):

{
  "output": {
    "message": {
      "role": "assistant",
      "content": [{"text": "Task accepted: 9e285dba622d"}]
    },
    "result": {
      "status": "accepted",
      "task_id": "9e285dba622d"
    },
    "timestamp": "2026-02-20T01:00:00.000000+00:00"
  }
}

Final metrics (PR URL, cost, turns, build status, etc.) appear in container logs, in DynamoDB when configured, and in the REST API for deployed tasks (GET /v1/tasks/{task_id} via the bgagent CLI or HTTP client).

Testing Server Mode Locally

Use run.sh --server to build and start the server locally. It handles credentials, port mapping, and resource constraints automatically:

# Start server (builds image, resolves AWS creds, exposes :8080)
./agent/run.sh --server "owner/repo"

# Health check
curl http://localhost:8080/ping

# Invoke
curl -X POST http://localhost:8080/invocations \
  -H "Content-Type: application/json" \
  -d '{"input":{"prompt":"Fix the login bug"}}'

The repo URL passed to run.sh is set as a container env var, so it can be omitted from the payload. You can also start the server without a repo and pass it per-request:

./agent/run.sh --server

Invoking via the CLI

Use the bgagent CLI to submit tasks to the deployed agent through the REST API. See cli/ for build instructions.

# Configure the CLI (one-time setup using stack outputs)
bgagent configure \
  --api-url <ApiUrl> --region us-east-1 \
  --user-pool-id <UserPoolId> --client-id <AppClientId>

# Log in
bgagent login --username user@example.com

# Submit with a task description
bgagent submit --repo owner/repo --task "update the rfc issue template"

# Submit with a GitHub issue
bgagent submit --repo owner/repo --issue 42

# Iterate on a PR (address review feedback)
bgagent submit --repo owner/repo --pr 42

# Review a PR (read-only — posts structured review comments)
bgagent submit --repo owner/repo --review-pr 55

# Submit and wait for completion
bgagent submit --repo owner/repo --issue 42 --wait

For the full CLI reference, see the User guide.

Monitoring a Running Agent

The local container runs with a fixed name (bgagent-run). Open a second terminal to monitor it:

# Live agent output (follows logs in real time)
docker logs -f bgagent-run

# CPU, memory, and network usage (updates every second)
docker stats bgagent-run

# Disk usage inside the container (one-off check)
docker exec bgagent-run du -sh /workspace

# Shell into the running container to inspect files
docker exec -it bgagent-run bash

The run.sh script prints these commands when it starts.

What It Does

The agent pipeline (shared by both modes). Behavior varies by task type (new_task, pr_iteration, pr_review):

  1. Config validation — checks required parameters
  2. Context hydration — fetches the GitHub issue (title, body, comments) if an issue number is provided; for pr_iteration and pr_review, fetches PR context (diff, description, review comments)
  3. Prompt assembly — combines the system prompt (behavioral contract, selected by task type from prompts/) with the issue/PR context and task description
  4. Deterministic pre-hooks — clones repo, creates or checks out branch, configures git auth, runs mise trust, mise install, mise run build, and mise run lint
  5. Agent execution — invokes the Claude Agent SDK via the ClaudeSDKClient class (connect/query/receive_response pattern) in unattended mode. The agent:
    • Understands the codebase
    • new_task: Makes changes, runs tests and linters, commits and pushes after each unit of work, creates a pull request
    • pr_iteration: Reads review feedback, addresses it with focused changes, commits and pushes, posts a summary comment on the PR
    • pr_review: Analyzes changes read-only (no Write or Edit tools available), composes structured review findings, posts a batch review via the GitHub Reviews API
  6. Deterministic post-hooks — verifies mise run build and mise run lint, ensures a PR exists (creates one if the agent did not). For pr_review, build status is informational only and the commit/push steps are skipped.
  7. Metrics — returns duration, disk usage, turn count, cost, and PR URL

Metrics

After the agent completes, a summary report is printed:

============================================================
METRICS REPORT
============================================================
  status                        : success
  agent_status                  : end_turn
  pr_url                        : https://github.com/owner/repo/pull/3
  build_passed                  : True
  cost_usd                      : 0.3598
  turns                         : 34
  duration_s                    : 312.4
  task_id                       : a1b2c3d4e5f6
  disk_before                   : 0.0 B
  disk_after                    : 487.2 MB
  disk_delta                    : 487.2 MB
============================================================

These map to AgentCore Runtime constraints:

Metric AgentCore Limit
Docker image size 2 GB
Disk usage (clone + deps + build) 10 GB
Memory 8 GB
CPU 2 vCPU
Duration 8 hours

Building Manually

# Build for ARM64 (AgentCore Runtime target)
docker buildx build --platform linux/arm64 -t bgagent-local --load ./agent

# Check image size
docker images bgagent-local --format "{{.Size}}"

File Structure

agent/
├── Dockerfile           Python 3.13 + Node.js 20 + Claude Code CLI + git + gh + mise (default platform linux/arm64)
├── .dockerignore
├── pyproject.toml       App dependencies (claude-agent-sdk, FastAPI, boto3, OpenTelemetry distro, MCP, …)
├── uv.lock              Locked deps for reproducible `uv sync` in the image
├── mise.toml            Tool versions / tasks used when the target repo relies on mise
├── entrypoint.py        Config, context hydration, ClaudeSDKClient pipeline, metrics, run_task()
├── server.py            FastAPI — async /invocations (background thread) and /ping; OTEL session correlation
├── task_state.py        Best-effort DynamoDB task status (no-op if TASK_TABLE_NAME unset)
├── observability.py     OpenTelemetry helpers (e.g. AgentCore session id)
├── memory.py            Optional memory / episode integration for the agent
├── prompts/             Per-task-type system prompt workflows
│   ├── __init__.py      Prompt registry — assembles base template + workflow for each task type
│   ├── base.py          Shared base template (environment, rules, placeholders)
│   ├── new_task.py      Workflow for new_task (create branch, implement, open PR)
│   ├── pr_iteration.py  Workflow for pr_iteration (read feedback, address, push)
│   └── pr_review.py     Workflow for pr_review (read-only analysis, structured review comments)
├── system_prompt.py     Behavioral contract (PRD Section 11)
├── prepare-commit-msg.sh Git hook (Task-Id / Prompt-Version trailers on commits)
├── run.sh               Build + run helper for local/server mode with AgentCore constraints
├── tests/               pytest unit tests for pure functions and prompt assembly
├── test_sdk_smoke.py    Diagnostic: minimal SDK smoke test (ClaudeSDKClient → CLI → Bedrock)
└── test_subprocess_threading.py  Diagnostic: subprocess-in-background-thread verification

The container CMD runs the app under opentelemetry-instrument with uvicorn using the asyncio event loop (not uvloop), avoiding known subprocess issues with uvloop.