Status: Active design Branch:
feature/interactive-background-agentsLast updated: 2026-04-29 (rev 6)
ABCA runs background coding agents that clone a repo, implement a task, run tests, and open a pull request. Tasks run from minutes to hours inside an isolated cloud runtime. The interaction model is asynchronous by design: users submit a task and move on; the agent works without supervision; results arrive through notifications (Slack / GitHub comment / email) and as pull requests.
This document describes the interactivity surfaces layered on top of that model — how users check in on, steer, and gate running agents without requiring a live connection to the compute substrate.
- Submit —
POST /taskswith a repo and task description. Fire-and-forget by default; the CLI returns atask_idand exits. - Status —
bgagent status <id>returns a deterministic, templated snapshot of current state (last milestone, current turn, elapsed time, cost so far). Backed by a Lambda readingTaskEventsTable; no LLM, no hallucination, no agent interruption. - Watch —
bgagent watch <id>pollsTaskEventsTablewith an adaptive interval (500 ms when events are arriving, back-off to 5 s when idle). Same endpoint used under the hood for foreground-block UX onaskand for HITL approval waits. - Nudge —
bgagent nudge <id> "<text>"writes a row intoTaskNudgesTable. The agent reads pending nudges between turns, acknowledges with anudge_acknowledgedmilestone event, and integrates the nudge on its next turn. - Ask —
bgagent ask <id> "<question>"(Phase 2) writes a question row. The agent answers at the next between-turns boundary; the answer surfaces as astatus_responseevent. CLI default is foreground block-and-poll with a spinner; task and answer are both durable if the CLI disconnects. - Approval gates — Phase 3 Cedar-driven hard gates. Agent emits
approval_requested, waits for a decision frombgagent approve/bgagent denyor a Slack button-press. Detailed design inCEDAR_HITL_GATES.md.
- Single AgentCore Runtime authenticated via IAM (SigV4) from the orchestrator Lambda. No JWT-authenticated runtime, no direct CLI-to-runtime path.
- Durable event table (
TaskEventsTable) is the one source of truth for agent progress. Every reader — CLI, Slack/GitHub/email dispatchers, status Lambda — reads from this table, never from the live agent. - Polling-only CLI. No SSE, no WebSockets. DDB eventually-consistent reads with an
event_idcursor are cheap, reliable, and compute-agnostic. - Notification plane as first-class. A FanOutConsumer Lambda subscribes to
TaskEventsTableDDB Streams and routes per-event-type to per-channel dispatcher Lambdas (Slack, email, GitHub comment). Per-channel defaults ship in v1. - Agent interaction via the hook mechanism the Claude Agent SDK provides. Nudges, asks, and approvals all use
Stop/ between-turns hooks; no mechanism outside the SDK's contract is required.
| Rev | Date | Summary |
|---|---|---|
| 6 | 2026-04-29 | Current active design. Async-only interaction model: single runtime, polling-only CLI, notification plane as first-class UX, bgagent status + bgagent watch + bgagent nudge in v1, bgagent ask + Phase 3 Cedar HITL layered on top. |
- Design goals
- Architecture overview
- Components
- Event model
- User interactions
- Notification plane
- Security and trust model
- State machine
- Error handling and observability
- Debug escape hatch
- Architectural decisions
- Implementation phases
- Open questions
- Appendix A — Claude Agent SDK reference
- Appendix B — AgentCore Runtime reference
- Appendix C — Competitive landscape
- Compute-agnostic. Nothing in the interaction surface depends on a specific compute substrate. The agent could run on AgentCore today and ECS tomorrow with no changes to the CLI or notification plane.
- Survive disconnect. Every interaction is durable in DynamoDB. A CLI crash, a closed laptop, or a flaky network never kills a task and never loses a reply.
- Fire-and-forget by default. Users submit and move on. Active observation is opt-in through
status/watch. - No UX choice at submission time. There is exactly one submit command and one observation command. Users do not pick between "resilient" and "live" when they submit.
- Notification as first-class. When the agent needs a human (approval gate, ask response, task completion), it reaches the user through their configured channel — not by hoping the user is watching a terminal.
- Token-by-token live streaming. Users want to know what step the agent is on, not what character it's typing.
- Sub-200 ms interaction latency. Human interaction in an async coding workflow is calibrated to seconds, not milliseconds.
- Transactional undo of agent actions. Tool calls are committed; the agent cannot retroactively revert a filesystem change because a user objected after the fact.
- Pair-programming / co-edit modes. A different product shape.
| Req | Covered by |
|---|---|
| R1. Users don't pick compute or observability at submission | Single submit command; TaskEventsTable is compute-agnostic |
| R2. Fire-and-forget runs independently | Orchestrator path runs without a client connection |
| R3. HITL notification when configured | approval_requested event → FanOutConsumer → Slack/email |
| R4. Users can check in + steer any time | bgagent status + bgagent watch + bgagent nudge + (Phase 2) bgagent ask |
| R5. Agent updates source context if configured | FanOutConsumer → GitHub issue-comment dispatcher (edit-in-place) |
┌─────────────────────────────────────────────────────────────────────────┐
│ CLIENT SURFACES │
│ │
│ bgagent CLI Slack bot GitHub webhook Web UI (future) │
│ │ │ │ │ │
│ └─────────────┴────────────────┴────────────────────┘ │
│ │ │
└────────────────────────────────┼────────────────────────────────────────┘
│ REST (Cognito JWT or HMAC webhook)
▼
┌──────────────────────────────────────────────┐
│ API Gateway (v1) │
│ │
│ POST /tasks submit │
│ GET /tasks/{id} status-api │
│ GET /tasks/{id}/events watch │
│ DELETE /tasks/{id} cancel │
│ POST /tasks/{id}/nudge nudge │
│ POST /tasks/{id}/asks ask (P2) │
│ POST /tasks/{id}/approvals approve P3 │
│ POST /webhooks/tasks GH webhook │
└───────────┬──────────────────────────────────┘
│
┌────────────┼───────────────┬───────────────────────┐
▼ ▼ ▼ ▼
SubmitTaskFn CLI-read Fns Nudge/Ask/Approve Webhook Fn
│ (status/events) write Fns │
│ │ │ │
│ async │ read │ write │ async
│ invoke │ │ │ invoke
▼ ▼ ▼ ▼
OrchestratorFn OrchestratorFn
│ │
│ admission check │
│ InvokeAgentRuntime (SigV4) │
▼ ▼
┌─────────────────────────────────────────────────────────────┐
│ AgentCore Runtime — single IAM-authed │
│ (agent container: pipeline, runner, hooks) │
└──┬────────────────┬───────────────┬──────────────────────┬──┘
│ writes │ reads │ reads │ reads
▼ ▼ ▼ ▼
TaskEvents TaskTable TaskNudges TaskApprovals
Table (state) Table Table (P3)
│ ▲
│ DDB Stream (NEW_IMAGE) │
▼ │
FanOutConsumer (router) │
│ │
├─→ SlackDispatchFn ──▶ Slack Web API │
├─→ EmailDispatchFn ──▶ SES │
└─→ GitHubDispatchFn ──▶ GitHub REST (edit-in-place) │
│ │
│ action-button callback │
│ (approve/deny) │
└─────────────────────────┘
Key properties:
- One write path in, one read path out. Every durable agent signal lands in
TaskEventsTable(orTaskTablefor state transitions). Every consumer reads from there. - Orchestrator is the only substrate-aware component. Replace
InvokeAgentRuntimewithecs:RunTaskand the CLI + notification plane don't notice. - No client holds a live connection to the agent.
watchis a polling loop againstTaskEventsTable, not a stream from the runtime.
A single AgentCore Runtime, invoked via bedrock-agentcore:InvokeAgentRuntime with SigV4 from the orchestrator Lambda. No JWT authorizer; no direct CLI access.
- Input: task payload from orchestrator (task_id, repo, task_description, optional initial_approvals, optional trace_flag)
- Output: none via response stream — the runtime is invoked fire-and-forget. All observable state flows through
TaskEventsTableandTaskTable. - Lifecycle:
idleRuntimeSessionTimeoutandmaxLifetimeboth set to 8 hours (AgentCore max). A running task holds the session; an idle runtime is evicted by AgentCore. - Compute substitutability: replacing this with ECS/Fargate is a change confined to the orchestrator + the AgentCore Runtime CDK construct. Nothing else in the system observes the difference.
Durable-execution Lambda that owns the task lifecycle from submission to terminal state.
Responsibilities:
- Admission control — atomic DDB conditional update on
UserConcurrencyTable(active_count < max); reject with 429 if over quota. - State transition
SUBMITTED → HYDRATING → RUNNING → FINALIZING → terminal. - Invocation — calls
InvokeAgentRuntimewith SigV4. - Poll loop — waits for the agent to land a terminal status in
TaskTable; enforces heartbeat watchdog; transitions toFAILEDif the container dies. - Finalize — TTL + concurrency decrement + synthesized terminal event.
Hydration (blueprint merge, repo config, PAT retrieval, prompt assembly) is targeted to live inside the agent container at startup, not in the orchestrator. This keeps the orchestrator thin, lets heavy I/O fail inside a durable 8 h runtime rather than a 15 min Lambda, and gives the runtime container the IAM it needs for those reads anyway.
Status (2026-04-30): the rev-6 PR ships with hydration still in the orchestrator Lambda for scope reasons — moving it is pure architectural relocation with no user-visible delta and a ~2,700 lines porting surface (TypeScript → Python with new boto3 clients and a GraphQL GitHub path). Tracked as AD-11 carry-forward in upstream issue #53 — current plan is a hybrid split: keep lightweight preflight in the orchestrator, move heavy I/O hydration to the container. Contract drift during the deferral window is bounded by the
SUPPORTED_HYDRATED_CONTEXT_VERSIONversion gate inagent/src/models.py.
Validates a submission, writes the TaskRecord with status=SUBMITTED, emits a task_created event, and async-invokes OrchestratorFn.
- Single path for all tasks. No
execution_modebranching. - Works identically for CLI-initiated submissions (Cognito JWT) and webhook-initiated submissions (HMAC, after the webhook authorizer).
The durable event spine. PK = task_id, SK = event_id (ULID), TTL enabled, DDB Streams enabled (NEW_IMAGE).
Writers:
ProgressWriter(inside the agent container) — per tool call, per turn, per milestone, cost updates, errors.OrchestratorFn—task_created,hydration_*,session_started,task_*,preflight_*,admission_rejected,guardrail_blocked.- Cancel/reconciler handlers —
task_cancelled,task_stranded.
Readers:
get-task-eventsLambda (backsbgagent watchandbgagent events).bgagent statusLambda (templated snapshot).FanOutConsumer(stream-subscribed; see §6).
Cost profile is negligible: eventually-consistent queries with a cursor return ~0.5 RCU per page. 50 simultaneous watchers polling every 2 seconds is pennies per active hour.
Task state machine: SUBMITTED → HYDRATING → RUNNING → FINALIZING → {COMPLETED, FAILED, CANCELLED, TIMED_OUT} with Phase 3 adding AWAITING_APPROVAL.
Writers: create-task, orchestrator, cancel, agent pipeline (terminal write), reconcilers. Transitions are conditional DDB writes; illegal transitions are rejected.
PK = task_id, SK = nudge_id. A row represents a pending user steering message.
Producer: POST /tasks/{id}/nudge handler (after ownership check, guardrail scan, and rate-limit conditional update).
Consumer: agent between-turns hook reads pending nudges, emits nudge_acknowledged milestone, and injects the nudge text into the next turn via decision: "block".
Phase 3 approval-request spine. Detailed schema in CEDAR_HITL_GATES.md. Semantics summary:
- Agent writes an approval row with the request context.
- Agent transitions
RUNNING → AWAITING_APPROVALand enters a poll loop. - User responds via REST (
POST /tasks/{id}/approvals/{request_id}) or via a Slack button dispatched by the notification plane. - On decision, agent transitions back to
RUNNING; denial reasons are injected as Stop-hook steering on the next turn.
Lambda subscribed to TaskEventsTable DDB Streams (ParallelizationFactor: 1, preserving per-task_id ordering by shard). Reads per-task notification config (from TaskTable metadata or RepoTable defaults), filters events by channel subscription, and invokes per-channel dispatcher Lambdas.
- SlackDispatchFn — posts to configured channel / DM. Includes action buttons for
approval_requiredevents. - EmailDispatchFn — SES.
- GitHubDispatchFn — edits a single GitHub issue comment in place via
PATCH /repos/{o}/{r}/issues/comments/{id}. On 404 (comment deleted upstream) falls back to POSTing a fresh comment. Per-task ordering is guaranteed upstream by DDB StreamParallelizationFactor: 1, so no conditional-request header is needed (and GitHub's REST API does not acceptIf-Matchon this endpoint — see §6.4).
Detailed routing and default filters in §6.
Two scheduled Lambdas that backstop the state machine:
- Stranded-task reconciler (every 5 min) — catches tasks stuck in non-terminal states past a unified timeout (
STRANDED_TIMEOUT_SECONDS=1200default). CoversOrchestratorFnasync-invoke crashes and container crashes. Transitions stuck tasks toFAILEDwith atask_strandedevent. - Concurrency reconciler (every 15 min) — recomputes
active_countper user by querying theUserStatusIndexGSI and corrects drift inUserConcurrencyTable.
Commands:
submit— fire-and-forget; returnstask_id.status <id>— templated snapshot.watch <id>— adaptive polling loop.events <id>— raw event stream (debug).nudge <id> "<text>"— steer.cancel <id>— stop the task.ask <id> "<question>"(Phase 2) — ask the agent a question.approve <id>/deny <id>/pending/policies(Phase 3) — HITL.
Authentication: Cognito User Pool ID token in Authorization header for all REST calls. Token caching in ~/.bgagent/credentials.json with auto-refresh.
TaskEventsTable row:
| Type | Emitted by | Meaning |
|---|---|---|
task_created |
SubmitTaskFn | New task accepted |
hydration_started / hydration_completed |
Agent startup | Blueprint + repo config loaded |
session_started |
Orchestrator | AgentCore session established |
agent_turn |
Runner | One model-roundtrip completed; includes turn number, model, thinking preview |
agent_tool_call |
Runner / PreToolUse hook | About to invoke a tool |
agent_tool_result |
Runner / PostToolUse hook | Tool returned |
agent_milestone |
Agent code (pipeline, hooks) | Named checkpoint (repo_cloned, pr_opened, nudge_acknowledged, ...) |
agent_cost_update |
Runner | Cumulative token + dollar cost |
agent_error |
Runner | Handled exception |
approval_required (P3) |
PreToolUse Cedar hook | Cedar policy requires user decision |
approval_decided (P3) |
Approve/Deny Lambda | User responded |
status_response (P2) |
Between-turns hook | Agent answered an ask |
nudge_acknowledged |
Between-turns hook | Agent saw a nudge before incorporating it |
pr_created |
Pipeline | PR opened for the task |
task_completed / task_failed / task_cancelled / task_stranded |
Orchestrator / reconciler | Terminal |
Named milestones (pr_created, nudge_acknowledged, repo_setup_complete, …) are written as agent_milestone events with metadata.milestone carrying the name. The fan-out router unwraps an allowlisted subset (§6.2) so per-channel default filters can target milestone names directly (e.g. GitHub's default set includes pr_created); unlisted milestone names stay wrapped and do not route. The watch CLI renders all milestones regardless of the allowlist.
Text fields (thinking, tool input, tool output, error details) are truncated to 200 characters by default to keep event rows small. The --trace flag raises the cap to 4 KB and additionally writes a full trajectory to S3 (see §10).
Consumers page TaskEventsTable using event_id as a cursor: KeyConditionExpression: task_id = :id AND event_id > :cursor, ConsistentRead: true. ULID sort order is time-monotonic, so lexical comparison gives time ordering.
$ bgagent submit --repo org/repo "fix the auth timeout bug"
task submitted: abc123
Writes TaskRecord, fires orchestrator, returns. The CLI does not wait. --wait flag is available for scripting (blocks until terminal state, returns non-zero on failure).
Deterministic, templated snapshot. No LLM.
$ bgagent status abc123
Task abc123 — RUNNING (3m 14s elapsed)
Repo: org/repo
Turn: 7 / ~12
Last milestone: nudge_acknowledged (42s ago)
Current: Bash tool call
Cost: $0.18 / budget $2.00
Last event: 2026-04-29T15:30:12Z
Implementation:
- Lambda reads the last N events from
TaskEventsTable+ currentTaskRecord. - Renders from a fixed template. Never calls an LLM. Never hallucinates.
- Fast (<200 ms P95), free, safe to call repeatedly.
Polling loop against GET /tasks/{id}/events?after=<cursor> with adaptive interval:
- Start at 500 ms.
- If a poll returns ≥1 event, keep at 500 ms.
- If a poll returns 0 events, back off: 1 s, 2 s, 5 s (cap).
- Reset to 500 ms on the next event.
Renders events as they arrive. Exits on terminal status. Cursor is the last event_id seen.
Cost profile: 50 simultaneous watchers × ~0.5 RCU per empty poll × 5 s intervals when idle ≈ negligible.
$ bgagent nudge abc123 "also fix the logging module, separate commit"
nudge queued: nudge_01JX...
Flow:
- CLI
POST /tasks/{id}/nudge→ rate-limit conditional update +PutIteminTaskNudgesTable. - Agent's Stop hook fires between turns. Calls
nudge_reader.read_pending(task_id)— returns all pending nudges for this task (concatenated into one<user_nudge>block if multiple). - Hook emits
nudge_acknowledgedmilestone toProgressWriterbefore returning to the SDK. User sees this event immediately viawatchor Slack. - Hook returns
{"decision": "block", "reason": <formatted_nudge_text>}. The SDK treats this as the start of the next user turn; the agent incorporates the nudge on its response. - Nudge row is marked consumed via conditional update (
consumed_atset only if currently null).
Cost model — honest: the nudge burns one turn from the task's max_turns budget. The acknowledgment rides in the same turn (the combined-turn ack pattern). This is the only mechanism the Claude Agent SDK exposes for injecting user-visible text mid-run; there is no "append to system prompt mid-conversation" API (see Appendix A).
Ask the agent a natural-language question that requires its own reasoning. Always burns a turn. Always has latency (bounded above by the agent's current turn duration, which can be minutes).
CLI default: foreground block-and-poll with a spinner.
$ bgagent ask abc123 "why did you change the retry logic?"
⠋ queued as ask_01JX... — waiting for agent
⠙ agent is running tool: Bash (turn 7/~12) — 42s elapsed
✓ agent responded (1m 14s)
The existing retry used exponential backoff with no jitter, causing thundering
herd under load. Added jitter to spread retries across the window.
Flow:
- CLI
POST /tasks/{id}/asks→{ask_id, cursor}. - CLI polls
GET /events?after=<cursor>&type=status_response&correlation_id=<ask_id>with adaptive interval. - Spinner renders last
agent_turn/agent_tool_callso the user sees the agent is alive. - Agent's between-turns hook reads the pending ask, injects it as a user turn via
decision: "block", agent answers, hook emitsstatus_response{ask_id, content, turn}. - CLI prints the response and exits.
Flags:
- default → foreground block
--no-wait→ returnsask_idimmediately; response delivered via Slack/watch--timeout N→ override default 5 min (hard cap 10 min)
Durability: the ask row lives in DDB regardless of CLI state. If the user Ctrl-Cs or the terminal closes, the ask still executes; the response is retrievable via bgagent asks show <ask_id>, bgagent watch, or Slack.
Rate limit: 1 open ask per task per user (429 otherwise). Forward-compatible with multi-user team scenarios.
HITL approval commands. All flows are REST + DDB; no streaming. Detailed design in CEDAR_HITL_GATES.md. Summary:
- Agent emits
approval_requiredwith the tool context. - Notification plane dispatches the event (Slack with action buttons, email, GitHub).
- User responds via
bgagent approve <id>,bgagent deny <id> --reason "…", or Slack button click. - Agent's poll loop sees the decision and proceeds or deny-steers.
Writes cancellation_requested flag on TaskRecord; agent's between-turns hook checks it and terminates. Agent's PR-short-circuit logic commits partial work before exit.
TaskEventsTable ──DDB Stream──▶ FanOutConsumer
│
│ reads notification config
│ (per-task or per-repo)
│
┌─────────────┼─────────────┐
▼ ▼ ▼
SlackDispatch EmailDispatch GitHubDispatch
│ │ │
Slack Web API SES GitHub REST API
- Single Lambda subscribes to the DDB Stream. Stateless; fails-forward into SQS DLQ on per-event errors.
ParallelizationFactor: 1on the event-source mapping → per-task_idshard ordering preserved for free.- Router reads per-task notification config (channel enablement + event-type filters), then invokes the relevant dispatcher Lambda(s) per event.
- Dispatchers are separate Lambdas so a GitHub API outage doesn't block Slack notifications.
| Channel | Default subscribed events | Opt-in via --verbose |
|---|---|---|
| Slack | task_completed, task_failed, task_cancelled, pr_created, agent_error, approval_required, status_response |
adds agent_milestone |
task_completed, task_failed, approval_required |
— | |
| GitHub issue comment | pr_created, terminal status (single edit-in-place comment) |
— already minimal |
Rationale: if Slack pings on every milestone, users mute the bot within days. Default to the minimal set that surfaces decision-requiring events and completion; power users opt into verbose streams.
approval_required events delivered to Slack include Approve / Deny action buttons. On click, Slack invokes an interaction callback Lambda which writes to TaskApprovalsTable via the same POST /approvals path the CLI uses. This gives the common case (reviewer in Slack, not at a terminal) a one-click response path.
A single comment per task, edited in place as the agent progresses (terminal states + pr_created by default).
Concurrency: Per-task_id ordering is guaranteed upstream by DDB Streams on TaskEventsTable with ParallelizationFactor: 1, and the fanout Lambda is the only writer on its own comment, so concurrent edits of the same comment body are not possible — last-writer-wins is safe because there is no concurrent writer to lose to. The dispatcher issues a single PATCH per event (no GET round-trip, no conditional headers). If the comment has been deleted upstream (404), it falls back to POSTing a fresh comment.
Tolerated races (bounded, logged, not silenced):
- Persist failure after successful POST — if the GitHub POST succeeds but the subsequent
TaskTableUpdateItem that persistsgithub_comment_idfails non-benignly (DDB throttling, IAM deny, etc.), the next event for the same task re-POSTs a second comment. Bounded to at most one duplicate per task per failure window (the per-invocation cap stops runaway). Logged at ERROR witherror_id: FANOUT_GITHUB_PERSIST_FAILEDso operators can alarm and reconcile. A sweeper that matches on thebgagent:task-id=marker body prefix is a post-v1 follow-up. - 404 → POST race between sibling invocations — if the previously-posted comment was deleted upstream and two consecutive fanout invocations independently re-POST before either persists the new id, both POSTs land. The UpdateItem uses
ConditionExpression: github_comment_id = :prevso only the first persist wins; the sibling'ssaveCommentStatesurfaces a benignConditionalCheckFailedExceptionat INFO and the sibling's comment survives on GitHub as an orphan (thebgagent:marker makes it reconcilable offline). - Transient
loadTaskForCommentfailure — if the task record's GetItem fails transiently,routeEvent'sPromise.allSettledrecords the dispatcher as rejected and the batch continues. No write lands. The event is effectively dropped; the next event (e.g.task_completedafterpr_created) will render the current task state.
Legacy field: A previous revision persisted github_comment_etag on the TaskRecord. That field is no longer written or read; items that still carry it from earlier deploys are ignored by the DocumentClient (fields not declared on the typed surface pass through untouched). No migration required.
Why not ETag / If-Match: An earlier revision attempted optimistic concurrency via GitHub's ETag and If-Match. In-account validation (PR #52 Scenario 7-extended) proved this does not work: GitHub's REST API rejects conditional-request headers on PATCH /issues/comments/{id} with HTTP 400 "Conditional request headers are not allowed in unsafe requests unless supported by the endpoint". The ETag returned on GET is a cache validator only; the write endpoint does not honor it. Upstream ordering via the DDB-Stream configuration above is sufficient on its own.
Submitted with the task (optional) or resolved from repo defaults:
{
"notifications": {
"slack": { "enabled": true, "channel": "#coding-agents", "events": ["default"] },
"email": { "enabled": true, "events": ["approval_required", "task_failed"] },
"github": { "enabled": true, "events": ["default"] }
}
}"default" resolves to the v1 per-channel defaults above.
| Surface | Auth | Notes |
|---|---|---|
| CLI → REST API (all endpoints) | Cognito JWT (ID token) | Managed by User Pool |
GitHub webhook → POST /webhooks/tasks |
HMAC-SHA256 via request authorizer | Shared secret in Secrets Manager |
| OrchestratorFn → AgentCore Runtime | SigV4 (IAM) | Lambda execution role |
| Agent container → AWS APIs (DDB, S3, Bedrock) | SigV4 via runtime's execution role | Scoped per-runtime |
| Slack button → interaction callback | Slack signing secret | Standard Slack pattern |
- Ownership check: the Lambda verifies
user_id(from Cognito claims) matches the task'suser_idbefore accepting the nudge. - Rate limit: 10 nudges per task per minute (conditional update on a
RATE#<task>#MINUTE#<bucket>row). - Size cap: 2 KB per nudge.
- Guardrail pre-screen: Bedrock guardrail scans nudge text for prompt-injection patterns before persisting.
- Ownership check on approve/deny.
- Atomic state transition via
TransactWriteItems(approval row + TaskTable status). - Recent-decision cache (60 s) prevents retry-loop storms.
- Denial reason sanitized by the DenyTaskFn Lambda (Bedrock output scanner) before persisting.
- Previews truncate to 200 chars → low risk of accidental secret capture in common cases.
- Agent-side output scanners redact secrets before calling
ProgressWriter. --traceflag opts into larger previews + S3 trajectory dumps; S3 objects are written to a user-scoped prefix with short TTL.
SUBMITTED ──▶ HYDRATING ──▶ RUNNING ──▶ FINALIZING ──▶ COMPLETED
│ │ │ │
│ │ │ └──▶ FAILED
│ │ │ └──▶ TIMED_OUT
│ │ └──▶ CANCELLED
│ │ └──▶ AWAITING_APPROVAL (P3)
│ └──▶ FAILED (stranded)
└──▶ FAILED (stranded)
RUNNING ──▶ AWAITING_APPROVAL ──▶ RUNNING (approve or deny-with-steering)
│
├──▶ CANCELLED (explicit cancel)
└──▶ FAILED (stranded reconciler catches abandoned approval)
The AWAITING_APPROVAL state holds the user's concurrency slot (paused but alive). See CEDAR_HITL_GATES.md for full semantics.
- Every transition is a conditional DDB write:
#status = :fromStatus. - Illegal transitions are rejected at the storage layer (not enforced in code).
- The valid-transition table lives in
cdk/src/handlers/shared/task-status.ts.
| Component | Failure posture | Rationale |
|---|---|---|
ProgressWriter |
Fail-open (3-strike circuit breaker) | Event telemetry must never crash the task |
| Nudge/ask rate-limit conditional update | Fail-closed (return 429) | Accurate throttling is a product guarantee |
| Cedar policy evaluation | Fail-closed (treat as DENY) | Security-critical; unknown outcome = deny |
| Approval poll DDB read | Fail-open with tolerance (10 consecutive failures → TIMED_OUT) | Tolerate transient DDB errors; fail closed on sustained |
| Notification dispatcher | Fail-open (log + DLQ) | A Slack outage must not block the agent |
Every log line, event, and metric carries task_id. CloudWatch Logs Insights queries across all Lambdas on task_id = "abc123" give the full cross-component picture.
Each component emits OTEL traces with task_id as a baggage item. OrchestratorFn starts the root span; AgentCore runtime continues it via env-var propagation; Lambdas downstream of DDB Streams resume from the event's traceparent attribute.
CloudWatch dashboard shows, per task:
- State transitions timeline
- Event rate by type
- Cost accumulation
- Concurrency slot utilization
Currently deferred — no operational notification channel exists for this project beyond Slack/email user-facing notifications. When an ops channel is added (SNS/PagerDuty), the alarm plumbing is a small follow-up; the metric data is already flowing.
Without live streaming, a developer debugging a misbehaving agent needs a richer offline view than the default 200-char event previews. The --trace flag:
$ bgagent submit --trace "fix the auth bug"
Changes for a trace-enabled task:
ProgressWriterpreview truncation raised from 200 chars → 4 KB.- Full agent trajectory (SDK message log, tool I/O, hook callbacks) written to S3 on task completion.
- A
trajectory_uploadedmilestone event with the S3 URI is emitted; the CLI can surface it at the end ofwatchorstatus.
Storage:
- S3 prefix:
s3://<bucket>/traces/<user_id>/<task_id>.jsonl.gz. - TTL: 7 days (lifecycle policy).
- Pre-signed URLs available via
bgagent trace download <task_id>.
- Reproducible failure modes during development.
- Customer-reported "agent did the wrong thing" incidents.
- Reward-hacking / hallucination audits.
Not intended for routine observability — that's what watch and notifications are for.
Short summaries of the load-bearing choices. Each decision is phrased as the chosen option; rationales are concise.
Exactly one runtime, invoked via SigV4 from the orchestrator. The CLI never talks directly to the runtime.
Why: Compute-substrate portability (ECS/Fargate swap requires only orchestrator changes); simpler auth; one runtime to operate and observe. Direct CLI-to-runtime paths would reintroduce substrate coupling and force a choice between live-stream and durability at submission time.
bgagent watch / bgagent status / bgagent ask all use REST-polling against TaskEventsTable with an adaptive interval. No SSE. No WebSockets.
Why: Human-scale interaction latency (seconds) is well-served by polling; DDB costs are trivial; no streaming infrastructure to build, operate, or secure. Cursor, GitHub Copilot coding agent, and Codex all use the same pattern.
Every durable signal from the agent flows through this table. Every consumer reads from it.
Why: Decouples the agent from every consumer. CLI, Slack bot, GitHub integration, and any future web UI all read the same substrate without touching the runtime.
FanOutConsumer routes events per-channel with sensible defaults shipping in v1.
Why: In an async product, notifications are the primary UX. Shipping without defaults would cause users to mute integrations on day one.
The between-turns hook emits a nudge_acknowledged milestone to ProgressWriter before returning decision: "block" with the nudge text. One turn burned (same as today); acknowledgment visible immediately.
Why: The Claude Agent SDK does not expose a mechanism to append to system context mid-conversation. The HookEvent enum is fixed; ClaudeAgentOptions.system_prompt is construction-time only; hookSpecificOutput.additionalContext is user-visible-only (confirmed not-planned by Anthropic). One-turn-per-nudge is an architectural constraint of the SDK; we surface it honestly rather than pretending it's free.
status = templated Lambda reading TaskEventsTable. ask = a real question to the agent, always costs a turn, always has latency.
Why: Users understand dashboard reads vs. questions-to-a-thinking-entity. One command per contract is clearer than one command with a flag that silently changes execution model.
Default UX blocks with a spinner showing current agent activity. Durable underneath — CLI disconnect does not cancel the ask or lose the answer.
Why: Matches user expectation of a synchronous CLI call. Survives a closed laptop. Spinner surfaces the bounded-but-non-trivial latency (turns can take minutes) without feeling like a hang.
Phase 3 ships hard gates. No soft questions, no "proceed with default if no response" semantics.
Why: Soft-question-with-timeout creates a ticking-clock UX that's actively hostile in an async workflow. "Gate or no gate" is the coherent choice. A future effect: "advise" tier (non-blocking FYI events, no timeout) is documented in the Phase 3 design as post-v1.
DDB Streams on TaskEventsTable with ParallelizationFactor: 1 give per-task_id ordering. The fanout Lambda is the only writer on its own comment, so no concurrent writer exists to race — last-writer-wins is safe. The dispatcher PATCHes directly (no GET-then-PATCH, no conditional headers).
Why: Simpler than SQS FIFO (no queue, no DLQ, no per-group throughput ceiling), and lower latency than a GET-then-PATCH round-trip.
Rejected alternative — If-Match ETag: An earlier revision of this design used optimistic concurrency via GitHub's ETag. Deploy-validation (PR #52 Scenario 7-extended) proved that PATCH /issues/comments/{id} rejects If-Match with HTTP 400 ("Conditional request headers are not allowed in unsafe requests unless supported by the endpoint"). The ETag returned on GET is a cache validator only. Upstream DDB-Stream ordering makes the ETag unnecessary anyway.
One timeout value covers all stranded cases (orchestrator crash, container crash, general abandonment).
Why: The interactive-specific timeout disappeared along with the interactive path. One reconciler, one threshold, easier to reason about.
Blueprint merging, repo config, PAT retrieval, and prompt assembly are targeted for the agent container at startup, not the orchestrator Lambda.
Why: Hydration artifacts (cloned repos, merged blueprints, rendered prompts) are large and only needed inside the runtime. Failures belong inside the durable 8 h runtime rather than a 15 min Lambda. The runtime already has the IAM it needs for those reads. Industry precedent (Cursor background agents, GitHub Copilot coding agent, Devin, Temporal's activity-worker pattern, LangGraph's queue-worker split) converges on worker-side hydration for long-running async agents.
Target shape — hybrid split: keep the cheap preflight in the orchestrator (PAT validity check, repo-existence check, guardrail screen on the raw task_description) so we still fail fast before burning an AgentCore compute slot. Move the heavy I/O hydration (GitHub issue / PR fetch including review threads, prompt assembly, Memory retrieval, S3 blueprint reads) into the agent container.
Status (2026-04-30): deferred to a follow-up PR, tracked at upstream issue #53. Rev-6 ships with full hydration still in the orchestrator Lambda. Reasons: (a) pure architectural relocation with no user-visible change, (b) ~2,700 lines porting surface (1,190 LOC of context-hydration.ts + 1,514 LOC of tests) requiring new boto3 surfaces in the container and a GraphQL GitHub client, (c) PR #52 already ships 10,000+ lines of changes across the SSE removal — folding in hydration would blur the review narrative. The Pydantic SUPPORTED_HYDRATED_CONTEXT_VERSION gate in agent/src/models.py bounds drift risk during the deferral window.
Opt-in per task: 4 KB previews + full trajectory to S3 with TTL.
Why: Without live streaming, debugging needs a richer offline artifact. Opt-in keeps normal-task storage costs flat.
- Single orchestrator path; delete all direct-SSE / two-runtime / interactive-mode infrastructure
bgagent status(deterministic)bgagent watchwith adaptive polling intervalbgagent nudgewith combined-turn acknowledgment- FanOutConsumer router + per-channel default filters
- GitHub edit-in-place dispatcher (DDB-Stream ordering, 404 → POST fallback)
- Stub Slack/email dispatchers (log-only, ready for real integration in Phase 2)
- Unified stranded-task reconciler timeout
--tracedebug flag
bgagent askend-to-end (REST, agent-side between-turns hook, foreground block-and-poll CLI, durability-on-disconnect)- Real Slack dispatcher (webhook + action buttons → approval callback Lambda)
- Per-task notification config +
bgagent notifications configure
- Hard-gate approval gates with Cedar policy evaluation
bgagent approve/deny/pending/policiesAWAITING_APPROVALstate + orchestrator handling- Full design in
CEDAR_HITL_GATES.md
- Real email dispatcher (SES)
- Real GitHub dispatcher (beyond the v1 edit-in-place stub)
- Per-repo default notification config
--verboseopt-in for milestone-level events- Dashboard widgets for notification delivery health
- LLM-synthesized status summary —
bgagent askwithout targeting the agent; Lambda calls an LLM to narrate state. Cost + hallucination trade-offs; revisit if v1 feedback warrants. - Cedar
effect: "advise"tier — non-blocking FYI policy tier for post-v1. Design sketch inCEDAR_HITL_GATES.md. - Outbound WebSocket from agent — only if a concrete sub-200 ms latency requirement surfaces. Agent-initiated egress avoids dual-auth problems and works on any compute.
- Multi-user watch — multiple users attached to the same task's live event stream (teams).
| ID | Question | Owner |
|---|---|---|
| Q1 | Retention policy for --trace S3 artifacts — 7 days or 30? Size cap per user? |
Design |
| Q2 | Should bgagent pending (Phase 3) show all pending approvals across all of a user's tasks, or filter to a single task_id? |
Phase 3 impl |
| Q3 | Slack action button callbacks — Slack signing secret rotation strategy? | Phase 2 impl |
| Q4 | Per-repo default notification config precedence vs per-task overrides — does per-task always win? Partial overrides? | Phase 4 impl |
| Q5 | bgagent ask concurrent limit — do we expose --queue semantics to explicitly enqueue vs 429? |
Phase 2 impl |
Pinned version: claude-agent-sdk==0.1.53 (Python).
HookEvent enum: PreToolUse | PostToolUse | PostToolUseFailure | UserPromptSubmit | Stop | SubagentStart | SubagentStop | PreCompact | PermissionRequest | Notification.
Our usage:
PreToolUse→ Cedar policy evaluation (Phase 3),can_use_tool-style allow/deny.PostToolUse→ output scanner (secret/PII redaction).Stop(between-turns) →_cancel_between_turns_hook,_nudge_between_turns_hook, Phase 2 ask hook, Phase 3 approval poll.
Stop hook return values:
{}→ no-op, SDK proceeds to stop or loop.{"decision": "block", "reason": "<text>"}→ SDK emitsreasonas a synthetic user turn; agent responds on its next iteration.
This is the only SDK-supported mechanism to inject agent-visible text mid-conversation. Implications:
- Every nudge, ask, and deny-with-steering burns one turn from
max_turns. - No "append to system prompt mid-run" primitive exists.
ClaudeAgentOptions.system_promptis set at construction. hookSpecificOutput.additionalContexton PostToolUse appears in docs but does not reach the model's context; Anthropic has confirmed this asnot-planned(GitHub issuesclaude-code#18427,claude-code#19643).
ClaudeSDKClient.interrupt() cancels the current turn without rolling back prior tool results. Used in our cancel path along with cancellation_requested flag on TaskRecord.
- HTTP on port 8080:
/invocations(JSON + optional SSE response),/ping(liveness). /pingreturning"HealthyBusy"signals an active session and prevents idle eviction.maxLifetimeandidleRuntimeSessionTimeoutboth configurable up to 8 hours. We set both to the maximum.
bedrock-agentcore:InvokeAgentRuntime — SigV4-authenticated API call from the orchestrator. Payload is the task context; response is ignored (fire-and-forget).
Same runtimeSessionId routes to the same MicroVM within the same runtime ARN. We use this property for the agent's own internal resumability (re-invocation with the same session ID lands on the same container if it's still alive), but never for CLI→runtime direct attach (which we don't do).
Products surveyed for interaction patterns (primary sources: product docs, engineering blogs):
| Product | Interaction model | Notes |
|---|---|---|
| Devin (Cognition) | Slack-thread chat during execution; fully async notifications | Closest analog; mid-run Q&A via in-thread messages is a shipped feature |
| GitHub Copilot coding agent | Fire-and-forget; progress visible as commits/PR activity | No mid-run steering; notifications via GitHub itself |
| OpenAI Codex (cloud) | SSE in web UI; external view is polling; no mid-run course-correction | Explicitly documents inability to steer mid-run |
| Replit Agent | Task board UI; user checks progress; no live terminal stream | Novel: automated "Decision-Time Guidance" (internal classifier-driven steering) |
| Cursor background agents | Pure fire-and-forget; user manually checks state | No built-in completion notifications (open feature request) |
Key observations:
- Fire-and-forget + notifications is the dominant pattern for long-running coding agents.
- Mid-run steering exists only where there's a persistent conversation surface (Devin's Slack thread); our
bgagent nudge+bgagent askis the equivalent. - No product ships "proceed with default if no response" for approval gates. Hard gates or no gates — that's the shipped landscape.
- Polling-based observation is ubiquitous and well-tolerated at minute-to-hour task durations.
{ "task_id": "abc123", // PK "event_id": "01JXY...", // SK, ULID (time-sortable) "event_type": "agent_tool_call", "timestamp": "2026-04-29T15:30:12Z", "ttl": 1735689600, "metadata": { "tool_name": "Bash", "tool_input_preview": "pytest tests/ -x", // ≤200 chars by default; 4KB with --trace "turn": 7, "...": "..." } }