You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Concurrency control - Per-user limits (default 3) with atomic admission and automated drift reconciliation
Stranded task reconciler - Scheduled Lambda detects tasks stuck in SUBMITTED, HYDRATING, or AWAITING_APPROVAL and drives them to failure with proper cleanup
Idempotency - Idempotency-Key header on POST requests (24-hour TTL)
Task types
Workflow-driven tasks - Task types are declarative, versioned workflow files (agent/workflows/**) interpreted by an agent-side step runner, not hardcoded task_type branches. Selected via workflow_ref (the task_type enum is removed). New task types are authored as YAML + registered step handlers, not core-code changes (ADR-014, WORKFLOWS.md)
coding/new-task-v1 - Branch, implement, build/test, open PR
coding/pr-review-v1 - Read-only structured code review via GitHub Reviews API (no Write/Edit tools)
Repo-less (knowledge) workflows - requires_repo:false workflows run end-to-end with no GitHub repo: hydrate_context → run_agent → deliver_artifact, delivering the agent's result to S3 (artifacts/{task_id}/) surfaced on TaskDetail.artifact_uri. Ships a reference knowledge/web-research-v1 workflow; memory keys on user:{user_id}
Onboarding and customization
Blueprint construct - Per-repo CDK configuration (model, max turns, prompt overrides, egress allowlist, GitHub token, Cedar policies, approval gate cap)
Output screening - Regex-based secret/PII scanner with PostToolUse hook redaction
Content sanitization - HTML stripping, injection pattern neutralization, control character removal
Cedar policy engine and HITL gates - Tool-call governance (allow / hard-deny / soft-deny requiring approval) with fail-closed default, per-repo Cedar policies, submit-time initial_approvals, AWAITING_APPROVAL state, bgagent approve / deny / pending / policies, and REST approval APIs. Stranded approvals in AWAITING_APPROVAL are cleared by the stranded-task reconciler. See CEDAR_HITL_GATES.md
Per-session IAM scoping - Agent assumes a per-task SessionRole via sts:AssumeRole with session tags {user_id, repo, task_id} and refreshable credentials (1-hour role-chaining cap; tasks up to 8 h). Tenant-data DynamoDB access uses dynamodb:LeadingKeys = ${aws:PrincipalTag/task_id}; S3 traces/attachments use a ${aws:PrincipalTag/user_id} prefix. Bedrock model invocation still uses the compute role (see Bedrock IAM session-tag attribution under What's next). See SECURITY.md
Model invocation logging - Full prompt/response audit trail (90-day retention)
Memory and learning
AgentCore Memory - Semantic (repo knowledge) and episodic (task episodes) strategies with namespace templates
Task attachments (multimodal) - attachments on create-task: inline base64 (≤ 500 KB), presigned upload (up to 10 MB), and URL fetch with SSRF protection. Images (PNG, JPEG) and text files (TXT, CSV, MD, JSON, PDF, LOG) pass through Guardrail screening, magic-bytes validation, and re-encoding. CLI --attachment, Slack file uploads, and Linear image extraction share the same schema. See ATTACHMENTS.md
Webhooks
HMAC-SHA256 webhooks - External systems create tasks without Cognito credentials
Turn caps - Per-task max turns (1-500, default 100) with Blueprint defaults
Cost budget - Per-task max budget in USD ($0.01-$100)
Data retention - Automatic TTL-based cleanup (default 90 days)
Interactive task UX
Real-time watch - bgagent watch streams progress events with adaptive polling (500 ms active; 1/2/5 s idle backoff), cold-start retry, clean exit on terminal state
Mid-run steering (nudge) - bgagent nudge sends guidance to a running agent; combined-turn acknowledgement (agent emits nudge_acknowledged before incorporating)
Execution tracing - --trace on submit raises preview cap to 4 KB and uploads full gzipped NDJSON trajectory to S3; bgagent trace download retrieves it
Deterministic status snapshot - bgagent status shows operational fields (turn, last milestone, current tool/turn, cost) from the task record + recent events with no LLM in the loop—suited to ops/debug, not a narrative manager-style report (see Smart progress updates under What's next)
Debug output - --verbose flag emits full HTTP request/response on stderr for any CLI command
Notification plane
DDB Stream fanout - FanOut Consumer Lambda on TaskEventsTable streams (ParallelizationFactor: 1 for per-task ordering) routes events to channel dispatchers
GitHub edit-in-place - Single status comment per task on the target PR, edited in place as progress events fire (phase, milestone, cost, link)
Routable agent milestones - Named checkpoints (pr_created, nudge_acknowledged) unwrapped against allowlist for channel filter matching
Slack notification dispatcher - FanOut Block Kit messages for Slack-origin tasks (lifecycle events, threaded replies, terminal dedup, in-thread cancel). Generic fallback text for unmapped event types (e.g. some milestones); richer milestone and approval-gate rendering is follow-up work
Deploy-preview screenshots - Listens for GitHub deployment_status: success events from any provider (Vercel, Amplify Hosting, Netlify, GitHub Actions); captures the preview URL via AgentCore Browser; posts a markdown image comment on the open PR (and on the linked Linear issue if Linear is configured). Lambda-only, deterministic, ~10–15 s post-deploy. See Deploy preview screenshots guide.
Email dispatcher - Log-only stub; pending SES integration
Channels
Slack integration - @mention task submission, bgagent slack link / setup, file attachments on submit, threaded progress notifications. See SLACK_SETUP_GUIDE.md
Linear integration - Label-triggered tasks, bgagent linear setup / link, progress comments on issues. See LINEAR_SETUP_GUIDE.md
Jira integration - Label-triggered tasks on Jira Cloud, bgagent jira setup / map / link, progress comments via the Jira REST v3 API. See JIRA_SETUP_GUIDE.md and ADR-015
Observability
OpenTelemetry - Custom spans for pipeline phases with CloudWatch querying
Audit trail - TaskEvents table with chronological event log per task
Runtime error classifier - Pattern-matching classifier that categorizes task errors (auth/network/concurrency/compute/agent/guardrail/config/timeout/unknown) with human-readable titles, descriptions, remedies, and retryability flags. Computed at API response time; powers structured CLI error display and CloudWatch alarm routing
Enhanced error classifiers - Specific terminal-state classifiers (error_max_turns, error_max_budget_usd, error_during_execution) for precise CLI display and alarm routing
Agent harness
Default branch detection - Dynamic detection via gh repo view
Uncommitted work safety net - Auto-commit before PR creation
Build/lint verification - Pre- and post-agent baselines in PR body
Prompt versioning - SHA-256 hash for A/B comparison
Per-commit attribution - Task-Id and Prompt-Version git trailers
Persistent session storage - /mnt/workspace for npm and config caches
Docs and DX
Quick start guide - Zero to first PR in ~30 minutes
Prompt guide - Best practices, anti-patterns, examples
Claude Code plugin - Interactive skills for setup, deploy, submit, troubleshoot
What's next
Planned capabilities, grouped by theme. Items are independent and may ship in any order.
Credentials and authorization
Capability
Description
Per-repo GitHub credentials
GitHub App per org/repo via AgentCore Token Vault. Auto-refresh for long sessions. Sets the pattern for GitLab, Jira, Slack integrations.
Principal-to-repo authorization
Map Cognito identities to allowed repository sets. Users can only trigger work on authorized repos.
End-to-end task attribution
Propagate task_id, user_id, and trace context consistently across orchestrator logs, agent OpenTelemetry, GitHub/API calls, and TaskEvents so every downstream action is attributable in incident response (aligns with Zero Trust agent-identity guidance).
Emergency session containment
Documented operator runbook and APIs: cancel task, terminate compute session, revoke short-lived credentials (assumed role, GitHub App token). Pairs with Per-session IAM scoping, Per-repo GitHub credentials, and Behavioral circuit breaker automated containment.
Delegation chain propagation
Embed a cryptographically signed actor chain (user_id → orchestrator → agent) in credentials issued to the agent. Downstream services (GitHub commits, API calls) can trace any action back to the originating human principal. Enables per-action accountability, compliance audit, and fine-grained authorization decisions based on the full delegation lineage rather than only the immediate caller.
Workload-anchored credential binding
Bind agent credentials to the specific MicroVM execution environment via attestation (e.g., instance identity document or platform-level workload identity). Credentials become non-transferable — unusable if exfiltrated from the VM. Complements per-session IAM scoping (which limits scope) with environment binding (which limits where credentials can be exercised).
Layered credential derivation
Extend per-session scoping with a derivation model where each layer in the execution stack receives progressively narrower credentials. The orchestrator holds a task-scoped token; the agent runtime derives a further-restricted token limited to specific tools and repositories; tool invocations receive single-use or time-boxed tokens for each external call. Limits blast radius at every boundary, not just at task creation.
Agent quality
Capability
Description
Autonomous feedback loop
Extend the orchestrator state machine beyond PR_OPENED with a PR watcher phase. Auto-resume the agent when CI fails (inject failure logs), merge conflicts arise (rebase instructions), or reviewers request changes (inline comments). Continue the loop until the PR is merged or a human cancels. Optionally auto-merge when CI passes and review is approved. Transforms ABCA from "open PR" to "merge PR".
Tiered validation pipeline
Three post-agent tiers: tool validation (build/test/lint), code quality (DRY/SOLID/complexity), risk and blast radius analysis.
In-pipeline build/lint fix-up loop
Today the agent path is linear (clone → code → build → lint → PR); a post-change verify_build / verify_lint failure fails the task. Instead, loop back into the agent with the failure output as extra context, up to a configurable retry count, then fail only if fixes are exhausted—while still respecting the existing max_turns budget. Likely implementable in pipeline.py (after run_agent(), on verification failure re-invoke the agent) without orchestrator changes; distinct from the Autonomous feedback loop (PR/CI after the PR exists).
In-pipeline pre-PR self-review
Post-hooks already run build / lint, but the LLM is not prompted to self-review its own diff before the PR. Add an optional in-pipeline step: surface the change set (diff), have the model critique it (bugs, style, edge cases, test gaps), then iterate on fixes—within the same max_turns / budget constraints. Aims to improve first-pass PR quality before human or CI review; implementable alongside other pipeline.py phases.
PR risk classification
Rule-based risk classifier at submission. Drives model selection, budget defaults, approval requirements.
PR scope creep check (pr_review)
Add an advisory-first scope analysis in pr_review that compares declared intent (task description / issue / PR narrative) to the actual diff and touched areas. Return structured output with scope_rating (within_scope/mild_expansion/significant_expansion/likely_scope_creep), confidence, and rationale (files, API/schema/config changes, unrelated dependency churn). Start as non-blocking reviewer guidance; optional policy gates can be enabled later for high-risk repos.
Review feedback memory loop
Capture PR review comments via webhook, extract rules via LLM, persist as searchable memory.
PR outcome tracking
Track merge/reject via GitHub webhooks. Positive/negative signals feed evaluation and memory.
Assign prompt variants per task or cohort; compare merge rate, failure rate, and token usage with statistical guardrails.
LLM-assisted trace analysis
Automated deep dive on failed trajectories (logs + spans) to surface recurring reasoning and tool-use failure modes.
Validation and risk analytics
Dashboards for PR risk labels, validation outcomes, and trends by repo, user, and prompt_version; eventually feed learned memory rules into Tier 2 when the tiered pipeline ships.
Memory security
Capability
Description
Trust-aware retrieval
Weight memories by freshness, source type, pattern consistency.
Temporal decay
Configurable per-entry TTL with faster decay for unverified content.
Anomaly detection
CloudWatch metrics on write patterns; alarms for burst writes or suspicious content.
Quarantine and rollback
Operator API for isolating suspicious entries and restoring pre-task snapshots.
Write-ahead validation
Route proposed memory writes through a guardian model.
Review feedback quorum
Promote review-derived rules to persistent memory only after corroboration (e.g. pattern seen across trusted reviewers and PRs), reducing single-comment poisoning. Complements Review feedback memory loop.
Memory backup to S3
Scheduled export of AgentCore Memory namespaces to versioned S3 for disaster recovery and pre-poisoning restore (see design: SECURITY.md).
Memory extraction replay
Operator API (e.g. start_memory_extraction_job) to re-run failed PR-review extraction after webhook or Lambda errors.
Structured knowledge graph (tier 4)
Optional long-term direction if semantic + episodic memory proves insufficient for repo-specific query patterns.
Security (execution guardrails)
Capability
Description
Behavioral circuit breaker
Per-session limits on tool-call rate, cumulative cost, consecutive failures, and file churn; pause or terminate when thresholds are exceeded. On trip: terminate session, revoke short-lived credentials where applicable, emit a containment audit event. Configurable per repo via Blueprint (design: SECURITY.md, REPO_ONBOARDING.md). Prefer hard containment over friction-only limits (rate/turn caps alone).
Tool capability tiers
Opt-in extended tool profile per repo: MCP servers, plugins, and additional Gateway-mediated tools beyond the default minimal surface (COMPUTE.md). Enforced at Gateway and policy layers.
MCP supply-chain controls
For extended-tier repos: pin or self-host MCP servers; keep .mcp.json in version control; verify tool descriptors before enablement; no dynamic tool discovery in production blueprints. Mitigates tool poisoning and rug-pull risks (SECURITY.md, COMPUTE.md).
Untrusted hydration content boundaries
Delimit external content in assembled prompts (issue/PR bodies, fetched URLs, review comments) so the model treats it as untrusted context (spotlighting-style framing). Complements Bedrock Guardrails at hydration time (context-hydration.ts).
Interactive task UX
Capability
Description
Smart progress updates (manager-style status)
Extend check-in beyond the shipped deterministic snapshot: human-readable progress that answers what the agent completed, what it plans next, and which decisions or blockers matter—surfaced via bgagent status, notification channels (GitHub/Slack/email), and the future control panel. Prefer structured agent-emitted progress events in TaskEventsTable (e.g. done / next / decisions / blockers) so all readers stay consistent and auditable; complement with Phase 2 bgagent ask for on-demand Q&A and an optional read-path LLM-synthesized summary over events (no agent turn) where cost/latency trade-offs are acceptable. Distinct from raw watch/events streams and from post-mortem LLM-assisted trace analysis. Design context: INTERACTIVE_AGENTS.md.
bgagent ask (Phase 2)
Mid-run questions to the agent (POST /tasks/{id}/asks); answers durable as status_response events with CLI block-and-poll. Enables interactive summaries (e.g. "what changed so far?") without a separate status API. Ships as part of the interactive check-in layer in INTERACTIVE_AGENTS.md Phase 2.
LLM-synthesized status summary (optional)
Optional bgagent status mode where a Lambda narrates recent TaskEvents without waking the agent—deferred in design due to cost and hallucination risk; pursue behind a flag only if agent-authored progress reports are insufficient. Complements, does not replace, Smart progress updates.
Channels and integrations
Capability
Description
Additional git providers
GitLab (and optionally Bitbucket). Same workflow, provider-specific API adapters.
Slack notification polish
Rich Block Kit for agent_milestone and approval_requested (today many map to generic fallback text); in-thread approve/deny buttons wired to HITL APIs. Should render Smart progress updates when that ships.
Control panel
Web UI: task list, task detail with logs/traces, cancel, metrics dashboards, cost attribution. Task detail should show manager-style progress alongside raw events/traces.
Email notification dispatcher
SES-based email notifications via the existing fanout pipeline. Log-only stub ships today (see unchecked Email dispatcher under What's ready).
Per-user notification preferences
DynamoDB (or equivalent) store for preferred channels, per-channel config, and event filters (INPUT_GATEWAY.md).
Browser extension channel
Lightweight extension to open tasks from GitHub issue/PR pages using existing webhook or OAuth-issued JWT; same internal message contract as other channels.
Compute and performance
Capability
Description
Adaptive model router
Per-turn model selection by complexity. Cheaper models for reads, Opus for complex reasoning. ~30-40% cost reduction. Related: Complexity-aware model router under Cost governance.
Alternative compute
ECS/Fargate or EKS via ComputeStrategy (EcsComputeStrategy exists; Agent stack wiring is commented out). For workloads exceeding AgentCore's 2 GB image limit or requiring GPU.
Environment pre-warming
Pre-build container layers per repo. Snapshot-on-schedule (rebuild on push). Cold start from minutes to seconds.
S3-backed SDK session store (portable transcripts)
Plumb the Claude Agent SDK SessionStore to S3 (dedicated bucket or prefix) with eager flush, IAM-scoped access, conditional part creates, checksums, adaptive retries, and structured logging. Emit metrics or alarms on transcript mirror failures; own graceful shutdown (disconnect on SIGTERM/cancel) so in-flight frames can flush. Persist task_id ↔ Claude session UUID (from the first ResultMessage) for resume on another worker; keep agent cwd stable so SDK-derived project_key paths stay predictable. Plan compaction when part count threatens resume latency; optional S3 Express One Zone when the fleet is single-AZ. Complements checked Persistent session storage (FUSE caches on /mnt/workspace) and end-of-task trace upload to traces/...jsonl.gz.
Onboarding and repo lifecycle
Capability
Description
Automated re-onboarding
Event-driven refresh of blueprint-related artifacts when the default branch changes materially (GitHub webhook); optional EventBridge schedule for periodic drift checks. Distinct from Scheduled triggers (task creation).
Dynamic onboarding artifacts
When repo hygiene is weak, generate attachments for the agent context: codebase summaries, dependency graphs, suggested rules from layout (REPO_ONBOARDING.md).
Documentation and specifications
Capability
Description
Exposed project specifications
Publish and surface human- and machine-readable specs—for example OpenAPI or JSON Schema generated from the REST API, explicit extension-point and integration indexes, and stable links into architecture contracts—so operators and contributors can modify, extend, or fork the solution without reverse-engineering the codebase. Complements the design-doc links at the end of this page.
Cost governance
Capability
Description
Bedrock IAM session-tag attribution
Route Bedrock InvokeModel through assumed credentials that carry {user_id, repo, task_id} session tags. Per-session IAM scoping (#209) already tags the SessionRole for DynamoDB/S3; model calls still use the AgentCore/ECS compute role today. Extend aws_session.py (or equivalent) so inference is chargeable in Cost Explorer / CUR 2.0 by principal tag. Operator must activate IAM principal cost allocation tags (see COST_MODEL.md).
Bedrock per-request metadata
Pass task_id, user_id, and repo on each Bedrock call via request metadata / X-Amzn-Bedrock-Request-Metadata into model invocation logs. Complements IAM attribution; does not replace in-app cost_usd. Requires Claude Code / SDK support for metadata on InvokeModel.
Cost dashboard and export API
Log Insights widgets on invocation logs; optional API or export for monthly spend roll-ups by user_id / repo from the task table. Operator dashboard today covers task-level cost aggregates, not Bedrock chargeback dimensions.
Optional tagged application inference profiles
CDK-managed Bedrock application inference profiles per onboarded repo or environment; set ANTHROPIC_MODEL to tagged profile ARN for resourceTags/* billing when repo count is bounded.
Org and team budgets
Per-user and per-team monthly token or USD budgets with alerting (e.g. 80%) and optional hard stop at 100%. Per-task max_budget_usd and turn caps ship today.
Complexity-aware model router
Route each request to the most appropriate model based on task complexity (simple reads/edits to cheaper models, deeper reasoning to stronger models) while honoring budget and policy constraints. Related: Adaptive model router under Compute and performance.
Observability and safe deploy
Capability
Description
Deployed runtime E2E verification
Phase 0 landed:@aws-cdk/integ-tests-alpha + integ-runner deploy a trimmed Task API stack to a real account, assert the create-and-persist happy path (task persists at SUBMITTED), then tear it down (mise //cdk:integ). In CI it runs per-PR via workflow_run when the diff touches cdk/** or agent/**, behind the integ environment's admin-approval gate, and posts a required integ-smoke status that blocks merge (workflow_dispatch retained for manual runs). Phase 1 (full lifecycle / real agent runs) and Phase 2 (channels) follow. See ADR-013.
Admission backlog observability
Metric and alarm when SUBMITTED task depth exceeds an operator threshold (capacity and admission health).
Admission queue with deferred pickup
When admission is at capacity, persist tasks in a durable queue instead of failing them. Automatically re-attempt admission and continue processing in FIFO order (with optional priority lanes) as concurrency becomes available. Preserve cancel/idempotency semantics and expose queue position/ETA in task status.
Safe orchestrator deploys
Pre-deploy checks for active tasks (drain or warn); blue-green or canary Lambda deploy for the durable orchestrator with rollback on error regressions (OBSERVABILITY.md).
Unified cross-plane trace correlation
Single trace root per task across orchestrator, MicroVM OpenTelemetry, TaskEvents, and S3 trace artifacts. Gap-fill beyond existing AgentCore session baggage (OBSERVABILITY.md).
Immutable audit export
Append-only export of TaskEvents and policy decisions to S3 (e.g. Object Lock). Complements Centralized policy frameworkPolicyDecisionEvent schema for compliance and tamper-evident investigation.
Security operations metrics (dwell time and coverage)
CloudWatch metrics and dashboard panels: time from anomaly (circuit breaker, guardrail spike, policy deny burst) to operator awareness; fraction of security/ops alarms investigated. Targets shortened exploit windows.
Automated alert first-pass triage
On selected security/ops alarms, a Lambda produces a structured disposition from logs, traces, and TaskEvents before human review. Distinct from LLM-assisted trace analysis (post-mortem on failed tasks).
Scale and collaboration
Capability
Description
Multi-user and teams
Team visibility, shared approval queues, team concurrency/cost budgets, memory isolation.
Agent swarm
Planner-worker architecture for complex multi-file tasks. DAG of subtasks, merge orchestrator, one consolidated PR. Workers receive a strict subset of planner credentials; orchestrator-issued subtask intent; per-worker OpenTelemetry spans under a shared trace root (prevents confused-deputy / unscoped privilege inheritance).
Multi-user nudge
Extend bgagent nudge to support multiple users injecting context into the same running task. Per-nudge commit attribution. (Single-user nudge shipped.)
Scheduled triggers
Cron-based task creation via EventBridge (dependency updates, nightly flaky test checks).
Platform maturity
Capability
Description
Unified liveness decision model (follow-up design ticket)
Normalize task health evaluation across compute backends so heartbeat, compute session status, and DynamoDB state are handled through a single typed decision path. Define explicit backend capabilities (for example, heartbeat support), deterministic precedence rules for terminal outcomes, and regression tests that prevent cross-runtime false failures like ECS heartbeat mismatch.
Pure decision function orchestrator refactor
Extract orchestrator decision logic into pure functions that take a frozen snapshot and return a typed action. Side-effectful execution applies actions with CAS (compare-and-swap) guards on DynamoDB updated_at to prevent stale writes. Makes the orchestrator exhaustively unit-testable without mocking I/O, eliminates competing-worker race conditions, and is a prerequisite for the autonomous feedback loop.
Blueprint custom steps and step sequences
Lambda-backed pre-agent / post-agent steps and optional step_sequence overrides with CDK synth + runtime validation and INVALID_STEP_SEQUENCE on misconfiguration (REPO_ONBOARDING.md, ORCHESTRATOR.md).
Blueprint RepoConfig parity
Extend the Blueprint construct to persist per-repo default max_budget_usd and memory_token_budget in DynamoDB (orchestrator already merges max_budget_usd when present; hydration uses a fixed memory token cap today).
Orchestrator DLQ
Dead-letter path for task orchestration after retry exhaustion so operators can inspect and replay failed durable executions (ORCHESTRATOR.md).
The scheduled stranded-task reconciler shipped (detects and fails stuck tasks). Further: operator notification before forced failure, manual resume option (ORCHESTRATOR.md).
EventBridge / SNS integration
Publish task lifecycle events to EventBridge or SNS for external consumers beyond the built-in DDB-Stream fanout (which already powers GitHub edit-in-place, Slack, and email dispatchers).
CDK constructs library
Publish reusable constructs to Construct Hub with semver versioning.
Centralized policy framework
Unified Cedar-based framework with PolicyDecisionEvent audit schema. Three enforcement modes with observe-before-enforce rollout.
Zero Trust control review ("impossible vs tedious")
Document a standing design test in SECURITY.md: prefer controls that remove capability over friction-only mitigations (rate limits, observe-only DNS). Use when prioritizing DNS enforcement, credential scoping, and containment vs. throttling.
Formal verification
TLA+ specification of task state machine, concurrency, cancellation races, reconciler interleavings.
Agent asset registry
Capability
Description
Central asset registry
A versioned, platform-managed registry from which agents resolve assets at runtime instead of requiring them to be vendored in source. Assets include skills, plugins, MCP server definitions, capabilities (Change Manifest verification strategies, knowledge tool configurations), custom prompt fragments, and Cedar policy modules. The registry is the single source of truth the Change Manifest's L1 (tool_needed validation) and L5 (knowledge resolution) evaluate against — replacing the implicit "known tool registry" currently assumed in the design. Backed by DynamoDB (metadata + version index) with S3 (artifact storage).
Asset versioning and immutability
Every asset version is immutable once published. Blueprints pin asset versions explicitly (no floating latest in production). Version resolution follows semver constraints. Rollback is a re-pin to a prior version, not mutation. Enables reproducible agent executions and safe rollout of new tool versions without affecting running tasks.
Asset lifecycle management
Publish, deprecate, and retire flow for registry assets. Deprecation emits warnings in task telemetry when a pinned asset version is nearing end-of-life. Retirement blocks new task starts that reference the asset. Operator API for bulk migration (re-pin all blueprints from v1 to v2).
Capability descriptors
Structured metadata per registry asset declaring what the asset provides (tool surface, permissions required, resource limits, Cedar actions it introduces) and what it requires (runtime dependencies, network egress domains, minimum compute profile). The agent runtime uses descriptors to configure the execution environment dynamically — enabling MCP servers, injecting context, and adjusting sandbox permissions based on resolved capabilities rather than static blueprint lists.
Blueprint registry references
Extend the Blueprint construct so knowledge_tools, mcp_servers, tool profiles, and capability configurations reference registry asset identifiers and version constraints instead of inline definitions. At task start the orchestrator resolves pinned versions from the registry, fetches artifacts, and provisions the agent environment. Decouples asset authoring from infrastructure deployment.
Registry access control
Cedar policies govern who can publish, deprecate, or pin assets. Scoped by asset type and namespace (e.g., org-private vs. platform-provided). Read access (resolution at task start) is unrestricted within the deployment; write access requires operator or publisher role.