Roadmap

What's shipped and what's coming next.

What's ready

Core platform

Autonomous agent execution - Isolated MicroVM (AgentCore Runtime) per task with shell, filesystem, and git access
CLI and REST API - Submit, list, get, cancel, nudge, watch, trace, status, webhook management; Cedar HITL (approve, deny, pending, policies); Slack and Linear workspace setup; view audit events; Cognito auth with token caching
Durable orchestrator - Lambda Durable Functions with checkpoint/resume; survives transient failures up to 9 hours
Task state machine - SUBMITTED → HYDRATING → RUNNING → AWAITING_APPROVAL (Cedar HITL) → FINALIZING → COMPLETED / FAILED / CANCELLED / TIMED_OUT
Concurrency control - Per-user limits (default 3) with atomic admission and automated drift reconciliation
Stranded task reconciler - Scheduled Lambda detects tasks stuck in SUBMITTED, HYDRATING, or AWAITING_APPROVAL and drives them to failure with proper cleanup
Idempotency - Idempotency-Key header on POST requests (24-hour TTL)

Task types

Workflow-driven tasks - Task types are declarative, versioned workflow files (agent/workflows/**) interpreted by an agent-side step runner, not hardcoded task_type branches. Selected via workflow_ref (the task_type enum is removed). New task types are authored as YAML + registered step handlers, not core-code changes (ADR-014, WORKFLOWS.md)
coding/new-task-v1 - Branch, implement, build/test, open PR
coding/pr-iteration-v1 - Check out PR branch, read review feedback, address it, push
coding/pr-review-v1 - Read-only structured code review via GitHub Reviews API (no Write/Edit tools)
Repo-less (knowledge) workflows - requires_repo:false workflows run end-to-end with no GitHub repo: hydrate_context → run_agent → deliver_artifact, delivering the agent's result to S3 (artifacts/{task_id}/) surfaced on TaskDetail.artifact_uri. Ships a reference knowledge/web-research-v1 workflow; memory keys on user:{user_id}

Onboarding and customization

Blueprint construct - Per-repo CDK configuration (model, max turns, prompt overrides, egress allowlist, GitHub token, Cedar policies, approval gate cap)
Repo-level project config - Agent loads CLAUDE.md, .claude/rules/, .claude/settings.json, .mcp.json
Per-repo overrides - Model ID, max turns, max budget (per-task request and/or RepoTable; Blueprint CDK default pending), system prompt overrides, poll interval, dedicated token

Security

Memory and learning

AgentCore Memory - Semantic (repo knowledge) and episodic (task episodes) strategies with namespace templates
Content integrity - SHA-256 hashing, source provenance tracking, schema v3
Fail-open design - Memory never blocks task execution; 2,000-token budget

Context hydration

Rich prompt assembly - Task description + GitHub issue/PR content + memory context (~100K token budget)
Token budget management - Oldest comments trimmed first; title/body always preserved
Task attachments (multimodal) - attachments on create-task: inline base64 (≤ 500 KB), presigned upload (up to 10 MB), and URL fetch with SSRF protection. Images (PNG, JPEG) and text files (TXT, CSV, MD, JSON, PDF, LOG) pass through Guardrail screening, magic-bytes validation, and re-encoding. CLI --attachment, Slack file uploads, and Linear image extraction share the same schema. See ATTACHMENTS.md

Webhooks

HMAC-SHA256 webhooks - External systems create tasks without Cognito credentials
Webhook management - Create, list, revoke with soft delete (30-day TTL)

Cost and limits

Turn caps - Per-task max turns (1-500, default 100) with Blueprint defaults
Cost budget - Per-task max budget in USD ($0.01-$100)
Data retention - Automatic TTL-based cleanup (default 90 days)

Interactive task UX

Real-time watch - bgagent watch streams progress events with adaptive polling (500 ms active; 1/2/5 s idle backoff), cold-start retry, clean exit on terminal state
Mid-run steering (nudge) - bgagent nudge sends guidance to a running agent; combined-turn acknowledgement (agent emits nudge_acknowledged before incorporating)
Execution tracing - --trace on submit raises preview cap to 4 KB and uploads full gzipped NDJSON trajectory to S3; bgagent trace download retrieves it
Deterministic status snapshot - bgagent status shows operational fields (turn, last milestone, current tool/turn, cost) from the task record + recent events with no LLM in the loop—suited to ops/debug, not a narrative manager-style report (see Smart progress updates under What's next)
Debug output - --verbose flag emits full HTTP request/response on stderr for any CLI command

Notification plane

DDB Stream fanout - FanOut Consumer Lambda on TaskEventsTable streams (ParallelizationFactor: 1 for per-task ordering) routes events to channel dispatchers
GitHub edit-in-place - Single status comment per task on the target PR, edited in place as progress events fire (phase, milestone, cost, link)
Routable agent milestones - Named checkpoints (pr_created, nudge_acknowledged) unwrapped against allowlist for channel filter matching
Slack notification dispatcher - FanOut Block Kit messages for Slack-origin tasks (lifecycle events, threaded replies, terminal dedup, in-thread cancel). Generic fallback text for unmapped event types (e.g. some milestones); richer milestone and approval-gate rendering is follow-up work
Deploy-preview screenshots - Listens for GitHub deployment_status: success events from any provider (Vercel, Amplify Hosting, Netlify, GitHub Actions); captures the preview URL via AgentCore Browser; posts a markdown image comment on the open PR (and on the linked Linear issue if Linear is configured). Lambda-only, deterministic, ~10–15 s post-deploy. See Deploy preview screenshots guide.
Email dispatcher - Log-only stub; pending SES integration

Channels

Slack integration - @mention task submission, bgagent slack link / setup, file attachments on submit, threaded progress notifications. See SLACK_SETUP_GUIDE.md
Linear integration - Label-triggered tasks, bgagent linear setup / link, progress comments on issues. See LINEAR_SETUP_GUIDE.md
Jira integration - Label-triggered tasks on Jira Cloud, bgagent jira setup / map / link, progress comments via the Jira REST v3 API. See JIRA_SETUP_GUIDE.md and ADR-015

Observability

OpenTelemetry - Custom spans for pipeline phases with CloudWatch querying
Operator dashboard - Task success rate, cost, duration, build/lint pass rates, AgentCore metrics
Alarms - Stuck tasks, orchestration failures, counter drift, crash rate, guardrail failures
Audit trail - TaskEvents table with chronological event log per task
Runtime error classifier - Pattern-matching classifier that categorizes task errors (auth/network/concurrency/compute/agent/guardrail/config/timeout/unknown) with human-readable titles, descriptions, remedies, and retryability flags. Computed at API response time; powers structured CLI error display and CloudWatch alarm routing
Enhanced error classifiers - Specific terminal-state classifiers (error_max_turns, error_max_budget_usd, error_during_execution) for precise CLI display and alarm routing

Agent harness

Default branch detection - Dynamic detection via gh repo view
Uncommitted work safety net - Auto-commit before PR creation
Build/lint verification - Pre- and post-agent baselines in PR body
Prompt versioning - SHA-256 hash for A/B comparison
Per-commit attribution - Task-Id and Prompt-Version git trailers
Persistent session storage - /mnt/workspace for npm and config caches

Docs and DX

Quick start guide - Zero to first PR in ~30 minutes
Prompt guide - Best practices, anti-patterns, examples
Claude Code plugin - Interactive skills for setup, deploy, submit, troubleshoot

What's next

Planned capabilities, grouped by theme. Items are independent and may ship in any order.

Credentials and authorization

Capability	Description
Per-repo GitHub credentials	GitHub App per org/repo via AgentCore Token Vault. Auto-refresh for long sessions. Sets the pattern for GitLab, Jira, Slack integrations.
Principal-to-repo authorization	Map Cognito identities to allowed repository sets. Users can only trigger work on authorized repos.
End-to-end task attribution	Propagate `task_id`, `user_id`, and trace context consistently across orchestrator logs, agent OpenTelemetry, GitHub/API calls, and `TaskEvents` so every downstream action is attributable in incident response (aligns with Zero Trust agent-identity guidance).
Emergency session containment	Documented operator runbook and APIs: cancel task, terminate compute session, revoke short-lived credentials (assumed role, GitHub App token). Pairs with Per-session IAM scoping, Per-repo GitHub credentials, and Behavioral circuit breaker automated containment.
Delegation chain propagation	Embed a cryptographically signed actor chain (`user_id → orchestrator → agent`) in credentials issued to the agent. Downstream services (GitHub commits, API calls) can trace any action back to the originating human principal. Enables per-action accountability, compliance audit, and fine-grained authorization decisions based on the full delegation lineage rather than only the immediate caller.
Workload-anchored credential binding	Bind agent credentials to the specific MicroVM execution environment via attestation (e.g., instance identity document or platform-level workload identity). Credentials become non-transferable — unusable if exfiltrated from the VM. Complements per-session IAM scoping (which limits scope) with environment binding (which limits where credentials can be exercised).
Layered credential derivation	Extend per-session scoping with a derivation model where each layer in the execution stack receives progressively narrower credentials. The orchestrator holds a task-scoped token; the agent runtime derives a further-restricted token limited to specific tools and repositories; tool invocations receive single-use or time-boxed tokens for each external call. Limits blast radius at every boundary, not just at task creation.

Agent quality

Capability	Description
Autonomous feedback loop	Extend the orchestrator state machine beyond `PR_OPENED` with a PR watcher phase. Auto-resume the agent when CI fails (inject failure logs), merge conflicts arise (rebase instructions), or reviewers request changes (inline comments). Continue the loop until the PR is merged or a human cancels. Optionally auto-merge when CI passes and review is approved. Transforms ABCA from "open PR" to "merge PR".
Tiered validation pipeline	Three post-agent tiers: tool validation (build/test/lint), code quality (DRY/SOLID/complexity), risk and blast radius analysis.
In-pipeline build/lint fix-up loop	Today the agent path is linear (clone → code → build → lint → PR); a post-change verify_build / verify_lint failure fails the task. Instead, loop back into the agent with the failure output as extra context, up to a configurable retry count, then fail only if fixes are exhausted—while still respecting the existing max_turns budget. Likely implementable in `pipeline.py` (after `run_agent()`, on verification failure re-invoke the agent) without orchestrator changes; distinct from the Autonomous feedback loop (PR/CI after the PR exists).
In-pipeline pre-PR self-review	Post-hooks already run build / lint, but the LLM is not prompted to self-review its own diff before the PR. Add an optional in-pipeline step: surface the change set (diff), have the model critique it (bugs, style, edge cases, test gaps), then iterate on fixes—within the same max_turns / budget constraints. Aims to improve first-pass PR quality before human or CI review; implementable alongside other `pipeline.py` phases.
PR risk classification	Rule-based risk classifier at submission. Drives model selection, budget defaults, approval requirements.
PR scope creep check (`pr_review`)	Add an advisory-first scope analysis in `pr_review` that compares declared intent (task description / issue / PR narrative) to the actual diff and touched areas. Return structured output with `scope_rating` (`within_scope`/`mild_expansion`/`significant_expansion`/`likely_scope_creep`), confidence, and rationale (files, API/schema/config changes, unrelated dependency churn). Start as non-blocking reviewer guidance; optional policy gates can be enabled later for high-risk repos.
Review feedback memory loop	Capture PR review comments via webhook, extract rules via LLM, persist as searchable memory.
PR outcome tracking	Track merge/reject via GitHub webhooks. Positive/negative signals feed evaluation and memory.
Evaluation pipeline	Failure categorization, memory effectiveness metrics (merge rate, revision cycles, CI pass rate).
A/B prompt experiments	Assign prompt variants per task or cohort; compare merge rate, failure rate, and token usage with statistical guardrails.
LLM-assisted trace analysis	Automated deep dive on failed trajectories (logs + spans) to surface recurring reasoning and tool-use failure modes.
Validation and risk analytics	Dashboards for PR risk labels, validation outcomes, and trends by repo, user, and `prompt_version`; eventually feed learned memory rules into Tier 2 when the tiered pipeline ships.

Memory security

Capability	Description
Trust-aware retrieval	Weight memories by freshness, source type, pattern consistency.
Temporal decay	Configurable per-entry TTL with faster decay for unverified content.
Anomaly detection	CloudWatch metrics on write patterns; alarms for burst writes or suspicious content.
Quarantine and rollback	Operator API for isolating suspicious entries and restoring pre-task snapshots.
Write-ahead validation	Route proposed memory writes through a guardian model.
Review feedback quorum	Promote review-derived rules to persistent memory only after corroboration (e.g. pattern seen across trusted reviewers and PRs), reducing single-comment poisoning. Complements Review feedback memory loop.
Memory backup to S3	Scheduled export of AgentCore Memory namespaces to versioned S3 for disaster recovery and pre-poisoning restore (see design: `SECURITY.md`).
Memory extraction replay	Operator API (e.g. `start_memory_extraction_job`) to re-run failed PR-review extraction after webhook or Lambda errors.
Structured knowledge graph (tier 4)	Optional long-term direction if semantic + episodic memory proves insufficient for repo-specific query patterns.

Security (execution guardrails)

Capability	Description
Behavioral circuit breaker	Per-session limits on tool-call rate, cumulative cost, consecutive failures, and file churn; pause or terminate when thresholds are exceeded. On trip: terminate session, revoke short-lived credentials where applicable, emit a `containment` audit event. Configurable per repo via Blueprint (design: `SECURITY.md`, `REPO_ONBOARDING.md`). Prefer hard containment over friction-only limits (rate/turn caps alone).
Tool capability tiers	Opt-in extended tool profile per repo: MCP servers, plugins, and additional Gateway-mediated tools beyond the default minimal surface (`COMPUTE.md`). Enforced at Gateway and policy layers.
MCP supply-chain controls	For extended-tier repos: pin or self-host MCP servers; keep `.mcp.json` in version control; verify tool descriptors before enablement; no dynamic tool discovery in production blueprints. Mitigates tool poisoning and rug-pull risks (`SECURITY.md`, `COMPUTE.md`).
Untrusted hydration content boundaries	Delimit external content in assembled prompts (issue/PR bodies, fetched URLs, review comments) so the model treats it as untrusted context (spotlighting-style framing). Complements Bedrock Guardrails at hydration time (`context-hydration.ts`).

Interactive task UX

Capability	Description
Smart progress updates (manager-style status)	Extend check-in beyond the shipped deterministic snapshot: human-readable progress that answers what the agent completed, what it plans next, and which decisions or blockers matter—surfaced via `bgagent status`, notification channels (GitHub/Slack/email), and the future control panel. Prefer structured agent-emitted progress events in `TaskEventsTable` (e.g. done / next / decisions / blockers) so all readers stay consistent and auditable; complement with Phase 2 `bgagent ask` for on-demand Q&A and an optional read-path LLM-synthesized summary over events (no agent turn) where cost/latency trade-offs are acceptable. Distinct from raw `watch`/`events` streams and from post-mortem LLM-assisted trace analysis. Design context: INTERACTIVE_AGENTS.md.
`bgagent ask` (Phase 2)	Mid-run questions to the agent (`POST /tasks/{id}/asks`); answers durable as `status_response` events with CLI block-and-poll. Enables interactive summaries (e.g. "what changed so far?") without a separate status API. Ships as part of the interactive check-in layer in INTERACTIVE_AGENTS.md Phase 2.
LLM-synthesized status summary (optional)	Optional `bgagent status` mode where a Lambda narrates recent `TaskEvents` without waking the agent—deferred in design due to cost and hallucination risk; pursue behind a flag only if agent-authored progress reports are insufficient. Complements, does not replace, Smart progress updates.

Channels and integrations

Capability	Description
Additional git providers	GitLab (and optionally Bitbucket). Same workflow, provider-specific API adapters.
Slack notification polish	Rich Block Kit for `agent_milestone` and `approval_requested` (today many map to generic fallback text); in-thread approve/deny buttons wired to HITL APIs. Should render Smart progress updates when that ships.
Control panel	Web UI: task list, task detail with logs/traces, cancel, metrics dashboards, cost attribution. Task detail should show manager-style progress alongside raw events/traces.
Email notification dispatcher	SES-based email notifications via the existing fanout pipeline. Log-only stub ships today (see unchecked Email dispatcher under What's ready).
Per-user notification preferences	DynamoDB (or equivalent) store for preferred channels, per-channel config, and event filters (`INPUT_GATEWAY.md`).
Browser extension channel	Lightweight extension to open tasks from GitHub issue/PR pages using existing webhook or OAuth-issued JWT; same internal message contract as other channels.

Compute and performance

Capability	Description
Adaptive model router	Per-turn model selection by complexity. Cheaper models for reads, Opus for complex reasoning. ~30-40% cost reduction. Related: Complexity-aware model router under Cost governance.
Alternative compute	ECS/Fargate or EKS via `ComputeStrategy` (`EcsComputeStrategy` exists; Agent stack wiring is commented out). For workloads exceeding AgentCore's 2 GB image limit or requiring GPU.
Environment pre-warming	Pre-build container layers per repo. Snapshot-on-schedule (rebuild on push). Cold start from minutes to seconds.
S3-backed SDK session store (portable transcripts)	Plumb the Claude Agent SDK `SessionStore` to S3 (dedicated bucket or prefix) with eager flush, IAM-scoped access, conditional part creates, checksums, adaptive retries, and structured logging. Emit metrics or alarms on transcript mirror failures; own graceful shutdown (`disconnect` on SIGTERM/cancel) so in-flight frames can flush. Persist `task_id` ↔ Claude session UUID (from the first `ResultMessage`) for resume on another worker; keep agent `cwd` stable so SDK-derived `project_key` paths stay predictable. Plan compaction when part count threatens resume latency; optional S3 Express One Zone when the fleet is single-AZ. Complements checked Persistent session storage (FUSE caches on `/mnt/workspace`) and end-of-task trace upload to `traces/...jsonl.gz`.

Onboarding and repo lifecycle

Capability	Description
Automated re-onboarding	Event-driven refresh of blueprint-related artifacts when the default branch changes materially (GitHub webhook); optional EventBridge schedule for periodic drift checks. Distinct from Scheduled triggers (task creation).
Dynamic onboarding artifacts	When repo hygiene is weak, generate attachments for the agent context: codebase summaries, dependency graphs, suggested rules from layout (`REPO_ONBOARDING.md`).

Documentation and specifications

Capability	Description
Exposed project specifications	Publish and surface human- and machine-readable specs—for example OpenAPI or JSON Schema generated from the REST API, explicit extension-point and integration indexes, and stable links into architecture contracts—so operators and contributors can modify, extend, or fork the solution without reverse-engineering the codebase. Complements the design-doc links at the end of this page.

Cost governance

Capability	Description
Bedrock IAM session-tag attribution	Route Bedrock InvokeModel through assumed credentials that carry `{user_id, repo, task_id}` session tags. Per-session IAM scoping (#209) already tags the SessionRole for DynamoDB/S3; model calls still use the AgentCore/ECS compute role today. Extend `aws_session.py` (or equivalent) so inference is chargeable in Cost Explorer / CUR 2.0 by principal tag. Operator must activate IAM principal cost allocation tags (see COST_MODEL.md).
Bedrock per-request metadata	Pass `task_id`, `user_id`, and `repo` on each Bedrock call via request metadata / `X-Amzn-Bedrock-Request-Metadata` into model invocation logs. Complements IAM attribution; does not replace in-app `cost_usd`. Requires Claude Code / SDK support for metadata on InvokeModel.
Cost dashboard and export API	Log Insights widgets on invocation logs; optional API or export for monthly spend roll-ups by `user_id` / `repo` from the task table. Operator dashboard today covers task-level cost aggregates, not Bedrock chargeback dimensions.
Optional tagged application inference profiles	CDK-managed Bedrock application inference profiles per onboarded repo or environment; set `ANTHROPIC_MODEL` to tagged profile ARN for `resourceTags/*` billing when repo count is bounded.
Org and team budgets	Per-user and per-team monthly token or USD budgets with alerting (e.g. 80%) and optional hard stop at 100%. Per-task `max_budget_usd` and turn caps ship today.
Complexity-aware model router	Route each request to the most appropriate model based on task complexity (simple reads/edits to cheaper models, deeper reasoning to stronger models) while honoring budget and policy constraints. Related: Adaptive model router under Compute and performance.

Observability and safe deploy

Capability	Description
Deployed runtime E2E verification	Phase 0 landed: `@aws-cdk/integ-tests-alpha` + `integ-runner` deploy a trimmed Task API stack to a real account, assert the create-and-persist happy path (task persists at `SUBMITTED`), then tear it down (`mise //cdk:integ`). In CI it runs per-PR via `workflow_run` when the diff touches `cdk/` or `agent/`, behind the `integ` environment's admin-approval gate, and posts a required `integ-smoke` status that blocks merge (`workflow_dispatch` retained for manual runs). Phase 1 (full lifecycle / real agent runs) and Phase 2 (channels) follow. See ADR-013.
Admission backlog observability	Metric and alarm when `SUBMITTED` task depth exceeds an operator threshold (capacity and admission health).
Admission queue with deferred pickup	When admission is at capacity, persist tasks in a durable queue instead of failing them. Automatically re-attempt admission and continue processing in FIFO order (with optional priority lanes) as concurrency becomes available. Preserve cancel/idempotency semantics and expose queue position/ETA in task status.
Safe orchestrator deploys	Pre-deploy checks for active tasks (drain or warn); blue-green or canary Lambda deploy for the durable orchestrator with rollback on error regressions (`OBSERVABILITY.md`).
Unified cross-plane trace correlation	Single trace root per task across orchestrator, MicroVM OpenTelemetry, `TaskEvents`, and S3 trace artifacts. Gap-fill beyond existing AgentCore session baggage (`OBSERVABILITY.md`).
Immutable audit export	Append-only export of `TaskEvents` and policy decisions to S3 (e.g. Object Lock). Complements Centralized policy framework `PolicyDecisionEvent` schema for compliance and tamper-evident investigation.
Security operations metrics (dwell time and coverage)	CloudWatch metrics and dashboard panels: time from anomaly (circuit breaker, guardrail spike, policy deny burst) to operator awareness; fraction of security/ops alarms investigated. Targets shortened exploit windows.
Automated alert first-pass triage	On selected security/ops alarms, a Lambda produces a structured disposition from logs, traces, and `TaskEvents` before human review. Distinct from LLM-assisted trace analysis (post-mortem on failed tasks).

Scale and collaboration

Capability	Description
Multi-user and teams	Team visibility, shared approval queues, team concurrency/cost budgets, memory isolation.
Agent swarm	Planner-worker architecture for complex multi-file tasks. DAG of subtasks, merge orchestrator, one consolidated PR. Workers receive a strict subset of planner credentials; orchestrator-issued subtask intent; per-worker OpenTelemetry spans under a shared trace root (prevents confused-deputy / unscoped privilege inheritance).
Multi-user nudge	Extend `bgagent nudge` to support multiple users injecting context into the same running task. Per-nudge commit attribution. (Single-user nudge shipped.)
Scheduled triggers	Cron-based task creation via EventBridge (dependency updates, nightly flaky test checks).

Platform maturity

Capability	Description
Unified liveness decision model (follow-up design ticket)	Normalize task health evaluation across compute backends so heartbeat, compute session status, and DynamoDB state are handled through a single typed decision path. Define explicit backend capabilities (for example, heartbeat support), deterministic precedence rules for terminal outcomes, and regression tests that prevent cross-runtime false failures like ECS heartbeat mismatch.
Pure decision function orchestrator refactor	Extract orchestrator decision logic into pure functions that take a frozen snapshot and return a typed action. Side-effectful execution applies actions with CAS (compare-and-swap) guards on DynamoDB `updated_at` to prevent stale writes. Makes the orchestrator exhaustively unit-testable without mocking I/O, eliminates competing-worker race conditions, and is a prerequisite for the autonomous feedback loop.
Blueprint custom steps and step sequences	Lambda-backed `pre-agent` / `post-agent` steps and optional `step_sequence` overrides with CDK synth + runtime validation and `INVALID_STEP_SEQUENCE` on misconfiguration (`REPO_ONBOARDING.md`, `ORCHESTRATOR.md`).
Blueprint RepoConfig parity	Extend the Blueprint construct to persist per-repo default `max_budget_usd` and `memory_token_budget` in DynamoDB (orchestrator already merges `max_budget_usd` when present; hydration uses a fixed memory token cap today).
Orchestrator DLQ	Dead-letter path for task orchestration after retry exhaustion so operators can inspect and replay failed durable executions (`ORCHESTRATOR.md`).
Stuck-task reconciliation (operator notify/resume)	The scheduled stranded-task reconciler shipped (detects and fails stuck tasks). Further: operator notification before forced failure, manual resume option (`ORCHESTRATOR.md`).
EventBridge / SNS integration	Publish task lifecycle events to EventBridge or SNS for external consumers beyond the built-in DDB-Stream fanout (which already powers GitHub edit-in-place, Slack, and email dispatchers).
CDK constructs library	Publish reusable constructs to Construct Hub with semver versioning.
Centralized policy framework	Unified Cedar-based framework with `PolicyDecisionEvent` audit schema. Three enforcement modes with observe-before-enforce rollout.
Zero Trust control review ("impossible vs tedious")	Document a standing design test in `SECURITY.md`: prefer controls that remove capability over friction-only mitigations (rate limits, observe-only DNS). Use when prioritizing DNS enforcement, credential scoping, and containment vs. throttling.
Formal verification	TLA+ specification of task state machine, concurrency, cancellation races, reconciler interleavings.

Agent asset registry

Capability	Description
Central asset registry	A versioned, platform-managed registry from which agents resolve assets at runtime instead of requiring them to be vendored in source. Assets include skills, plugins, MCP server definitions, capabilities (Change Manifest verification strategies, knowledge tool configurations), custom prompt fragments, and Cedar policy modules. The registry is the single source of truth the Change Manifest's L1 (`tool_needed` validation) and L5 (knowledge resolution) evaluate against — replacing the implicit "known tool registry" currently assumed in the design. Backed by DynamoDB (metadata + version index) with S3 (artifact storage).
Asset versioning and immutability	Every asset version is immutable once published. Blueprints pin asset versions explicitly (no floating `latest` in production). Version resolution follows semver constraints. Rollback is a re-pin to a prior version, not mutation. Enables reproducible agent executions and safe rollout of new tool versions without affecting running tasks.
Asset lifecycle management	Publish, deprecate, and retire flow for registry assets. Deprecation emits warnings in task telemetry when a pinned asset version is nearing end-of-life. Retirement blocks new task starts that reference the asset. Operator API for bulk migration (re-pin all blueprints from v1 to v2).
Capability descriptors	Structured metadata per registry asset declaring what the asset provides (tool surface, permissions required, resource limits, Cedar actions it introduces) and what it requires (runtime dependencies, network egress domains, minimum compute profile). The agent runtime uses descriptors to configure the execution environment dynamically — enabling MCP servers, injecting context, and adjusting sandbox permissions based on resolved capabilities rather than static blueprint lists.
Blueprint registry references	Extend the Blueprint construct so `knowledge_tools`, `mcp_servers`, tool profiles, and capability configurations reference registry asset identifiers and version constraints instead of inline definitions. At task start the orchestrator resolves pinned versions from the registry, fetches artifacts, and provisions the agent environment. Decouples asset authoring from infrastructure deployment.
Registry access control	Cedar policies govern who can publish, deprecate, or pin assets. Scoped by asset type and namespace (e.g., org-private vs. platform-provided). Read access (resolution at task start) is unrestricted within the deployment; write access requires operator or publisher role.

Design docs to keep in sync: ARCHITECTURE.md, ORCHESTRATOR.md, API_CONTRACT.md, ATTACHMENTS.md, CEDAR_HITL_GATES.md, INPUT_GATEWAY.md, INTERACTIVE_AGENTS.md, REPO_ONBOARDING.md, MEMORY.md, OBSERVABILITY.md, COMPUTE.md, SECURITY.md, EVALUATION.md.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Roadmap

What's ready

Core platform

Task types

Onboarding and customization

Security

Memory and learning

Context hydration

Webhooks

Cost and limits

Interactive task UX

Notification plane

Channels

Observability

Agent harness

Docs and DX

What's next

Credentials and authorization

Agent quality

Memory security

Security (execution guardrails)

Interactive task UX

Channels and integrations

Compute and performance

Onboarding and repo lifecycle

Documentation and specifications

Cost governance

Observability and safe deploy

Scale and collaboration

Platform maturity

Agent asset registry

FilesExpand file tree

ROADMAP.md

Latest commit

History

ROADMAP.md

File metadata and controls

Roadmap

What's ready

Core platform

Task types

Onboarding and customization

Security

Memory and learning

Context hydration

Webhooks

Cost and limits

Interactive task UX

Notification plane

Channels

Observability

Agent harness

Docs and DX

What's next

Credentials and authorization

Agent quality

Memory security

Security (execution guardrails)

Interactive task UX

Channels and integrations

Compute and performance

Onboarding and repo lifecycle

Documentation and specifications

Cost governance

Observability and safe deploy

Scale and collaboration

Platform maturity

Agent asset registry