feat: add FargateAgentStack as alternative compute backend by MichaelWalker-git · Pull Request #4 · aws-samples/sample-autonomous-cloud-coding-agents

MichaelWalker-git · 2026-04-02T04:44:50Z

Summary

Introduces an AWS Fargate + Step Functions compute backend as an alternative to the existing AgentCore runtime
New FargateAgentStack coexists alongside AgentStack, sharing DynamoDB tables, VPC, and secrets via cross-stack references
Step Functions Standard workflow replaces durable Lambda polling with native ECS RunTask .sync integration
Reuses same agent/ Python code and Dockerfile

Changes

Story 1: Expose shared resources from AgentStack as public readonly properties
Story 2: FargateAgentCluster construct (ECS Cluster + Fargate TaskDefinition: 4 vCPU, 16GB, 100GB ephemeral)
Story 3: TaskStepFunction construct (Load → Admit → Hydrate → Run Fargate → Finalize, 9h timeout)
Story 4: 6 thin Lambda handlers for SFN steps, all importing from shared/orchestrator
Story 5: FargateAgentStack composing all constructs with cdk-nag compliance
Story 6: Wired into main.ts producing two CloudFormation templates
Story 7: VPC endpoints for ECS (ecs, ecs-agent, ecs-telemetry) and Step Functions (states)

Test plan

546 unit tests pass across 44 test suites
Both templates synthesize: backgroundagent-dev.template.json and backgroundagent-fargate-dev.template.json
cdk-nag AwsSolutionsChecks pass with documented suppressions
Deploy both stacks (npx projen deploy --all)
Start SFN execution with existing task_id and verify full lifecycle
Verify error handling (invalid repo → HandleError → FAILED)

Copilot

Pull request overview

This PR introduces a new AWS Fargate + Step Functions compute backend (FargateAgentStack) that runs the existing agent/ container on ECS and orchestrates task lifecycle via a Standard Step Functions workflow, coexisting alongside the current AgentStack and sharing core resources (VPC, DDB tables, Secrets).

Changes:

Adds FargateAgentCluster (ECS cluster + Fargate task definition) and TaskStepFunction (SFN workflow using ECS RunTask .sync).
Adds 6 thin Step Functions Lambda step handlers that delegate to src/handlers/shared/orchestrator.
Exposes shared resources from AgentStack, adds required VPC interface endpoints, and wires both stacks in src/main.ts.

Reviewed changes

Copilot reviewed 31 out of 31 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
test/stacks/fargate-agent.test.ts	Synth tests for new `FargateAgentStack`
test/handlers/sfn-steps/transition-to-running.test.ts	Unit test for step handler env override output
test/handlers/sfn-steps/load-task.test.ts	Unit test for loading task + blueprint
test/handlers/sfn-steps/hydrate-context.test.ts	Unit test for hydration step wrapper
test/handlers/sfn-steps/handle-error.test.ts	Unit tests for SFN error handling wrapper
test/handlers/sfn-steps/finalize-task.test.ts	Unit test for finalize step wrapper
test/handlers/sfn-steps/admission-control.test.ts	Unit tests for admission control wrapper
test/constructs/task-step-function.test.ts	SFN construct synthesis/assertion tests
test/constructs/fargate-agent-cluster.test.ts	ECS/Fargate construct synthesis/assertion tests
test/constructs/agent-vpc.test.ts	Updates VPC endpoint count + new endpoint assertions
src/stacks/fargate-agent.ts	New stack composing ECS, SFN, and step Lambdas
src/stacks/agent.ts	Exposes shared resources as public readonly properties; disables X-Ray traces
src/main.ts	Instantiates both AgentStack and FargateAgentStack
src/handlers/sfn-steps/transition-to-running.ts	Transitions task to RUNNING and builds container env overrides
src/handlers/sfn-steps/load-task.ts	Step handler: load task + blueprint config
src/handlers/sfn-steps/hydrate-context.ts	Step handler: hydrate context and transition to HYDRATING
src/handlers/sfn-steps/handle-error.ts	Step handler: fail task + (attempt) concurrency release
src/handlers/sfn-steps/finalize-task.ts	Step handler: finalize task after ECS completion
src/handlers/sfn-steps/admission-control.ts	Step handler: user concurrency admission check
src/constructs/task-step-function.ts	New Step Functions workflow construct (Lambda steps + ECS RunTask)
src/constructs/fargate-agent-cluster.ts	New ECS cluster + Fargate task definition (agent container)
src/constructs/agent-vpc.ts	Adds ECS + Step Functions interface VPC endpoints
docs/src/content/docs/roadmap/Roadmap.md	Roadmap updates (engineering practice + iteration items)
docs/src/content/docs/design/Fargate-stack.md	New docs-site design doc for Fargate stack
docs/design/FARGATE_STACK.md	New repo-level design doc for Fargate stack
docs/.projen/files.json	Registers new docs-site file
docs/.gitignore	Ensures docs-site file isn’t ignored
docs/.gitattributes	Marks docs-site file as generated
.projenrc.ts	CI/projen changes (mise env vars; agent:check no longer enforced)
.projen/tasks.json	Generated task change to make agent checks non-fatal
.github/workflows/build.yml	Build workflow updates for mise install/token env

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Add a new CDK stack that runs autonomous coding agents on AWS Fargate, orchestrated by Step Functions, as an alternative to AgentCore Runtime. Key changes: - Expose shared resources from AgentStack as public properties - Add FargateAgentCluster construct (ECS cluster, task def, ARM64 container) - Add TaskStepFunction construct (Step Functions state machine with Load → Admit → Hydrate → Transition → RunFargate → Finalize flow) - Add 6 thin Lambda handlers for Step Functions steps - Add VPC endpoints for ECS and Step Functions - Wire FargateAgentStack into main.ts alongside existing AgentStack - Self-install trivy in security:image task to avoid GitHub API rate limits

krokoko · 2026-04-06T16:20:38Z

Review:

Thanks for the thorough design doc and implementation work, Michael. The PR is well-structured, tests are comprehensive, and the individual constructs are solid. However, after reviewing the changes against the project's documented architecture, I believe this PR takes an approach that diverges from the intended design in a fundamental way, and I think we should discuss this before proceeding.

Core concern: this bypasses the `ComputeStrategy` interface

The project's architecture explicitly defines a strategy pattern for
compute backends. This is not a vague aspiration — it is documented as a
design principle and cross-referenced across multiple design docs:

ARCHITECTURE.md (Design Principles): "Critical components should
be accessed through internal interfaces (e.g., ComputeStrategy,
MemoryStore) so that implementations can be swapped without rewriting the
codebase."
ARCHITECTURE.md (Extensibility Model): "The internal
extensibility model — interface-driven components (ComputeStrategy,
blueprint customization layers, swappable providers) — serves platform
operators who want to customize behavior without forking."
REPO_ONBOARDING.md (lines 341-373): The ComputeStrategy
interface is fully specified with startSession(), pollSession(),
stopSession() methods, SessionHandle, and SessionStatus types.
COMPUTE.md: "The ComputeStrategy interface should be designed
in Iteration 3a (as an interface contract) even if only the AgentCore
implementation exists initially. This ensures the orchestrator is not
tightly coupled to AgentCore-specific assumptions."
ROADMAP.md (Iteration 5): "The orchestrator calls abstract
methods (startSession, stopSession, pollSession); the implementation
maps to AgentCore, Fargate, or EKS. Repos select the strategy via
compute_type in their blueprint configuration."

Instead of implementing Fargate behind this interface, this PR
creates:

A separate CDK stack (FargateAgentStack) with its own
orchestration
A separate orchestration engine (Step Functions) that replaces the
existing durable Lambda orchestrator
6 new Lambda handlers that duplicate the orchestration pipeline
steps
A parallel control plane that runs alongside the existing one

The result is two independent orchestration paths that share DynamoDB
tables but have no common abstraction. This is precisely the outcome the
ComputeStrategy design was meant to prevent.

What the design calls for vs. what this PR does

Aspect	Intended design (`ComputeStrategy`)	This PR
Orchestrator	Single orchestrator dispatches to compute backends via
interface	Two separate orchestrators (durable Lambda + Step Functions)
Adding a backend	Implement `ComputeStrategy` interface (~3 methods)
Create an entire parallel stack + orchestration engine
Switching a repo's backend	Change `compute_type` in blueprint config

Deploy a second CloudFormation stack |

Specific issues

The SFN step handlers re-implement the orchestration pipeline.
While they call into shared/orchestrator.ts functions (good), they
rebuild the pipeline flow in Step Functions ASL. The step ordering, error
handling, retry logic, and state transitions are now defined in two
places: the durable Lambda handler (orchestrate-task.ts) and the Step
Functions definition (task-step-function.ts). Any future changes to the
orchestration flow (e.g., adding pre-flight checks from the Iter 3c
roadmap) must be made in both places.
The FargateAgentStack as a separate stack creates tight cross-stack
coupling. Cross-stack references via CloudFormation exports are brittle
— you cannot update or delete exported values while another stack imports
them. The design docs even note this risk. Keeping compute backends
within the same stack (or using SSM parameters) would be more
maintainable.
VPC endpoints for ECS/StepFunctions are added unconditionally to
AgentVpc. These 4 endpoints add cost (~$30/month total) for all
deployments, even those not using Fargate. These should be conditional or
part of the Fargate-specific infrastructure.
The X-Ray tracing disable in agent.ts is an unrelated change. The
comment says it needs account-level pre-configuration — this should be a
separate PR.
Story 8 (Blueprint routing via create-task-core.ts) is described in
the design doc but not implemented. Without it, there is no way for the
shared API to route tasks to the Fargate backend based on compute_type.
The Step Functions state machine can only be started manually.

Suggested path forward

I'd recommend restructuring this as follows:

First PR: Implement the ComputeStrategy interface and refactor the
orchestrator.
- Create the TypeScript interface (ComputeStrategy, SessionHandle,
  SessionStatus) as specified in REPO_ONBOARDING.md.
- Extract the current AgentCore-specific code from orchestrator.ts
  into an AgentCoreComputeStrategy class.
- Refactor orchestrate-task.ts to use the strategy pattern. The
  existing pipeline steps (load → admit → hydrate → start → poll → finalize)
  stay in the orchestrator; only startSession/pollSession/stopSession
  are delegated to the strategy.
- Add a strategy factory/resolver that maps compute_type → strategy
  implementation.
- This is a pure refactor — behavior is unchanged, same stack, same
  durable Lambda.
Second PR: Add EcsFargateComputeStrategy implementation.
- Implement startSession() using ECS RunTask.
- Implement pollSession() using DescribeTasks (or rely on the
  agent's DDB writes, as the current AgentCore path does).
- Implement stopSession() using StopTask.
- Add the ECS cluster, Fargate task definition, and container to the
  existing AgentStack (conditionally, based on whether any blueprint uses
  compute_type: 'ecs').
- The orchestrator's existing durable Lambda pipeline handles the
  lifecycle — no Step Functions needed for this use case.
Optional third PR: Step Functions orchestrator as an alternative to
durable Lambda.
- If there's a separate motivation to replace the durable Lambda with
  Step Functions (better visibility, declarative retries, etc.), that's a
  valid improvement — but it should apply to all compute backends, not
  just Fargate. It would replace the orchestration engine, not the compute
  backend.

This approach keeps the codebase aligned with the documented architecture,
avoids maintaining two parallel orchestration pipelines, and makes adding
future backends (EKS, custom EC2) straightforward.

The Fargate infrastructure work in this PR (ECS cluster, task definition,
container setup) is well done and can be reused largely as-is in the
restructured approach. The main ask is to wire it through the
ComputeStrategy abstraction rather than building a parallel stack.

…al union (krokoko review aws-samples#2, aws-samples#4, aws-samples#10) Three related type-safety fixes from the code review on PR aws-samples#52. Grouped because they all touch the same contract surface (``TaskDetail`` on both sides of the wire) and share the theme "honest types at the DDB + API boundary". ## Findings addressed **aws-samples#2 — turns_attempted / turns_completed bypass coerceNumericOrNull** ``toTaskDetail`` at ``cdk/src/handlers/shared/types.ts`` was passing ``record.turns_attempted`` and ``record.turns_completed`` through with only ``?? null`` — same bug class as the ``costUsd.toFixed is not a function`` crash fixed at the fanout boundary in commit ``9fe704e`` / consolidated in ``c09bfd7``. The DDB Document-client deserializes ``Number`` attributes as ``string`` on some code paths; any downstream caller doing arithmetic on these fields would crash at runtime. Fix extends beyond the two fields the review called out — ALL numeric fields sourced from the DDB record now route through ``coerceNumericOrNull``: ``duration_s``, ``cost_usd``, ``max_turns``, ``max_budget_usd``, ``turns_attempted``, ``turns_completed``. A new JSDoc block on ``toTaskDetail`` documents the contract ("all numeric fields coerced through shared helper; do not bypass") so a future field addition has a clear pattern to follow. Scope-bounded: the non-numeric fields (``task_description``, ``pr_url``, …) and request-body-validated ints (``issue_number``, ``pr_number``) stay untouched. **aws-samples#4 — CLI/CDK type drift on turns_attempted / turns_completed** Per AGENTS.md the CDK and CLI ``TaskDetail`` declarations must stay in sync. CDK declared both fields as required (``number | null``); CLI marked them optional (``number | null | undefined``). The tightening means the server's guarantee (``toTaskDetail`` always emits both fields, defaulting to ``null`` when absent on the record) now flows honestly to the CLI type. **aws-samples#10 — channel_source typed as string instead of literal union** Added an exported ``ChannelSource = 'api' | 'webhook'`` literal union in both CDK and CLI ``types.ts``. ``TaskRecord.channel_source`` and ``TaskDetail.channel_source`` on both sides now use the narrow type. Downstream switches/predicates that read ``channel_source`` get exhaustiveness checking at compile time, matching the internal ``CreateTaskContext.channelSource`` literal already in use at ``create-task-core.ts``. Reviewer's comment: "the internal CreateTaskContext correctly narrows it but the external types don't" — now they do. ## Tests +3 regression tests in ``cdk/test/handlers/shared/error-classifier.test.ts`` under the ``toTaskDetail integration`` block: - String-typed DDB numeric fields coerce to finite numbers on the ``TaskDetail`` output (``typeof === 'number'`` + exact value for all six coerced fields). - Unparseable numeric strings collapse to ``null`` without crash (defence test for the non-finite branch in ``coerceNumericOrNull``). - ``channel_source`` narrows to the literal union — uses ``@ts-expect-error`` to pin that a widened ``'slack'`` fails to compile, so a future ``ChannelSource`` regression surfaces in CI immediately. CLI test fixtures updated to include ``channel_source: 'api'`` and ``turns_attempted: null`` / ``turns_completed: null`` where the non-optional fields were previously omitted. No CLI test count change — the fixture additions satisfy the stricter contract without requiring new test bodies. CDK suite: 1032 passing (was 1029). CLI suite: 190 passing. Refs: krokoko code review on PR aws-samples#52 (findings 2, 4, 10)

Lands the stateless CDK primitives for Cedar-HITL approval gates so Chunk 5's REST handlers can be wired onto concrete tables. Completes §15.2 tasks 9, 20, and 25. Constructs ---------- ``TaskApprovalsTable`` (§10.1) - PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side primitives landed in the prior commit. - GSI ``user_id-status-index`` with user_id PK + status SK and an ``INCLUDE`` projection limited to the fields GET /v1/pending renders. Three deny-sensitive attrs (``deny_reason``, ``scope``, ``tool_input_sha256``) deliberately omitted from the projection — the list endpoint only returns PENDING rows in practice, but excluding them kills the projection-leak concern outright and costs no bytes today. - Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors it on ``construct.userStatusIndexName`` so handlers referencing the GSI fail compile-time on a rename. - TTL attribute ``ttl`` (agent writes ``created_at + timeout_s + 120s``). - No DynamoDB streams per §11.2. TaskEventsTable carries the audit fan-out; streams here would duplicate. - Default RemovalPolicy.DESTROY to match the rest of the sample. Production deploys override to RETAIN per §10.1. ``SlackUserMappingTable`` (§11.2, finding aws-samples#4) - Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no stream. The forward-only shape is the trust boundary — a reverse GSI (Cognito → Slack) would let a compromised Cognito sub enumerate Slack identities without adding v1 capability. - Writes land through LinkSlackUserFn (Chunk 5) which enforces the ``attribute_not_exists(slack_user_id)`` condition so a prior legitimate mapping cannot be overwritten by a later compromise. ``task-status.ts`` — AWAITING_APPROVAL (§10.3) - Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES: the task is alive, paused on a human decision). - VALID_TRANSITIONS wires the five edges §10.3 enumerates: RUNNING → AWAITING_APPROVAL (soft-deny entry) HYDRATING → AWAITING_APPROVAL (rare early-gate case) AWAITING_APPROVAL → RUNNING (approve / deny resume) AWAITING_APPROVAL → CANCELLED (user cancel mid-approval) AWAITING_APPROVAL → FAILED (stranded-approval reconciler) - Notably NOT added: AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race) AWAITING_APPROVAL → COMPLETED (skip RUNNING) AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval row, not the task clock) These are regression tests so a future refactor cannot quietly add them and bypass the `awaiting_approval_request_id = :rid` invariant. Tests: +29 total. - TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST, PITR default + override, TTL attribute, NO streams, GSI schema + projection + sensitive-attr exclusion, removal policy default + override, ``USER_STATUS_INDEX_NAME`` constant parity with the construct field. - SlackUserMappingTable (8 tests): single-key schema (explicit KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams, no reverse GSI, DESTROY default, TTL absent. - TaskStatus (+10 tests over existing: 5 new assertions on the 9-state cardinality, AWAITING_APPROVAL membership, and the transition graph including the three forbidden edges). The existing assertions updated for the new state count. No stack wiring yet — ``agent.ts`` instantiation + env var plumbing + grants land in the next commit alongside the Cedar-WASM Lambda layer.

Lands the three read/discovery handlers Chunk 6 (CLI) needs to power ``bgagent pending``, ``bgagent policies list/show``, and ``bgagent notifications configure slack``. Completes §15.2 tasks 14, 15, and 25 (handler side). Handlers -------- ``get-pending.ts`` (§7.7 — GET /v1/pending) - Queries ``user_id-status-index`` GSI on TaskApprovalsTable with ``user_id = :caller AND status = :pending``. Without the GSI this would be a full-table Scan per call — under ``watch -n1 bgagent pending`` that exhausts burst capacity for the whole fleet (§10.1 finding aws-samples#8). - Response maps each row to ``PendingApprovalSummary`` with a derived ``expires_at = created_at + timeout_s`` so the CLI can render time-to-timeout without doing arithmetic on ISO strings. - Severity coerced to ``medium`` on unknown values so GSI writes that drift from the enum don't break the list response. - Rate-limited 10/min/user (synthetic row on the same table, namespaced ``RATE#<user>#PENDING`` so it does not collide with the approve/deny counter). ``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies) - Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES`` with the repo's ``cedar_policies`` blueprint override. Runs the combined text through ``parseRules`` and returns ``{hard[], soft[]}`` rule summaries. - 5-minute per-repo in-Lambda cache; cold starts throw it away. ``_resetCacheForTests`` exposed for unit-test isolation. - Repo ID is URL-decoded from the path (``owner%2Frepo`` common in CLI UX). - Rate-limited 30/min/user. - Blueprint load failure falls back to built-ins with a WARN log; invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE`` rather than a misleading empty list. ``link-slack-user.ts`` (§11.2 finding aws-samples#4 — POST /v1/notifications/slack/link) - Writes to SlackUserMappingTable with ``ConditionExpression: attribute_not_exists(slack_user_id)``. This guard is the entire admission control the §11.2 design hinges on: even a compromised Slack admin cannot overwrite an existing mapping. - Validates ``slack_user_id`` shape (letters, digits, underscores, 2–40 chars) so junk rows cannot land. - Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused error code (the payload message directs the user to unlink via support). - Slack link_token end-to-end validation against Slack OAuth is deferred — v1 accepts the token on trust from the Cognito-authed caller; it is persisted in CloudWatch for audit. Supporting primitives --------------------- ``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar`` and ``agent/policies/soft_deny.cedar`` as TypeScript string constants. Embedded rather than read from disk because Lambda's esbuild bundler does not copy non-TS assets by default and a dedicated bundling hook is more code than the embed. A drift test (``builtin-policies.test.ts``) asserts byte-equality with the agent files so any change on one side without the other flips red at build time. ``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated ``base_permit`` entry (both tiers need it as a Cedar catch-all; it is not a user-facing rule so it stays out of ParsedRule[]). This matches the agent-side ``_parse_policy_annotations`` behaviour. Tests: +37 total. - get-pending (8): 401 on missing auth, 429 on rate limit, empty result, GSI query shape, row → PendingApprovalSummary with derived expires_at, severity fallback, missing timeout → expires_at falls back to created_at, 500 on DDB error. - get-policies (11): 401/400 validation, built-in rules listed on empty repo, URL-decoded repo path, custom blueprint rule lands in soft, per-repo cache across calls, 429 rate limit, 503 on invalid blueprint cedar, fallback on load failure, hard rules omit severity / approval_timeout_s, soft rules carry them. - link-slack-user (8): 401/400 validation, shape check, 201 on success, 409 on overwrite attempt, 500 on unknown DDB error, whitespace trim on slack_user_id, ConditionExpression verified. - builtin-policies (4): drift byte-equality with both agent files, parseRules round-trip for hard/soft rule IDs. - cedar-policy (updated): ``base_permit`` is skipped from ParsedRule[] rather than rejected. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout) lands in the next commit.

…After (aws-samples#79 review aws-samples#4) PR aws-samples#79 review aws-samples#4 surfaced a sibling-channel-failure hazard: when GitHub or Email rate-limits, the record lands in ``batchItemFailures``. On the Lambda retry, every Slack-subscribed event for that record runs again. Terminal events were already guarded by ``slack_notified_terminal``; ``agent_error`` was not — operators would page twice on a single agent failure if a sibling channel happened to fail. Generalize the dedup mechanism. ``TERMINAL_EVENTS`` is replaced by a ``SLACK_DEDUP_ATTRIBUTE`` map that marks each event type with the ``channel_metadata`` attribute that should guard the post: - 5 terminals share ``slack_notified_terminal`` (any first-arriving terminal claims the right; subsequent terminals dedup against it) - ``agent_error`` gets its own ``slack_dispatched_agent_error`` so a duplicate agent_error doesn't reuse the terminal slot - ``task_created`` / ``session_started`` map to ``null`` because they already use the per-event ``slack_*_msg_ts`` conditional persists from review #1 — the conditional already provides full idempotency (a separate marker would be redundant) Also surfaces Slack's ``Retry-After`` header on rate-limited responses through a dedicated ``fanout.slack.retryable_api_error`` warn so operators reading CloudWatch can see the recovery window instead of guessing from sustained warn rate. Tests: - logs Retry-After header on rate-limited Slack responses (new): asserts ``retry_after_seconds`` propagates from Slack's response header into the warn metadata - existing terminal-codes parametrized test untouched (terminal branch doesn't read headers) - existing retryable test gains a ``headers: { get: () => null }`` stub on the fetch mock so the headers.get call doesn't crash Reviewer suggested a per-channel dispatch bitmap as the alternative. Rejected as premature: the duplicate-GitHub-PATCH is harmless (idempotent), Email is still a stub, and the dedup map covers the specific agent_error pain identified above. A bitmap would add a new table + IAM grants + per-dispatch DDB cost for a hypothetical problem (Slack rate-limiting AND a sibling channel failure).

…ples#79 test gap aws-samples#33) Pre-PR-aws-samples#79 review-fix aws-samples#4 there was no direct test for the ``slack_dispatched_agent_error`` dedup attribute or its interaction with the existing ``slack_notified_terminal`` slot. A future refactor that collapsed the two slots — or renamed one of them — would silently break the sibling-channel-failure-retry guarantee that fix aws-samples#4 added. Adds 4 tests: - ``agent_error claims its own dedup attribute``: pins the UpdateExpression and ConditionExpression strings so a refactor that renames the attribute breaks loudly. - ``agent_error retry hits the dedup guard``: end-to-end scenario matching review aws-samples#4 — task already has ``slack_dispatched_agent_error: true``, retry must short-circuit before chat.postMessage. Without the guard, a second rotating_light fires. - ``terminal dedup attribute is per-class``: a flaky task_completed-then-task_failed sequence dedups against the same ``slack_notified_terminal`` slot. Catches the regression where the orchestrator emits both terminal types and we'd otherwise post both ✅ and ❌. - ``agent_error and terminals use distinct dedup slots``: the important negative — having ``slack_dispatched_agent_error`` set must NOT shadow a subsequent ``task_completed``. Pins the slot separation so a future merge into a single slot can't silently drop terminals after an agent_error.

#79) * feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) Move the Slack outbound delivery off its own DynamoDB Streams consumer onto FanOutConsumer as a per-channel dispatcher. Drops TaskEventsTable from 2 concurrent stream readers to 1, restoring headroom for future channels (Email, Teams, etc.) without exceeding the documented DynamoDB Streams 2-reader-per-shard practical limit. The PR also addresses an adversarial code review on the original migration; the body below walks through each piece in the order it landed. ## (a) Migration - `cdk/src/handlers/slack-notify.ts` — rewritten as exported `dispatchSlackEvent(event, ddb)` plus a tagged `SlackApiError` class. The standalone `handler(event)` stream entrypoint is gone; the FanOutConsumer is now the only TaskEventsTable stream reader. Behaviour preserved bit-for-bit: channel_source==='slack' gate, terminal-event dedup via conditional UpdateItem on `channel_metadata.slack_notified_terminal`, threaded replies under the @mention or task_created message, emoji transitions (eyes -> hourglass -> ✅/❌/🚫/⏲), DM channel_id -> user_id rewrite, intermediate session+created message cleanup on terminal events. - `cdk/src/handlers/fanout-task-events.ts` — replaces the log-only `dispatchToSlack` stub with a wrapper that calls dispatchSlackEvent and routes errors via the new typed contract (see (b) below). Slack defaults gain task_created, session_started, task_timed_out so the router fans out the lifecycle events the old SlackNotifyFn handled; the dispatcher's channel_source gate keeps non-Slack tasks unaffected. - `cdk/src/constructs/fanout-consumer.ts` — adds a scoped `secretsmanager:GetSecretValue` grant on `bgagent/slack/*` so the fanout Lambda can fetch per-workspace bot tokens. Same scope the old SlackNotifyFn role held. - `cdk/src/constructs/slack-integration.ts` — deletes SlackNotifyFn, its DynamoEventSource, its IAM policy, and its NagSuppressions entry. Drops the now-unused StartingPosition / FilterCriteria / FilterRule / lambdaEventSources imports. After this lands, `aws lambda list-event-source-mappings` shows exactly one consumer of the TaskEventsTable stream (FanOutFn); verified on the dev stack with end-to-end @mention + cancel + CLI isolation scenarios. ## (b) Review fix #1 — partial-batch retry semantics (BLOCKER) The first review pass found that the post-migration handler silently dropped Slack-side infra errors (DDB throttle on the GetItem, Secrets Manager 5xx, transient Slack timeout). Pre-migration the SlackNotifyFn handler rethrew non-SlackApiError so Lambda retried the batch; post-migration `Promise.allSettled` swallowed the rejection and routeEvent returned an empty list with no escalation path to `batchItemFailures`. routeEvent's return type changed from `NotificationChannel[]` to `{ dispatched, infraRejections }`. The handler now pushes the record into `batchItemFailures` whenever `infraRejections.length>0`, so Lambda replays the record under the partial-batch contract. The warn line on rejection is tagged `retryable: true` so operators can alert distinctly from the channel-terminal swallow path. GitHub got the symmetric treatment: 4xx (excluding the existing 401 and 404 handling) is now treated as a channel-terminal swallow via `fanout.github.api_error` instead of escalating to retry. ## (c) Review fix #2 — split SlackApiError into terminal + retryable Originally any `!result.ok` Slack response was wrapped in SlackApiError and swallowed. That collapsed retryable codes (`ratelimited`, `service_unavailable`, `internal_error`, `fatal_error`, `request_timeout`) into the same swallow as `channel_not_found` — a tier-1 Slack outage would silently drop every message. Introduced `TERMINAL_SLACK_API_ERRORS` set + `classifySlackError` helper. Terminal codes still throw SlackApiError (router swallows). Retryable codes throw a plain Error so the router classifies them as infra rejections and Lambda replays. ## (d) Review fix #3 — NOTIFIABLE_EVENTS / CHANNEL_DEFAULTS drift The original migration added task_created/session_started/task_timed_out to CHANNEL_DEFAULTS.slack but the dispatcher's NOTIFIABLE_EVENTS gate already excluded several events the router was subscribing Slack to (agent_error, pr_created, task_stranded). Result: Slack was reported as `dispatched` for events it silently dropped — telemetry lied, agent_error never reached operators on Slack-origin tasks, and task_stranded rendered the generic "Event: task_stranded for owner/repo" fallback (UX regression). Added render cases for task_stranded and agent_error in slack-blocks.ts and added them to NOTIFIABLE_EVENTS. Forward-compat approval_required and status_response stay out of NOTIFIABLE_EVENTS until their emitters ship; a new cross-file consistency test in fanout-task-events.test.ts fails if anyone re-introduces the drift. The Slack dispatcher wrapper now passes `effectiveEventType` so an agent_milestone(pr_created) wrapper is unwrapped before NOTIFIABLE_EVENTS matching. Without the rewrite, the dispatcher would short-circuit on the wrapper string `agent_milestone`. ## (e) Review fix #4 — conditional UpdateItem on lifecycle persists Once the BLOCKER fix made batches retry, the original task_created and session_started UpdateItem calls became hazardous: a Slack POST that succeeded but whose follow-up UpdateItem failed transiently would, on retry, post a second root and overwrite slack_thread_ts — orphaning every threaded reply that had threaded under the first ts. Both UpdateItems now carry an `attribute_not_exists` ConditionExpression on the relevant `channel_metadata.slack_*_msg_ts`. On ConditionalCheckFailedException the handler logs at info, deletes the duplicate Slack message via `chat.delete`, and returns. Sibling retry wins the race; the duplicate is cleaned up. ## (f) Dev-stack regression: drop pr_created from Slack defaults Live verification surfaced a UX duplication: pr_created (subscribed in CHANNEL_DEFAULTS.slack as the original §6.2 design called for) and task_completed both rendered messages with View PR buttons, posted seconds apart. The original SlackNotifyFn had silently dropped pr_created (NOTIFIABLE_EVENTS gate), so users hadn't relied on it. Removed pr_created from CHANNEL_DEFAULTS.slack and from NOTIFIABLE_EVENTS, and removed the prCreatedMessage renderer. GitHub keeps pr_created (its edit-in-place comment surface genuinely benefits from the early checkpoint). ## Verification - mise //cdk:compile — clean - mise //cdk:test — 1183 / 1183 pass (8 net-new tests added for the review fixes: NOTIFIABLE_EVENTS drift guard, retryable Slack codes, GitHub 4xx swallow, infra rejection escalation, SlackApiError swallow, task_stranded render) - mise //cdk:eslint — clean - mise //cdk:synth — confirms exactly one Lambda::EventSourceMapping on TaskEventsTable, pointing at FanOutFn - Dev-stack scenarios — @mention happy path, Cancel button, CLI submit (channel_source=api -> zero Slack dispatches, GitHub edit-in-place still fires) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(fanout): retry GitHub 403/429 instead of swallowing as terminal (#79 review #1) PR #79 review found that the new 4xx terminal-swallow path treats HTTP 403 and 429 as channel-terminal — but on GitHub these are transient rate-limit responses (403 with "API rate limit exceeded", 429 "Too Many Requests"). Under a reconciliation wave that touches many tasks, an entire window of GitHub comment updates would be permanently dropped with only a warn log. Carve out 403 and 429 from the swallow guard so they propagate as infra rejections through ``Promise.allSettled``. The record lands in ``batchItemFailures`` and Lambda replays until the rate-limit window clears (or DLQs after ``retryAttempts``). Test coverage: parametrized over 403 + 429 with a GitHubCommentError mock at the helper boundary, asserting the record's eventID surfaces in ``batchItemFailures`` rather than being absorbed. * fix(fanout): guard Slack Secrets Manager grant on a prop (#79 review #2) Every other external-service grant in FanOutConsumer (taskTable, repoTable, githubTokenSecret) is gated by ``if (props.X)``, so a deployment that hasn't onboarded the corresponding service stays free of dangling IAM permissions. The original migration broke the pattern with an unconditional ``bgagent/slack/*`` Secrets Manager grant — dev stacks without Slack onboarding ended up holding read permission on a resource pattern they never use, with a misleading ``cdk-nag AwsSolutions-IAM5`` suppression reason. Adds an optional ``slackSecretArnPattern`` prop on ``FanOutConsumerProps``; the policy statement is only attached when the prop is set. ``cdk/src/stacks/agent.ts`` now computes the ``bgagent/slack/*`` ARN inline and passes it through, mirroring the other guarded props. ``ArnFormat`` and ``Stack`` imports moved out of fanout-consumer.ts since the construct no longer needs them. No changes to live behaviour — agent.ts always passes the prop, so the IAM policy still attaches in production. The dispatcher will log-and-fail-retry on a missing pattern (covered by review #3 fix). Test gap covering the construct itself ships in a follow-up commit (test gap #34). * fix(fanout): throw on missing TASK_TABLE_NAME env var (#79 review #3) Pre-fix: when ``TASK_TABLE_NAME`` was unset on a Slack-subscribed event, ``dispatchSlackEvent`` returned silently after a warn line. The router counted Slack as ``dispatched`` and a broken stack quietly dropped every Slack notification — operators only saw it in the warn-rate metric, with no rejected-channel signal. Post-fix: throw a plain Error so the rejection propagates as an infra rejection through ``Promise.allSettled``. The router pushes the record into ``batchItemFailures``, Lambda retries the batch, the ``fanout.dispatcher.rejected`` warn fires per record, and operators get a distinct alarm. Also bumps the existing log line from ``warn`` to ``error`` and attaches an ``error_id: FANOUT_SLACK_MISSING_TASK_TABLE`` so the deployment-bug case can be distinguished from per-record failures. Test: ``throws when TASK_TABLE_NAME env var is missing`` deletes the env var, asserts the throw, asserts no DDB call was attempted (env-var guard fires first). * fix(fanout): match SlackApiError by name as well as instanceof (#79 review #7) When a bundler ever duplicates the slack-notify module (rare with NodejsFunction tree-shaking but possible if dual-bundled), two distinct SlackApiError classes coexist and ``instanceof`` against one fails for instances of the other. The dispatcher would see a foreign-class SlackApiError, fall through to the rethrow branch, and the router would treat it as an infra rejection — flipping a channel-terminal swallow into infinite Lambda retries. Add an ``err.name === 'SlackApiError'`` fallback so the swallow branch fires either way. Mirrors the duck-typed ``GitHubCommentError`` check used elsewhere in the same handler. Test: synthesise a plain Error with name === 'SlackApiError' (NOT an instance of the mock's SlackApiError class) and assert batchItemFailures stays empty — proving the swallow path catches both shapes. * fix(fanout): extend TERMINAL_SLACK_API_ERRORS with permission codes (#79 review #8) Original set omitted documented Slack permission/scope failures. Codes outside the set fall to the retryable branch, so a misconfiguration like ``ekm_access_denied`` or ``missing_scope`` would burn 3 Lambda retries before DLQ on every event — even though the failure is fundamentally a configuration bug that no retry can clear. Adds: - Permission/scope: missing_scope, ekm_access_denied, team_access_not_granted, posting_to_general_channel_denied - Payload shape: invalid_arguments Reorganized the set into commented blocks (channel-shape, auth, permission/scope, payload-shape) so future additions go in the right bucket and the rationale stays visible. Test coverage: parametrized over the full TERMINAL_SLACK_API_ERRORS set (21 codes) — every one must throw SlackApiError so the router swallows it. The existing retryable test.each remains intact and covers the negative-class case (codes outside the set throw a plain Error and escalate to retry). * fix(fanout): promote Slack reaction/delete network errors to error logs (#79 review #5) The reaction / delete helpers (``addReaction``, ``removeReaction``, ``deleteMessage``) used to log every catch at warn with a single generic event key, lumping API-level rejections (e.g. ``no_reaction``) together with infrastructure failures (DNS lookup, TLS handshake, fetch timeout, JSON parse error from a hostile gateway). Operators who alarmed on the warn rate saw a flat signal that masked genuine infra problems. Split the boundary: - API-level (``!result.ok`` after a successful HTTP call) stays at warn with channel-specific event keys (``fanout.slack.reaction_add_api_error``, ``fanout.slack.reaction_remove_api_error``, ``fanout.slack.message_delete_api_error``). These are per-message UX problems; operators don't page. - Network errors (the outer ``catch (err)`` after ``fetch``) promote to ``logger.error`` with dedicated event keys (``fanout.slack.reaction_add_network_error``, ``fanout.slack.reaction_remove_network_error``, ``fanout.slack.message_delete_network_error``) and ``error_id``s (``FANOUT_SLACK_REACTION_NETWORK``, ``FANOUT_SLACK_DELETE_NETWORK``) so each has its own alarmable signal. User-visible symptoms when these fire silently: stale emoji reactions (hourglass never swaps to ✅) and orphaned intermediate messages. Behaviour unchanged: errors are still swallowed (per-message reactions and intermediate cleanup are best-effort by design; they must not fail the batch), but operators now get distinct metrics for each failure class. * fix(fanout): emit fanout.slack.dup_delete_failed on ghost-message accumulation (#79 review #6) The conditional UpdateItem dup-delete path (``task_created`` / ``session_started`` lifecycle persists) calls ``deleteMessage`` to clean up the duplicate Slack message that landed when a sibling retry won the race. The delete is inherently best-effort — but if it fails, the duplicate becomes a permanent ghost in the thread and operators had no way to alarm on the rate. Refactor ``deleteMessage`` to return a boolean (``true`` on success or ``message_not_found``-as-already-gone, ``false`` otherwise) and emit a dedicated ``fanout.slack.dup_delete_failed`` event with an ``error_id: FANOUT_SLACK_DUP_DELETE_FAILED`` from the dup-delete callsites when the cleanup couldn't complete. The terminal-event cleanup paths (``slack_session_msg_ts``, ``slack_created_msg_ts``) intentionally don't fire this event — those paths target genuinely-stale UX cleanup, not retry-driven duplicates, so an alarm there would be noise. No new tests beyond the existing dup-delete coverage; the ``deleteMessage`` return value isn't yet asserted at the unit level, but the behavior is fully exercised by the existing ``dup-delete`` integration paths (test gap #31 will add an explicit failure-path assertion when it lands). * chore(fanout): tighten RouteOutcome arrays to ReadonlyArray (#79 review #9) ``RouteOutcome.dispatched`` and ``infraRejections`` were typed as plain ``NotificationChannel[]`` — which made ``readonly`` on the property prevent reassignment but still allow callers to mutate the underlying array via ``.push``, ``.splice``, or ``.sort``. Inconsistent with the ``ReadonlySet<string>`` used for ``CHANNEL_DEFAULTS`` in the same file. Tightening to ``ReadonlyArray<NotificationChannel>`` makes the contract honest: the router owns the arrays, callers read them. Test suite updated to use ``[...outcome.dispatched].sort()`` where it previously called ``.sort()`` directly — the explicit copy makes the intent clear and would have surfaced any silent test-side mutation. * refactor(fanout): make SlackDispatchEvent a type alias of FanOutEvent (#79 review #10) The two interfaces were structurally identical: same five fields, same readonly modifiers, same metadata shape. The decoupling was purely nominal and a silent-drift footgun — adding a field to ``FanOutEvent`` (e.g. when the router starts plumbing an ``approval_required`` ID through) would not flow into ``SlackDispatchEvent``, leaving the dispatcher unaware until a downstream test happened to fail. Replace with a one-line type alias: export type SlackDispatchEvent = FanOutEvent; The slack-notify module now type-imports ``FanOutEvent`` from fanout-task-events. ``import type`` is erased at compile time, so the runtime bundle still has the one-way dep (fanout-task-events → slack-notify) — no module-cycle hazard. Reviewer-suggested ``Pick<FanOutEvent, 'task_id' | …>`` was considered and rejected: the dispatcher uses every field of ``FanOutEvent``, so the Pick would just enumerate the same five fields with extra noise. A direct alias keeps the intent obvious and prevents drift identically. * fix(fanout): generalize Slack dedup to cover agent_error + log Retry-After (#79 review #4) PR #79 review #4 surfaced a sibling-channel-failure hazard: when GitHub or Email rate-limits, the record lands in ``batchItemFailures``. On the Lambda retry, every Slack-subscribed event for that record runs again. Terminal events were already guarded by ``slack_notified_terminal``; ``agent_error`` was not — operators would page twice on a single agent failure if a sibling channel happened to fail. Generalize the dedup mechanism. ``TERMINAL_EVENTS`` is replaced by a ``SLACK_DEDUP_ATTRIBUTE`` map that marks each event type with the ``channel_metadata`` attribute that should guard the post: - 5 terminals share ``slack_notified_terminal`` (any first-arriving terminal claims the right; subsequent terminals dedup against it) - ``agent_error`` gets its own ``slack_dispatched_agent_error`` so a duplicate agent_error doesn't reuse the terminal slot - ``task_created`` / ``session_started`` map to ``null`` because they already use the per-event ``slack_*_msg_ts`` conditional persists from review #1 — the conditional already provides full idempotency (a separate marker would be redundant) Also surfaces Slack's ``Retry-After`` header on rate-limited responses through a dedicated ``fanout.slack.retryable_api_error`` warn so operators reading CloudWatch can see the recovery window instead of guessing from sustained warn rate. Tests: - logs Retry-After header on rate-limited Slack responses (new): asserts ``retry_after_seconds`` propagates from Slack's response header into the warn metadata - existing terminal-codes parametrized test untouched (terminal branch doesn't read headers) - existing retryable test gains a ``headers: { get: () => null }`` stub on the fetch mock so the headers.get call doesn't crash Reviewer suggested a per-channel dispatch bitmap as the alternative. Rejected as premature: the duplicate-GitHub-PATCH is harmless (idempotent), Email is still a stub, and the dedup map covers the specific agent_error pain identified above. A bitmap would add a new table + IAM grants + per-dispatch DDB cost for a hypothetical problem (Slack rate-limiting AND a sibling channel failure). * test(fanout): conditional UpdateItem race + dup-delete coverage (#79 test gap) Adds 4 tests covering the lifecycle-persist conditional path that review fix #1 introduced and review fix #6 hardened. Pre-PR-#79 the only ConditionalCheckFailed coverage was the terminal-dedup path; the new lifecycle-persist + dup-delete code lacked direct assertions and was flagged 9/10 criticality by the reviewer. - task_created persist ConditionalCheckFailed → posts duplicate then deletes it: pins the cleanup behaviour that prevents ghost task_created posts in the channel - session_started persist ConditionalCheckFailed → posts duplicate then deletes it: parallel coverage for the other lifecycle attribute (slack_session_msg_ts) - dup-delete failure emits fanout.slack.dup_delete_failed with error_id: pins the operator-alarm signal added in review fix #6; asserts both the event key and the FANOUT_SLACK_DUP_DELETE_FAILED error_id propagate - chat.delete returning message_not_found is treated as success (no dup_delete_failed): negative-class assertion. Prevents false-positive alarms when the race resolves cleanly (the duplicate was already deleted by a prior retry). The ghost / message_not_found tests use ``fetchMock.mockImplementation`` URL-routing rather than ``.mockResolvedValueOnce`` chains because ``updateReaction`` issues 2-3 reaction-API fetches between chat.postMessage and chat.delete; routing by URL keeps the test focused on the load-bearing chat.delete behaviour without coupling to reaction call order. * test(fanout): cover task_stranded + agent_error renderers (#79 test gap #32) Pre-PR-#79 the new ``taskStrandedMessage`` and ``agentErrorMessage`` helpers in slack-blocks.ts had no direct unit tests. Reviewer flagged this as a 7/10 gap because the renderers carry the prior_status / error_type / message_preview metadata threaded through from the event source — silent drift in the metadata field names would produce ugly fallback messages in production. Adds 5 tests: - task_stranded WITH metadata renders the prior_status parenthetical (``Task stranded for org/repo (last status: RUNNING)``) so operators can tell at a glance whether the task hung in HYDRATING vs RUNNING — without the parenthetical the reviewer's "generic Event: ..." UX regression would resurface. - task_stranded WITHOUT metadata still renders cleanly (legacy events written before the reconciler started stamping metadata must not crash or leak ``undefined``). - agent_error with full metadata (error_type + message_preview) renders the rotating_light, type, and preview. - agent_error WITHOUT metadata stays sensible — no leaked ``undefined`` strings or empty ``_Type:_`` line. - agent_error truncates a 500-char message_preview to keep Slack channel UX readable. * test(fanout): cover agent_error dedup + dedup-slot isolation (#79 test gap #33) Pre-PR-#79 review-fix #4 there was no direct test for the ``slack_dispatched_agent_error`` dedup attribute or its interaction with the existing ``slack_notified_terminal`` slot. A future refactor that collapsed the two slots — or renamed one of them — would silently break the sibling-channel-failure-retry guarantee that fix #4 added. Adds 4 tests: - ``agent_error claims its own dedup attribute``: pins the UpdateExpression and ConditionExpression strings so a refactor that renames the attribute breaks loudly. - ``agent_error retry hits the dedup guard``: end-to-end scenario matching review #4 — task already has ``slack_dispatched_agent_error: true``, retry must short-circuit before chat.postMessage. Without the guard, a second rotating_light fires. - ``terminal dedup attribute is per-class``: a flaky task_completed-then-task_failed sequence dedups against the same ``slack_notified_terminal`` slot. Catches the regression where the orchestrator emits both terminal types and we'd otherwise post both ✅ and ❌. - ``agent_error and terminals use distinct dedup slots``: the important negative — having ``slack_dispatched_agent_error`` set must NOT shadow a subsequent ``task_completed``. Pins the slot separation so a future merge into a single slot can't silently drop terminals after an agent_error. * test(fanout): add construct-level tests for FanOutConsumer (#79 test gap #34) The construct shipped on issue #64 with no unit-level coverage of its IAM contract. The only synth-level signal lived inside ``slack-integration.test.ts`` ("0 EventSourceMapping") which proved the migration didn't regress the OTHER construct. Reviewer flagged this 6/10 — and the gap is what allowed review #2 (unconditional Slack secret grant) to slip through in the first place. Adds 6 tests: - ``attaches a single DynamoEventSource on the TaskEventsTable stream``: pins the architectural invariant — issue #64 was fundamentally about reaching exactly-one stream reader. Adding a second consumer must fail this test loudly. - ``creates a DLQ for the fanout Lambda``: pins retention period + presence; a DLQ-less deployment would silently drop poison-pill records past retryAttempts. - ``omits the bgagent/slack/* grant when slackSecretArnPattern is not provided``: the review #2 invariant. Iterates every IAM::Policy and asserts NONE of them grant secretsmanager:* on a bgagent/slack/* ARN. A regression that re-introduces the unconditional grant breaks this test. - ``attaches the bgagent/slack/* grant only when slackSecretArnPattern is provided``: the positive case. Pins the grant shape (action, effect, resource pattern). - ``passes TASK_TABLE_NAME env var when taskTable is provided``: review #3 dependency — the dispatcher throws on missing env. - ``omits TASK_TABLE_NAME env var when taskTable is not provided``: graceful degrade for dev stacks that haven't onboarded the TaskTable yet (matches the construct's documented contract). * test(fanout): cover task_stranded through terminal dedup (#79 test gap #35) The reconciler at handlers/reconcile-stranded-tasks.ts:170 emits BOTH ``task_stranded`` and ``task_failed`` for a heartbeat-expired task — one for the operator signal, one to drive the FAILED status transition. Pre-PR-#79 this pair had no test coverage; reviewer flagged this 8/10 because the visible failure mode (a paired "Task stranded" + "Task failed" double-page in Slack) would surface in production but be silent in CI. Adds 2 tests: - ``task_stranded posts and writes the terminal dedup marker on first arrival``: pins that task_stranded participates in the shared terminal slot and renders the warning message with metadata. Catches a regression that omits task_stranded from the dedup map entirely. - ``task_stranded after a sibling task_failed dedups``: the operational scenario — task_failed already claimed ``slack_notified_terminal``; the subsequent task_stranded must short-circuit before chat.postMessage. Without this guard, operators get the double-page the reviewer warned about. * fix(fanout): re-read TaskRecord before terminal cleanup to close orphan-message race Live observation during PR #79 review verification: the same Slack @mention happy path sometimes leaves the 🚀 task_created message in the thread (orphaned beside the ✅ task_completed) and sometimes deletes it cleanly. The race window: 1. ``task_created`` stream batch posts the rocket message and persists ``slack_created_msg_ts`` via the conditional UpdateItem introduced in PR #79 review fix #1. 2. ``task_completed`` stream batch fires ~30s later. Its initial GetItem races the prior UpdateItem and sees a stale ``channel_metadata`` WITHOUT ``slack_created_msg_ts``. 3. The terminal cleanup branch checks ``channelMeta.slack_created_msg_ts`` — undefined — silently skips the chat.delete. The rocket message stays in the thread. Add a fresh GetItem inside the TERMINAL_EVENTS cleanup branch, after the dedup UpdateItem has linearized our view of the table. Any prior ``slack_*_msg_ts`` writes are visible by then, so the cleanup fires correctly. On a re-read failure (DDB throttle / transient blip) we fall back to the dispatch-entry snapshot and emit ``fanout.slack.cleanup_reread_failed`` so operators can alarm on the rate. Pre-existing race (the unconditional UpdateItem in pre-PR-#79 was the same shape — wrote, GetItem on the next batch could miss it). PR #79 doesn't introduce it but doesn't fix it either; this commit does, since the live screenshot evidence appeared during review verification. Tests: - ``terminal cleanup re-reads TaskRecord``: scripts a stale dispatch-entry GetItem followed by a fresh re-read GetItem with ``slack_created_msg_ts`` present; asserts chat.delete fires against the freshly-read ts. - ``terminal cleanup falls back to dispatch-entry snapshot when re-read fails``: defense-in-depth — DDB throttle on the re-read must not break terminal delivery; cleanup uses the entry snapshot and emits the fallback warn. --------- Co-authored-by: bgagent <bgagent@noreply.github.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: Alain Krok <alkrok@amazon.com>

Lands the stateless CDK primitives for Cedar-HITL approval gates so Chunk 5's REST handlers can be wired onto concrete tables. Completes §15.2 tasks 9, 20, and 25. Constructs ---------- ``TaskApprovalsTable`` (§10.1) - PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side primitives landed in the prior commit. - GSI ``user_id-status-index`` with user_id PK + status SK and an ``INCLUDE`` projection limited to the fields GET /v1/pending renders. Three deny-sensitive attrs (``deny_reason``, ``scope``, ``tool_input_sha256``) deliberately omitted from the projection — the list endpoint only returns PENDING rows in practice, but excluding them kills the projection-leak concern outright and costs no bytes today. - Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors it on ``construct.userStatusIndexName`` so handlers referencing the GSI fail compile-time on a rename. - TTL attribute ``ttl`` (agent writes ``created_at + timeout_s + 120s``). - No DynamoDB streams per §11.2. TaskEventsTable carries the audit fan-out; streams here would duplicate. - Default RemovalPolicy.DESTROY to match the rest of the sample. Production deploys override to RETAIN per §10.1. ``SlackUserMappingTable`` (§11.2, finding aws-samples#4) - Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no stream. The forward-only shape is the trust boundary — a reverse GSI (Cognito → Slack) would let a compromised Cognito sub enumerate Slack identities without adding v1 capability. - Writes land through LinkSlackUserFn (Chunk 5) which enforces the ``attribute_not_exists(slack_user_id)`` condition so a prior legitimate mapping cannot be overwritten by a later compromise. ``task-status.ts`` — AWAITING_APPROVAL (§10.3) - Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES: the task is alive, paused on a human decision). - VALID_TRANSITIONS wires the five edges §10.3 enumerates: RUNNING → AWAITING_APPROVAL (soft-deny entry) HYDRATING → AWAITING_APPROVAL (rare early-gate case) AWAITING_APPROVAL → RUNNING (approve / deny resume) AWAITING_APPROVAL → CANCELLED (user cancel mid-approval) AWAITING_APPROVAL → FAILED (stranded-approval reconciler) - Notably NOT added: AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race) AWAITING_APPROVAL → COMPLETED (skip RUNNING) AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval row, not the task clock) These are regression tests so a future refactor cannot quietly add them and bypass the `awaiting_approval_request_id = :rid` invariant. Tests: +29 total. - TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST, PITR default + override, TTL attribute, NO streams, GSI schema + projection + sensitive-attr exclusion, removal policy default + override, ``USER_STATUS_INDEX_NAME`` constant parity with the construct field. - SlackUserMappingTable (8 tests): single-key schema (explicit KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams, no reverse GSI, DESTROY default, TTL absent. - TaskStatus (+10 tests over existing: 5 new assertions on the 9-state cardinality, AWAITING_APPROVAL membership, and the transition graph including the three forbidden edges). The existing assertions updated for the new state count. No stack wiring yet — ``agent.ts`` instantiation + env var plumbing + grants land in the next commit alongside the Cedar-WASM Lambda layer.

Lands the three read/discovery handlers Chunk 6 (CLI) needs to power ``bgagent pending``, ``bgagent policies list/show``, and ``bgagent notifications configure slack``. Completes §15.2 tasks 14, 15, and 25 (handler side). Handlers -------- ``get-pending.ts`` (§7.7 — GET /v1/pending) - Queries ``user_id-status-index`` GSI on TaskApprovalsTable with ``user_id = :caller AND status = :pending``. Without the GSI this would be a full-table Scan per call — under ``watch -n1 bgagent pending`` that exhausts burst capacity for the whole fleet (§10.1 finding aws-samples#8). - Response maps each row to ``PendingApprovalSummary`` with a derived ``expires_at = created_at + timeout_s`` so the CLI can render time-to-timeout without doing arithmetic on ISO strings. - Severity coerced to ``medium`` on unknown values so GSI writes that drift from the enum don't break the list response. - Rate-limited 10/min/user (synthetic row on the same table, namespaced ``RATE#<user>#PENDING`` so it does not collide with the approve/deny counter). ``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies) - Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES`` with the repo's ``cedar_policies`` blueprint override. Runs the combined text through ``parseRules`` and returns ``{hard[], soft[]}`` rule summaries. - 5-minute per-repo in-Lambda cache; cold starts throw it away. ``_resetCacheForTests`` exposed for unit-test isolation. - Repo ID is URL-decoded from the path (``owner%2Frepo`` common in CLI UX). - Rate-limited 30/min/user. - Blueprint load failure falls back to built-ins with a WARN log; invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE`` rather than a misleading empty list. ``link-slack-user.ts`` (§11.2 finding aws-samples#4 — POST /v1/notifications/slack/link) - Writes to SlackUserMappingTable with ``ConditionExpression: attribute_not_exists(slack_user_id)``. This guard is the entire admission control the §11.2 design hinges on: even a compromised Slack admin cannot overwrite an existing mapping. - Validates ``slack_user_id`` shape (letters, digits, underscores, 2–40 chars) so junk rows cannot land. - Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused error code (the payload message directs the user to unlink via support). - Slack link_token end-to-end validation against Slack OAuth is deferred — v1 accepts the token on trust from the Cognito-authed caller; it is persisted in CloudWatch for audit. Supporting primitives --------------------- ``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar`` and ``agent/policies/soft_deny.cedar`` as TypeScript string constants. Embedded rather than read from disk because Lambda's esbuild bundler does not copy non-TS assets by default and a dedicated bundling hook is more code than the embed. A drift test (``builtin-policies.test.ts``) asserts byte-equality with the agent files so any change on one side without the other flips red at build time. ``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated ``base_permit`` entry (both tiers need it as a Cedar catch-all; it is not a user-facing rule so it stays out of ParsedRule[]). This matches the agent-side ``_parse_policy_annotations`` behaviour. Tests: +37 total. - get-pending (8): 401 on missing auth, 429 on rate limit, empty result, GSI query shape, row → PendingApprovalSummary with derived expires_at, severity fallback, missing timeout → expires_at falls back to created_at, 500 on DDB error. - get-policies (11): 401/400 validation, built-in rules listed on empty repo, URL-decoded repo path, custom blueprint rule lands in soft, per-repo cache across calls, 429 rate limit, 503 on invalid blueprint cedar, fallback on load failure, hard rules omit severity / approval_timeout_s, soft rules carry them. - link-slack-user (8): 401/400 validation, shape check, 201 on success, 409 on overwrite attempt, 500 on unknown DDB error, whitespace trim on slack_user_id, ConditionExpression verified. - builtin-policies (4): drift byte-equality with both agent files, parseRules round-trip for hard/soft rule IDs. - cedar-policy (updated): ``base_permit`` is skipped from ParsedRule[] rather than rejected. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout) lands in the next commit.

* docs(cedar-hitl): restore and revise HITL gates design, fold adversarial findings Design doc was accidentally removed in 0742ebe; restored from b34d7cd and substantially revised under a new filename. "Phase 3" framing dropped — this is the Cedar HITL approval gates feature. - Renamed PHASE3_CEDAR_HITL.md → CEDAR_HITL_GATES.md; all "phase" gating removed (Phase 3a/3b → v1 / future work §17). - Integrated 16 findings from 2026-05-06 adversarial review with realistic scenarios. Major structural changes: - Decision #23 (new): cross-engine parity contract between cedarpy (agent, Python) and @cedar-policy/cedar-wasm@4.10.0 (Lambda, TS). - §11.2: SlackUserMappingTable with OAuth user-initiated mapping; severity- gated Slack approvals; admin has no write path. - §7.1/§12.3: ApproveTaskFn uses cross-table TransactWriteItems for atomicity. - §10.1: user_id-status-index GSI on TaskApprovalsTable; v1 not v-later. - §15.6: cedar-wasm as a Lambda layer shared across policy Lambdas. - Gate-cap revision (2026-05-07): decision #13 — default 50, blueprint- configurable via security.approvalGateCap (bounded 1–500), persisted on TaskTable. Cache memory bound decoupled: 50-entry LRU regardless of cap. IMPL-22 adds telemetry-driven re-evaluation criteria. - Timeout adversarial+advocate pass (2026-05-07): - §6.5 VM-throttle race fix: re-read row on failed TIMED_OUT ConditionCheckFailed; honor APPROVED if user beat the timer. IMPL-24. - Sub-120s @approval_timeout_s emits blueprint-load WARN. IMPL-25. - User-visible timeout cap milestones (approval_timeout_capped_at_submit, approval_ceiling_shrinking). IMPL-26. - Runtime JWT: no refresh logic in agent/src/ (container uses IAM role); ceiling stays min(1h, maxLifetime_remaining - 120s). IMPL-27. - Three new CloudWatch metrics for timeout tuning. IMPL-28. - §14.8 new: off-hours trade-off section (fail-closed is the invariant). - §13.13 new: notification-delivery failure does NOT pause the timer (bypass-prevention). - Added six mermaid diagrams: three-outcome decision flow, end-to-end round- trip, TaskApprovalsTable state machine, Slack user-mapping, fail-closed decision flow, cross-engine parity check. - Cross-references updated in INTERACTIVE_AGENTS.md and SECURITY.md. - Starlight mirror regenerated via docs/scripts/sync-starlight.mjs. No code changes in this commit — design work only. Implementation lands in a follow-up PR per §15.2 task list. * feat(cedar-hitl): pin Cedar engines and seed cross-engine parity contract Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md). Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines pinned exactly per decision #23, annotation surface validated by Day-1 spikes per decision #22, and the golden-file parity fixtures seeded so every subsequent chunk can rely on the contract. - Pin cedarpy==4.8.0 (agent) and @cedar-policy/cedar-wasm@4.10.0 (cdk) exactly (no ^/~); document both in mise.toml header. - Add agent/tests/test_cedarpy_annotations_contract.py (10 tests) validating all 5 annotations round-trip verbatim via policies_to_json_str() under staticPolicies.<id>.annotations. - Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating policySetTextToParts + policyToJson extract the same annotations verbatim and isAuthorized returns the documented {type, response} wrapper shape. - Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match, multi-match, hard-deny, soft-deny write, no-match default-allow) + README documenting the contract. Every fixture policy carries a @rule_id - including the base permit as @rule_id("base_permit") - so the parity tests raise if either engine returns an unannotated match instead of silently dropping it. - Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm side) loading the shared fixtures and asserting (decision, sorted rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm so a dependency regression fails loud rather than silently skipping. - Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6 prose and the parity mermaid diagram to point at contracts/cedar-parity/ (the precedent set by contracts/memory-hash-vectors.json) instead of a new tests/fixtures/ dir. Regenerate the Starlight mirror. - Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm diagnostics.reason naming asymmetry surfaced by the spikes; engine code normalizes at the boundary. - Fix rev-4 -> rev-5 cosmetic footer drift. Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190 unchanged. No production code changes in this chunk; engine rewrite lands in Chunk 2. Follow-up: separate chore issue to move contracts/memory-hash-vectors.json into a self-named subdir for consistency with contracts/cedar-parity/. * feat(cedar-hitl): three-outcome PolicyEngine core Chunk 2 of the Cedar HITL gates PR. Rewrites agent/src/policy.py into the three-outcome engine specified in docs/design/CEDAR_HITL_GATES.md section 6. The REQUIRE_APPROVAL outcome is the human-in-the-loop surface the next chunks (PreToolUse hook extension, REST API, CLI) plug into. This chunk ships the engine and its load-time validation; no hook or wire-format changes yet. Engine: - Outcome enum (ALLOW, DENY, REQUIRE_APPROVAL) + extended PolicyDecision with .allowed backward-compat shim for Phase 1a/1b/2 callers. Custom __init__ accepts both outcome= and legacy allowed= kwargs so existing tests keep working verbatim. - Three-outcome pipeline per section 6.2: hard-deny eval (absolute) -> allowlist fast-path (tool_type/tool_group/bash_pattern/write_path/ all_session) -> recent-decision cache (60s TTL on DENIED/TIMED_OUT) -> soft-deny eval (with post-eval rule-scope allowlist check and blueprint_disable filtering) -> default ALLOW. - ApprovalAllowlist (section 6.4): parses and matches every scope type. Strips whitespace and rejects empty-after-strip values so "tool_type: Read " normalizes instead of silently mismatching (review finding 6). - RecentDecisionCache (section 12.9): 50-entry LRU, INDEPENDENT of approvalGateCap. Populated only on DENIED/TIMED_OUT. Session-scoped (documented section 12.8 caveat). - Annotation handling (sections 5.2 + 6.3): parses @rule_id, @tier, @approval_timeout_s, @severity, @category via cedarpy.policies_to_json_str(); merges on multi-match with min timeout (clamped by 30s floor) and max severity. - Load-time validation (sections 5.1, 12.4): rejects missing/mismatched @tier, missing @rule_id, sub-floor timeouts, duplicate rule_ids across tiers, blueprint text > 64 KB, disable entries naming built-in hard-deny rules (finding 9), approval_gate_cap outside [1, 500] (decision 13). Sub-120s @approval_timeout_s emits WARN but accepts (IMPL-25). - Fail-closed posture (section 13): cedarpy parse errors surface via diagnostics.errors -> RuntimeError raised inside _eval_tier -> outer handler returns DENY with reason "fail-closed: <ExceptionType>". TypeError on json.dumps of unhashable tool_input surfaces as distinct "fail-closed: unhashable_tool_input" reason (review finding 5). Built-in policies: - agent/policies/hard_deny.cedar: base_permit catch-all + rm_slash + write_git_internals + write_git_internals_nested + drop_table + pr_review-specific Write/Edit forbids (absolute). - agent/policies/soft_deny.cedar: base_permit (catch-all required in each tier so cedarpy default-deny does not convert no-match into DENY) + force_push_any + force_push_main + push_to_protected_branch + write_env_files + write_credentials. All soft rules carry @tier, @rule_id, @approval_timeout_s, @severity, @category per section 15.4 starter set. Review findings addressed (1 blocker, 8 significant, plus minor): - blueprint_disable actually disables soft rules at eval time instead of silently no-op (the blocker: test coverage had been a silent-pass). - Legacy extra_policies with @tier/@rule_id rejected to avoid undefined double-annotation behavior. - _matching_rule_ids logs WARN on unknown policy IDs (state-drift signal). - base_permit validator exemption restricted to effect=="permit" so misnamed forbid rules cannot bypass validation (finding 7). - Hard-tier Cedar no_decision logged at WARN (signals missing/malformed base_permit catch-all). - Allowlist whitespace normalization + empty-value rejection. - StrEnum upgrade, Callable moved to TYPE_CHECKING, assert replaced with explicit RuntimeError for S101 compliance. Phase 1 compatibility: - All 39 existing test_policy.py tests pass unchanged via the .allowed property. One test (test_invalid_policy_syntax_fails_closed) updated to patch _hard_policies instead of the removed _policies attribute; docstring explains the rewrite. - extra_policies kwarg preserved; callers with annotated rules must migrate to blueprint_soft_policies / blueprint_hard_policies. Test counts: agent 516 -> 576 (+60: 51 three-outcome + 9 regression fixes). cli 190 unchanged. cdk 1054 unchanged. Carry-forward to Chunk 3: - extra_policies semantic shift (Phase 1 DENY -> Chunk 2 REQUIRE_APPROVAL); .allowed=False preserved but .outcome differs. Switchover happens when hooks.py adopts the three-outcome branching. - Cross-tier action-context asymmetry (review finding 8): document rule-authoring constraint in section 5.5 of design. - Probe entity-shape coverage (finding 10): extend _probe_cedar to exercise Write/Edit/Bash action paths, not just invoke_tool. * feat(cedar-hitl): approval milestone writers + engine counters Adds the 14 agent-side approval milestone writers (§11.1) on ``_ProgressWriter`` so Chunk 3's hook integration has a typed API instead of stringly-typed ``write_agent_milestone`` calls, and the per-task gate counter / per-container sliding-window rate limit / denial-injection queue on ``PolicyEngine`` that §6.5 requires. Why now: the hook work lands cleanly only after these surfaces exist — every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch calls one of these helpers. Shipping them separately lets the hook commit be about the state machine, not the event-shape bookkeeping. Engine additions: - ``approval_gate_count`` / ``increment_approval_gate_count``: the per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped in v1; persistence tracked in §17. - ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``: sliding-window rate limit (20/min/container, §12.9). Prune on read so callers see the current count without a separate tick. - ``queue_denial_injection`` / ``drain_denial_injections``: queue consumed by ``_denial_between_turns_hook`` at the next Stop seam (§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``. - ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26. - ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts the hook imports rather than re-deriving. Milestone writers (§11.1 table, 14 agent-emitted of 15): - ``pre_approvals_loaded``, ``approval_requested``, ``approval_granted``, ``approval_denied``, ``approval_timed_out``, ``approval_stranded``, ``approval_write_failed``, ``approval_resume_failed``, ``approval_poll_degraded``, ``approval_timeout_capped``, ``approval_ceiling_shrinking``, ``approval_cap_exceeded``, ``approval_rate_limit_exceeded``, ``approval_late_win``. - ``approval_decision_recorded`` (Lambda audit) and ``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the Lambda side — Chunk 5 owns those. Each helper is a thin wrapper over ``_put_event("agent_milestone", ...)`` so the shared circuit-breaker + classifier path (finding #6/#8) continues to apply. Metadata keys mirror the §11.1 shapes verbatim (``maxLifetime_remaining_s`` preserves the design-doc spelling for downstream parsers). Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB payload shape for each helper (including the two ``approval_timeout_capped`` reason variants — rule_annotation carries matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the engine: counter monotonicity, rate-window prune semantics at window boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch idempotency. No caller changes — engine and writer surfaces are additive. Hook integration lands in commit C. * feat(cedar-hitl): TaskApprovals + AWAITING_APPROVAL transition primitives Adds the four agent-side DDB primitives §6.5 + IMPL-24 need for the three-outcome hook integration in the next commit: - ``transact_write_approval_request`` — cross-table TransactWriteItems: Put(TaskApprovalsTable) with ``attribute_not_exists(request_id)`` + Update(TaskTable) gated on ``status = RUNNING``. Atomic per §12.3 so a concurrent cancel cannot land the task in AWAITING_APPROVAL with no matching approval row (or vice versa). - ``transact_resume_from_approval`` — Update(TaskTable) gated on ``status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid``. The ``request_id`` condition prevents resuming with a stale ID after a reconciler race (§13.9). - ``best_effort_update_approval_status`` — conditional UpdateItem on the approval row with ``status = :pending`` guard. Returns False on ``ConditionalCheckFailedException``; this is the signal IMPL-24's re-read path fires on (§6.5 pseudocode lines 846-879, §13.12). - ``get_approval_row`` — GetItem with ``ConsistentRead=True`` by default. Required by IMPL-24's re-read; kept opt-out (bool flag) for future cold-path callers that don't need the strong read. Errors: - ``ApprovalTablesUnavailable`` for env-var-missing — raised loud so a pre-Chunk-4 deploy fails closed (hook will map to DENY) rather than silently no-op'ing the gate. - ``ApprovalWriteError`` / ``ApprovalResumeError`` wrap ``TransactionCanceledException`` with the cancellation reasons list. The hook uses these to distinguish the "concurrent cancel" branch from real DDB outages. - ``ConditionalCheckFailedException`` on ``update_item`` is consumed and returned as ``False`` from ``best_effort_update_approval_status`` — the caller (hook) needs the boolean to decide whether to re-read, not to propagate. - All other DDB errors propagate so the hook's outer try/except can classify fail-closed with a specific reason. Implementation notes: - Uses ``boto3.client("dynamodb")`` low-level API (not resource). ``transact_write_items`` lives on the client, and marshalling the approval row attributes explicitly gives deterministic DDB shapes that the tests can assert on. ``_py_to_ddb_attr`` covers the subset of Python types §10.1 actually uses (str/int/bool/None/list of str); any other type raises TypeError loudly rather than silently writing something unexpected. - ``_extract_error_code`` / ``_extract_cancellation_reasons`` duck-type on ``exc.response`` so we don't need botocore at import time (tests use a minimal exception class). - Errors from unsupported types (floats, dicts, etc.) are caught BEFORE the DDB round-trip so the unit-test asserts ``transact_write_items`` was not called — catches schema drift early. - Status constants (``_STATUS_RUNNING`` / ``_STATUS_AWAITING_APPROVAL``) named so a rename in CDK cannot silently diverge the Python path. Tests: +20 total. - 5 on TransactWriteApprovalRequest: env-missing, happy-path shape assertion (both items + conditions), TransactionCanceled → ApprovalWriteError with reasons preserved, other errors propagate, unsupported type rejected before any DDB call. - 3 on TransactResumeFromApproval: env-missing, happy-path expression shape (includes REMOVE awaiting_approval_request_id), cancel → ApprovalResumeError. - 4 on BestEffortUpdateApprovalStatus: happy path returns True, ``reason`` kwarg attaches ``deny_reason``, ConditionalCheckFailed returns False (IMPL-24's signal), other errors propagate. - 4 on GetApprovalRow: ConsistentRead default True, opt-out False, row-not-found returns None, row unmarshalling through every supported DDB attribute type. - 4 on helpers: error-code extraction with and without ClientError-shape, cancellation-reasons extraction with and without. No runtime callers yet — hook integration lands in commit C. Physical TaskApprovalsTable lands in Chunk 4; Python side is wire-compatible so the hook work can be unit-tested today with mocked clients. * feat(cedar-hitl): PreToolUse three-outcome REQUIRE_APPROVAL path Wires the agent to the full §6.5 pseudocode: cap + rate-limit check, atomic TransactWriteItems for pending row + TaskTable AWAITING_APPROVAL, 2s→5s ConsistentRead poll, IMPL-24 VM-throttle race re-read, resume transition, scope propagation to allowlist, and denial-injection queue consumed at the next Stop seam. Completes §15.2 rows 26 + 27. Hook control flow (three outcomes) ---------------------------------- - ALLOW / DENY: existing Phase 1 behavior, now switching on ``.outcome`` rather than ``.allowed``. Legacy Phase 1/2 tests still green because PolicyDecision preserves the ``.allowed`` shim. - REQUIRE_APPROVAL (new): extracted into ``_handle_require_approval`` for readability. Delegates to ``task_state`` primitives and ``engine.*`` counter surfaces from the prior commits; no new DDB client construction here. Key pieces: - ``_compute_effective_timeout`` applies the §6.5 min(rule, default, lifetime) formula. The engine's ``_merge_annotations`` has already clipped decision.timeout_s against the task default; the hook adds the remaining-lifetime ceiling and floors at FLOOR_30S. ``clip_reason`` distinguishes ``rule_annotation`` (rule was tighter than task default) from ``maxLifetime_ceiling`` (task is late in its life) so ``approval_timeout_capped`` carries the right reason. - ``_remaining_maxlifetime_s`` reads ``AGENTCORE_MAX_LIFETIME_S`` + ``TASK_STARTED_AT`` env vars (8h default). Returns ``None`` when the start timestamp is absent — the hook treats that as "unknown, don't clip" rather than pre-DENYing, so Phase 1 test paths that don't set the env var still see the old task-default behaviour. Chunk 4/5 will wire these at task launch. - ``_poll_for_decision`` uses 2s cadence for the first 30s then 5s (IMPL-12). All polls use ``ConsistentRead=True`` per IMPL-24. 3 consecutive GetItem failures emit ``approval_poll_degraded``; 10 consecutive failures fall through as TIMED_OUT with a specific reason (§13.2). - ``_reconcile_late_decision`` implements IMPL-24 re-read: on a ConditionCheckFailed from the TIMED_OUT write, re-read with ConsistentRead. APPROVED → rebuild outcome, propagate scope to allowlist, run normal allow flow, emit ``approval_late_win``. DENIED → honor the user's sanitized reason. PENDING or row gone → fall through with TIMED_OUT (fail-closed, §13.12 last paragraph). Cancel-wins semantics (finding #2) ---------------------------------- ``_denial_between_turns_hook`` is registered AFTER ``_nudge_between_turns_hook`` in ``between_turns_hooks`` so cancel short-circuits both. The hook re-checks ``_cancel_requested`` itself as belt-and-braces (matching the nudge hook) so a future reorder does not silently break cancel-wins. Denial queue is PRESERVED on cancel — not drained — so a denial still sitting on the queue when the task is being torn down does not leak across tasks (the engine is per-task per §IMPL-7). ``stop_hook`` threads ``engine`` into ``ctx`` so the denial hook can ``drain_denial_injections``. ``build_hook_matchers`` accepts a new ``user_id`` kwarg (§12.2) so approval rows carry caller identity for the REST side's ownership check. ``permissionDecisionReason`` guaranteed surface ----------------------------------------------- The hook's deny return is the ONLY guaranteed surface the SDK emits to the agent; denial injection is best-effort (pre-empted by cancel). ``_deny_response`` pipes every reason through ``_strip_ansi`` + ``_truncate(500)``: ANSI sequences can never reach the model, and the line stays loggable. §12.7 requirement. Tests: +24 agent hook tests (47 total in test_hooks.py). Run in 0.92s via a ``_fast_poll`` fixture that collapses ``asyncio.sleep`` to a no-op AND advances ``hooks.time.monotonic`` by the requested duration so the poll wall-clock deadline actually trips. Happy paths: - APPROVED + scope propagation to allowlist + milestones. - APPROVED with scope=this_call does NOT grow allowlist. - DENIED queues denial injection + populates recent-decision cache (next identical call auto-denies). - TIMED_OUT writes TIMED_OUT row and emits approval_timed_out. IMPL-24 race: four branches. - APPROVED re-read → allow flow, approval_late_win milestone, scope propagated, resume succeeds. - DENIED re-read → deny flow, approval_late_win milestone, user's reason is the permissionDecisionReason. - Still-PENDING re-read → fail-closed fall-through (no late_win). - Row-gone re-read → same fail-closed fall-through. Cap / rate-limit / write failure / resume failure branches all: - Short-circuit before any DDB write when the local guard fires (cap, rate limit). - Emit the right approval_* milestone. - Return DENY with a specific permissionDecisionReason. Sanitization: - ANSI stripped from deny reason. - Deny reason truncated to ≤500 chars. Timeout clipping: - rule_annotation reason when a rule's approval_timeout_s is below the task default; matching_rule_ids populated. - maxLifetime_ceiling reason when remaining lifetime is the tightest bound; matching_rule_ids is None. - approval_ceiling_shrinking emits exactly once per task (IMPL-26 latch). Denial injection hook (6 tests): - Draining produces a <user_denial request_id=... decided_at=...> block with XML-escaped reason. - Cancel short-circuit preserves the queue so the denial is not lost; just not injected into a dying agent. - Hostile reason (</user_denial>...<user_nudge>) is XML-escaped so the envelope cannot be forged. - No-engine ctx returns [] (Phase 1 call sites still work). - Registered LAST in ``between_turns_hooks`` (invariant for §6.5 finding #2). - End-to-end via stop_hook: queued denial becomes ``decision=block`` + reason on the Stop return. Carry-forward ------------- - ``_remaining_maxlifetime_s`` returns None when TASK_STARTED_AT is unset — Chunk 4/5 will wire this at task launch. Tracked in §16. - ``approval_gate_count`` lives on the engine (session-scoped) not on TaskTable in v1. §13.6 notes that the reconciler + approval_gate_cap still bound worst-case across container restarts. Chunk 7+ tracks persistence when telemetry justifies it. - Denial injection emits a ``user_denial_injected`` milestone that is NOT in the §11.1 enumerated table. It mirrors ``nudge_acknowledged`` for stream visibility; keep the name distinct from the ``approval_*`` prefix so future §11.1 consumers can't confuse it with an approval outcome. * feat(cedar-hitl): TaskApprovalsTable + SlackUserMapping + status enum Lands the stateless CDK primitives for Cedar-HITL approval gates so Chunk 5's REST handlers can be wired onto concrete tables. Completes §15.2 tasks 9, 20, and 25. Constructs ---------- ``TaskApprovalsTable`` (§10.1) - PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side primitives landed in the prior commit. - GSI ``user_id-status-index`` with user_id PK + status SK and an ``INCLUDE`` projection limited to the fields GET /v1/pending renders. Three deny-sensitive attrs (``deny_reason``, ``scope``, ``tool_input_sha256``) deliberately omitted from the projection — the list endpoint only returns PENDING rows in practice, but excluding them kills the projection-leak concern outright and costs no bytes today. - Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors it on ``construct.userStatusIndexName`` so handlers referencing the GSI fail compile-time on a rename. - TTL attribute ``ttl`` (agent writes ``created_at + timeout_s + 120s``). - No DynamoDB streams per §11.2. TaskEventsTable carries the audit fan-out; streams here would duplicate. - Default RemovalPolicy.DESTROY to match the rest of the sample. Production deploys override to RETAIN per §10.1. ``SlackUserMappingTable`` (§11.2, finding #4) - Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no stream. The forward-only shape is the trust boundary — a reverse GSI (Cognito → Slack) would let a compromised Cognito sub enumerate Slack identities without adding v1 capability. - Writes land through LinkSlackUserFn (Chunk 5) which enforces the ``attribute_not_exists(slack_user_id)`` condition so a prior legitimate mapping cannot be overwritten by a later compromise. ``task-status.ts`` — AWAITING_APPROVAL (§10.3) - Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES: the task is alive, paused on a human decision). - VALID_TRANSITIONS wires the five edges §10.3 enumerates: RUNNING → AWAITING_APPROVAL (soft-deny entry) HYDRATING → AWAITING_APPROVAL (rare early-gate case) AWAITING_APPROVAL → RUNNING (approve / deny resume) AWAITING_APPROVAL → CANCELLED (user cancel mid-approval) AWAITING_APPROVAL → FAILED (stranded-approval reconciler) - Notably NOT added: AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race) AWAITING_APPROVAL → COMPLETED (skip RUNNING) AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval row, not the task clock) These are regression tests so a future refactor cannot quietly add them and bypass the `awaiting_approval_request_id = :rid` invariant. Tests: +29 total. - TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST, PITR default + override, TTL attribute, NO streams, GSI schema + projection + sensitive-attr exclusion, removal policy default + override, ``USER_STATUS_INDEX_NAME`` constant parity with the construct field. - SlackUserMappingTable (8 tests): single-key schema (explicit KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams, no reverse GSI, DESTROY default, TTL absent. - TaskStatus (+10 tests over existing: 5 new assertions on the 9-state cardinality, AWAITING_APPROVAL membership, and the transition graph including the three forbidden edges). The existing assertions updated for the new state count. No stack wiring yet — ``agent.ts`` instantiation + env var plumbing + grants land in the next commit alongside the Cedar-WASM Lambda layer. * feat(cedar-hitl): Cedar-wasm layer + wire approval tables into agent stack Activates the agent-side approval path and ships the Lambda layer Chunk 5's REST handlers need. Cedar-wasm Lambda layer (§15.2 task 10) ---------------------------------------- ``CedarWasmLayer`` bundles ``@cedar-policy/cedar-wasm@4.10.0`` into ``/opt/nodejs/node_modules/`` so Lambdas can ``require('@cedar-policy/cedar-wasm/nodejs')`` without shipping the 4 MB wasm binary in every function package. A dedicated ``cdk/layers/cedar-wasm/`` directory carries a minimal ``package.json`` pinning the exact version — bundling runs ``npm install --omit=dev`` against that manifest, so the layer build is hermetic from any ``cdk/node_modules/`` drift. The bundler has two fallbacks: - Docker (``public.ecr.aws/sam/build-nodejs22.x``) for CI / prod deploys. - Local-npm fallback for environments without Docker (unit-test synths + `cdk synth` on runners that lack Docker). The local path is safe here because the layer ships pure JS + a prebuilt wasm binary — no native build step. Three constants exposed from the module: - ``CEDAR_WASM_VERSION`` — single source of truth for the pinned version; tests assert this matches both ``cdk/package.json`` and the layer manifest, so the three places the version lives stay in sync. - ``CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` — 512 MB floor for attaching Lambdas per §15.2 task 10. - ``CedarWasmLayer.layer`` — the underlying ``LayerVersion`` for Chunk 5 handlers to attach via ``fn.addLayers(...)``. Agent stack wiring (§15.2 task 19) ------------------------------------ ``agent.ts`` now instantiates: - ``TaskApprovalsTable`` (prior commit) — grants RW to the runtime so ``pre_tool_use_hook`` can TransactWriteItems + ConsistentRead the PENDING row. - ``SlackUserMappingTable`` (prior commit) — not granted to the runtime; only the link-user Lambda (Chunk 5) writes here. - ``CedarWasmLayer`` — the layer's asset lands in the synthed template so Chunk 5 handlers can reference ``.layer`` without causing a new asset on their deploy. New runtime env vars: - ``TASK_APPROVALS_TABLE_NAME`` — consumed by ``task_state._require_tables``; its absence previously raised ``ApprovalTablesUnavailable`` → hook DENY. Now set, so the approval path is live on deploy. - ``AGENTCORE_MAX_LIFETIME_S = '28800'`` — 8 hours, matching ``lifecycleConfiguration.maxLifetime``. Consumed by the hook's ``_remaining_maxlifetime_s`` for the maxLifetime ceiling clip (§6.5). Kept in sync with the lifecycle via a direct test assertion so drift surfaces at build time. New CfnOutputs: ``TaskApprovalsTableName``, ``SlackUserMappingTableName``, ``CedarWasmLayerArn``. Each is useful for post-deploy smoke tests (`aws dynamodb describe-table` / `aws lambda get-layer-version`). Tests: +8 layer tests + 9 agent-stack assertions. Layer: - LayerVersion resource count. - Compatible runtimes (nodejs20/22). - Description carries the pinned version. - CEDAR_WASM_VERSION matches ``cdk/package.json``. - CEDAR_WASM_VERSION matches ``layers/cedar-wasm/package.json``. - CEDAR_WASM_MIN_LAMBDA_MEMORY_MB ≥ 512. - Custom description override works. - ``.layer`` exposes a real ``LayerVersion``. Agent stack: - Table count updated from 6 → 8. - TaskApprovalsTable schema match (task_id PK / request_id SK, user_id-status-index GSI presence). - SlackUserMappingTable single-key schema. - LayerVersion count + compatibleRuntimes. - Three new CfnOutputs present. - TASK_APPROVALS_TABLE_NAME env var on the runtime. - AGENTCORE_MAX_LIFETIME_S == '28800' (drift guard). Carry-forward ------------- - ``TASK_STARTED_AT`` is the other input the hook's ``_remaining_maxlifetime_s`` consumes — it's a PER-TASK value the orchestrator must stamp at invocation time, not a stack-level env var. Chunk 5's orchestrator changes need to add it to the runtime invocation payload / session env. For now the hook's fallback ("unknown, don't clip") keeps approvals functional. - Chunk 5 will attach the CedarWasmLayer onto ApproveTaskFn, DenyTaskFn, GetPoliciesFn, CreateTaskFn and assert ``memorySize >= CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` for each. * feat(cedar-hitl): approve + deny handlers + shared types (§7.1, §7.2) Lands the two user-facing REST handlers that flip a PENDING approval row to APPROVED / DENIED, the shared types both call sites and the CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers (get-policies, create-task validation) will use. Wire types (shared/types.ts) ---------------------------- - ApprovalScope union covering every shape the agent's ApprovalAllowlist understands. Typed so approve-task / create-task / CLI (Chunk 6) all agree at compile time. - ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest / DenyResponse / PendingApprovalSummary / GetPendingResponse / PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest / LinkSlackUserResponse / SlackUserMappingRecord / ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions. - Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20, INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30, APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300. New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED, TASK_NOT_AWAITING_APPROVAL. Shared helpers -------------- - shared/approval-scope.ts — parseApprovalScope validates every shape; rejects unknown tool types / groups / prefixes, empty values, over-128-char strings. isDegeneratePattern implements §7.4 (length ≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task. - shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys, GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks, bearer tokens with [REDACTED-...] markers. Mirrors agent/src/output_scanner.py so the deny reason the agent ultimately reads is never raw user input. - shared/cedar-policy.ts — parseRules pulls the five HITL annotations (tier/rule_id/severity/approval_timeout_s/category) into a ParsedRule[], preserving positional policy_id for IMPL-29 diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId, matchingRuleIds, concatPolicies exposed for future handlers. Handlers -------- - approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve - Cross-table TransactWriteItems: approval row PENDING → APPROVED guarded by user_id = :caller AND status = :pending; TaskTable no-op Update guarded by status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid. - TransactionCanceledException classified by per-item CancellationReasons. Approval-row failure collapses to 404 REQUEST_NOT_FOUND (no existence oracle per §7.1 finding #6); task-row failure → 409 TASK_NOT_AWAITING_APPROVAL. - Optional scope defaults to this_call. - Per-user per-minute rate limit (30/min, synthetic row). - Writes approval_decision_recorded audit event (IMPL-6). Audit failure is logged but does not fail the request — decision is already committed. - deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny - Same cross-table pattern; status → DENIED + deny_reason. - Reason is scanDenyReason-sanitized + truncated to DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit both read sanitized form; raw input never stored. - Same rate-limit namespace as approve. Tests: +64 total (cedar-policy-parser 24, approval-scope 28, deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test fixtures are assembled from string fragments so the source never holds a contiguous secret literal — Code Defender pre-commit hook otherwise blocks. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout + LinkSlackUserFn + GetPolicies + GetPending) lands in the next commit. * feat(cedar-hitl): get-pending + get-policies + link-slack-user handlers Lands the three read/discovery handlers Chunk 6 (CLI) needs to power ``bgagent pending``, ``bgagent policies list/show``, and ``bgagent notifications configure slack``. Completes §15.2 tasks 14, 15, and 25 (handler side). Handlers -------- ``get-pending.ts`` (§7.7 — GET /v1/pending) - Queries ``user_id-status-index`` GSI on TaskApprovalsTable with ``user_id = :caller AND status = :pending``. Without the GSI this would be a full-table Scan per call — under ``watch -n1 bgagent pending`` that exhausts burst capacity for the whole fleet (§10.1 finding #8). - Response maps each row to ``PendingApprovalSummary`` with a derived ``expires_at = created_at + timeout_s`` so the CLI can render time-to-timeout without doing arithmetic on ISO strings. - Severity coerced to ``medium`` on unknown values so GSI writes that drift from the enum don't break the list response. - Rate-limited 10/min/user (synthetic row on the same table, namespaced ``RATE#<user>#PENDING`` so it does not collide with the approve/deny counter). ``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies) - Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES`` with the repo's ``cedar_policies`` blueprint override. Runs the combined text through ``parseRules`` and returns ``{hard[], soft[]}`` rule summaries. - 5-minute per-repo in-Lambda cache; cold starts throw it away. ``_resetCacheForTests`` exposed for unit-test isolation. - Repo ID is URL-decoded from the path (``owner%2Frepo`` common in CLI UX). - Rate-limited 30/min/user. - Blueprint load failure falls back to built-ins with a WARN log; invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE`` rather than a misleading empty list. ``link-slack-user.ts`` (§11.2 finding #4 — POST /v1/notifications/slack/link) - Writes to SlackUserMappingTable with ``ConditionExpression: attribute_not_exists(slack_user_id)``. This guard is the entire admission control the §11.2 design hinges on: even a compromised Slack admin cannot overwrite an existing mapping. - Validates ``slack_user_id`` shape (letters, digits, underscores, 2–40 chars) so junk rows cannot land. - Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused error code (the payload message directs the user to unlink via support). - Slack link_token end-to-end validation against Slack OAuth is deferred — v1 accepts the token on trust from the Cognito-authed caller; it is persisted in CloudWatch for audit. Supporting primitives --------------------- ``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar`` and ``agent/policies/soft_deny.cedar`` as TypeScript string constants. Embedded rather than read from disk because Lambda's esbuild bundler does not copy non-TS assets by default and a dedicated bundling hook is more code than the embed. A drift test (``builtin-policies.test.ts``) asserts byte-equality with the agent files so any change on one side without the other flips red at build time. ``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated ``base_permit`` entry (both tiers need it as a Cedar catch-all; it is not a user-facing rule so it stays out of ParsedRule[]). This matches the agent-side ``_parse_policy_annotations`` behaviour. Tests: +37 total. - get-pending (8): 401 on missing auth, 429 on rate limit, empty result, GSI query shape, row → PendingApprovalSummary with derived expires_at, severity fallback, missing timeout → expires_at falls back to created_at, 500 on DDB error. - get-policies (11): 401/400 validation, built-in rules listed on empty repo, URL-decoded repo path, custom blueprint rule lands in soft, per-repo cache across calls, 429 rate limit, 503 on invalid blueprint cedar, fallback on load failure, hard rules omit severity / approval_timeout_s, soft rules carry them. - link-slack-user (8): 401/400 validation, shape check, 201 on success, 409 on overwrite attempt, 500 on unknown DDB error, whitespace trim on slack_user_id, ConditionExpression verified. - builtin-policies (4): drift byte-equality with both agent files, parseRules round-trip for hard/soft rule IDs. - cedar-policy (updated): ``base_permit`` is skipped from ParsedRule[] rather than rejected. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout) lands in the next commit. * feat(cedar-hitl): wire Chunk 5 routes + orchestrator + reconciler + agent plumbing Completes Chunk 5 end-to-end: the five new Lambdas are instantiated and wired onto the REST API, the orchestrator threads approval-related data through to the agent runtime, the stranded-task reconciler sweeps AWAITING_APPROVAL tasks, and the agent pipeline accepts the new per-task approval configuration. Stack wiring (agent.ts + task-api.ts) ------------------------------------- - TaskApi construct accepts `taskApprovalsTable`, `slackUserMappingTable`, `cedarWasmLayer` props. Approve/Deny/GetPending Lambdas are created when the approvals table is present; GetPolicies also requires the cedar-wasm layer + RepoTable. Slack-link Lambda attaches when the slack mapping table is provided. - New routes: POST /tasks/{task_id}/approve POST /tasks/{task_id}/deny GET /pending GET /repos/{repo_id}/policies POST /notifications/slack/link - GetPoliciesFn configures `memorySize: 512` (Cedar-wasm floor from §15.2 task 10) and externalizes `@cedar-policy/cedar-wasm` from the esbuild bundle so the layer provides the wasm binary at runtime. - CedarWasmLayer compatibleRuntimes extended to include nodejs24.x (the Lambda runtime) — the Node 20/22 list was the original §15.2 spec but the actual function uses Node 24. - agent.ts passes all three new constructs into TaskApi. Orchestrator (shared/orchestrator.ts) ------------------------------------- - `finalizeTask` now treats AWAITING_APPROVAL as a "task still alive" terminal-timeout source: on poll exhaustion the task transitions to TIMED_OUT with a distinct `approval_poll_timeout` reason + error message ("Orchestrator poll timeout exceeded while awaiting approval"). The stranded-approval reconciler is the secondary safety net (§13.6) for tasks the orchestrator already lost track of. - Invocation payload now carries three new fields: - `task_started_at` (ISO 8601 at HYDRATING → RUNNING time) — consumed by the agent hook's `_remaining_maxlifetime_s` so the §6.5 maxLifetime ceiling math uses the real task clock instead of the fail-open fallback. - `approval_timeout_s` (when the submit payload supplied it). - `initial_approvals` (when the submit payload supplied entries). Stranded-task reconciler ------------------------ - Sweeps AWAITING_APPROVAL in addition to SUBMITTED/HYDRATING. - New `APPROVAL_STRANDED_TIMEOUT_SECONDS` env var (default 7200s = 2h) — double §7.3's 1h ceiling so this reconciler never races the happy-path timer. - Distinct failure message on approval-stranded vs generic-stranded so users see "approval stranded — container evicted" rather than the misleading "no pipeline attached" copy. Fanout (handlers/fanout-task-events.ts) --------------------------------------- - Slack channel default set replaces the forward-compat `approval_required` stub with the real §11.1 events: `approval_requested` and `approval_stranded`. Other approval milestones (granted/denied/timed_out/late_win/etc.) stay out of default routing to avoid notification fatigue — the CLI surfaces those confirmations directly. - Email default replaces `approval_required` with `approval_requested` (high-severity gates only; severity gating happens in the dispatcher). Create-task validation (shared/create-task-core.ts) --------------------------------------------------- - New request fields: - `approval_timeout_s` — integer within `[APPROVAL_TIMEOUT_S_MIN, APPROVAL_TIMEOUT_S_MAX]`. - `initial_approvals` — array of scope strings; each entry must be a valid `ApprovalScope` per `parseApprovalScope`; bash_pattern and write_path scopes get the §7.4 degenerate-pattern check. - TaskRecord extended with `approval_timeout_s`, `initial_approvals`, `approval_gate_count` (seeded to 0 at admission), and `awaiting_approval_request_id` (written atomically by the agent's `transact_write_approval_request` primitive). Agent plumbing (models.py / config.py / pipeline.py / runner.py / server.py) ---------------------------------------------------------------------------- - `TaskConfig` adds `approval_timeout_s`, `initial_approvals`. - `build_config`, `run_task`, `_run_task_background`, and `_extract_invocation_params` thread the two new fields from payload → config → PolicyEngine. - `server._extract_invocation_params` stamps `os.environ["TASK_STARTED_AT"]` from the payload so the hook's `_remaining_maxlifetime_s` returns real values (carry-forward from Chunk 3 resolved). - `runner.py` constructs PolicyEngine with `initial_approvals` + `task_default_timeout_s` when supplied; the engine clamps bad values at construction time. Tests ----- All CDK tests pass: 1219 / 1219. All agent tests pass: 648 / 648. Affected suites (changes only): - test/stacks/agent.test.ts: cedar-wasm layer CompatibleRuntimes now expects `nodejs24.x`; table count still 8. - test/constructs/cedar-wasm-layer.test.ts: same runtime expansion. - test/handlers/fanout-task-events.test.ts: approval_required → approval_requested/approval_stranded in Slack default set; approval_required → approval_requested in Email default set. - test/handlers/reconcile-stranded-tasks.test.ts: primeResponses now queue a third `Items: []` for AWAITING_APPROVAL queries; queryCalls assertion bumped to 3. Carry-forward (non-blocking) ---------------------------- - GetPoliciesFn has write access to TaskApprovalsTable (for the rate-limit counter path). A future permissions audit should tighten this to a single-item write scoped to `RATE#<user>#*`. - TASK_STARTED_AT env var is only set when a payload supplies it; server.py still supports the Phase 2 no-payload startup path. * feat(cedar-hitl): Chunk 6 CLI — approve / deny / pending / policies Ships the four user-facing commands that close the Cedar HITL loop: once Chunks 1-5 have a PENDING approval row and the Slack/Email fan-out has notified the user, Chunk 6 is how they actually respond. New commands (cli/src/commands/) -------------------------------- - `bgagent approve <task-id> <request-id> [--scope <scope>] [--yes]` Default scope is `this_call`; callers extend allowlist with `tool_type:Bash`, `rule:<id>`, etc. `all_session` is the only scope that requires `--yes` to confirm — mirrors the safety UX from §8.4. Error classification maps 404 → "run `bgagent pending`", 409 → "task no longer awaiting approval", 429 → rate-limit, 401 → login. - `bgagent deny <task-id> <request-id> [--reason ... | --reason-file ...]` `--reason-file` accepts multi-line reasons that would otherwise need shell quoting. Client-side `DENY_REASON_MAX_LENGTH` cap avoids a round-trip on obviously-too-long reasons; the server still truncates. Reason is sanitized server-side (output_scanner) before ever reaching the agent. - `bgagent pending [--output text|json]` Lists every PENDING approval owned by the caller. Rendered with approve/deny hints inline so the user can copy-paste the next command. JSON output for scripting. Rate-limited server-side. - `bgagent policies list --repo <owner/repo> [--tier hard|soft]` `bgagent policies show --repo <owner/repo> --rule <rule_id>` Discovery commands so users can find rule IDs without reading CDK source. Both subcommands reuse a single `listPolicies` API call and filter locally. Wire changes ------------ - `cli/src/api-client.ts`: `approveTask`, `denyTask`, `listPending`, `listPolicies` — each matching the §7.1 / §7.2 / §7.6 / §7.7 request/response shapes. `approveTask` omits the `scope` body field when unset so the server's `this_call` default applies. - `cli/src/types.ts`: mirrors the Chunk 5 server types verbatim — `ApprovalScope` union, `ApprovalRequest/Response`, `DenyRequest/Response`, `PendingApprovalSummary`, `GetPoliciesResponse`, `PolicyRuleSummary`, plus the five constants (`DENY_REASON_MAX_LENGTH`, `INITIAL_APPROVALS_MAX_ENTRIES`, `INITIAL_APPROVALS_MAX_ENTRY_LENGTH`, `APPROVAL_TIMEOUT_S_MIN/MAX/DEFAULT`). - `cli/src/bin/bgagent.ts`: registers the four new commands in the order they appear in help output. Tests: +27 new (217 total). - approve (9): default scope, custom scope, all_session guard + `--yes` bypass, JSON output, 404/409/401/429 error classifications. - deny (6): no-reason path, `--reason`, `--reason-file` with tmpdir fixture, mutually-exclusive rejection, over-length rejection, 404 classification. - pending (5): empty render, populated render with approve/deny hints, JSON output, 401 and 429 classifications. - policies (7): list both tiers, `--tier` filter, `--output json`, bad `--tier`, show found rule, show unknown rule, 404 repo-not-onboarded classification. Carry-forward ------------- - `submit.ts` extension with `--approval-timeout` / `--pre-approve` flags is deferred to a follow-up commit — the server already accepts these fields on POST /v1/tasks (Chunk 5), and `bgagent submit` already forwards unknown payload fields through the existing request path, so users can set them via `--body-file` today until the explicit flags land. - `--output json` on error branches currently returns a CliError instead of a JSON error envelope; matches the pattern the existing commands use (status, cancel, nudge). Follow-up to standardize JSON error envelopes across the whole CLI if that becomes a common scripting pain point. * feat(cedar-hitl): Chunk 7a — persist gate counter + IMPL-23 cache observability Persist approval_gate_count to TaskTable across container restarts per §13.6 so the cumulative gate budget survives eviction. Emit pre_approvals_loaded after PolicyEngine init per §4 step 7 / §11.1 so operators see the starting approval posture in the live SSE stream. Add IMPL-23 cache-hit observability: cache hits attach metadata to PolicyDecision, hook forwards to new write_policy_decision_cached progress helper (decision_source="recent_decision_cache"). Why: container restarts were silently resetting the per-task gate counter, re-exposing users to another approvalGateCap-worth of gates per restart. Cache-driven denies were invisible in TaskEventsTable beyond the initial gate. Fresh tasks emitted no "starting posture" signal so dashboards could not distinguish "no pre-approvals seeded" from "agent has not started". Surface additions: - task_state.increment_approval_gate_count_in_ddb — best-effort atomic ADD on approval_gate_count - PolicyEngine(initial_approval_gate_count=N) — seed session counter - TaskConfig.initial_approval_gate_count — orchestrator payload field - progress_writer.write_policy_decision_cached — IMPL-23 emitter - PolicyDecision.cache_hit_metadata — observability-only field - _CachedDecision.original_decision_ts — wall-clock preservation - runner._initialize_policy_engine_and_hooks — extracted helper Counter survival is a safety bound, not correctness: DDB failure does NOT block the gate (§13.6). Joint-update invariant on status + awaiting_approval_request_id (§10.2) is preserved — counter uses separate UpdateItem, not merged into resume transaction. Tests: +36 agent (648→684), +8 CDK (1219→1227), +6 new runner tests. * feat(cedar-hitl): Chunk 7b — persist approval_gate_cap from blueprint Capture the per-task approval-gate cap at submit-time (§4 step 5, decision #13, §13.6) so a blueprint-configured override is frozen onto the TaskRecord. Mid-task blueprint edits cannot shift the cap beneath a running task; container restarts re-seed the agent's PolicyEngine from the persisted value instead of its compile-time default-50. Why: Chunk 7a added approval_gate_count persistence but the cap itself was still resolved from the blueprint on every restart — so an operator lowering security.approvalGateCap mid-task would retroactively fail-close the running task. The design has always said cap is frozen at submit; this chunk makes the implementation match. Surface additions: - BlueprintProps.security.approvalGateCap (CDK, synth-validated [1, 500] integer) — new per-repo blueprint prop - RepoConfig.approval_gate_cap + BlueprintConfig.approval_gate_cap - TaskRecord.approval_gate_cap + APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT} - create-task-core now calls loadRepoConfig, resolves cap, bounds- checks, persists; returns 503 SERVICE_UNAVAILABLE on invalid blueprint data (permanent until admin re-deploys, not transient) - orchestrator.ts: isValidApprovalGateCap integer+bounds guard; logs warn if a persisted cap is structurally invalid (schema drift / hand-edited DDB row) - TaskConfig.approval_gate_cap: int | None = None (agent-side); runner threads to PolicyEngine kwarg when not None - "Task created" log line now carries approval_gate_cap + approval_gate_cap_source ("blueprint" | "platform_default") so operators can detect a broken-plumbing deploy at the single chokepoint where all fallback layers converge Per silent-failure review: - HIGH: 500 → 503 + logger.error for permanent misconfig - HIGH: cap + source in task-created log (catches 4-layer cascade) - MEDIUM: orchestrator guard tightened past typeof (NaN, Infinity, floats, out-of-bounds all omitted + warned) Tests: CDK 1263/1263 (+36), agent 694/694 (+10). CLI unchanged. * feat(cedar-hitl): Chunk 7c — observability wrap-up for resolved cap + warn path Close three deferred items from Chunks 7a/7b before Chunks 8-10: - runner.py init log now carries approval_gate_cap=N + approval_gate_cap_source=threaded|engine_default. Matches the handler log key so CloudWatch Insights can join across the cascade; agent can't distinguish blueprint-override from platform-default-frozen (handler log is the ground truth). - server.py adds _warn_cw helper routing [server/warn] lines to a dedicated CloudWatch stream (server_warn/<task_id>). stdout print is preserved for local dev + existing capsys tests. AgentCore does not forward container stdout to APPLICATION_LOGS, so pre-7c warnings about malformed invocation payloads were invisible in production. Failure counter shared with _debug_cw for a single alarm surface; hoisted above writer defs for import-time ordering safety. - blueprint.ts emits a synth-time info annotation when security.approvalGateCap is omitted so operators see a signal that the repo will rely on the platform default of 50. Without this, the default was a silent fallback at the handler layer — only visible by inspecting a TaskRecord at runtime. Tests: agent 694→700 (+6), cdk 1263→1265 (+2), cli unchanged. Design refs: §4 step 5, §11.1, §13.6, decision #13. * feat(cedar-hitl): Chunk 8a — extend approval outcome event schema Add created_at / effective_timeout_s / matching_rule_ids to approval_granted / approval_denied / approval_timed_out events so the incoming ApprovalMetricsPublisher Lambda (Chunk 8b) can compute decision latency and emit a rule_id-dimensioned timeout breakdown without a round-trip GetItem against TaskEventsTable. Fields are added conditionally — omitted from metadata when the caller did not supply them — so the event stream stays free of null-value noise and legacy callers continue to produce valid payloads. Publisher handles missing fields via explicit skip-and-log on the specific metric branch (not fallback-to-zero). Agent tests extended: +6 progress_writer tests, +3 hooks tests. Baseline 700 → 710. No consumer wired yet — this commit is a forward-compatible superset; Chunk 8b ships the CDK publisher + dashboard widgets. * feat(cedar-hitl): Chunk 8b — ApprovalMetricsPublisher + native CloudWatch dashboard widgets Ship the Cedar-HITL dashboard widgets from §11.3 / IMPL-28 via the MetricsPublisher architecture (Option E): - New ApprovalMetricsPublisher Lambda consumes TaskEventsTable DDB stream as consumer #2 (FanoutConsumer is #1; stream is within its 2-consumer soft cap — documented in task-events-table.ts). - Handler emits CloudWatch EMF for 3 metrics in namespace ABCA/Cedar-HITL: * ApprovalRequestCount + ClippedApprovalCount (reason dim) → ApprovalTimeoutClipRate widget (MathExpression with IF-guard against NaN on zero-denominator periods) * TimedOutEffectiveTimeout (rule_id dim with allowlist cardinality cap) → ApprovalTimeoutBreakdown widget * ApprovalDecisionLatencyMs (outcome dim) → ApprovalDecisionLatency widget with per-outcome p50/p90/p99 - Observability-of-observability (silent-failure review): * MetricsPublisherHeartbeat per batch so dashboard gaps distinguish "no traffic" from "pipeline broken" * MetricEmitSkipped with a reason dim on schema mismatches, parse anomalies, unknown rule ids — never fall back to latency=0 or count=0 which would poison percentile widgets * Expected high-volume skip reasons (non-milestone events, REMOVE records) DO NOT emit MetricEmitSkipped — only anomaly reasons (missing keys, missing milestone name) do, so real signal isn't drowned * Structured log lines alongside every skip so the absence of metrics is also observable via CloudWatch Logs Insights - Cardinality caps via ``RULE_ID_ALLOWLIST`` + ``normalizeClipReason``. Unknown values collapse to ``other`` / ``unknown`` buckets with dashboard series so the collapse is discoverable rather than silently accruing custom-metric cost. - Event-source-mapping filter pattern rejects non-agent_milestone records at the service layer; handler-layer allowlist catches anything that slips through. Filter pattern correctness tested structurally + positively/negatively probed (silent-failure H3). - Per-record try/catch + reportBatchItemFailures + SQS DLQ mirror the fanout-task-events.ts poison-pill pattern exactly. Deferred to Chunk 10 chore issues: - DLQ alarms (fanout + publisher) — fire-into-void until notification channel lands, so wire with §11.5 alarms as a group - Explicit log-group declaration (IAM drift defense) - stdout-flush race documentation (pre-existing pattern in fanout) - EMF 100-updates/sec throttle alarm Tests: cdk 1265 → 1327 (+62); agent 710 (unchanged); cli 217 (unchanged). All pass. §11.5 alarm plumbing now unblocked — publisher provides the metrics infrastructure the design always intended; only the notification-channel SNS wiring is left. * docs(cedar-hitl): Chunk 9 — sync design doc to Chunks 7b / 8a / 8b Bring CEDAR_HITL_GATES.md current with the code that shipped in Chunks 7b (approval_gate_cap persist), 8a (outcome event schema superset), and 8b (ApprovalMetricsPublisher + dashboard widgets): - §10.2 adds the missing approval_gate_cap row (carry-forward drift from Chunk 7b). Bounds + frozen-at-submit semantics documented. - §11.1 outcome events (approval_granted / approval_denied / approval_timed_out) now document the Chunk 8a optional fields (created_at, effective_timeout_s, matching_rule_ids) plus the publisher's skip-on-missing-field policy. - §11.1 intro names ApprovalMetricsPublisherFn as consumer #2 and points to §11.3 for the metric schema. - §11.3 rewritten to describe the Option E architecture: publisher Lambda + EMF + native CloudWatch metrics in namespace ABCA/Cedar-HITL, MathExpression with divide-by-zero guard, rule_id cardinality cap, observability-of-observability via heartbeat + skip meta-metrics, widget layout (12/12 over 24), 2-consumer stream budget. Dropped the stale "Retired the old bundled widget" line — that widget never shipped. - §11.5 reframed as "deferred (notification-channel gated)" with a plumbing-status paragraph noting the metric infra now exists; only SNS wiring remains. Alarm list expanded to include DLQ and publisher-health alarms. - §16 IMPL-28 rewritten for Option E; §15.2 row 46 expanded to reference the 4 new test files; Appendix B checklist updated. Starlight mirror regenerated via ``cd docs && node scripts/sync-starlight.mjs``. No code changes. Test baselines unchanged. Adversarial comment-analyzer review verified every new claim against committed code — zero inaccuracies. * feat(cedar-hitl): Chunk 10 review fixes — close 2 blockers + tighten 2 mediums Full-branch adversarial review (code-reviewer + silent-failure-hunter on all 18 commits) surfaced findings that only appear at final-state. Addressing the blockers + low-cost meds before deploy: B2 — stranded approvals were invisible to the dashboard: - Reconciler writes ``event_type: 'task_stranded'``; the metrics publisher's event-source filter only accepts ``event_type: 'agent_milestone'``, so AWAITING_APPROVAL evictions produced zero §11.3 signal. - Fix: reconciler now additionally emits an ``agent_milestone`` with ``milestone: 'approval_stranded'`` when the stranded task was AWAITING_APPROVAL. Publisher allowlist extended; classifier emits ``ApprovalStrandedCount`` counter. SUBMITTED / HYDRATING stranded events unchanged (guarded by test). B1 — heartbeat comment was false reassurance: - Event-source filter blocks Lambda invocation when no ``agent_milestone`` records exist in the poll window, so a quiet period produces the same widget gap as a broken pipeline. The code + design-doc wording claimed "gap = pipeline broken" which would mislead the on-call. - Fix: corrected module + function docstrings to describe the heartbeat as "present when active, not pipeline-alive-always." Operators should alarm on the combination (heartbeat-absent + recent TaskEventsTable traffic) or wire a scheduled canary — the latter tracked as a §11.5 follow-up. M1 — safety-critical milestones produced zero dashboard signal: - ``approval_cap_exceeded`` (§12.9 per-task cap) and ``approval_rate_limit_exceeded`` (per-user per-minute rate) were emitted by the agent but not on the publisher allowlist. A production bug where every gate hit the cap would have been invisible. - Fix: both added to APPROVAL_METRIC_MILESTONES with ``ApprovalCapExceededCount`` / ``ApprovalRateLimitExceededCount`` counters. No dimensions — the request_id in the event carries per-user correlation for ad-hoc log-insights investigation. H2 — filter / handler eventName disagreement: - Event-source filter required ``INSERT``; handler accepted ``INSERT`` and ``MODIFY``. Benign today (TaskEventsTable is put-only), but a future chunk MODIFY-ing records would be silently dropped by the filter while the handler was ready to process them. - Fix: handler now INSERT-only, matching the filter. Single source of truth on the eventName invariant. M1-rename — ``expected_non_approval_milestone`` skip reason was misleading (the non-metric approval milestones like ``approval_late_win`` also land in this bucket). Renamed to ``expected_milestone_not_tracked``. Tests: cdk 1327 → 1332 (+5: 3 classifier branches for new metrics, 1 reconciler AWAITING_APPROVAL path, 1 SUBMITTED-not-double-counted guard). Agent + CLI unchanged. All pre-commit hooks green; pre-push security fails only on the 3 pre-existing CVEs tracked for chore issue filing. Deferred findings from the same review (file as chore issues): - H1: agent-dies-between-TIMED_OUT-and-resume loses latency (edge, affects p99 bias) - H3: late-win APPROVED created_at staleness invariant (works today, document invariant) - H4: _warn_cw daemon-thread burst under adversarial payload - M2-M4: late-win metric, rename helpers, etc. No upstream PR filing this chunk — deploy to Sam's AWS account for integration testing first. * fix(cedar-hitl): suppress AwsSolutions-IAM5 on Runtime ExecutionRole overflow policies Synth + deploy were blocked by cdk-nag: the Cedar HITL additions (TaskApprovalsTable grant + SlackUserMappingTable + extra env vars threaded to the AgentCore runtime) pushed the runtime ExecutionRole past CDK's inline-policy size limit, so CDK auto-splits excess statements into ``OverflowPolicy1``. The overflow inherits the same wildcard ``bedrock:InvokeModel*`` / CloudWatch actions as the base policy but lives at a path (``Runtime/ExecutionRole/OverflowPolicy1/Resource``) that the existing ``addResourceSuppressions(runtime, ..., applyToChildren: true)`` cannot reach — CDK creates overflow policies lazily during synth ``prepare()``, after the construct tree has been frozen and after static suppressions have been cached. Suppress via an Aspect at MUTATING priority so the suppression is applied before cdk-nag's READONLY visitor runs. Matches any path containing ``/Runtime/ExecutionRole/OverflowPolicy`` + ending ``/Resource`` so future ``OverflowPolicy2``, etc. are covered without hardcoding indices. Verified: ``mise //cdk:synth`` now completes cleanly. ``mise //cdk:test`` still 1332/1332. * fix(cedar-hitl): E2E deploy-readiness — policies bundle + onboarding gate + CLI error visibility Three E2E T1.4 + T2.2 findings from the Chunk 10 integration-test session. Batched into one commit since all three need the same redeploy to verify: 1. agent/Dockerfile: COPY policies/ into the container image. ``PolicyEngine.__init__`` reads ``/app/policies/hard_deny.cedar`` + ``soft_deny.cedar`` at import time via ``_POLICIES_DIR = Path(__file__).parent.parent / "policies"``. The Dockerfile only copied ``src/``, so the directory was missing and every Cedar-HITL task failed at 0 turns with ``missing built-in hard-deny policies``. Introduced alongside Chunk 2 when the policy files were first added — Dockerfile was never updated. Zero tasks on this branch ever su…

@Version

…rer findings (#248) Resolve the remaining findings from isadeks's review (findings #2/#4 were already fixed in the prior commit): #1 — bare submit silently stopped opening PRs. The server ladder resolves an absent workflow_ref to the repo-less default/agent-v1 (by design), so a CLI `submit --repo X --task Y` regressed from "opens a PR" to "S3 artifact". The CLI now sends coding/new-task-v1 explicitly when a repo is present and no workflow/ pr flag is given, preserving the old new_task default. #3 — repo-optional + repo disagreed across the boundary. The agent took the repo-less path off config.requires_repo alone, while the orchestrator built a repo-bound prompt off repo presence. pipeline.run_task now takes the repo-less path only when requires_repo is false AND no repo was supplied; a repo-optional workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK admission and the prompt. #6 — resolveWorkflowRef silently discarded a @Version constraint. It now honors the constraint (Phase 1: must match the single shipped version) and returns null otherwise; create-task-core 400s with a distinct "version not available" message via resolveWorkflowRefError, instead of quietly running the shipped version. #7 — deliver_artifact unset-target default disagreed between validator (lenient full set) and runtime (s3). Introduce DEFAULT_DELIVER_TARGET as the single source of truth; produced_outcomes(None) now models that exact default, so a primary:comment workflow that omits target is correctly flagged by rule 11. #10 — descriptor/YAML drift had no parity test. The sync test now asserts required_inputs (allOf/oneOf) parity in addition to id/version/requires_repo/ read_only/model, and a new test cross-checks the agent-side _KNOWN_WRITEABLE_WORKFLOW_IDS (the third hand-maintained copy) against the writeable repo-bound descriptors so drift fails CI.

@Version

…rkflows (aws-samples#248) (aws-samples#296) * Initial design docs * docs(workflows): address principal-review findings on ADR-014 - Validator parity: declare JSON Schema as the single canonical shape contract (consumed, not re-implemented), keep exactly one cross-field validator impl in Phases 1-3, and add a contracts/workflow-validation/ golden corpus to lock any Phase-4 second impl to parity (mirrors the cedar-parity mechanism). - Cedar principal migration: split into its own isolated PR (Phase 2a) reviewed alone with regenerated parity fixtures, landing ahead of the pr_iteration/pr_review workflow migrations (Phase 2b) that depend on it. - Repo-optionality: reframe web_research as a not-yet-runnable schema fixture (not an acceptance proof) and require the two blocking open questions (memory actorId, artifact delivery) to be resolved as a recorded ADR addendum before the Phase-0 schema is frozen. Regenerated Starlight mirrors. * feat(agent): workflow models + loader (aws-samples#248 Phase 1) Add the agent/src/workflow/ package: Pydantic models mirroring workflow.schema.json and a loader that resolves a first-party workflow id, parses YAML, and shape-validates against the canonical JSON Schema. Additive only — does not yet touch task_type (the breaking cutover lands later in Phase 1). - models.py: Workflow + sub-models; resolved_requires_repo applies the domain-derived default (coding=true, knowledge/hybrid=false). - loader.py: JSON-Schema shape validation (one canonical contract, consumed not re-implemented), YAML load, id/path agreement, and path-traversal defense. Cross-field rules deliberately live in the separate validator (Task #2), so there is one cross-field impl. - Promote pyyaml + jsonschema to direct deps (were only transitive); the loader must not depend on another package's transitive pin. - Fix a latent schema bug found by the new tests: the requires_repo/ read_only allOf conditionals fired vacuously when the property was absent, wrongly applying repo-less constraints to a coding workflow relying on the domain default. Guard each `if` with `required`. 17 new tests; agent ruff + ty + full suite (836) green. * feat(agent): cross-field workflow validator + golden corpus (aws-samples#248 Phase 1) The single CI-time implementation of the WORKFLOWS.md cross-field rules the JSON Schema cannot express (rules 1-9, 11, 12, 14). The runtime loader stays shape-only and trusts this verdict, so there is exactly one cross-field implementation in Phases 1-3 (avoids the cedar-style two-language drift). contracts/workflow-validation/ ships the golden parity corpus from Phase 1 so the expected-verdict contract is fixed before aws-samples#246 adds a second validator, mirroring contracts/cedar-parity/. test_workflow_tree_valid.py gates every first-party workflow file through the validator at CI time. * feat(agent): workflow step runner + handler registry (aws-samples#248 Phase 1) The agent-side step runner that interprets workflow.steps inside the container (ADR-014): a StepKind→handler registry, in-order execution honouring each step's on_failure policy (fail / continue / skip_remaining), per-step step:<name>:start/<status> progress milestones, and resume-aware checkpointing (deterministic side-effect-free steps recorded to workflow_state.json on the persistent mount are skipped on resume; agentic/side-effecting steps re-run idempotently per WORKFLOWS.md §Step execution semantics). Handlers are thin wrappers over the existing helpers (setup_repo, run_agent, verify_build/lint, ensure_pr) so this is maximal structural change, minimal logic change. post_review/deliver_artifact are registered but raise NotImplementedError (Phase 2b / Phase 3) — fail-loud, not silent no-op, keeping validator rule-8 handler parity honest. Orchestration core is unit-tested with fakes; wiring into pipeline.run_task is task 5. * feat(agent): first-party workflow files — coding/new-task-v1 + default/agent-v1 (aws-samples#248 Phase 1) Migrates the new_task task type to coding/new-task-v1.yaml (the heavyweight clone→build→open-PR coding path) and ships the platform default workflow default/agent-v1.yaml — the conservative last rung of the resolution ladder used when no workflow_ref and no Blueprint default apply (requires_repo:false, read-leaning tool set, deliver-as-comment). pr_iteration/pr_review migrations are Phase 2b (depend on the Cedar principal migration). Both pass the cross-field validator and the test_workflow_tree_valid CI gate (file path agrees with declared id; zero violations). * fix(agent): address code-review findings in workflow validator + runner (aws-samples#248 Phase 1) Fixes the group-A bugs surfaced by the high-recall code review of the Phase 1 commits: runner.py - Resume-skip product loss: clone_repo/hydrate_context populate in-memory products (ctx.setup, ctx.user_prompt) that can't be rebuilt from the JSON checkpoint, so skipping them on resume left ctx.setup None (breaking ensure_pr) and ctx.user_prompt empty (unguided agent). Narrow _RESUMABLE_SKIP_KINDS to verify_build/verify_lint — steps whose ENTIRE product is the checkpointed boolean re-applied via ctx.artifacts. clone_repo/hydrate_context now re-run on resume (handler-level idempotency), matching WORKFLOWS.md's intent. - Skipped steps now emit step:<name>:start / :skipped milestones so a watcher sees them accounted for instead of a gap. validator.py - _HANDLER_KINDS (rule 8) is now derived from runner.STEP_HANDLERS — the single source of truth — instead of a hand-copied list that could silently drift. - rule-6 tier ceiling now applies the elevated-only-fields check to read-only (the strictest tier) as well as standard; previously read-only could declare mcp_servers/plugins/skills unchecked. - rule-11 now verifies a deliver_artifact step's target actually produces the declared comment/artifact outcome, not merely that the step kind is present. Tests + corpus updated; rule3/rule7 fixtures switched to s3_and_comment target to stay isolated to their intended rule, plus a new rule11 target-mismatch fixture. Full agent suite green (909 passed), ruff + ty clean. * feat(agent): wire workflow runner into pipeline behind a gate (aws-samples#248 Phase 1, task 5) Builds the runner→pipeline seam and gates it off (default production behavior unchanged). pipeline.run_task's agent invocation now goes through _execute_agent_step: when WORKFLOW_RUNNER_ENABLED is set AND task_type maps to a shipped workflow (only new_task→coding/new-task-v1 in Phase 1), the single run_agent step is dispatched through workflow.run_workflow — exercising the real handler registry, step milestones, and result threading — while clone, context assembly, and post-hooks stay on the proven inline path. Otherwise it is exactly the legacy asyncio.run(run_agent(...)) call. pr_iteration/pr_review have no workflow file until Phase 2b, so they fall back to inline. run_workflow gains an only_kinds filter so the seam drives just the run_agent step (no double clone / double PR against the inline post-hooks). The flag is flipped to default-on in the tasks 6-8 cutover. Folds in the group-B code-review fixes (effective when the seam is enabled): - system prompt: hydrate_context builds ctx.system_prompt via the existing build_system_prompt (was empty), reusing pipeline's logic rather than reimplementing it. - verify gate: new shared _gate_status gives verify_build/verify_lint matching semantics — informational/read_only never gate, regression_only consults build_before/lint_before (mirrors pipeline's build_ok = passed or not build_before), strict always gates. Fixes the regression_only build bug and the verify_lint read_only asymmetry. - clone_repo is idempotent (reuses a pre-populated ctx.setup). - StepContext threads the --trace trajectory into run_agent (was dropped). ensure_pr strategy reconciliation stays deferred to task 8 (it requires removing the task_type branch inside ensure_pr); recorded in local-docs. Full agent suite green (930 passed), ruff + ty clean. * fix(agent): address second-round review findings on the workflow seam (aws-samples#248 Phase 1) - Error-handling contract (highest severity): when the run_agent step's handler raises, run_workflow captures it into a failed StepOutcome rather than propagating. _execute_agent_step now RE-RAISES in that case so run_task's except block restores full fidelity — the log_error_cw APPLICATION_LOGS mirror (read by TaskDashboard / bgagent status), the span error, and the real exception text — instead of silently downgrading to a generic AgentResult. This also makes the previously-dead agent_result-is-None branch live. - _gate_status: an unset gate now defaults to regression_only semantics (gate only a regression; a pre-existing failure does not gate), matching pipeline.py which is unconditionally 'build_ok = passed or not build_before'. Previously an unset gate fell through to strict, contradicting the docstring's 'mirrors pipeline.py' claim and would wrongly fail a broken-before repo. - run_agent handler fails loud on an empty system prompt rather than running an unguided agent loop (repo-less prompt assembly is Phase 3; this turns the gap into an attributable failed step instead of a silent context-free run). - _workflow_id_for_task_type: documented as a transitional bridge tied to the canonical task_type→workflow table in WORKFLOWS.md / API_CONTRACT.md, to be kept in sync until tasks 6-8 delete it. Tests added for each behavior change. Full agent suite green (934 passed), ruff + ty clean. * feat(api,cli): replace task_type with workflow_ref/resolved_workflow (aws-samples#248 tasks 6-7) Breaking API change (no alias): task_type is removed end-to-end; workflow_ref selects a workflow and resolves to a pinned {id, version} at the create-task boundary. CDK/CLI types (task 6, in lockstep — check:types-sync): - Remove TaskType + isPrTaskType; add ResolvedWorkflow. - CreateTaskRequest.workflow_ref?; TaskRecord/TaskDetail/TaskSummary resolved_workflow (+ mappers). - CLI: --pr/--review-pr now set workflow_ref (coding/pr-iteration-v1, coding/pr-review-v1); new --workflow flag; format reads resolved_workflow.id. CDK resolution boundary (task 7): - New workflows.ts: CDK-side descriptor mirror of agent/workflows/** + resolveWorkflowRef (ladder: explicit ref → default/agent-v1), isValidWorkflowRef, requires_repo/read_only/uses_pr accessors. Drift-guard test keeps the table in sync with the shipped YAML. - validation.ts: drop VALID_TASK_TYPES/isValidTaskType; hasTaskSpec now takes the resolved workflow's required_inputs contract. - create-task-core.ts: resolve workflow, validate inputs against the contract, persist workflow_ref + resolved_workflow. Repo stays required for ALL workflows in Phase 1 (repo-less admission is Phase 3). - preflight.ts: taskType param → readOnly + requiresRepo; repo-less short-circuits to passed (seam for Phase 3). - orchestrate-task/context-hydration/orchestrator/ecs-strategy: derive read_only/requires_repo/PR-ness from resolved_workflow; payload swaps task_type → resolved_workflow. ecs bootCommand passes resolved_workflow (lockstep with agent run_task in task 8). CLI 294 tests + CDK suite green; check:types-sync OK. Agent cutover (task 8) lands next to consume resolved_workflow. * feat(agent): complete task_type→workflow cutover in the agent (aws-samples#248 task 8) Removes the Python TaskType enum / PR_TASK_TYPES / _PROMPTS-by-task_type and the gated seam; the workflow runner is now the sole agentic path, driven by the resolved_workflow {id, version} threaded from the create-task boundary. - models.py: delete TaskType enum; TaskConfig gains resolved_workflow, policy_principal, is_pr_workflow (drops task_type). - config.py: build_config takes resolved_workflow (not task_type); derives policy_principal via workflow.policy_principal_for and is_pr_workflow; PR_TASK_TYPES → PR_WORKFLOW_IDS. - Cedar UNCHANGED (Phase 2a owns the principal migration): the Agent::TaskAgent::"<id>" scheme and contracts/cedar-parity/ are untouched. policy_principal_for maps read_only⇒pr_review, else id→legacy {new_task, pr_iteration, pr_review}. Only policy.py edit: drop the WARN-only TaskType() validation. runner.py passes config.policy_principal as the principal. All test_policy* pass unchanged. - prompts rekeyed by workflow id (get_system_prompt falls back to the default coding prompt for an unknown id rather than raising). - pipeline.py: _execute_agent_step loads from resolved_workflow, always-on; the 3 task_type=='pr_review' branches → workflow.read_only; ensure_pr takes an explicit strategy (create/push_resolve/resolve) from the workflow step, resolving the deferred code-review item. - post_hooks.ensure_pr: strategy param replaces task_type self-inspection. repo.py: PR-branch resume keys off config.is_pr_workflow. - server.py/entrypoint.py: thread resolved_workflow; validation keyed on PR workflow ids. - New faithful Phase-1 workflow files coding/pr-iteration-v1 (push_resolve) + coding/pr-review-v1 (read_only, resolve, no Write/Edit per rule 4). All 4 first-party files validate; CDK descriptor drift-guard green. - docs: API_CONTRACT.md migration table + resolved_workflow responses; Starlight mirror synced. Agent suite 922 passed, ruff + ty clean. With tasks 6-7 (e829c4f) this completes the breaking cutover across API + CLI + agent. * fix(agent): use ty ignore syntax + annotate workflow test handlers (aws-samples#248) The full build was never completed on this branch; ty type-checking and ruff format gates were red. No src logic changes — type/format hygiene only. - Tests used mypy-style `# type: ignore[arg-type]`, but the project type checker is ty (`# ty: ignore[rule]`), so the suppression silently no-op'd. - test_workflow_runner.py: annotate inline step handlers as StepHandler-compatible `(step: Step, ctx: StepContext) -> StepOutcome`; type the `handlers` dicts as `dict[str, StepHandler]` (ty treats the alias params as positional-only → dict-value invariance rejects named-param funcs); convert the stale ignore to `# ty: ignore[invalid-argument-type]`. - test_entrypoint.py: assert `resolved_workflow is not None` before subscripting (TaskConfig.resolved_workflow is `dict | None`). - Apply ruff format reflows (runner.py, validator.py, test_workflow_tree_valid.py) the branch was committed without. * feat(agent): Cedar read-only enforcement keys off context.read_only (aws-samples#248 Phase 2a) Migrate read-only enforcement from the literal `Agent::TaskAgent::"pr_review"` principal match to the `context.read_only` attribute, so the Write/Edit hard-deny attaches to the *property* and fires for every read-only workflow uniformly — not just coding/pr-review. Removes the Phase-2b principal bridge. This is the security-load-bearing step: an error here *silently weakens* enforcement (the rule stops matching) rather than failing loudly. Guarded by new contracts/cedar-parity/ golden fixtures verified on BOTH the cedarpy (Python) and cedar-wasm (TypeScript) engines. Policy (both bindings, kept byte-for-byte identical — drift guard enforces): - agent/policies/hard_deny.cedar + cdk/.../builtin-policies.ts: pr_review_forbid_write/edit (principal == "pr_review") → read_only_forbid_write/edit (when context.read_only == true, any principal). Wiring: - policy.py: PolicyEngine gains read_only kwarg; threads read_only into every Cedar request context (probe + base_context). - models.py: TaskConfig.read_only; config.py derives it from the resolved workflow; runner.py passes config.read_only to PolicyEngine. - workflow/loader.py: remove the read_only ⇒ "pr_review" bridge in policy_principal_for — the principal is now an identity/audit tag only; pr-review keeps its own id-derived principal. Parity fixtures (new): read-only-forbid-write, read-only-forbid-edit (read_only=true ⇒ deny), read-only-false-permits-write (read_only=false ⇒ base_permit — proves the deny is gated on the property, not always-on). Tests: TestPrReviewPermissions → TestReadOnlyPermissions, now asserting the deny fires for any read-only principal AND that read_only=false permits Write; test_hooks + test_config updated. agent 927 / cdk 1822 green; ty + ruff clean. Docs: ADR-014 addendum (2026-06-08) records dropping the isolated-PR/ordering requirement (2b shipped first behind the bridge, so read-only was never unprotected); WORKFLOWS.md §"Replacing the Cedar principal" + phasing table updated to past tense. * docs(workflows): resolve repo-optional open questions + freeze schema (aws-samples#248) ADR-014 addendum (2026-06-08) settles the two open questions that gated the Phase-0 schema freeze (WORKFLOWS.md open questions #1, #2). With the one implied schema reshape applied, the schema is frozen — later phases add handlers and plumbing, not schema fields. Decision 1 — Memory actorId for repo-less tasks: per-user user:{cognito_sub}. Caller-scoped (no cross-tenant bleed; mirrors the per-user trace prefix); cross- workflow pooling explicitly not adopted. NO schema field — a fixed platform fallback, applied in memory.py in Phase 3 when repo is absent. Decision 2 — Artifact delivery: named Python deliverers (open target string), shared S3 plumbing pinned now. deliver_artifact.target resolves to a registered deliverer (workflow/deliverers.py → DELIVERERS), same pattern as STEP_HANDLERS — a new delivery method is a registered deliverer, not a schema change. Plumbing frozen: task-scoped key artifacts/{task_id}/, prefix-scoped SessionRole grant, size limit, TaskDetail URL surfacing, workflow:{id} repo-less tenant tag. Schema reshape applied: - workflow.schema.json: steps[].target drops its enum (stays type:string, minLength:1); the closed set moves into the deliverer registry. - models.py: DeliverTarget Literal → str. - New workflow/deliverers.py: Deliverer dataclass + DELIVERERS registry + produced_outcomes(); single source of truth for "what each target produces." - validator.py rule 11: consults the registry (_deliverer_produces) instead of the hardcoded _DELIVER_TARGET_OUTCOMES map. The three first-party names keep their exact produced-outcome sets, so NO existing workflow / fixture / golden vector changes verdict; an unknown deliverer name now produces nothing (new guard, tested). Tests: test_workflow_deliverers.py (registry contract) + a rule-11 unknown-deliverer case. agent 932 green; ty + ruff clean. deliver_artifact remains a NotImplementedError stub until Phase 3 — only the contract is frozen. * feat(api,agent): repo-optional admission for repo-less workflows (aws-samples#248 Phase 3) First slice of Phase 3 (repo-optional tasks): the admission path now accepts a submission with no repo when the resolved workflow does not require one, so a repo-less knowledge workflow (e.g. default/agent-v1) can be admitted, hydrated, and run from task_description alone. The deliverer/S3 + agent-runtime clone-skip slices follow separately. Semantics: requires_repo decides whether a repo is MANDATORY; requires_repo:false means repo-OPTIONAL (a repo-less workflow may still target a repo). The repo-less behavior keys off repo *absence*, not the workflow flag — a repo-optional workflow given a repo still takes the repo-bound path. Types (CDK + CLI mirror, kept in sync — check:types-sync green): - CreateTaskRequest.repo, TaskRecord.repo → optional; TaskDetail.repo, TaskSummary.repo → string | null. Mappers null-coalesce. Admission (create-task-core.ts): resolve the workflow FIRST, then gate repo on workflow.requiresRepo (missing repo on a repo-bound workflow → 400). Onboarding + blueprint-cap lookup runs only when a repo is present; repo-less takes the platform default cap. Record omits repo when absent; replay-required-fields drops 'repo'; event metadata null-coalesces. Hydration (context-hydration.ts): new repo-less branch (keyed on !task.repo) that assembles from task_description + per-user memory, returning before the repo-bound fetch path; assembleUserPrompt drops the Repository: line when repo is absent. Memory (memory.ts + orchestrator.ts): loadMemoryContext/writeMinimalEpisode take an actorNamespace (repo for coding, user:{user_id} for repo-less — ADR-014 addendum). Preflight already honored requiresRepo (Phase-1 seam) — widened its repo param to string|undefined. fanout-task-events skips the GitHub channel for a repo-less task. loadBlueprintConfig skips the RepoTable lookup when repo is absent. Agent (config.py): build_config loads the workflow up-front and requires repo_url/github_token only when requires_repo; repo-less runs from task_description. CLI shows '—' for a repo-less task. Tests: repo-less acceptance (create-task-core, create-task, context-hydration, CLI format) + repo-bound-missing-repo rejection; memory test log key repo → actor_namespace. agent 932 / cdk 1825 / cli 296 green; ty + ruff + eslint clean. deliver_artifact is still a NotImplementedError stub — a repo-less task can be admitted and hydrated but not yet deliver an artifact (next slice). * feat(agent): repo-less pipeline path via the workflow runner (aws-samples#248 Phase 3) Second Phase-3 slice: a repo-less workflow now runs the agent loop in-container with no clone / build / PR. When config.requires_repo is false, run_task delegates to _run_repoless_task, which drives the workflow's steps (hydrate_context → run_agent) through the workflow step runner — coding vs knowledge now differ by the workflow's steps list, per ADR-014's end-state — and assembles a terminal TaskResult with pr_url=None. Scope boundary (deliberate): only hydrate_context + run_agent are driven here. deliver_artifact is the workflow's terminal step but its handler is a stub until the artifacts-bucket S3/IAM contract ships (next slice), so it is excluded via only_kinds; delivery is deferred, the agent run is not. The coding path is left byte-identical — its hard-won inline cancel-safety (skip ensure_pr on a cancelled task) is NOT yet runner-driven, so routing the full coding step list through the runner is a separate cancel-aware change, intentionally not done here. - models.py: TaskConfig.requires_repo (defaults True; coding unaffected). config.py build_config sets it from the resolved workflow. - pipeline.py: requires_repo branch before the repo-coupled segment; _run_repoless_task runs the runner + assembles/persists the terminal result. run_task repo_url now defaults to "" (repo-less callers omit it). - prompts/default_agent.py: repo-less system prompt (no git/branch/PR placeholders); registered for default/agent-v1. - prompt_builder.build_repoless_system_prompt + shared _render_memory_context. Tests: repo-less pipeline run (no setup_repo, no PR, success, repo-less prompt); repo-less prompt has no repo placeholders. agent 934 green; ty + ruff clean. * fix(agent): address PR-review findings on Phase 2a/3 (aws-samples#248) Findings from the plugin review (code-reviewer, silent-failure-hunter, pr-test-analyzer, type-design-analyzer) on 4c34c8b..HEAD. HIGH — repo-less task reported COMPLETED while delivering nothing (silent-failure-hunter). _run_repoless_task skips deliver_artifact (stub until the artifacts contract ships), but the workflow's terminal outcome (e.g. `comment`) was never produced, yet status was success. Now: if the primary terminal outcome is delivery-backed and delivery was skipped, the task is a loud FAILED with an explicit error + delivery_deferred milestone naming the gap. MEDIUM — config.py load-failure fallback bug + fail-open (code-reviewer, silent-failure-hunter). The WorkflowValidationError branch compared workflow_id against DEFAULT_WORKFLOW_ID (the *coding* default) for requires_repo, contradicting its comment, and defaulted read_only=False (fail-open) for unknown ids. Now: log the failure (was silent); fail CLOSED — read_only=True for any id not in _KNOWN_WRITEABLE_WORKFLOW_IDS; requires_repo keyed off the real repo-less default (REPO_LESS_DEFAULT_WORKFLOW_ID = default/agent-v1). MEDIUM — TaskConfig missing cross-field invariant (type-design-analyzer). Added _validate_requires_repo_has_repo (mirrors _validate_trace_requires_user_id): requires_repo=True ⇒ non-empty repo_url, enforced at construction. repo_url and github_token now default to "" so a repo-less TaskConfig is constructible; the validator preserves the repo-bound invariant. LOW/secondary — rule-8 now also checks deliver_artifact.target resolves in DELIVERERS (type-design-analyzer), so an unknown target is caught universally, not only when it collides with the primary outcome. Test gaps closed (pr-test-analyzer): repo-OPTIONAL workflow given a valid repo (admission + hydration); repo-less memory keyed user:{user_id} (asserted); malformed-repo-on-repo-bound rejection; preflight repo-less short-circuit + repo-bound-no-repo invariant; repo-less agent_result=None → FAILED; config load-failure fail-closed; TaskConfig validator; rule-8 deliver target. agent 941 / cdk 1831 / cli 296 green; ty + ruff + type-sync clean. * feat(agent,api): deliver_artifact + repo-less memory — close criterion aws-samples#4 (aws-samples#248 Phase 3) Final Phase-3 slice: a repo-less knowledge task now runs end-to-end — admitted, hydrated, agent loop, AND delivers its output. Closes acceptance criterion aws-samples#4 (a non-coding task runs without a repo) for the s3/comment deliverers. Infra (IAM + env): - agent-session-role.ts: s3:PutObject grant scoped to artifacts/${aws:PrincipalTag/task_id}/* on the trace-artifacts bucket (mirrors the traces/ pattern; task_id-scoped per ADR-014 addendum). cdk-nag reason updated. - agent.ts: ARTIFACTS_BUCKET_NAME runtime env (same bucket as traces). Deliverers (agent/src/workflow/deliverers.py): - DELIVERERS gains a deliver(target, ctx) dispatcher + DeliveryResult. The s3 deliverer uploads the agent's result text to artifacts/{task_id}/result.md via the tenant-scoped S3 client (5 MiB cap); the comment deliverer records a delivered_comment milestone for the channel fanout; s3_and_comment does both. - AgentResult.result_text captures ResultMessage.result (the knowledge-task deliverable); runner.py populates it on success. - runner._handle_deliver_artifact: now implemented (was NotImplementedError), routes through deliver(); a delivery failure is a failed step (terminal FAILED), never a silent skip. Pipeline (pipeline.py): - _run_repoless_task drives the FULL step list (deliver_artifact included); replaces the delivery-deferred FAILED gate with the real runner verdict (agent succeeded but workflow failed ⇒ attributed FAILED). Sets TaskResult.artifact_uri. - Repo-less episodic memory write keyed user:{user_id} (fail-open). Memory (memory.py): write_task_episode repo→actor param; _validate_actor accepts owner/repo OR user:{id} so the same write path serves coding + knowledge tasks. Types: TaskResult/TaskRecord/TaskDetail + CLI mirror gain artifact_uri (mirrors trace_s3_uri); CLI detail view shows an Artifact: line. check:types-sync green. Tests: deliverer dispatch (s3 upload + key/body, comment milestone, s3_and_comment, missing bucket, empty text, oversize); repo-less pipeline now succeeds + sets artifact_uri=null for comment-only; session-role artifacts grant; deliver_artifact dropped from the fail-loud stub list. agent 947 / cdk 1832 / cli 296 green; synth + cdk-nag clean; ty + ruff + type-sync clean. * feat(agent): make repo-less delivery retrievable + ship a knowledge workflow (aws-samples#248 Phase 3) Three should-fix items closing the gap where "criterion aws-samples#4" was met by machinery but the default repo-less path delivered to a channel that isn't wired. 1. Stale docstring: _run_repoless_task said deliver_artifact was a deferred stub — it ships now. Updated to describe the real full-step-list run. 2. Default workflow delivers retrievably. The channel fan-out only knows slack/email/github; an api-origin repo-less task (the common default case) has no channel, so a `comment`-only deliverer surfaced nothing the user could get. default/agent-v1 now uses `target: s3_and_comment` (primary outcome `artifact`): the S3 upload to artifacts/{task_id}/ is always retrievable, the comment milestone is still rendered by the fan-out when a channel exists. 3. Ship a runnable knowledge workflow: agent/workflows/knowledge/web-research-v1.yaml — the concrete repo-less demonstration (research → S3 artifact, no repo). Built-in capabilities only (registry MCP/skills are Phase 4); CDK descriptor mirror + drift-guard + resolver tests updated. Also fixes a latent bug the knowledge workflow surfaced: a repo-less id with no registered prompt fell back to the CODING prompt (leaking unsubstitutable {repo_url}/{branch_name}). get_system_prompt gains repo_less= so repo-less workflows fall back to the repo-less default-agent prompt instead. Tests: repo-less pipeline now asserts artifact_uri set (s3_and_comment); repo-less prompt fallback; knowledge workflow validates/resolves. agent 949 / cdk 1833 green; ty + ruff + type-sync clean; docs synced. * fix(agent): address full-branch review — 2 repo-less blockers + cleanups (aws-samples#248) Findings from the four-reviewer full-branch pass. Two were blockers that would have shipped the repo-less feature broken despite green unit tests (the tests didn't cross the server/task_state boundaries). BLOCKER 1 — repo-less task rejected on the AgentCore backend. server.py _validate_required_params required repo_url unconditionally, so /invocations returned 400 for every repo-less task before the pipeline ran (ECS backend dodged it by calling run_task directly). Now gates repo_url on the workflow's requires_repo (resolved via load_workflow; fails SAFE — assume required — on a load error). Test: repo-less payload accepted, repo-bound still requires repo. BLOCKER 2 — artifact_uri computed but never persisted. task_state.write_terminal's field allowlist had no artifact_uri branch, so the URI was dropped before DynamoDB and TaskDetail.artifact_uri was always null (delivery not retrievable despite the S3 object existing). Added the persist branch + tests. Secondary: - comment deliverer no longer overstates: _post_comment docstring + WORKFLOWS.md state plainly that delivered_comment is recorded for the event stream (visible in `bgagent watch`) but is NOT yet routed to an external channel (not in the fan-out ROUTABLE_MILESTONES). comment_posted=True only when actually recorded. - test pins the config→engine read_only seam (runner.py PolicyEngine(read_only=)): dropping it now fails a test instead of silently disabling the Write/Edit deny. - stale phase-boundary comments corrected (post_review docstring, runner module docstring + only_kinds + run_agent guard, pr-review/pr-iteration YAML headers) — they described Phase 2a/2b/3 as pending when those shipped on this branch. - ADR-014: replaced the dangling pre-rebase commit hash 838c72e (unreachable after rebase) with a descriptive reference. agent 954 / cdk 1833 green; ty + ruff + type-sync clean; docs synced. * test(cli): cover artifact_uri render lines (aws-samples#248 Phase 3) The Artifact: display lines added to formatTaskDetail / formatStatusSnapshot introduced uncovered truthy branches that dropped CLI branch coverage below the 84% gate (full `mise run build` caught it; the per-package run had passed on the prior content). Adds non-null + null cases for both renderers, mirroring the existing trace_s3_uri tests. cli 300 passed; full build green. * fix(agent): address PR review findings + green CI (aws-samples#248) CI: ruff-format the 4 files that tripped the build self-mutation (config.py, deliverers.py, test_pipeline.py, test_task_state.py) so "Fail build on mutation" passes. Review findings: - prompts/__init__.py: get_system_prompt now logs WARN when an id has no registered prompt and falls back, restoring the signal the old build_system_prompt emitted (silent-failure-hunter). - pipeline._run_repoless_task: chain wf_result.failed_step.error into the synthesized AgentResult so a run_agent failure on the repo-less path is diagnosable instead of a generic message (code-reviewer). - server._validate_required_params: narrow the broad `except Exception` around load_workflow to WorkflowValidationError/ImportError so a genuine programming error surfaces loudly instead of being masked as "could not resolve requires_repo" (still fails safe — repo required). - workflows.ts workflowIsReadOnly: document why the unknown-id default is `false` (conservative admission: needsWrite=!readOnly) and must NOT be "aligned" to the agent-side fail-closed read-only default. - runner.py / pr-review-v1.yaml: comment fixes — artifact_key→artifact_uri, drop never-written review_posted, rule 4→6, post_review handler is registered-but-unimplemented not "no shipped handler" (comment-analyzer). Tests (pr-test-analyzer gaps): - test_memory.py: cover _validate_actor (user:{id} namespace + rejects) and a write_task_episode(actor="user:...") reaching create_event. - test_pipeline.py: cover the repo-less delivery-gate-failure branch (agent succeeds but deliver_artifact fails → terminal FAILED naming the step). Docs: ROADMAP.md "Task types" now reflects workflow-driven tasks + repo-less knowledge workflows; Starlight mirror regenerated. agent 962 passed, ruff+ty clean, cdk compile+workflows tests green. * fix(agent): ship first-party workflow files in the image + sync user docs (aws-samples#248) The Dockerfile copied agent/src, agent/policies, and contracts but never agent/workflows, so the runtime loader (which resolves /app/workflows/<domain>/ <name>-vN.yaml) failed every workflow load with WorkflowValidationError. Add COPY agent/workflows/ /app/workflows/ so the five shipped workflows + the JSON schema land in the image. Verified in a local build: load_workflow( 'coding/new-task-v1') resolves to 1.0.0. Also bring the user-facing guides in line with the workflow-driven model: USER_GUIDE/QUICK_START now document workflow_ref, the --workflow flag, and resolved_workflow responses (the retired task_type is shown only as a migration mapping). Rename the "Task types" section to "Workflows", update the Starlight sync map + sidebar slug, and regenerate the mirrors. * fix(agent,api,cli): address PR-review findings on the workflow seam (aws-samples#248) Resolve five verified findings from the aws-samples#248 review: HIGH — thread agent_config.allowed_tools to the SDK. runner.py hardcoded the full tool surface, so a workflow's declared allowed_tools never reached ClaudeAgentOptions and the read_only:false read-leaning lanes (default/agent-v1, web-research-v1) could still get Write/Bash. Add TaskConfig.allowed_tools (populated from the resolved workflow), resolve it in run_agent via _resolve_allowed_tools (drops Write/Edit when read_only), and add SDK-layer tests. Cedar missing-read_only fail-open — add the ADR-014-promised parity vector. With context omitting read_only, both engines fail OPEN (Allow + base_permit); safety rests on policy.py::_eval_tier re-raising on diagnostics.errors. Add contracts/cedar-parity/read-only-missing-attribute.json (verified on cedarpy AND cedar-wasm) plus test_policy.py tests pinning the re-raise + always-inject discipline. MEDIUM — repo-less artifact success gate. pipeline.py now fails the task when the primary outcome is `artifact` but no artifact_uri was produced (matches the WORKFLOWS.md "agent-success AND S3 key present" contract), not just when the deliver step raised. MEDIUM — rule 13 model allow-list admission. Add WORKFLOW_MODEL_ALLOWLIST + disallowedWorkflowModel() and enforce it in create-task-core (unpermitted model => 400, no silent downgrade); descriptor sync-test asserts modelId matches the YAML and stays on the allow-list. LOW — CLI repo-less submission. --repo is no longer a hard requiredOption: required unless --workflow selects a repo-less workflow (still required with --pr/--review-pr). Lets bgagent reach knowledge/web-research-v1. Rule 10 (one production version per lineage) remains intentionally unenforced at the single-file layer — it is a registry/promotion-time property. * fix(agent,api,cli): address PR-review aws-samples#296 resolution-ladder + deliverer findings (aws-samples#248) Resolve the remaining findings from isadeks's review (findings #2/aws-samples#4 were already fixed in the prior commit): #1 — bare submit silently stopped opening PRs. The server ladder resolves an absent workflow_ref to the repo-less default/agent-v1 (by design), so a CLI `submit --repo X --task Y` regressed from "opens a PR" to "S3 artifact". The CLI now sends coding/new-task-v1 explicitly when a repo is present and no workflow/ pr flag is given, preserving the old new_task default. aws-samples#3 — repo-optional + repo disagreed across the boundary. The agent took the repo-less path off config.requires_repo alone, while the orchestrator built a repo-bound prompt off repo presence. pipeline.run_task now takes the repo-less path only when requires_repo is false AND no repo was supplied; a repo-optional workflow given a repo runs the repo-bound path (clone/build/PR), matching CDK admission and the prompt. aws-samples#6 — resolveWorkflowRef silently discarded a @Version constraint. It now honors the constraint (Phase 1: must match the single shipped version) and returns null otherwise; create-task-core 400s with a distinct "version not available" message via resolveWorkflowRefError, instead of quietly running the shipped version. aws-samples#7 — deliver_artifact unset-target default disagreed between validator (lenient full set) and runtime (s3). Introduce DEFAULT_DELIVER_TARGET as the single source of truth; produced_outcomes(None) now models that exact default, so a primary:comment workflow that omits target is correctly flagged by rule 11. aws-samples#10 — descriptor/YAML drift had no parity test. The sync test now asserts required_inputs (allOf/oneOf) parity in addition to id/version/requires_repo/ read_only/model, and a new test cross-checks the agent-side _KNOWN_WRITEABLE_WORKFLOW_IDS (the third hand-maintained copy) against the writeable repo-bound descriptors so drift fails CI. * fix(agent): close PR-review aws-samples#296 follow-ups — post-hook fail-soft, prompt, cap, parity (aws-samples#248) aws-samples#5 (High) — post-hook load_workflow had no fail-soft fallback, so a reload failure AFTER run_agent mutated/committed the tree stranded the work as FAILED with no PR. read_only now comes from config (already fallback-computed and the verdict that drove Cedar); the reload — needed only for the ensure_pr strategy — is wrapped in the same WorkflowValidationError fallback build_config uses, defaulting to "create" so the PR still opens. aws-samples#8 — knowledge/web-research-v1 had no registered prompt and silently degraded to the generic default-agent prompt. Add a research-specialized, repo-less prompt (prompts/web_research.py) and register it. aws-samples#9 — the 5 MiB artifact cap was checked AFTER encoding to UTF-8 (a second full copy). Reject on the character count up front (chars ≤ bytes in UTF-8) so the cap actually bounds memory on the MicroVM before materializing the bytes. aws-samples#10 residual — add a parity test for the agent-side REPO_LESS_DEFAULT_WORKFLOW_ID / DEFAULT_WORKFLOW_ID constants: the repo-less default must match the CDK platform default and be requires_repo:false in the YAML; the coding default must be repo-bound. Closes the last hand-maintained-copy axis aws-samples#10 named. Perf — memoize load_workflow (@cache): files are image-baked and immutable per process and Workflow is frozen, so the 3-4 loads per task now parse once. Leaves the coding-lane decorative-`gate` item (run_workflow only drives run_agent while verify_build/ensure_pr stay inline) as a tracked follow-up — it is a larger half-migrated-runner refactor both reviewers flagged as non-blocking. --------- Co-authored-by: bgagent <bgagent@noreply.github.com>

MichaelWalker-git requested a review from a team as a code owner April 2, 2026 04:44

MichaelWalker-git requested a review from Copilot April 2, 2026 21:52

Copilot started reviewing on behalf of MichaelWalker-git April 2, 2026 21:53 View session

Copilot AI reviewed Apr 2, 2026

View reviewed changes

MichaelWalker-git force-pushed the feat/fargate-agent-stack branch from 2dbf7eb to 574b205 Compare April 3, 2026 16:24

MichaelWalker-git mentioned this pull request Apr 7, 2026

feat(compute): add ComputeStrategy interface with ECS Fargate backend #8

Merged

5 tasks

MichaelWalker-git closed this Apr 7, 2026

scottschreckengaust mentioned this pull request Apr 24, 2026

docs: add least-privilege deployment roles and deployment guide #46

Merged

13 tasks

isadeks mentioned this pull request May 13, 2026

feat(fanout): migrate SlackNotifyFn to FanOutConsumer subscriber (#64) #79

Merged

8 tasks

This was referenced May 19, 2026

Add CloudWatch alarm for ApprovalMetricsPublisher EMF rate ceiling #114

Open

Migrate dashboard LogQueryWidget widgets to EMF metrics #115

Open

theagenticguy mentioned this pull request Jun 2, 2026

feat(ci): scan container image + SAST on PRs — weekly-only cadence let a HIGH CVE sit on main for 6 days #235

Open

5 tasks

krokoko mentioned this pull request Jun 9, 2026

feat(tasks): workflow-driven tasks — declarative steps + repo-less workflows (#248) #296

Merged

5 tasks

flamingquaks mentioned this pull request Jun 11, 2026

feat(slack): Slack-button approvals for Cedar HITL gates (#112) #316

Closed

theagenticguy mentioned this pull request Jun 12, 2026

fix(project): fix multiple issues identified #324

Merged

5 tasks

This was referenced Jun 12, 2026

RFC: Event-Driven Governance and Actions #230

Open

feat(agent): observable blocker signal + bounded self-remediation for environmental faults #251

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add FargateAgentStack as alternative compute backend#4

feat: add FargateAgentStack as alternative compute backend#4
MichaelWalker-git wants to merge 1 commit into
mainfrom
feat/fargate-agent-stack

MichaelWalker-git commented Apr 2, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krokoko commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

MichaelWalker-git commented Apr 2, 2026

Summary

Changes

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

krokoko commented Apr 6, 2026

Core concern: this bypasses the ComputeStrategy interface

What the design calls for vs. what this PR does

Specific issues

Suggested path forward

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Core concern: this bypasses the `ComputeStrategy` interface