Commit be3207a
feat: Cedar HITL approval gates for agent tool use (aws-samples#88)
* docs(cedar-hitl): restore and revise HITL gates design, fold adversarial findings
Design doc was accidentally removed in 0742ebe; restored from b34d7cd and
substantially revised under a new filename. "Phase 3" framing dropped — this
is the Cedar HITL approval gates feature.
- Renamed PHASE3_CEDAR_HITL.md → CEDAR_HITL_GATES.md; all "phase" gating
removed (Phase 3a/3b → v1 / future work §17).
- Integrated 16 findings from 2026-05-06 adversarial review with realistic
scenarios. Major structural changes:
- Decision #23 (new): cross-engine parity contract between cedarpy (agent,
Python) and @cedar-policy/cedar-wasm@4.10.0 (Lambda, TS).
- §11.2: SlackUserMappingTable with OAuth user-initiated mapping; severity-
gated Slack approvals; admin has no write path.
- §7.1/§12.3: ApproveTaskFn uses cross-table TransactWriteItems for atomicity.
- §10.1: user_id-status-index GSI on TaskApprovalsTable; v1 not v-later.
- §15.6: cedar-wasm as a Lambda layer shared across policy Lambdas.
- Gate-cap revision (2026-05-07): decision #13 — default 50, blueprint-
configurable via security.approvalGateCap (bounded 1–500), persisted on
TaskTable. Cache memory bound decoupled: 50-entry LRU regardless of cap.
IMPL-22 adds telemetry-driven re-evaluation criteria.
- Timeout adversarial+advocate pass (2026-05-07):
- §6.5 VM-throttle race fix: re-read row on failed TIMED_OUT
ConditionCheckFailed; honor APPROVED if user beat the timer. IMPL-24.
- Sub-120s @approval_timeout_s emits blueprint-load WARN. IMPL-25.
- User-visible timeout cap milestones (approval_timeout_capped_at_submit,
approval_ceiling_shrinking). IMPL-26.
- Runtime JWT: no refresh logic in agent/src/ (container uses IAM role);
ceiling stays min(1h, maxLifetime_remaining - 120s). IMPL-27.
- Three new CloudWatch metrics for timeout tuning. IMPL-28.
- §14.8 new: off-hours trade-off section (fail-closed is the invariant).
- §13.13 new: notification-delivery failure does NOT pause the timer
(bypass-prevention).
- Added six mermaid diagrams: three-outcome decision flow, end-to-end round-
trip, TaskApprovalsTable state machine, Slack user-mapping, fail-closed
decision flow, cross-engine parity check.
- Cross-references updated in INTERACTIVE_AGENTS.md and SECURITY.md.
- Starlight mirror regenerated via docs/scripts/sync-starlight.mjs.
No code changes in this commit — design work only. Implementation lands in a
follow-up PR per §15.2 task list.
* feat(cedar-hitl): pin Cedar engines and seed cross-engine parity contract
Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md).
Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines
pinned exactly per decision #23, annotation surface validated by Day-1
spikes per decision #22, and the golden-file parity fixtures seeded so
every subsequent chunk can rely on the contract.
- Pin cedarpy==4.8.0 (agent) and @cedar-policy/cedar-wasm@4.10.0 (cdk)
exactly (no ^/~); document both in mise.toml header.
- Add agent/tests/test_cedarpy_annotations_contract.py (10 tests)
validating all 5 annotations round-trip verbatim via
policies_to_json_str() under staticPolicies.<id>.annotations.
- Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating
policySetTextToParts + policyToJson extract the same annotations
verbatim and isAuthorized returns the documented {type, response}
wrapper shape.
- Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match,
multi-match, hard-deny, soft-deny write, no-match default-allow) +
README documenting the contract. Every fixture policy carries a
@rule_id - including the base permit as @rule_id("base_permit") - so
the parity tests raise if either engine returns an unannotated match
instead of silently dropping it.
- Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and
cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm
side) loading the shared fixtures and asserting (decision, sorted
rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm
so a dependency regression fails loud rather than silently skipping.
- Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6
prose and the parity mermaid diagram to point at contracts/cedar-parity/
(the precedent set by contracts/memory-hash-vectors.json) instead of a
new tests/fixtures/ dir. Regenerate the Starlight mirror.
- Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm
diagnostics.reason naming asymmetry surfaced by the spikes; engine
code normalizes at the boundary.
- Fix rev-4 -> rev-5 cosmetic footer drift.
Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190
unchanged. No production code changes in this chunk; engine rewrite
lands in Chunk 2.
Follow-up: separate chore issue to move contracts/memory-hash-vectors.json
into a self-named subdir for consistency with contracts/cedar-parity/.
* feat(cedar-hitl): three-outcome PolicyEngine core
Chunk 2 of the Cedar HITL gates PR. Rewrites agent/src/policy.py into
the three-outcome engine specified in docs/design/CEDAR_HITL_GATES.md
section 6. The REQUIRE_APPROVAL outcome is the human-in-the-loop surface
the next chunks (PreToolUse hook extension, REST API, CLI) plug into.
This chunk ships the engine and its load-time validation; no hook or
wire-format changes yet.
Engine:
- Outcome enum (ALLOW, DENY, REQUIRE_APPROVAL) + extended PolicyDecision
with .allowed backward-compat shim for Phase 1a/1b/2 callers. Custom
__init__ accepts both outcome= and legacy allowed= kwargs so existing
tests keep working verbatim.
- Three-outcome pipeline per section 6.2: hard-deny eval (absolute) ->
allowlist fast-path (tool_type/tool_group/bash_pattern/write_path/
all_session) -> recent-decision cache (60s TTL on DENIED/TIMED_OUT) ->
soft-deny eval (with post-eval rule-scope allowlist check and
blueprint_disable filtering) -> default ALLOW.
- ApprovalAllowlist (section 6.4): parses and matches every scope type.
Strips whitespace and rejects empty-after-strip values so
"tool_type: Read " normalizes instead of silently mismatching (review
finding 6).
- RecentDecisionCache (section 12.9): 50-entry LRU, INDEPENDENT of
approvalGateCap. Populated only on DENIED/TIMED_OUT. Session-scoped
(documented section 12.8 caveat).
- Annotation handling (sections 5.2 + 6.3): parses @rule_id, @tier,
@approval_timeout_s, @severity, @category via
cedarpy.policies_to_json_str(); merges on multi-match with min timeout
(clamped by 30s floor) and max severity.
- Load-time validation (sections 5.1, 12.4): rejects missing/mismatched
@tier, missing @rule_id, sub-floor timeouts, duplicate rule_ids across
tiers, blueprint text > 64 KB, disable entries naming built-in
hard-deny rules (finding 9), approval_gate_cap outside [1, 500]
(decision 13). Sub-120s @approval_timeout_s emits WARN but accepts
(IMPL-25).
- Fail-closed posture (section 13): cedarpy parse errors surface via
diagnostics.errors -> RuntimeError raised inside _eval_tier -> outer
handler returns DENY with reason "fail-closed: <ExceptionType>".
TypeError on json.dumps of unhashable tool_input surfaces as distinct
"fail-closed: unhashable_tool_input" reason (review finding 5).
Built-in policies:
- agent/policies/hard_deny.cedar: base_permit catch-all + rm_slash +
write_git_internals + write_git_internals_nested + drop_table +
pr_review-specific Write/Edit forbids (absolute).
- agent/policies/soft_deny.cedar: base_permit (catch-all required in
each tier so cedarpy default-deny does not convert no-match into
DENY) + force_push_any + force_push_main + push_to_protected_branch
+ write_env_files + write_credentials. All soft rules carry @tier,
@rule_id, @approval_timeout_s, @severity, @category per section 15.4
starter set.
Review findings addressed (1 blocker, 8 significant, plus minor):
- blueprint_disable actually disables soft rules at eval time instead
of silently no-op (the blocker: test coverage had been a silent-pass).
- Legacy extra_policies with @tier/@rule_id rejected to avoid undefined
double-annotation behavior.
- _matching_rule_ids logs WARN on unknown policy IDs (state-drift
signal).
- base_permit validator exemption restricted to effect=="permit" so
misnamed forbid rules cannot bypass validation (finding 7).
- Hard-tier Cedar no_decision logged at WARN (signals missing/malformed
base_permit catch-all).
- Allowlist whitespace normalization + empty-value rejection.
- StrEnum upgrade, Callable moved to TYPE_CHECKING, assert replaced
with explicit RuntimeError for S101 compliance.
Phase 1 compatibility:
- All 39 existing test_policy.py tests pass unchanged via the
.allowed property. One test (test_invalid_policy_syntax_fails_closed)
updated to patch _hard_policies instead of the removed _policies
attribute; docstring explains the rewrite.
- extra_policies kwarg preserved; callers with annotated rules must
migrate to blueprint_soft_policies / blueprint_hard_policies.
Test counts: agent 516 -> 576 (+60: 51 three-outcome + 9 regression
fixes). cli 190 unchanged. cdk 1054 unchanged.
Carry-forward to Chunk 3:
- extra_policies semantic shift (Phase 1 DENY -> Chunk 2
REQUIRE_APPROVAL); .allowed=False preserved but .outcome differs.
Switchover happens when hooks.py adopts the three-outcome branching.
- Cross-tier action-context asymmetry (review finding 8): document
rule-authoring constraint in section 5.5 of design.
- Probe entity-shape coverage (finding 10): extend _probe_cedar to
exercise Write/Edit/Bash action paths, not just invoke_tool.
* feat(cedar-hitl): approval milestone writers + engine counters
Adds the 14 agent-side approval milestone writers (§11.1) on
``_ProgressWriter`` so Chunk 3's hook integration has a typed API
instead of stringly-typed ``write_agent_milestone`` calls, and the
per-task gate counter / per-container sliding-window rate limit /
denial-injection queue on ``PolicyEngine`` that §6.5 requires.
Why now: the hook work lands cleanly only after these surfaces exist
— every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch
calls one of these helpers. Shipping them separately lets the hook
commit be about the state machine, not the event-shape bookkeeping.
Engine additions:
- ``approval_gate_count`` / ``increment_approval_gate_count``: the
per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped
in v1; persistence tracked in §17.
- ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``:
sliding-window rate limit (20/min/container, §12.9). Prune on read
so callers see the current count without a separate tick.
- ``queue_denial_injection`` / ``drain_denial_injections``: queue
consumed by ``_denial_between_turns_hook`` at the next Stop seam
(§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``.
- ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26.
- ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts
the hook imports rather than re-deriving.
Milestone writers (§11.1 table, 14 agent-emitted of 15):
- ``pre_approvals_loaded``, ``approval_requested``,
``approval_granted``, ``approval_denied``, ``approval_timed_out``,
``approval_stranded``, ``approval_write_failed``,
``approval_resume_failed``, ``approval_poll_degraded``,
``approval_timeout_capped``, ``approval_ceiling_shrinking``,
``approval_cap_exceeded``, ``approval_rate_limit_exceeded``,
``approval_late_win``.
- ``approval_decision_recorded`` (Lambda audit) and
``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the
Lambda side — Chunk 5 owns those.
Each helper is a thin wrapper over ``_put_event("agent_milestone",
...)`` so the shared circuit-breaker + classifier path (finding #6/#8)
continues to apply. Metadata keys mirror the §11.1 shapes verbatim
(``maxLifetime_remaining_s`` preserves the design-doc spelling for
downstream parsers).
Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB
payload shape for each helper (including the two
``approval_timeout_capped`` reason variants — rule_annotation carries
matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the
engine: counter monotonicity, rate-window prune semantics at window
boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch
idempotency.
No caller changes — engine and writer surfaces are additive. Hook
integration lands in commit C.
* feat(cedar-hitl): TaskApprovals + AWAITING_APPROVAL transition primitives
Adds the four agent-side DDB primitives §6.5 + IMPL-24 need for the
three-outcome hook integration in the next commit:
- ``transact_write_approval_request`` — cross-table TransactWriteItems:
Put(TaskApprovalsTable) with ``attribute_not_exists(request_id)`` +
Update(TaskTable) gated on ``status = RUNNING``. Atomic per §12.3 so
a concurrent cancel cannot land the task in AWAITING_APPROVAL with
no matching approval row (or vice versa).
- ``transact_resume_from_approval`` — Update(TaskTable) gated on
``status = AWAITING_APPROVAL AND awaiting_approval_request_id =
:rid``. The ``request_id`` condition prevents resuming with a stale
ID after a reconciler race (§13.9).
- ``best_effort_update_approval_status`` — conditional UpdateItem on
the approval row with ``status = :pending`` guard. Returns False on
``ConditionalCheckFailedException``; this is the signal IMPL-24's
re-read path fires on (§6.5 pseudocode lines 846-879, §13.12).
- ``get_approval_row`` — GetItem with ``ConsistentRead=True`` by
default. Required by IMPL-24's re-read; kept opt-out (bool flag) for
future cold-path callers that don't need the strong read.
Errors:
- ``ApprovalTablesUnavailable`` for env-var-missing — raised loud so
a pre-Chunk-4 deploy fails closed (hook will map to DENY) rather
than silently no-op'ing the gate.
- ``ApprovalWriteError`` / ``ApprovalResumeError`` wrap
``TransactionCanceledException`` with the cancellation reasons
list. The hook uses these to distinguish the "concurrent cancel"
branch from real DDB outages.
- ``ConditionalCheckFailedException`` on ``update_item`` is consumed
and returned as ``False`` from ``best_effort_update_approval_status``
— the caller (hook) needs the boolean to decide whether to
re-read, not to propagate.
- All other DDB errors propagate so the hook's outer try/except can
classify fail-closed with a specific reason.
Implementation notes:
- Uses ``boto3.client("dynamodb")`` low-level API (not resource).
``transact_write_items`` lives on the client, and marshalling the
approval row attributes explicitly gives deterministic DDB shapes
that the tests can assert on. ``_py_to_ddb_attr`` covers the
subset of Python types §10.1 actually uses (str/int/bool/None/list
of str); any other type raises TypeError loudly rather than
silently writing something unexpected.
- ``_extract_error_code`` / ``_extract_cancellation_reasons`` duck-type
on ``exc.response`` so we don't need botocore at import time (tests
use a minimal exception class).
- Errors from unsupported types (floats, dicts, etc.) are caught
BEFORE the DDB round-trip so the unit-test asserts
``transact_write_items`` was not called — catches schema drift
early.
- Status constants (``_STATUS_RUNNING`` / ``_STATUS_AWAITING_APPROVAL``)
named so a rename in CDK cannot silently diverge the Python path.
Tests: +20 total.
- 5 on TransactWriteApprovalRequest: env-missing, happy-path shape
assertion (both items + conditions), TransactionCanceled → ApprovalWriteError
with reasons preserved, other errors propagate, unsupported type rejected
before any DDB call.
- 3 on TransactResumeFromApproval: env-missing, happy-path expression
shape (includes REMOVE awaiting_approval_request_id), cancel →
ApprovalResumeError.
- 4 on BestEffortUpdateApprovalStatus: happy path returns True,
``reason`` kwarg attaches ``deny_reason``, ConditionalCheckFailed
returns False (IMPL-24's signal), other errors propagate.
- 4 on GetApprovalRow: ConsistentRead default True, opt-out False,
row-not-found returns None, row unmarshalling through every
supported DDB attribute type.
- 4 on helpers: error-code extraction with and without
ClientError-shape, cancellation-reasons extraction with and without.
No runtime callers yet — hook integration lands in commit C. Physical
TaskApprovalsTable lands in Chunk 4; Python side is wire-compatible so
the hook work can be unit-tested today with mocked clients.
* feat(cedar-hitl): PreToolUse three-outcome REQUIRE_APPROVAL path
Wires the agent to the full §6.5 pseudocode: cap + rate-limit check,
atomic TransactWriteItems for pending row + TaskTable AWAITING_APPROVAL,
2s→5s ConsistentRead poll, IMPL-24 VM-throttle race re-read, resume
transition, scope propagation to allowlist, and denial-injection queue
consumed at the next Stop seam. Completes §15.2 rows 26 + 27.
Hook control flow (three outcomes)
----------------------------------
- ALLOW / DENY: existing Phase 1 behavior, now switching on
``.outcome`` rather than ``.allowed``. Legacy Phase 1/2 tests still
green because PolicyDecision preserves the ``.allowed`` shim.
- REQUIRE_APPROVAL (new): extracted into ``_handle_require_approval``
for readability. Delegates to ``task_state`` primitives and
``engine.*`` counter surfaces from the prior commits; no new DDB
client construction here.
Key pieces:
- ``_compute_effective_timeout`` applies the §6.5 min(rule, default,
lifetime) formula. The engine's ``_merge_annotations`` has already
clipped decision.timeout_s against the task default; the hook adds
the remaining-lifetime ceiling and floors at FLOOR_30S.
``clip_reason`` distinguishes ``rule_annotation`` (rule was tighter
than task default) from ``maxLifetime_ceiling`` (task is late in
its life) so ``approval_timeout_capped`` carries the right reason.
- ``_remaining_maxlifetime_s`` reads ``AGENTCORE_MAX_LIFETIME_S`` +
``TASK_STARTED_AT`` env vars (8h default). Returns ``None`` when
the start timestamp is absent — the hook treats that as "unknown,
don't clip" rather than pre-DENYing, so Phase 1 test paths that
don't set the env var still see the old task-default behaviour.
Chunk 4/5 will wire these at task launch.
- ``_poll_for_decision`` uses 2s cadence for the first 30s then 5s
(IMPL-12). All polls use ``ConsistentRead=True`` per IMPL-24. 3
consecutive GetItem failures emit ``approval_poll_degraded``; 10
consecutive failures fall through as TIMED_OUT with a specific
reason (§13.2).
- ``_reconcile_late_decision`` implements IMPL-24 re-read: on a
ConditionCheckFailed from the TIMED_OUT write, re-read with
ConsistentRead. APPROVED → rebuild outcome, propagate scope to
allowlist, run normal allow flow, emit ``approval_late_win``.
DENIED → honor the user's sanitized reason. PENDING or row gone
→ fall through with TIMED_OUT (fail-closed, §13.12 last paragraph).
Cancel-wins semantics (finding #2)
----------------------------------
``_denial_between_turns_hook`` is registered AFTER
``_nudge_between_turns_hook`` in ``between_turns_hooks`` so cancel
short-circuits both. The hook re-checks ``_cancel_requested`` itself
as belt-and-braces (matching the nudge hook) so a future reorder does
not silently break cancel-wins. Denial queue is PRESERVED on cancel —
not drained — so a denial still sitting on the queue when the task is
being torn down does not leak across tasks (the engine is per-task
per §IMPL-7).
``stop_hook`` threads ``engine`` into ``ctx`` so the denial hook can
``drain_denial_injections``. ``build_hook_matchers`` accepts a new
``user_id`` kwarg (§12.2) so approval rows carry caller identity for
the REST side's ownership check.
``permissionDecisionReason`` guaranteed surface
-----------------------------------------------
The hook's deny return is the ONLY guaranteed surface the SDK emits
to the agent; denial injection is best-effort (pre-empted by cancel).
``_deny_response`` pipes every reason through ``_strip_ansi`` +
``_truncate(500)``: ANSI sequences can never reach the model, and the
line stays loggable. §12.7 requirement.
Tests: +24 agent hook tests (47 total in test_hooks.py). Run in 0.92s
via a ``_fast_poll`` fixture that collapses ``asyncio.sleep`` to a
no-op AND advances ``hooks.time.monotonic`` by the requested duration
so the poll wall-clock deadline actually trips.
Happy paths:
- APPROVED + scope propagation to allowlist + milestones.
- APPROVED with scope=this_call does NOT grow allowlist.
- DENIED queues denial injection + populates recent-decision cache
(next identical call auto-denies).
- TIMED_OUT writes TIMED_OUT row and emits approval_timed_out.
IMPL-24 race: four branches.
- APPROVED re-read → allow flow, approval_late_win milestone, scope
propagated, resume succeeds.
- DENIED re-read → deny flow, approval_late_win milestone, user's
reason is the permissionDecisionReason.
- Still-PENDING re-read → fail-closed fall-through (no late_win).
- Row-gone re-read → same fail-closed fall-through.
Cap / rate-limit / write failure / resume failure branches all:
- Short-circuit before any DDB write when the local guard fires
(cap, rate limit).
- Emit the right approval_* milestone.
- Return DENY with a specific permissionDecisionReason.
Sanitization:
- ANSI stripped from deny reason.
- Deny reason truncated to ≤500 chars.
Timeout clipping:
- rule_annotation reason when a rule's approval_timeout_s is below
the task default; matching_rule_ids populated.
- maxLifetime_ceiling reason when remaining lifetime is the tightest
bound; matching_rule_ids is None.
- approval_ceiling_shrinking emits exactly once per task (IMPL-26
latch).
Denial injection hook (6 tests):
- Draining produces a <user_denial request_id=... decided_at=...>
block with XML-escaped reason.
- Cancel short-circuit preserves the queue so the denial is not
lost; just not injected into a dying agent.
- Hostile reason (</user_denial>...<user_nudge>) is XML-escaped so
the envelope cannot be forged.
- No-engine ctx returns [] (Phase 1 call sites still work).
- Registered LAST in ``between_turns_hooks`` (invariant for §6.5
finding #2).
- End-to-end via stop_hook: queued denial becomes
``decision=block`` + reason on the Stop return.
Carry-forward
-------------
- ``_remaining_maxlifetime_s`` returns None when TASK_STARTED_AT is
unset — Chunk 4/5 will wire this at task launch. Tracked in §16.
- ``approval_gate_count`` lives on the engine (session-scoped) not on
TaskTable in v1. §13.6 notes that the reconciler + approval_gate_cap
still bound worst-case across container restarts. Chunk 7+ tracks
persistence when telemetry justifies it.
- Denial injection emits a ``user_denial_injected`` milestone that is
NOT in the §11.1 enumerated table. It mirrors ``nudge_acknowledged``
for stream visibility; keep the name distinct from the ``approval_*``
prefix so future §11.1 consumers can't confuse it with an approval
outcome.
* feat(cedar-hitl): TaskApprovalsTable + SlackUserMapping + status enum
Lands the stateless CDK primitives for Cedar-HITL approval gates so
Chunk 5's REST handlers can be wired onto concrete tables. Completes
§15.2 tasks 9, 20, and 25.
Constructs
----------
``TaskApprovalsTable`` (§10.1)
- PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side
primitives landed in the prior commit.
- GSI ``user_id-status-index`` with user_id PK + status SK and an
``INCLUDE`` projection limited to the fields GET /v1/pending
renders. Three deny-sensitive attrs (``deny_reason``, ``scope``,
``tool_input_sha256``) deliberately omitted from the projection —
the list endpoint only returns PENDING rows in practice, but
excluding them kills the projection-leak concern outright and
costs no bytes today.
- Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors
it on ``construct.userStatusIndexName`` so handlers referencing
the GSI fail compile-time on a rename.
- TTL attribute ``ttl`` (agent writes ``created_at + timeout_s +
120s``).
- No DynamoDB streams per §11.2. TaskEventsTable carries the audit
fan-out; streams here would duplicate.
- Default RemovalPolicy.DESTROY to match the rest of the sample.
Production deploys override to RETAIN per §10.1.
``SlackUserMappingTable`` (§11.2, finding #4)
- Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no
stream. The forward-only shape is the trust boundary — a reverse
GSI (Cognito → Slack) would let a compromised Cognito sub
enumerate Slack identities without adding v1 capability.
- Writes land through LinkSlackUserFn (Chunk 5) which enforces the
``attribute_not_exists(slack_user_id)`` condition so a prior
legitimate mapping cannot be overwritten by a later compromise.
``task-status.ts`` — AWAITING_APPROVAL (§10.3)
- Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES:
the task is alive, paused on a human decision).
- VALID_TRANSITIONS wires the five edges §10.3 enumerates:
RUNNING → AWAITING_APPROVAL (soft-deny entry)
HYDRATING → AWAITING_APPROVAL (rare early-gate case)
AWAITING_APPROVAL → RUNNING (approve / deny resume)
AWAITING_APPROVAL → CANCELLED (user cancel mid-approval)
AWAITING_APPROVAL → FAILED (stranded-approval reconciler)
- Notably NOT added:
AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race)
AWAITING_APPROVAL → COMPLETED (skip RUNNING)
AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval
row, not the task clock)
These are regression tests so a future refactor cannot quietly
add them and bypass the `awaiting_approval_request_id = :rid`
invariant.
Tests: +29 total.
- TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST,
PITR default + override, TTL attribute, NO streams, GSI schema +
projection + sensitive-attr exclusion, removal policy default +
override, ``USER_STATUS_INDEX_NAME`` constant parity with the
construct field.
- SlackUserMappingTable (8 tests): single-key schema (explicit
KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams,
no reverse GSI, DESTROY default, TTL absent.
- TaskStatus (+10 tests over existing: 5 new assertions on the
9-state cardinality, AWAITING_APPROVAL membership, and the
transition graph including the three forbidden edges). The
existing assertions updated for the new state count.
No stack wiring yet — ``agent.ts`` instantiation + env var plumbing +
grants land in the next commit alongside the Cedar-WASM Lambda layer.
* feat(cedar-hitl): Cedar-wasm layer + wire approval tables into agent stack
Activates the agent-side approval path and ships the Lambda layer
Chunk 5's REST handlers need.
Cedar-wasm Lambda layer (§15.2 task 10)
----------------------------------------
``CedarWasmLayer`` bundles ``@cedar-policy/cedar-wasm@4.10.0`` into
``/opt/nodejs/node_modules/`` so Lambdas can
``require('@cedar-policy/cedar-wasm/nodejs')`` without shipping the
4 MB wasm binary in every function package. A dedicated
``cdk/layers/cedar-wasm/`` directory carries a minimal ``package.json``
pinning the exact version — bundling runs ``npm install --omit=dev``
against that manifest, so the layer build is hermetic from any
``cdk/node_modules/`` drift.
The bundler has two fallbacks:
- Docker (``public.ecr.aws/sam/build-nodejs22.x``) for CI / prod
deploys.
- Local-npm fallback for environments without Docker (unit-test
synths + `cdk synth` on runners that lack Docker). The local
path is safe here because the layer ships pure JS + a prebuilt
wasm binary — no native build step.
Three constants exposed from the module:
- ``CEDAR_WASM_VERSION`` — single source of truth for the pinned
version; tests assert this matches both ``cdk/package.json`` and
the layer manifest, so the three places the version lives stay
in sync.
- ``CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` — 512 MB floor for attaching
Lambdas per §15.2 task 10.
- ``CedarWasmLayer.layer`` — the underlying ``LayerVersion`` for
Chunk 5 handlers to attach via ``fn.addLayers(...)``.
Agent stack wiring (§15.2 task 19)
------------------------------------
``agent.ts`` now instantiates:
- ``TaskApprovalsTable`` (prior commit) — grants RW to the runtime
so ``pre_tool_use_hook`` can TransactWriteItems + ConsistentRead
the PENDING row.
- ``SlackUserMappingTable`` (prior commit) — not granted to the
runtime; only the link-user Lambda (Chunk 5) writes here.
- ``CedarWasmLayer`` — the layer's asset lands in the synthed
template so Chunk 5 handlers can reference ``.layer`` without
causing a new asset on their deploy.
New runtime env vars:
- ``TASK_APPROVALS_TABLE_NAME`` — consumed by
``task_state._require_tables``; its absence previously raised
``ApprovalTablesUnavailable`` → hook DENY. Now set, so the
approval path is live on deploy.
- ``AGENTCORE_MAX_LIFETIME_S = '28800'`` — 8 hours, matching
``lifecycleConfiguration.maxLifetime``. Consumed by the hook's
``_remaining_maxlifetime_s`` for the maxLifetime ceiling clip
(§6.5). Kept in sync with the lifecycle via a direct test
assertion so drift surfaces at build time.
New CfnOutputs: ``TaskApprovalsTableName``, ``SlackUserMappingTableName``,
``CedarWasmLayerArn``. Each is useful for post-deploy smoke tests
(`aws dynamodb describe-table` / `aws lambda get-layer-version`).
Tests: +8 layer tests + 9 agent-stack assertions.
Layer:
- LayerVersion resource count.
- Compatible runtimes (nodejs20/22).
- Description carries the pinned version.
- CEDAR_WASM_VERSION matches ``cdk/package.json``.
- CEDAR_WASM_VERSION matches ``layers/cedar-wasm/package.json``.
- CEDAR_WASM_MIN_LAMBDA_MEMORY_MB ≥ 512.
- Custom description override works.
- ``.layer`` exposes a real ``LayerVersion``.
Agent stack:
- Table count updated from 6 → 8.
- TaskApprovalsTable schema match (task_id PK / request_id SK,
user_id-status-index GSI presence).
- SlackUserMappingTable single-key schema.
- LayerVersion count + compatibleRuntimes.
- Three new CfnOutputs present.
- TASK_APPROVALS_TABLE_NAME env var on the runtime.
- AGENTCORE_MAX_LIFETIME_S == '28800' (drift guard).
Carry-forward
-------------
- ``TASK_STARTED_AT`` is the other input the hook's
``_remaining_maxlifetime_s`` consumes — it's a PER-TASK value the
orchestrator must stamp at invocation time, not a stack-level env
var. Chunk 5's orchestrator changes need to add it to the runtime
invocation payload / session env. For now the hook's fallback
("unknown, don't clip") keeps approvals functional.
- Chunk 5 will attach the CedarWasmLayer onto ApproveTaskFn,
DenyTaskFn, GetPoliciesFn, CreateTaskFn and assert
``memorySize >= CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` for each.
* feat(cedar-hitl): approve + deny handlers + shared types (§7.1, §7.2)
Lands the two user-facing REST handlers that flip a PENDING approval
row to APPROVED / DENIED, the shared types both call sites and the
CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers
(get-policies, create-task validation) will use.
Wire types (shared/types.ts)
----------------------------
- ApprovalScope union covering every shape the agent's
ApprovalAllowlist understands. Typed so approve-task / create-task /
CLI (Chunk 6) all agree at compile time.
- ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest /
DenyResponse / PendingApprovalSummary / GetPendingResponse /
PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest /
LinkSlackUserResponse / SlackUserMappingRecord /
ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions.
- Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20,
INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30,
APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300.
New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED,
TASK_NOT_AWAITING_APPROVAL.
Shared helpers
--------------
- shared/approval-scope.ts — parseApprovalScope validates every shape;
rejects unknown tool types / groups / prefixes, empty values,
over-128-char strings. isDegeneratePattern implements §7.4 (length
≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task.
- shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys,
GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks,
bearer tokens with [REDACTED-...] markers. Mirrors
agent/src/output_scanner.py so the deny reason the agent
ultimately reads is never raw user input.
- shared/cedar-policy.ts — parseRules pulls the five HITL annotations
(tier/rule_id/severity/approval_timeout_s/category) into a
ParsedRule[], preserving positional policy_id for IMPL-29
diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId,
matchingRuleIds, concatPolicies exposed for future handlers.
Handlers
--------
- approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve
- Cross-table TransactWriteItems: approval row PENDING → APPROVED
guarded by user_id = :caller AND status = :pending; TaskTable
no-op Update guarded by status = AWAITING_APPROVAL AND
awaiting_approval_request_id = :rid.
- TransactionCanceledException classified by per-item
CancellationReasons. Approval-row failure collapses to 404
REQUEST_NOT_FOUND (no existence oracle per §7.1 finding #6);
task-row failure → 409 TASK_NOT_AWAITING_APPROVAL.
- Optional scope defaults to this_call.
- Per-user per-minute rate limit (30/min, synthetic row).
- Writes approval_decision_recorded audit event (IMPL-6). Audit
failure is logged but does not fail the request — decision is
already committed.
- deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny
- Same cross-table pattern; status → DENIED + deny_reason.
- Reason is scanDenyReason-sanitized + truncated to
DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit
both read sanitized form; raw input never stored.
- Same rate-limit namespace as approve.
Tests: +64 total (cedar-policy-parser 24, approval-scope 28,
deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test
fixtures are assembled from string fragments so the source never
holds a contiguous secret literal — Code Defender pre-commit hook
otherwise blocks.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout +
LinkSlackUserFn + GetPolicies + GetPending) lands in the next
commit.
* feat(cedar-hitl): get-pending + get-policies + link-slack-user handlers
Lands the three read/discovery handlers Chunk 6 (CLI) needs to power
``bgagent pending``, ``bgagent policies list/show``, and
``bgagent notifications configure slack``. Completes §15.2 tasks
14, 15, and 25 (handler side).
Handlers
--------
``get-pending.ts`` (§7.7 — GET /v1/pending)
- Queries ``user_id-status-index`` GSI on TaskApprovalsTable with
``user_id = :caller AND status = :pending``. Without the GSI
this would be a full-table Scan per call — under
``watch -n1 bgagent pending`` that exhausts burst capacity for
the whole fleet (§10.1 finding #8).
- Response maps each row to ``PendingApprovalSummary`` with a
derived ``expires_at = created_at + timeout_s`` so the CLI can
render time-to-timeout without doing arithmetic on ISO strings.
- Severity coerced to ``medium`` on unknown values so GSI writes
that drift from the enum don't break the list response.
- Rate-limited 10/min/user (synthetic row on the same table,
namespaced ``RATE#<user>#PENDING`` so it does not collide with
the approve/deny counter).
``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies)
- Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES``
with the repo's ``cedar_policies`` blueprint override. Runs the
combined text through ``parseRules`` and returns
``{hard[], soft[]}`` rule summaries.
- 5-minute per-repo in-Lambda cache; cold starts throw it away.
``_resetCacheForTests`` exposed for unit-test isolation.
- Repo ID is URL-decoded from the path (``owner%2Frepo`` common in
CLI UX).
- Rate-limited 30/min/user.
- Blueprint load failure falls back to built-ins with a WARN log;
invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE``
rather than a misleading empty list.
``link-slack-user.ts`` (§11.2 finding #4 — POST /v1/notifications/slack/link)
- Writes to SlackUserMappingTable with
``ConditionExpression: attribute_not_exists(slack_user_id)``. This
guard is the entire admission control the §11.2 design hinges on:
even a compromised Slack admin cannot overwrite an existing
mapping.
- Validates ``slack_user_id`` shape (letters, digits, underscores,
2–40 chars) so junk rows cannot land.
- Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused
error code (the payload message directs the user to unlink via
support).
- Slack link_token end-to-end validation against Slack OAuth is
deferred — v1 accepts the token on trust from the Cognito-authed
caller; it is persisted in CloudWatch for audit.
Supporting primitives
---------------------
``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar``
and ``agent/policies/soft_deny.cedar`` as TypeScript string constants.
Embedded rather than read from disk because Lambda's esbuild bundler
does not copy non-TS assets by default and a dedicated bundling hook
is more code than the embed. A drift test
(``builtin-policies.test.ts``) asserts byte-equality with the agent
files so any change on one side without the other flips red at build
time.
``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated
``base_permit`` entry (both tiers need it as a Cedar catch-all; it
is not a user-facing rule so it stays out of ParsedRule[]). This
matches the agent-side ``_parse_policy_annotations`` behaviour.
Tests: +37 total.
- get-pending (8): 401 on missing auth, 429 on rate limit, empty
result, GSI query shape, row → PendingApprovalSummary with
derived expires_at, severity fallback, missing timeout → expires_at
falls back to created_at, 500 on DDB error.
- get-policies (11): 401/400 validation, built-in rules listed on
empty repo, URL-decoded repo path, custom blueprint rule lands
in soft, per-repo cache across calls, 429 rate limit, 503 on
invalid blueprint cedar, fallback on load failure, hard rules
omit severity / approval_timeout_s, soft rules carry them.
- link-slack-user (8): 401/400 validation, shape check, 201 on
success, 409 on overwrite attempt, 500 on unknown DDB error,
whitespace trim on slack_user_id, ConditionExpression verified.
- builtin-policies (4): drift byte-equality with both agent files,
parseRules round-trip for hard/soft rule IDs.
- cedar-policy (updated): ``base_permit`` is skipped from
ParsedRule[] rather than rejected.
Stack wiring (task-api.ts routes, agent.ts layer attachment,
CreateTaskFn extension, orchestrator + reconciler + fanout) lands in
the next commit.
* feat(cedar-hitl): wire Chunk 5 routes + orchestrator + reconciler + agent plumbing
Completes Chunk 5 end-to-end: the five new Lambdas are instantiated
and wired onto the REST API, the orchestrator threads approval-related
data through to the agent runtime, the stranded-task reconciler sweeps
AWAITING_APPROVAL tasks, and the agent pipeline accepts the new
per-task approval configuration.
Stack wiring (agent.ts + task-api.ts)
-------------------------------------
- TaskApi construct accepts `taskApprovalsTable`, `slackUserMappingTable`,
`cedarWasmLayer` props. Approve/Deny/GetPending Lambdas are created
when the approvals table is present; GetPolicies also requires the
cedar-wasm layer + RepoTable. Slack-link Lambda attaches when the
slack mapping table is provided.
- New routes:
POST /tasks/{task_id}/approve
POST /tasks/{task_id}/deny
GET /pending
GET /repos/{repo_id}/policies
POST /notifications/slack/link
- GetPoliciesFn configures `memorySize: 512` (Cedar-wasm floor from
§15.2 task 10) and externalizes `@cedar-policy/cedar-wasm` from the
esbuild bundle so the layer provides the wasm binary at runtime.
- CedarWasmLayer compatibleRuntimes extended to include nodejs24.x
(the Lambda runtime) — the Node 20/22 list was the original §15.2
spec but the actual function uses Node 24.
- agent.ts passes all three new constructs into TaskApi.
Orchestrator (shared/orchestrator.ts)
-------------------------------------
- `finalizeTask` now treats AWAITING_APPROVAL as a "task still alive"
terminal-timeout source: on poll exhaustion the task transitions to
TIMED_OUT with a distinct `approval_poll_timeout` reason + error
message ("Orchestrator poll timeout exceeded while awaiting approval").
The stranded-approval reconciler is the secondary safety net (§13.6)
for tasks the orchestrator already lost track of.
- Invocation payload now carries three new fields:
- `task_started_at` (ISO 8601 at HYDRATING → RUNNING time) —
consumed by the agent hook's `_remaining_maxlifetime_s` so the
§6.5 maxLifetime ceiling math uses the real task clock instead
of the fail-open fallback.
- `approval_timeout_s` (when the submit payload supplied it).
- `initial_approvals` (when the submit payload supplied entries).
Stranded-task reconciler
------------------------
- Sweeps AWAITING_APPROVAL in addition to SUBMITTED/HYDRATING.
- New `APPROVAL_STRANDED_TIMEOUT_SECONDS` env var (default 7200s =
2h) — double §7.3's 1h ceiling so this reconciler never races the
happy-path timer.
- Distinct failure message on approval-stranded vs generic-stranded
so users see "approval stranded — container evicted" rather than
the misleading "no pipeline attached" copy.
Fanout (handlers/fanout-task-events.ts)
---------------------------------------
- Slack channel default set replaces the forward-compat
`approval_required` stub with the real §11.1 events:
`approval_requested` and `approval_stranded`. Other approval
milestones (granted/denied/timed_out/late_win/etc.) stay out of
default routing to avoid notification fatigue — the CLI surfaces
those confirmations directly.
- Email default replaces `approval_required` with `approval_requested`
(high-severity gates only; severity gating happens in the dispatcher).
Create-task validation (shared/create-task-core.ts)
---------------------------------------------------
- New request fields:
- `approval_timeout_s` — integer within
`[APPROVAL_TIMEOUT_S_MIN, APPROVAL_TIMEOUT_S_MAX]`.
- `initial_approvals` — array of scope strings; each entry must
be a valid `ApprovalScope` per `parseApprovalScope`; bash_pattern
and write_path scopes get the §7.4 degenerate-pattern check.
- TaskRecord extended with `approval_timeout_s`, `initial_approvals`,
`approval_gate_count` (seeded to 0 at admission), and
`awaiting_approval_request_id` (written atomically by the agent's
`transact_write_approval_request` primitive).
Agent plumbing (models.py / config.py / pipeline.py / runner.py / server.py)
----------------------------------------------------------------------------
- `TaskConfig` adds `approval_timeout_s`, `initial_approvals`.
- `build_config`, `run_task`, `_run_task_background`, and
`_extract_invocation_params` thread the two new fields from payload
→ config → PolicyEngine.
- `server._extract_invocation_params` stamps `os.environ["TASK_STARTED_AT"]`
from the payload so the hook's `_remaining_maxlifetime_s` returns
real values (carry-forward from Chunk 3 resolved).
- `runner.py` constructs PolicyEngine with `initial_approvals` +
`task_default_timeout_s` when supplied; the engine clamps bad
values at construction time.
Tests
-----
All CDK tests pass: 1219 / 1219.
All agent tests pass: 648 / 648.
Affected suites (changes only):
- test/stacks/agent.test.ts: cedar-wasm layer CompatibleRuntimes
now expects `nodejs24.x`; table count still 8.
- test/constructs/cedar-wasm-layer.test.ts: same runtime expansion.
- test/handlers/fanout-task-events.test.ts: approval_required →
approval_requested/approval_stranded in Slack default set;
approval_required → approval_requested in Email default set.
- test/handlers/reconcile-stranded-tasks.test.ts: primeResponses
now queue a third `Items: []` for AWAITING_APPROVAL queries;
queryCalls assertion bumped to 3.
Carry-forward (non-blocking)
----------------------------
- GetPoliciesFn has write access to TaskApprovalsTable (for the
rate-limit counter path). A future permissions audit should
tighten this to a single-item write scoped to `RATE#<user>#*`.
- TASK_STARTED_AT env var is only set when a payload supplies it;
server.py still supports the Phase 2 no-payload startup path.
* feat(cedar-hitl): Chunk 6 CLI — approve / deny / pending / policies
Ships the four user-facing commands that close the Cedar HITL loop:
once Chunks 1-5 have a PENDING approval row and the Slack/Email fan-out
has notified the user, Chunk 6 is how they actually respond.
New commands (cli/src/commands/)
--------------------------------
- `bgagent approve <task-id> <request-id> [--scope <scope>] [--yes]`
Default scope is `this_call`; callers extend allowlist with
`tool_type:Bash`, `rule:<id>`, etc. `all_session` is the only scope
that requires `--yes` to confirm — mirrors the safety UX from
§8.4. Error classification maps 404 → "run `bgagent pending`", 409
→ "task no longer awaiting approval", 429 → rate-limit, 401 → login.
- `bgagent deny <task-id> <request-id> [--reason ... | --reason-file ...]`
`--reason-file` accepts multi-line reasons that would otherwise
need shell quoting. Client-side `DENY_REASON_MAX_LENGTH` cap avoids
a round-trip on obviously-too-long reasons; the server still
truncates. Reason is sanitized server-side (output_scanner) before
ever reaching the agent.
- `bgagent pending [--output text|json]`
Lists every PENDING approval owned by the caller. Rendered with
approve/deny hints inline so the user can copy-paste the next
command. JSON output for scripting. Rate-limited server-side.
- `bgagent policies list --repo <owner/repo> [--tier hard|soft]`
`bgagent policies show --repo <owner/repo> --rule <rule_id>`
Discovery commands so users can find rule IDs without reading CDK
source. Both subcommands reuse a single `listPolicies` API call
and filter locally.
Wire changes
------------
- `cli/src/api-client.ts`: `approveTask`, `denyTask`, `listPending`,
`listPolicies` — each matching the §7.1 / §7.2 / §7.6 / §7.7
request/response shapes. `approveTask` omits the `scope` body field
when unset so the server's `this_call` default applies.
- `cli/src/types.ts`: mirrors the Chunk 5 server types verbatim —
`ApprovalScope` union, `ApprovalRequest/Response`, `DenyRequest/Response`,
`PendingApprovalSummary`, `GetPoliciesResponse`, `PolicyRuleSummary`,
plus the five constants (`DENY_REASON_MAX_LENGTH`,
`INITIAL_APPROVALS_MAX_ENTRIES`, `INITIAL_APPROVALS_MAX_ENTRY_LENGTH`,
`APPROVAL_TIMEOUT_S_MIN/MAX/DEFAULT`).
- `cli/src/bin/bgagent.ts`: registers the four new commands in the
order they appear in help output.
Tests: +27 new (217 total).
- approve (9): default scope, custom scope, all_session guard +
`--yes` bypass, JSON output, 404/409/401/429 error classifications.
- deny (6): no-reason path, `--reason`, `--reason-file` with
tmpdir fixture, mutually-exclusive rejection, over-length rejection,
404 classification.
- pending (5): empty render, populated render with approve/deny
hints, JSON output, 401 and 429 classifications.
- policies (7): list both tiers, `--tier` filter, `--output json`,
bad `--tier`, show found rule, show unknown rule, 404
repo-not-onboarded classification.
Carry-forward
-------------
- `submit.ts` extension with `--approval-timeout` / `--pre-approve`
flags is deferred to a follow-up commit — the server already accepts
these fields on POST /v1/tasks (Chunk 5), and `bgagent submit`
already forwards unknown payload fields through the existing
request path, so users can set them via `--body-file` today until
the explicit flags land.
- `--output json` on error branches currently returns a CliError
instead of a JSON error envelope; matches the pattern the existing
commands use (status, cancel, nudge). Follow-up to standardize
JSON error envelopes across the whole CLI if that becomes a
common scripting pain point.
* feat(cedar-hitl): Chunk 7a — persist gate counter + IMPL-23 cache observability
Persist approval_gate_count to TaskTable across container restarts per
§13.6 so the cumulative gate budget survives eviction. Emit
pre_approvals_loaded after PolicyEngine init per §4 step 7 / §11.1 so
operators see the starting approval posture in the live SSE stream.
Add IMPL-23 cache-hit observability: cache hits attach metadata to
PolicyDecision, hook forwards to new write_policy_decision_cached
progress helper (decision_source="recent_decision_cache").
Why: container restarts were silently resetting the per-task gate
counter, re-exposing users to another approvalGateCap-worth of gates
per restart. Cache-driven denies were invisible in TaskEventsTable
beyond the initial gate. Fresh tasks emitted no "starting posture"
signal so dashboards could not distinguish "no pre-approvals seeded"
from "agent has not started".
Surface additions:
- task_state.increment_approval_gate_count_in_ddb — best-effort
atomic ADD on approval_gate_count
- PolicyEngine(initial_approval_gate_count=N) — seed session counter
- TaskConfig.initial_approval_gate_count — orchestrator payload field
- progress_writer.write_policy_decision_cached — IMPL-23 emitter
- PolicyDecision.cache_hit_metadata — observability-only field
- _CachedDecision.original_decision_ts — wall-clock preservation
- runner._initialize_policy_engine_and_hooks — extracted helper
Counter survival is a safety bound, not correctness: DDB failure
does NOT block the gate (§13.6). Joint-update invariant on status
+ awaiting_approval_request_id (§10.2) is preserved — counter uses
separate UpdateItem, not merged into resume transaction.
Tests: +36 agent (648→684), +8 CDK (1219→1227), +6 new runner tests.
* feat(cedar-hitl): Chunk 7b — persist approval_gate_cap from blueprint
Capture the per-task approval-gate cap at submit-time (§4 step 5,
decision #13, §13.6) so a blueprint-configured override is frozen
onto the TaskRecord. Mid-task blueprint edits cannot shift the cap
beneath a running task; container restarts re-seed the agent's
PolicyEngine from the persisted value instead of its compile-time
default-50.
Why: Chunk 7a added approval_gate_count persistence but the cap
itself was still resolved from the blueprint on every restart —
so an operator lowering security.approvalGateCap mid-task would
retroactively fail-close the running task. The design has always
said cap is frozen at submit; this chunk makes the implementation
match.
Surface additions:
- BlueprintProps.security.approvalGateCap (CDK, synth-validated
[1, 500] integer) — new per-repo blueprint prop
- RepoConfig.approval_gate_cap + BlueprintConfig.approval_gate_cap
- TaskRecord.approval_gate_cap + APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT}
- create-task-core now calls loadRepoConfig, resolves cap, bounds-
checks, persists; returns 503 SERVICE_UNAVAILABLE on invalid
blueprint data (permanent until admin re-deploys, not transient)
- orchestrator.ts: isValidApprovalGateCap integer+bounds guard;
logs warn if a persisted cap is structurally invalid (schema
drift / hand-edited DDB row)
- TaskConfig.approval_gate_cap: int | None = None (agent-side);
runner threads to PolicyEngine kwarg when not None
- "Task created" log line now carries approval_gate_cap +
approval_gate_cap_source ("blueprint" | "platform_default") so
operators can detect a broken-plumbing deploy at the single
chokepoint where all fallback layers converge
Per silent-failure review:
- HIGH: 500 → 503 + logger.error for permanent misconfig
- HIGH: cap + source in task-created log (catches 4-layer cascade)
- MEDIUM: orchestrator guard tightened past typeof (NaN, Infinity,
floats, out-of-bounds all omitted + warned)
Tests: CDK 1263/1263 (+36), agent 694/694 (+10). CLI unchanged.
* feat(cedar-hitl): Chunk 7c — observability wrap-up for resolved cap + warn path
Close three deferred items from Chunks 7a/7b before Chunks 8-10:
- runner.py init log now carries approval_gate_cap=N +
approval_gate_cap_source=threaded|engine_default. Matches the
handler log key so CloudWatch Insights can join across the
cascade; agent can't distinguish blueprint-override from
platform-default-frozen (handler log is the ground truth).
- server.py adds _warn_cw helper routing [server/warn] lines to
a dedicated CloudWatch stream (server_warn/<task_id>). stdout
print is preserved for local dev + existing capsys tests.
AgentCore does not forward container stdout to APPLICATION_LOGS,
so pre-7c warnings about malformed invocation payloads were
invisible in production. Failure counter shared with _debug_cw
for a single alarm surface; hoisted above writer defs for
import-time ordering safety.
- blueprint.ts emits a synth-time info annotation when
security.approvalGateCap is omitted so operators see a signal
that the repo will rely on the platform default of 50. Without
this, the default was a silent fallback at the handler layer —
only visible by inspecting a TaskRecord at runtime.
Tests: agent 694→700 (+6), cdk 1263→1265 (+2), cli unchanged.
Design refs: §4 step 5, §11.1, §13.6, decision #13.
* feat(cedar-hitl): Chunk 8a — extend approval outcome event schema
Add created_at / effective_timeout_s / matching_rule_ids to
approval_granted / approval_denied / approval_timed_out events so
the incoming ApprovalMetricsPublisher Lambda (Chunk 8b) can compute
decision latency and emit a rule_id-dimensioned timeout breakdown
without a round-trip GetItem against TaskEventsTable.
Fields are added conditionally — omitted from metadata when the
caller did not supply them — so the event stream stays free of
null-value noise and legacy callers continue to produce valid
payloads. Publisher handles missing fields via explicit skip-and-log
on the specific metric branch (not fallback-to-zero).
Agent tests extended: +6 progress_writer tests, +3 hooks tests.
Baseline 700 → 710. No consumer wired yet — this commit is a
forward-compatible superset; Chunk 8b ships the CDK publisher +
dashboard widgets.
* feat(cedar-hitl): Chunk 8b — ApprovalMetricsPublisher + native CloudWatch dashboard widgets
Ship the Cedar-HITL dashboard widgets from §11.3 / IMPL-28 via the
MetricsPublisher architecture (Option E):
- New ApprovalMetricsPublisher Lambda consumes TaskEventsTable DDB
stream as consumer #2 (FanoutConsumer is #1; stream is within its
2-consumer soft cap — documented in task-events-table.ts).
- Handler emits CloudWatch EMF for 3 metrics in namespace
ABCA/Cedar-HITL:
* ApprovalRequestCount + ClippedApprovalCount (reason dim) →
ApprovalTimeoutClipRate widget (MathExpression with IF-guard
against NaN on zero-denominator periods)
* TimedOutEffectiveTimeout (rule_id dim with allowlist
cardinality cap) → ApprovalTimeoutBreakdown widget
* ApprovalDecisionLatencyMs (outcome dim) → ApprovalDecisionLatency
widget with per-outcome p50/p90/p99
- Observability-of-observability (silent-failure review):
* MetricsPublisherHeartbeat per batch so dashboard gaps
distinguish "no traffic" from "pipeline broken"
* MetricEmitSkipped with a reason dim on schema mismatches,
parse anomalies, unknown rule ids — never fall back to
latency=0 or count=0 which would poison percentile widgets
* Expected high-volume skip reasons (non-milestone events,
REMOVE records) DO NOT emit MetricEmitSkipped — only
anomaly reasons (missing keys, missing milestone name) do,
so real signal isn't drowned
* Structured log lines alongside every skip so the absence of
metrics is also observable via CloudWatch Logs Insights
- Cardinality caps via ``RULE_ID_ALLOWLIST`` + ``normalizeClipReason``.
Unknown values collapse to ``other`` / ``unknown`` buckets with
dashboard series so the collapse is discoverable rather than
silently accruing custom-metric cost.
- Event-source-mapping filter pattern rejects non-agent_milestone
records at the service layer; handler-layer allowlist catches
anything that slips through. Filter pattern correctness tested
structurally + positively/negatively probed (silent-failure H3).
- Per-record try/catch + reportBatchItemFailures + SQS DLQ mirror
the fanout-task-events.ts poison-pill pattern exactly.
Deferred to Chunk 10 chore issues:
- DLQ alarms (fanout + publisher) — fire-into-void until
notification channel lands, so wire with §11.5 alarms as a group
- Explicit log-group declaration (IAM drift defense)
- stdout-flush race documentation (pre-existing pattern in fanout)
- EMF 100-updates/sec throttle alarm
Tests: cdk 1265 → 1327 (+62); agent 710 (unchanged); cli 217
(unchanged). All pass. §11.5 alarm plumbing now unblocked —
publisher provides the metrics infrastructure the design always
intended; only the notification-channel SNS wiring is left.
* docs(cedar-hitl): Chunk 9 — sync design doc to Chunks 7b / 8a / 8b
Bring CEDAR_HITL_GATES.md current with the code that shipped in
Chunks 7b (approval_gate_cap persist), 8a (outcome event schema
superset), and 8b (ApprovalMetricsPublisher + dashboard widgets):
- §10.2 adds the missing approval_gate_cap row (carry-forward
drift from Chunk 7b). Bounds + frozen-at-submit semantics
documented.
- §11.1 outcome events (approval_granted / approval_denied /
approval_timed_out) now document the Chunk 8a optional fields
(created_at, effective_timeout_s, matching_rule_ids) plus the
publisher's skip-on-missing-field policy.
- §11.1 intro names ApprovalMetricsPublisherFn as consumer #2 and
points to §11.3 for the metric schema.
- §11.3 rewritten to describe the Option E architecture:
publisher Lambda + EMF + native CloudWatch metrics in namespace
ABCA/Cedar-HITL, MathExpression with divide-by-zero guard,
rule_id cardinality cap, observability-of-observability via
heartbeat + skip meta-metrics, widget layout (12/12 over 24),
2-consumer stream budget. Dropped the stale "Retired the old
bundled widget" line — that widget never shipped.
- §11.5 reframed as "deferred (notification-channel gated)" with
a plumbing-status paragraph noting the metric infra now exists;
only SNS wiring remains. Alarm list expanded to include DLQ
and publisher-health alarms.
- §16 IMPL-28 rewritten for Option E; §15.2 row 46 expanded to
reference the 4 new test files; Appendix B checklist updated.
Starlight mirror regenerated via ``cd docs && node
scripts/sync-starlight.mjs``.
No code changes. Test baselines unchanged. Adversarial
comment-analyzer review verified every new claim against
committed code — zero inaccuracies.
* feat(cedar-hitl): Chunk 10 review fixes — close 2 blockers + tighten 2 mediums
Full-branch adversarial review (code-reviewer + silent-failure-hunter
on all 18 commits) surfaced findings that only appear at final-state.
Addressing the blockers + low-cost meds before deploy:
B2 — stranded approvals were invisible to the dashboard:
- Reconciler writes ``event_type: 'task_stranded'``; the metrics
publisher's event-source filter only accepts
``event_type: 'agent_milestone'``, so AWAITING_APPROVAL evictions
produced zero §11.3 signal.
- Fix: reconciler now additionally emits an ``agent_milestone``
with ``milestone: 'approval_stranded'`` when the stranded task
was AWAITING_APPROVAL. Publisher allowlist extended; classifier
emits ``ApprovalStrandedCount`` counter. SUBMITTED / HYDRATING
stranded events unchanged (guarded by test).
B1 — heartbeat comment was false reassurance:
- Event-source filter blocks Lambda invocation when no
``agent_milestone`` records exist in the poll window, so a
quiet period produces the same widget gap as a broken
pipeline. The code + design-doc wording claimed "gap =
pipeline broken" which would mislead the on-call.
- Fix: corrected module + function docstrings to describe the
heartbeat as "present when active, not pipeline-alive-always."
Operators should alarm on the combination
(heartbeat-absent + recent TaskEventsTable traffic) or wire
a scheduled canary — the latter tracked as a §11.5 follow-up.
M1 — safety-critical milestones produced zero dashboard signal:
- ``approval_cap_exceeded`` (§12.9 per-task cap) and
``approval_rate_limit_exceeded`` (per-user per-minute rate)
were emitted by the agent but not on the publisher allowlist.
A production bug where every gate hit the cap would have
been invisible.
- Fix: both added to APPROVAL_METRIC_MILESTONES with
``ApprovalCapExceededCount`` / ``ApprovalRateLimitExceededCount``
counters. No dimensions — the request_id in the event carries
per-user correlation for ad-hoc log-insights investigation.
H2 — filter / handler eventName disagreement:
- Event-source filter required ``INSERT``; handler accepted
``INSERT`` and ``MODIFY``. Benign today (TaskEventsTable is
put-only), but a future chunk MODIFY-ing records would be
silently dropped by the filter while the handler was ready
to process them.
- Fix: handler now INSERT-only, matching the filter. Single
source of truth on the eventName invariant.
M1-rename — ``expected_non_approval_milestone`` skip reason was
misleading (the non-metric approval milestones like
``approval_late_win`` also land in this bucket). Renamed to
``expected_milestone_not_tracked``.
Tests: cdk 1327 → 1332 (+5: 3 classifier branches for new metrics,
1 reconciler AWAITING_APPROVAL path, 1 SUBMITTED-not-double-counted
guard). Agent + CLI unchanged. All pre-commit hooks green; pre-push
security fails only on the 3 pre-existing CVEs tracked for chore
issue filing.
Deferred findings from the same review (file as chore issues):
- H1: agent-dies-between-TIMED_OUT-and-resume loses latency
(edge, affects p99 bias)
- H3: late-win APPROVED created_at staleness invariant
(works today, document invariant)
- H4: _warn_cw daemon-thread burst under adversarial payload
- M2-M4: late-win metric, rename helpers, etc.
No upstream PR filing this chunk — deploy to Sam's AWS account
for integration testing first.
* fix(cedar-hitl): suppress AwsSolutions-IAM5 on Runtime ExecutionRole overflow policies
Synth + deploy were blocked by cdk-nag: the Cedar HITL additions
(TaskApprovalsTable grant + SlackUserMappingTable + extra env vars
threaded to the AgentCore runtime) pushed the runtime ExecutionRole
past CDK's inline-policy size limit, so CDK auto-splits excess
statements into ``OverflowPolicy1``. The overflow inherits the same
wildcard ``bedrock:InvokeModel*`` / CloudWatch actions as the base
policy but lives at a path
(``Runtime/ExecutionRole/OverflowPolicy1/Resource``) that the
existing ``addResourceSuppressions(runtime, ..., applyToChildren:
true)`` cannot reach — CDK creates overflow policies lazily during
synth ``prepare()``, after the construct tree has been frozen and
after static suppressions have been cached.
Suppress via an Aspect at MUTATING priority so the suppression is
applied before cdk-nag's READONLY visitor runs. Matches any path
containing ``/Runtime/ExecutionRole/OverflowPolicy`` + ending
``/Resource`` so future ``OverflowPolicy2``, etc. are covered
without hardcoding indices.
Verified: ``mise //cdk:synth`` now completes cleanly.
``mise //cdk:test`` still 1332/1332.
* fix(cedar-hitl): E2E deploy-readiness — policies bundle + onboarding gate + CLI error visibility
Three E2E T1.4 + T2.2 findings from the Chunk 10 integration-test
session. Batched into one commit since all three need the same
redeploy to verify:
1. agent/Dockerfile: COPY policies/ into the container image.
``PolicyEngine.__init__`` reads
``/app/policies/hard_deny.cedar`` + ``soft_deny.cedar`` at import
time via ``_POLICIES_DIR = Path(__file__).parent.parent /
"policies"``. The Dockerfile only copied ``src/``, so the
directory was missing and every Cedar-HITL task failed at 0 turns
with ``missing built-in hard-deny policies``. Introduced
alongside Chunk 2 when the policy files were first added —
Dockerfile was never updated. Zero tasks on this branch ever
su…1 parent 79d7d8f commit be3207a
143 files changed
Lines changed: 25105 additions & 2463 deletions
File tree
- agent
- policies
- src
- tests
- cdk
- layers/cedar-wasm
- src
- constructs
- handlers
- stacks
- test
- constructs
- handlers
- stacks
- cli
- src
- bin
- commands
- test
- commands
- contracts
- cedar-parity
- docs
- design
- guides
- scripts
- src/content/docs
- architecture
- customizing
- developer-guide
- getting-started
- using
- scripts
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
71 | 71 | | |
72 | 72 | | |
73 | 73 | | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
74 | 90 | | |
75 | 91 | | |
76 | 92 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
47 | 47 | | |
48 | 48 | | |
49 | 49 | | |
| 50 | + | |
50 | 51 | | |
51 | 52 | | |
52 | 53 | | |
| |||
This file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
20 | 20 | | |
21 | 21 | | |
22 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
23 | 30 | | |
24 | 31 | | |
25 | 32 | | |
26 | 33 | | |
27 | 34 | | |
28 | 35 | | |
29 | | - | |
30 | | - | |
31 | 36 | | |
32 | 37 | | |
33 | 38 | | |
| |||
49 | 54 | | |
50 | 55 | | |
51 | 56 | | |
52 | | - | |
53 | | - | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
54 | 60 | | |
55 | 61 | | |
56 | 62 | | |
57 | 63 | | |
58 | | - | |
59 | | - | |
60 | | - | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
61 | 79 | | |
62 | 80 | | |
63 | 81 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
52 | 52 | | |
53 | 53 | | |
54 | 54 | | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
55 | 59 | | |
56 | | - | |
| 60 | + | |
57 | 61 | | |
58 | 62 | | |
59 | 63 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
14 | | - | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
15 | 26 | | |
16 | 27 | | |
17 | 28 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
92 | 92 | | |
93 | 93 | | |
94 | 94 | | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
95 | 99 | | |
96 | 100 | | |
97 | 101 | | |
| |||
146 | 150 | | |
147 | 151 | | |
148 | 152 | | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
149 | 157 | | |
150 | 158 | | |
151 | 159 | | |
| |||
0 commit comments