Skip to content

Commit be3207a

Browse files
scoropezakrokokobgagent
authored
feat: Cedar HITL approval gates for agent tool use (aws-samples#88)
* docs(cedar-hitl): restore and revise HITL gates design, fold adversarial findings Design doc was accidentally removed in 0742ebe; restored from b34d7cd and substantially revised under a new filename. "Phase 3" framing dropped — this is the Cedar HITL approval gates feature. - Renamed PHASE3_CEDAR_HITL.md → CEDAR_HITL_GATES.md; all "phase" gating removed (Phase 3a/3b → v1 / future work §17). - Integrated 16 findings from 2026-05-06 adversarial review with realistic scenarios. Major structural changes: - Decision #23 (new): cross-engine parity contract between cedarpy (agent, Python) and @cedar-policy/cedar-wasm@4.10.0 (Lambda, TS). - §11.2: SlackUserMappingTable with OAuth user-initiated mapping; severity- gated Slack approvals; admin has no write path. - §7.1/§12.3: ApproveTaskFn uses cross-table TransactWriteItems for atomicity. - §10.1: user_id-status-index GSI on TaskApprovalsTable; v1 not v-later. - §15.6: cedar-wasm as a Lambda layer shared across policy Lambdas. - Gate-cap revision (2026-05-07): decision #13 — default 50, blueprint- configurable via security.approvalGateCap (bounded 1–500), persisted on TaskTable. Cache memory bound decoupled: 50-entry LRU regardless of cap. IMPL-22 adds telemetry-driven re-evaluation criteria. - Timeout adversarial+advocate pass (2026-05-07): - §6.5 VM-throttle race fix: re-read row on failed TIMED_OUT ConditionCheckFailed; honor APPROVED if user beat the timer. IMPL-24. - Sub-120s @approval_timeout_s emits blueprint-load WARN. IMPL-25. - User-visible timeout cap milestones (approval_timeout_capped_at_submit, approval_ceiling_shrinking). IMPL-26. - Runtime JWT: no refresh logic in agent/src/ (container uses IAM role); ceiling stays min(1h, maxLifetime_remaining - 120s). IMPL-27. - Three new CloudWatch metrics for timeout tuning. IMPL-28. - §14.8 new: off-hours trade-off section (fail-closed is the invariant). - §13.13 new: notification-delivery failure does NOT pause the timer (bypass-prevention). - Added six mermaid diagrams: three-outcome decision flow, end-to-end round- trip, TaskApprovalsTable state machine, Slack user-mapping, fail-closed decision flow, cross-engine parity check. - Cross-references updated in INTERACTIVE_AGENTS.md and SECURITY.md. - Starlight mirror regenerated via docs/scripts/sync-starlight.mjs. No code changes in this commit — design work only. Implementation lands in a follow-up PR per §15.2 task list. * feat(cedar-hitl): pin Cedar engines and seed cross-engine parity contract Chunk 1 of the Cedar HITL gates PR (docs/design/CEDAR_HITL_GATES.md). Lays the foundation before engine rewrites in Chunk 2+: both Cedar engines pinned exactly per decision #23, annotation surface validated by Day-1 spikes per decision #22, and the golden-file parity fixtures seeded so every subsequent chunk can rely on the contract. - Pin cedarpy==4.8.0 (agent) and @cedar-policy/cedar-wasm@4.10.0 (cdk) exactly (no ^/~); document both in mise.toml header. - Add agent/tests/test_cedarpy_annotations_contract.py (10 tests) validating all 5 annotations round-trip verbatim via policies_to_json_str() under staticPolicies.<id>.annotations. - Add cdk/test/handlers/shared/cedar-policy.test.ts (12 tests) validating policySetTextToParts + policyToJson extract the same annotations verbatim and isAuthorized returns the documented {type, response} wrapper shape. - Add contracts/cedar-parity/ with 5 golden-file fixtures (single-match, multi-match, hard-deny, soft-deny write, no-match default-allow) + README documenting the contract. Every fixture policy carries a @rule_id - including the base permit as @rule_id("base_permit") - so the parity tests raise if either engine returns an unannotated match instead of silently dropping it. - Add agent/tests/test_cedar_parity.py (6 tests, cedarpy side) and cdk/test/handlers/shared/cedar-parity.test.ts (6 tests, cedar-wasm side) loading the shared fixtures and asserting (decision, sorted rule_ids) match expected. Both tests hard-import cedarpy/cedar-wasm so a dependency regression fails loud rather than silently skipping. - Update docs/design/CEDAR_HITL_GATES.md sections 15.2 row 3, 15.6 prose and the parity mermaid diagram to point at contracts/cedar-parity/ (the precedent set by contracts/memory-hash-vectors.json) instead of a new tests/fixtures/ dir. Regenerate the Starlight mirror. - Add IMPL-29 noting the cedarpy diagnostics.reasons / cedar-wasm diagnostics.reason naming asymmetry surfaced by the spikes; engine code normalizes at the boundary. - Fix rev-4 -> rev-5 cosmetic footer drift. Test counts: agent 500 -> 516 (+16), cdk 1036 -> 1054 (+18), cli 190 unchanged. No production code changes in this chunk; engine rewrite lands in Chunk 2. Follow-up: separate chore issue to move contracts/memory-hash-vectors.json into a self-named subdir for consistency with contracts/cedar-parity/. * feat(cedar-hitl): three-outcome PolicyEngine core Chunk 2 of the Cedar HITL gates PR. Rewrites agent/src/policy.py into the three-outcome engine specified in docs/design/CEDAR_HITL_GATES.md section 6. The REQUIRE_APPROVAL outcome is the human-in-the-loop surface the next chunks (PreToolUse hook extension, REST API, CLI) plug into. This chunk ships the engine and its load-time validation; no hook or wire-format changes yet. Engine: - Outcome enum (ALLOW, DENY, REQUIRE_APPROVAL) + extended PolicyDecision with .allowed backward-compat shim for Phase 1a/1b/2 callers. Custom __init__ accepts both outcome= and legacy allowed= kwargs so existing tests keep working verbatim. - Three-outcome pipeline per section 6.2: hard-deny eval (absolute) -> allowlist fast-path (tool_type/tool_group/bash_pattern/write_path/ all_session) -> recent-decision cache (60s TTL on DENIED/TIMED_OUT) -> soft-deny eval (with post-eval rule-scope allowlist check and blueprint_disable filtering) -> default ALLOW. - ApprovalAllowlist (section 6.4): parses and matches every scope type. Strips whitespace and rejects empty-after-strip values so "tool_type: Read " normalizes instead of silently mismatching (review finding 6). - RecentDecisionCache (section 12.9): 50-entry LRU, INDEPENDENT of approvalGateCap. Populated only on DENIED/TIMED_OUT. Session-scoped (documented section 12.8 caveat). - Annotation handling (sections 5.2 + 6.3): parses @rule_id, @tier, @approval_timeout_s, @severity, @category via cedarpy.policies_to_json_str(); merges on multi-match with min timeout (clamped by 30s floor) and max severity. - Load-time validation (sections 5.1, 12.4): rejects missing/mismatched @tier, missing @rule_id, sub-floor timeouts, duplicate rule_ids across tiers, blueprint text > 64 KB, disable entries naming built-in hard-deny rules (finding 9), approval_gate_cap outside [1, 500] (decision 13). Sub-120s @approval_timeout_s emits WARN but accepts (IMPL-25). - Fail-closed posture (section 13): cedarpy parse errors surface via diagnostics.errors -> RuntimeError raised inside _eval_tier -> outer handler returns DENY with reason "fail-closed: <ExceptionType>". TypeError on json.dumps of unhashable tool_input surfaces as distinct "fail-closed: unhashable_tool_input" reason (review finding 5). Built-in policies: - agent/policies/hard_deny.cedar: base_permit catch-all + rm_slash + write_git_internals + write_git_internals_nested + drop_table + pr_review-specific Write/Edit forbids (absolute). - agent/policies/soft_deny.cedar: base_permit (catch-all required in each tier so cedarpy default-deny does not convert no-match into DENY) + force_push_any + force_push_main + push_to_protected_branch + write_env_files + write_credentials. All soft rules carry @tier, @rule_id, @approval_timeout_s, @severity, @category per section 15.4 starter set. Review findings addressed (1 blocker, 8 significant, plus minor): - blueprint_disable actually disables soft rules at eval time instead of silently no-op (the blocker: test coverage had been a silent-pass). - Legacy extra_policies with @tier/@rule_id rejected to avoid undefined double-annotation behavior. - _matching_rule_ids logs WARN on unknown policy IDs (state-drift signal). - base_permit validator exemption restricted to effect=="permit" so misnamed forbid rules cannot bypass validation (finding 7). - Hard-tier Cedar no_decision logged at WARN (signals missing/malformed base_permit catch-all). - Allowlist whitespace normalization + empty-value rejection. - StrEnum upgrade, Callable moved to TYPE_CHECKING, assert replaced with explicit RuntimeError for S101 compliance. Phase 1 compatibility: - All 39 existing test_policy.py tests pass unchanged via the .allowed property. One test (test_invalid_policy_syntax_fails_closed) updated to patch _hard_policies instead of the removed _policies attribute; docstring explains the rewrite. - extra_policies kwarg preserved; callers with annotated rules must migrate to blueprint_soft_policies / blueprint_hard_policies. Test counts: agent 516 -> 576 (+60: 51 three-outcome + 9 regression fixes). cli 190 unchanged. cdk 1054 unchanged. Carry-forward to Chunk 3: - extra_policies semantic shift (Phase 1 DENY -> Chunk 2 REQUIRE_APPROVAL); .allowed=False preserved but .outcome differs. Switchover happens when hooks.py adopts the three-outcome branching. - Cross-tier action-context asymmetry (review finding 8): document rule-authoring constraint in section 5.5 of design. - Probe entity-shape coverage (finding 10): extend _probe_cedar to exercise Write/Edit/Bash action paths, not just invoke_tool. * feat(cedar-hitl): approval milestone writers + engine counters Adds the 14 agent-side approval milestone writers (§11.1) on ``_ProgressWriter`` so Chunk 3's hook integration has a typed API instead of stringly-typed ``write_agent_milestone`` calls, and the per-task gate counter / per-container sliding-window rate limit / denial-injection queue on ``PolicyEngine`` that §6.5 requires. Why now: the hook work lands cleanly only after these surfaces exist — every code path in ``pre_tool_use_hook``'s REQUIRE_APPROVAL branch calls one of these helpers. Shipping them separately lets the hook commit be about the state machine, not the event-shape bookkeeping. Engine additions: - ``approval_gate_count`` / ``increment_approval_gate_count``: the per-task counter §12.9 bounds at ``approvalGateCap``. Session-scoped in v1; persistence tracked in §17. - ``approvals_in_last_minute`` / ``record_approval_gate_timestamp``: sliding-window rate limit (20/min/container, §12.9). Prune on read so callers see the current count without a separate tick. - ``queue_denial_injection`` / ``drain_denial_injections``: queue consumed by ``_denial_between_turns_hook`` at the next Stop seam (§6.5). Reason is pre-sanitized upstream by ``DenyTaskFn``. - ``mark_ceiling_shrinking_emitted``: emit-once latch for IMPL-26. - ``APPROVAL_RATE_LIMIT`` / ``APPROVAL_RATE_WINDOW_S`` module consts the hook imports rather than re-deriving. Milestone writers (§11.1 table, 14 agent-emitted of 15): - ``pre_approvals_loaded``, ``approval_requested``, ``approval_granted``, ``approval_denied``, ``approval_timed_out``, ``approval_stranded``, ``approval_write_failed``, ``approval_resume_failed``, ``approval_poll_degraded``, ``approval_timeout_capped``, ``approval_ceiling_shrinking``, ``approval_cap_exceeded``, ``approval_rate_limit_exceeded``, ``approval_late_win``. - ``approval_decision_recorded`` (Lambda audit) and ``approval_timeout_capped_at_submit`` (CreateTaskFn) stay on the Lambda side — Chunk 5 owns those. Each helper is a thin wrapper over ``_put_event("agent_milestone", ...)`` so the shared circuit-breaker + classifier path (finding #6/#8) continues to apply. Metadata keys mirror the §11.1 shapes verbatim (``maxLifetime_remaining_s`` preserves the design-doc spelling for downstream parsers). Tests: +29 total. 17 on ``TestApprovalMilestoneHelpers`` pin the DDB payload shape for each helper (including the two ``approval_timeout_capped`` reason variants — rule_annotation carries matching_rule_ids; maxLifetime_ceiling omits the field). 12 on the engine: counter monotonicity, rate-window prune semantics at window boundary, denial-queue FIFO + drain-clears, ceiling-shrinking latch idempotency. No caller changes — engine and writer surfaces are additive. Hook integration lands in commit C. * feat(cedar-hitl): TaskApprovals + AWAITING_APPROVAL transition primitives Adds the four agent-side DDB primitives §6.5 + IMPL-24 need for the three-outcome hook integration in the next commit: - ``transact_write_approval_request`` — cross-table TransactWriteItems: Put(TaskApprovalsTable) with ``attribute_not_exists(request_id)`` + Update(TaskTable) gated on ``status = RUNNING``. Atomic per §12.3 so a concurrent cancel cannot land the task in AWAITING_APPROVAL with no matching approval row (or vice versa). - ``transact_resume_from_approval`` — Update(TaskTable) gated on ``status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid``. The ``request_id`` condition prevents resuming with a stale ID after a reconciler race (§13.9). - ``best_effort_update_approval_status`` — conditional UpdateItem on the approval row with ``status = :pending`` guard. Returns False on ``ConditionalCheckFailedException``; this is the signal IMPL-24's re-read path fires on (§6.5 pseudocode lines 846-879, §13.12). - ``get_approval_row`` — GetItem with ``ConsistentRead=True`` by default. Required by IMPL-24's re-read; kept opt-out (bool flag) for future cold-path callers that don't need the strong read. Errors: - ``ApprovalTablesUnavailable`` for env-var-missing — raised loud so a pre-Chunk-4 deploy fails closed (hook will map to DENY) rather than silently no-op'ing the gate. - ``ApprovalWriteError`` / ``ApprovalResumeError`` wrap ``TransactionCanceledException`` with the cancellation reasons list. The hook uses these to distinguish the "concurrent cancel" branch from real DDB outages. - ``ConditionalCheckFailedException`` on ``update_item`` is consumed and returned as ``False`` from ``best_effort_update_approval_status`` — the caller (hook) needs the boolean to decide whether to re-read, not to propagate. - All other DDB errors propagate so the hook's outer try/except can classify fail-closed with a specific reason. Implementation notes: - Uses ``boto3.client("dynamodb")`` low-level API (not resource). ``transact_write_items`` lives on the client, and marshalling the approval row attributes explicitly gives deterministic DDB shapes that the tests can assert on. ``_py_to_ddb_attr`` covers the subset of Python types §10.1 actually uses (str/int/bool/None/list of str); any other type raises TypeError loudly rather than silently writing something unexpected. - ``_extract_error_code`` / ``_extract_cancellation_reasons`` duck-type on ``exc.response`` so we don't need botocore at import time (tests use a minimal exception class). - Errors from unsupported types (floats, dicts, etc.) are caught BEFORE the DDB round-trip so the unit-test asserts ``transact_write_items`` was not called — catches schema drift early. - Status constants (``_STATUS_RUNNING`` / ``_STATUS_AWAITING_APPROVAL``) named so a rename in CDK cannot silently diverge the Python path. Tests: +20 total. - 5 on TransactWriteApprovalRequest: env-missing, happy-path shape assertion (both items + conditions), TransactionCanceled → ApprovalWriteError with reasons preserved, other errors propagate, unsupported type rejected before any DDB call. - 3 on TransactResumeFromApproval: env-missing, happy-path expression shape (includes REMOVE awaiting_approval_request_id), cancel → ApprovalResumeError. - 4 on BestEffortUpdateApprovalStatus: happy path returns True, ``reason`` kwarg attaches ``deny_reason``, ConditionalCheckFailed returns False (IMPL-24's signal), other errors propagate. - 4 on GetApprovalRow: ConsistentRead default True, opt-out False, row-not-found returns None, row unmarshalling through every supported DDB attribute type. - 4 on helpers: error-code extraction with and without ClientError-shape, cancellation-reasons extraction with and without. No runtime callers yet — hook integration lands in commit C. Physical TaskApprovalsTable lands in Chunk 4; Python side is wire-compatible so the hook work can be unit-tested today with mocked clients. * feat(cedar-hitl): PreToolUse three-outcome REQUIRE_APPROVAL path Wires the agent to the full §6.5 pseudocode: cap + rate-limit check, atomic TransactWriteItems for pending row + TaskTable AWAITING_APPROVAL, 2s→5s ConsistentRead poll, IMPL-24 VM-throttle race re-read, resume transition, scope propagation to allowlist, and denial-injection queue consumed at the next Stop seam. Completes §15.2 rows 26 + 27. Hook control flow (three outcomes) ---------------------------------- - ALLOW / DENY: existing Phase 1 behavior, now switching on ``.outcome`` rather than ``.allowed``. Legacy Phase 1/2 tests still green because PolicyDecision preserves the ``.allowed`` shim. - REQUIRE_APPROVAL (new): extracted into ``_handle_require_approval`` for readability. Delegates to ``task_state`` primitives and ``engine.*`` counter surfaces from the prior commits; no new DDB client construction here. Key pieces: - ``_compute_effective_timeout`` applies the §6.5 min(rule, default, lifetime) formula. The engine's ``_merge_annotations`` has already clipped decision.timeout_s against the task default; the hook adds the remaining-lifetime ceiling and floors at FLOOR_30S. ``clip_reason`` distinguishes ``rule_annotation`` (rule was tighter than task default) from ``maxLifetime_ceiling`` (task is late in its life) so ``approval_timeout_capped`` carries the right reason. - ``_remaining_maxlifetime_s`` reads ``AGENTCORE_MAX_LIFETIME_S`` + ``TASK_STARTED_AT`` env vars (8h default). Returns ``None`` when the start timestamp is absent — the hook treats that as "unknown, don't clip" rather than pre-DENYing, so Phase 1 test paths that don't set the env var still see the old task-default behaviour. Chunk 4/5 will wire these at task launch. - ``_poll_for_decision`` uses 2s cadence for the first 30s then 5s (IMPL-12). All polls use ``ConsistentRead=True`` per IMPL-24. 3 consecutive GetItem failures emit ``approval_poll_degraded``; 10 consecutive failures fall through as TIMED_OUT with a specific reason (§13.2). - ``_reconcile_late_decision`` implements IMPL-24 re-read: on a ConditionCheckFailed from the TIMED_OUT write, re-read with ConsistentRead. APPROVED → rebuild outcome, propagate scope to allowlist, run normal allow flow, emit ``approval_late_win``. DENIED → honor the user's sanitized reason. PENDING or row gone → fall through with TIMED_OUT (fail-closed, §13.12 last paragraph). Cancel-wins semantics (finding #2) ---------------------------------- ``_denial_between_turns_hook`` is registered AFTER ``_nudge_between_turns_hook`` in ``between_turns_hooks`` so cancel short-circuits both. The hook re-checks ``_cancel_requested`` itself as belt-and-braces (matching the nudge hook) so a future reorder does not silently break cancel-wins. Denial queue is PRESERVED on cancel — not drained — so a denial still sitting on the queue when the task is being torn down does not leak across tasks (the engine is per-task per §IMPL-7). ``stop_hook`` threads ``engine`` into ``ctx`` so the denial hook can ``drain_denial_injections``. ``build_hook_matchers`` accepts a new ``user_id`` kwarg (§12.2) so approval rows carry caller identity for the REST side's ownership check. ``permissionDecisionReason`` guaranteed surface ----------------------------------------------- The hook's deny return is the ONLY guaranteed surface the SDK emits to the agent; denial injection is best-effort (pre-empted by cancel). ``_deny_response`` pipes every reason through ``_strip_ansi`` + ``_truncate(500)``: ANSI sequences can never reach the model, and the line stays loggable. §12.7 requirement. Tests: +24 agent hook tests (47 total in test_hooks.py). Run in 0.92s via a ``_fast_poll`` fixture that collapses ``asyncio.sleep`` to a no-op AND advances ``hooks.time.monotonic`` by the requested duration so the poll wall-clock deadline actually trips. Happy paths: - APPROVED + scope propagation to allowlist + milestones. - APPROVED with scope=this_call does NOT grow allowlist. - DENIED queues denial injection + populates recent-decision cache (next identical call auto-denies). - TIMED_OUT writes TIMED_OUT row and emits approval_timed_out. IMPL-24 race: four branches. - APPROVED re-read → allow flow, approval_late_win milestone, scope propagated, resume succeeds. - DENIED re-read → deny flow, approval_late_win milestone, user's reason is the permissionDecisionReason. - Still-PENDING re-read → fail-closed fall-through (no late_win). - Row-gone re-read → same fail-closed fall-through. Cap / rate-limit / write failure / resume failure branches all: - Short-circuit before any DDB write when the local guard fires (cap, rate limit). - Emit the right approval_* milestone. - Return DENY with a specific permissionDecisionReason. Sanitization: - ANSI stripped from deny reason. - Deny reason truncated to ≤500 chars. Timeout clipping: - rule_annotation reason when a rule's approval_timeout_s is below the task default; matching_rule_ids populated. - maxLifetime_ceiling reason when remaining lifetime is the tightest bound; matching_rule_ids is None. - approval_ceiling_shrinking emits exactly once per task (IMPL-26 latch). Denial injection hook (6 tests): - Draining produces a <user_denial request_id=... decided_at=...> block with XML-escaped reason. - Cancel short-circuit preserves the queue so the denial is not lost; just not injected into a dying agent. - Hostile reason (</user_denial>...<user_nudge>) is XML-escaped so the envelope cannot be forged. - No-engine ctx returns [] (Phase 1 call sites still work). - Registered LAST in ``between_turns_hooks`` (invariant for §6.5 finding #2). - End-to-end via stop_hook: queued denial becomes ``decision=block`` + reason on the Stop return. Carry-forward ------------- - ``_remaining_maxlifetime_s`` returns None when TASK_STARTED_AT is unset — Chunk 4/5 will wire this at task launch. Tracked in §16. - ``approval_gate_count`` lives on the engine (session-scoped) not on TaskTable in v1. §13.6 notes that the reconciler + approval_gate_cap still bound worst-case across container restarts. Chunk 7+ tracks persistence when telemetry justifies it. - Denial injection emits a ``user_denial_injected`` milestone that is NOT in the §11.1 enumerated table. It mirrors ``nudge_acknowledged`` for stream visibility; keep the name distinct from the ``approval_*`` prefix so future §11.1 consumers can't confuse it with an approval outcome. * feat(cedar-hitl): TaskApprovalsTable + SlackUserMapping + status enum Lands the stateless CDK primitives for Cedar-HITL approval gates so Chunk 5's REST handlers can be wired onto concrete tables. Completes §15.2 tasks 9, 20, and 25. Constructs ---------- ``TaskApprovalsTable`` (§10.1) - PK ``task_id`` + SK ``request_id`` (ULID). Matches the agent-side primitives landed in the prior commit. - GSI ``user_id-status-index`` with user_id PK + status SK and an ``INCLUDE`` projection limited to the fields GET /v1/pending renders. Three deny-sensitive attrs (``deny_reason``, ``scope``, ``tool_input_sha256``) deliberately omitted from the projection — the list endpoint only returns PENDING rows in practice, but excluding them kills the projection-leak concern outright and costs no bytes today. - Exports ``USER_STATUS_INDEX_NAME`` as a module constant + mirrors it on ``construct.userStatusIndexName`` so handlers referencing the GSI fail compile-time on a rename. - TTL attribute ``ttl`` (agent writes ``created_at + timeout_s + 120s``). - No DynamoDB streams per §11.2. TaskEventsTable carries the audit fan-out; streams here would duplicate. - Default RemovalPolicy.DESTROY to match the rest of the sample. Production deploys override to RETAIN per §10.1. ``SlackUserMappingTable`` (§11.2, finding #4) - Single-key (``slack_user_id`` PK). No SK, no TTL, no GSI, no stream. The forward-only shape is the trust boundary — a reverse GSI (Cognito → Slack) would let a compromised Cognito sub enumerate Slack identities without adding v1 capability. - Writes land through LinkSlackUserFn (Chunk 5) which enforces the ``attribute_not_exists(slack_user_id)`` condition so a prior legitimate mapping cannot be overwritten by a later compromise. ``task-status.ts`` — AWAITING_APPROVAL (§10.3) - Added to TaskStatus enum + ACTIVE_STATUSES (NOT TERMINAL_STATUSES: the task is alive, paused on a human decision). - VALID_TRANSITIONS wires the five edges §10.3 enumerates: RUNNING → AWAITING_APPROVAL (soft-deny entry) HYDRATING → AWAITING_APPROVAL (rare early-gate case) AWAITING_APPROVAL → RUNNING (approve / deny resume) AWAITING_APPROVAL → CANCELLED (user cancel mid-approval) AWAITING_APPROVAL → FAILED (stranded-approval reconciler) - Notably NOT added: AWAITING_APPROVAL → FINALIZING (approve-during-cleanup race) AWAITING_APPROVAL → COMPLETED (skip RUNNING) AWAITING_APPROVAL → TIMED_OUT (timer lives on the approval row, not the task clock) These are regression tests so a future refactor cannot quietly add them and bypass the `awaiting_approval_request_id = :rid` invariant. Tests: +29 total. - TaskApprovalsTable (11 tests): PK/SK schema, PAY_PER_REQUEST, PITR default + override, TTL attribute, NO streams, GSI schema + projection + sensitive-attr exclusion, removal policy default + override, ``USER_STATUS_INDEX_NAME`` constant parity with the construct field. - SlackUserMappingTable (8 tests): single-key schema (explicit KeySchema length assertion), PAY_PER_REQUEST, PITR, no streams, no reverse GSI, DESTROY default, TTL absent. - TaskStatus (+10 tests over existing: 5 new assertions on the 9-state cardinality, AWAITING_APPROVAL membership, and the transition graph including the three forbidden edges). The existing assertions updated for the new state count. No stack wiring yet — ``agent.ts`` instantiation + env var plumbing + grants land in the next commit alongside the Cedar-WASM Lambda layer. * feat(cedar-hitl): Cedar-wasm layer + wire approval tables into agent stack Activates the agent-side approval path and ships the Lambda layer Chunk 5's REST handlers need. Cedar-wasm Lambda layer (§15.2 task 10) ---------------------------------------- ``CedarWasmLayer`` bundles ``@cedar-policy/cedar-wasm@4.10.0`` into ``/opt/nodejs/node_modules/`` so Lambdas can ``require('@cedar-policy/cedar-wasm/nodejs')`` without shipping the 4 MB wasm binary in every function package. A dedicated ``cdk/layers/cedar-wasm/`` directory carries a minimal ``package.json`` pinning the exact version — bundling runs ``npm install --omit=dev`` against that manifest, so the layer build is hermetic from any ``cdk/node_modules/`` drift. The bundler has two fallbacks: - Docker (``public.ecr.aws/sam/build-nodejs22.x``) for CI / prod deploys. - Local-npm fallback for environments without Docker (unit-test synths + `cdk synth` on runners that lack Docker). The local path is safe here because the layer ships pure JS + a prebuilt wasm binary — no native build step. Three constants exposed from the module: - ``CEDAR_WASM_VERSION`` — single source of truth for the pinned version; tests assert this matches both ``cdk/package.json`` and the layer manifest, so the three places the version lives stay in sync. - ``CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` — 512 MB floor for attaching Lambdas per §15.2 task 10. - ``CedarWasmLayer.layer`` — the underlying ``LayerVersion`` for Chunk 5 handlers to attach via ``fn.addLayers(...)``. Agent stack wiring (§15.2 task 19) ------------------------------------ ``agent.ts`` now instantiates: - ``TaskApprovalsTable`` (prior commit) — grants RW to the runtime so ``pre_tool_use_hook`` can TransactWriteItems + ConsistentRead the PENDING row. - ``SlackUserMappingTable`` (prior commit) — not granted to the runtime; only the link-user Lambda (Chunk 5) writes here. - ``CedarWasmLayer`` — the layer's asset lands in the synthed template so Chunk 5 handlers can reference ``.layer`` without causing a new asset on their deploy. New runtime env vars: - ``TASK_APPROVALS_TABLE_NAME`` — consumed by ``task_state._require_tables``; its absence previously raised ``ApprovalTablesUnavailable`` → hook DENY. Now set, so the approval path is live on deploy. - ``AGENTCORE_MAX_LIFETIME_S = '28800'`` — 8 hours, matching ``lifecycleConfiguration.maxLifetime``. Consumed by the hook's ``_remaining_maxlifetime_s`` for the maxLifetime ceiling clip (§6.5). Kept in sync with the lifecycle via a direct test assertion so drift surfaces at build time. New CfnOutputs: ``TaskApprovalsTableName``, ``SlackUserMappingTableName``, ``CedarWasmLayerArn``. Each is useful for post-deploy smoke tests (`aws dynamodb describe-table` / `aws lambda get-layer-version`). Tests: +8 layer tests + 9 agent-stack assertions. Layer: - LayerVersion resource count. - Compatible runtimes (nodejs20/22). - Description carries the pinned version. - CEDAR_WASM_VERSION matches ``cdk/package.json``. - CEDAR_WASM_VERSION matches ``layers/cedar-wasm/package.json``. - CEDAR_WASM_MIN_LAMBDA_MEMORY_MB ≥ 512. - Custom description override works. - ``.layer`` exposes a real ``LayerVersion``. Agent stack: - Table count updated from 6 → 8. - TaskApprovalsTable schema match (task_id PK / request_id SK, user_id-status-index GSI presence). - SlackUserMappingTable single-key schema. - LayerVersion count + compatibleRuntimes. - Three new CfnOutputs present. - TASK_APPROVALS_TABLE_NAME env var on the runtime. - AGENTCORE_MAX_LIFETIME_S == '28800' (drift guard). Carry-forward ------------- - ``TASK_STARTED_AT`` is the other input the hook's ``_remaining_maxlifetime_s`` consumes — it's a PER-TASK value the orchestrator must stamp at invocation time, not a stack-level env var. Chunk 5's orchestrator changes need to add it to the runtime invocation payload / session env. For now the hook's fallback ("unknown, don't clip") keeps approvals functional. - Chunk 5 will attach the CedarWasmLayer onto ApproveTaskFn, DenyTaskFn, GetPoliciesFn, CreateTaskFn and assert ``memorySize >= CEDAR_WASM_MIN_LAMBDA_MEMORY_MB`` for each. * feat(cedar-hitl): approve + deny handlers + shared types (§7.1, §7.2) Lands the two user-facing REST handlers that flip a PENDING approval row to APPROVED / DENIED, the shared types both call sites and the CLI consume, and the Lambda-side Cedar parser future Chunk-5 handlers (get-policies, create-task validation) will use. Wire types (shared/types.ts) ---------------------------- - ApprovalScope union covering every shape the agent's ApprovalAllowlist understands. Typed so approve-task / create-task / CLI (Chunk 6) all agree at compile time. - ApprovalRecord / ApprovalRequest / ApprovalResponse / DenyRequest / DenyResponse / PendingApprovalSummary / GetPendingResponse / PolicyRuleSummary / GetPoliciesResponse / LinkSlackUserRequest / LinkSlackUserResponse / SlackUserMappingRecord / ApprovalDecisionRecordedEvent / CreateTaskApprovalExtensions. - Constants: DENY_REASON_MAX_LENGTH=2000, INITIAL_APPROVALS_MAX_ENTRIES=20, INITIAL_APPROVALS_MAX_ENTRY_LENGTH=128, APPROVAL_TIMEOUT_S_MIN=30, APPROVAL_TIMEOUT_S_MAX=3600, APPROVAL_TIMEOUT_S_DEFAULT=300. New error codes: REQUEST_NOT_FOUND, REQUEST_ALREADY_DECIDED, TASK_NOT_AWAITING_APPROVAL. Shared helpers -------------- - shared/approval-scope.ts — parseApprovalScope validates every shape; rejects unknown tool types / groups / prefixes, empty values, over-128-char strings. isDegeneratePattern implements §7.4 (length ≤ 2, all-wildcard, wildcard ratio > 50%) for Chunk-5 create-task. - shared/deny-reason-scanner.ts — scanDenyReason redacts AWS keys, GitHub PATs (classic + fine-grained), Slack tokens, PEM blocks, bearer tokens with [REDACTED-...] markers. Mirrors agent/src/output_scanner.py so the deny reason the agent ultimately reads is never raw user input. - shared/cedar-policy.ts — parseRules pulls the five HITL annotations (tier/rule_id/severity/approval_timeout_s/category) into a ParsedRule[], preserving positional policy_id for IMPL-29 diagnostics-to-rule_id resolution. isHardDenyRule, isValidRuleId, matchingRuleIds, concatPolicies exposed for future handlers. Handlers -------- - approve-task.ts (§7.1) — POST /v1/tasks/{task_id}/approve - Cross-table TransactWriteItems: approval row PENDING → APPROVED guarded by user_id = :caller AND status = :pending; TaskTable no-op Update guarded by status = AWAITING_APPROVAL AND awaiting_approval_request_id = :rid. - TransactionCanceledException classified by per-item CancellationReasons. Approval-row failure collapses to 404 REQUEST_NOT_FOUND (no existence oracle per §7.1 finding #6); task-row failure → 409 TASK_NOT_AWAITING_APPROVAL. - Optional scope defaults to this_call. - Per-user per-minute rate limit (30/min, synthetic row). - Writes approval_decision_recorded audit event (IMPL-6). Audit failure is logged but does not fail the request — decision is already committed. - deny-task.ts (§7.2) — POST /v1/tasks/{task_id}/deny - Same cross-table pattern; status → DENIED + deny_reason. - Reason is scanDenyReason-sanitized + truncated to DENY_REASON_MAX_LENGTH BEFORE any persistence — agent and audit both read sanitized form; raw input never stored. - Same rate-limit namespace as approve. Tests: +64 total (cedar-policy-parser 24, approval-scope 28, deny-reason-scanner 13, approve-task 14, deny-task 9). Secret test fixtures are assembled from string fragments so the source never holds a contiguous secret literal — Code Defender pre-commit hook otherwise blocks. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout + LinkSlackUserFn + GetPolicies + GetPending) lands in the next commit. * feat(cedar-hitl): get-pending + get-policies + link-slack-user handlers Lands the three read/discovery handlers Chunk 6 (CLI) needs to power ``bgagent pending``, ``bgagent policies list/show``, and ``bgagent notifications configure slack``. Completes §15.2 tasks 14, 15, and 25 (handler side). Handlers -------- ``get-pending.ts`` (§7.7 — GET /v1/pending) - Queries ``user_id-status-index`` GSI on TaskApprovalsTable with ``user_id = :caller AND status = :pending``. Without the GSI this would be a full-table Scan per call — under ``watch -n1 bgagent pending`` that exhausts burst capacity for the whole fleet (§10.1 finding #8). - Response maps each row to ``PendingApprovalSummary`` with a derived ``expires_at = created_at + timeout_s`` so the CLI can render time-to-timeout without doing arithmetic on ISO strings. - Severity coerced to ``medium`` on unknown values so GSI writes that drift from the enum don't break the list response. - Rate-limited 10/min/user (synthetic row on the same table, namespaced ``RATE#<user>#PENDING`` so it does not collide with the approve/deny counter). ``get-policies.ts`` (§7.6 — GET /v1/repos/{repo_id}/policies) - Combines ``BUILTIN_HARD_DENY_POLICIES`` + ``BUILTIN_SOFT_DENY_POLICIES`` with the repo's ``cedar_policies`` blueprint override. Runs the combined text through ``parseRules`` and returns ``{hard[], soft[]}`` rule summaries. - 5-minute per-repo in-Lambda cache; cold starts throw it away. ``_resetCacheForTests`` exposed for unit-test isolation. - Repo ID is URL-decoded from the path (``owner%2Frepo`` common in CLI UX). - Rate-limited 30/min/user. - Blueprint load failure falls back to built-ins with a WARN log; invalid blueprint cedar text returns 503 ``SERVICE_UNAVAILABLE`` rather than a misleading empty list. ``link-slack-user.ts`` (§11.2 finding #4 — POST /v1/notifications/slack/link) - Writes to SlackUserMappingTable with ``ConditionExpression: attribute_not_exists(slack_user_id)``. This guard is the entire admission control the §11.2 design hinges on: even a compromised Slack admin cannot overwrite an existing mapping. - Validates ``slack_user_id`` shape (letters, digits, underscores, 2–40 chars) so junk rows cannot land. - Conflict surface is 409 ``REQUEST_ALREADY_DECIDED`` — reused error code (the payload message directs the user to unlink via support). - Slack link_token end-to-end validation against Slack OAuth is deferred — v1 accepts the token on trust from the Cognito-authed caller; it is persisted in CloudWatch for audit. Supporting primitives --------------------- ``shared/builtin-policies.ts`` — mirrors ``agent/policies/hard_deny.cedar`` and ``agent/policies/soft_deny.cedar`` as TypeScript string constants. Embedded rather than read from disk because Lambda's esbuild bundler does not copy non-TS assets by default and a dedicated bundling hook is more code than the embed. A drift test (``builtin-policies.test.ts``) asserts byte-equality with the agent files so any change on one side without the other flips red at build time. ``shared/cedar-policy.ts`` — ``parseRules`` now skips the unannotated ``base_permit`` entry (both tiers need it as a Cedar catch-all; it is not a user-facing rule so it stays out of ParsedRule[]). This matches the agent-side ``_parse_policy_annotations`` behaviour. Tests: +37 total. - get-pending (8): 401 on missing auth, 429 on rate limit, empty result, GSI query shape, row → PendingApprovalSummary with derived expires_at, severity fallback, missing timeout → expires_at falls back to created_at, 500 on DDB error. - get-policies (11): 401/400 validation, built-in rules listed on empty repo, URL-decoded repo path, custom blueprint rule lands in soft, per-repo cache across calls, 429 rate limit, 503 on invalid blueprint cedar, fallback on load failure, hard rules omit severity / approval_timeout_s, soft rules carry them. - link-slack-user (8): 401/400 validation, shape check, 201 on success, 409 on overwrite attempt, 500 on unknown DDB error, whitespace trim on slack_user_id, ConditionExpression verified. - builtin-policies (4): drift byte-equality with both agent files, parseRules round-trip for hard/soft rule IDs. - cedar-policy (updated): ``base_permit`` is skipped from ParsedRule[] rather than rejected. Stack wiring (task-api.ts routes, agent.ts layer attachment, CreateTaskFn extension, orchestrator + reconciler + fanout) lands in the next commit. * feat(cedar-hitl): wire Chunk 5 routes + orchestrator + reconciler + agent plumbing Completes Chunk 5 end-to-end: the five new Lambdas are instantiated and wired onto the REST API, the orchestrator threads approval-related data through to the agent runtime, the stranded-task reconciler sweeps AWAITING_APPROVAL tasks, and the agent pipeline accepts the new per-task approval configuration. Stack wiring (agent.ts + task-api.ts) ------------------------------------- - TaskApi construct accepts `taskApprovalsTable`, `slackUserMappingTable`, `cedarWasmLayer` props. Approve/Deny/GetPending Lambdas are created when the approvals table is present; GetPolicies also requires the cedar-wasm layer + RepoTable. Slack-link Lambda attaches when the slack mapping table is provided. - New routes: POST /tasks/{task_id}/approve POST /tasks/{task_id}/deny GET /pending GET /repos/{repo_id}/policies POST /notifications/slack/link - GetPoliciesFn configures `memorySize: 512` (Cedar-wasm floor from §15.2 task 10) and externalizes `@cedar-policy/cedar-wasm` from the esbuild bundle so the layer provides the wasm binary at runtime. - CedarWasmLayer compatibleRuntimes extended to include nodejs24.x (the Lambda runtime) — the Node 20/22 list was the original §15.2 spec but the actual function uses Node 24. - agent.ts passes all three new constructs into TaskApi. Orchestrator (shared/orchestrator.ts) ------------------------------------- - `finalizeTask` now treats AWAITING_APPROVAL as a "task still alive" terminal-timeout source: on poll exhaustion the task transitions to TIMED_OUT with a distinct `approval_poll_timeout` reason + error message ("Orchestrator poll timeout exceeded while awaiting approval"). The stranded-approval reconciler is the secondary safety net (§13.6) for tasks the orchestrator already lost track of. - Invocation payload now carries three new fields: - `task_started_at` (ISO 8601 at HYDRATING → RUNNING time) — consumed by the agent hook's `_remaining_maxlifetime_s` so the §6.5 maxLifetime ceiling math uses the real task clock instead of the fail-open fallback. - `approval_timeout_s` (when the submit payload supplied it). - `initial_approvals` (when the submit payload supplied entries). Stranded-task reconciler ------------------------ - Sweeps AWAITING_APPROVAL in addition to SUBMITTED/HYDRATING. - New `APPROVAL_STRANDED_TIMEOUT_SECONDS` env var (default 7200s = 2h) — double §7.3's 1h ceiling so this reconciler never races the happy-path timer. - Distinct failure message on approval-stranded vs generic-stranded so users see "approval stranded — container evicted" rather than the misleading "no pipeline attached" copy. Fanout (handlers/fanout-task-events.ts) --------------------------------------- - Slack channel default set replaces the forward-compat `approval_required` stub with the real §11.1 events: `approval_requested` and `approval_stranded`. Other approval milestones (granted/denied/timed_out/late_win/etc.) stay out of default routing to avoid notification fatigue — the CLI surfaces those confirmations directly. - Email default replaces `approval_required` with `approval_requested` (high-severity gates only; severity gating happens in the dispatcher). Create-task validation (shared/create-task-core.ts) --------------------------------------------------- - New request fields: - `approval_timeout_s` — integer within `[APPROVAL_TIMEOUT_S_MIN, APPROVAL_TIMEOUT_S_MAX]`. - `initial_approvals` — array of scope strings; each entry must be a valid `ApprovalScope` per `parseApprovalScope`; bash_pattern and write_path scopes get the §7.4 degenerate-pattern check. - TaskRecord extended with `approval_timeout_s`, `initial_approvals`, `approval_gate_count` (seeded to 0 at admission), and `awaiting_approval_request_id` (written atomically by the agent's `transact_write_approval_request` primitive). Agent plumbing (models.py / config.py / pipeline.py / runner.py / server.py) ---------------------------------------------------------------------------- - `TaskConfig` adds `approval_timeout_s`, `initial_approvals`. - `build_config`, `run_task`, `_run_task_background`, and `_extract_invocation_params` thread the two new fields from payload → config → PolicyEngine. - `server._extract_invocation_params` stamps `os.environ["TASK_STARTED_AT"]` from the payload so the hook's `_remaining_maxlifetime_s` returns real values (carry-forward from Chunk 3 resolved). - `runner.py` constructs PolicyEngine with `initial_approvals` + `task_default_timeout_s` when supplied; the engine clamps bad values at construction time. Tests ----- All CDK tests pass: 1219 / 1219. All agent tests pass: 648 / 648. Affected suites (changes only): - test/stacks/agent.test.ts: cedar-wasm layer CompatibleRuntimes now expects `nodejs24.x`; table count still 8. - test/constructs/cedar-wasm-layer.test.ts: same runtime expansion. - test/handlers/fanout-task-events.test.ts: approval_required → approval_requested/approval_stranded in Slack default set; approval_required → approval_requested in Email default set. - test/handlers/reconcile-stranded-tasks.test.ts: primeResponses now queue a third `Items: []` for AWAITING_APPROVAL queries; queryCalls assertion bumped to 3. Carry-forward (non-blocking) ---------------------------- - GetPoliciesFn has write access to TaskApprovalsTable (for the rate-limit counter path). A future permissions audit should tighten this to a single-item write scoped to `RATE#<user>#*`. - TASK_STARTED_AT env var is only set when a payload supplies it; server.py still supports the Phase 2 no-payload startup path. * feat(cedar-hitl): Chunk 6 CLI — approve / deny / pending / policies Ships the four user-facing commands that close the Cedar HITL loop: once Chunks 1-5 have a PENDING approval row and the Slack/Email fan-out has notified the user, Chunk 6 is how they actually respond. New commands (cli/src/commands/) -------------------------------- - `bgagent approve <task-id> <request-id> [--scope <scope>] [--yes]` Default scope is `this_call`; callers extend allowlist with `tool_type:Bash`, `rule:<id>`, etc. `all_session` is the only scope that requires `--yes` to confirm — mirrors the safety UX from §8.4. Error classification maps 404 → "run `bgagent pending`", 409 → "task no longer awaiting approval", 429 → rate-limit, 401 → login. - `bgagent deny <task-id> <request-id> [--reason ... | --reason-file ...]` `--reason-file` accepts multi-line reasons that would otherwise need shell quoting. Client-side `DENY_REASON_MAX_LENGTH` cap avoids a round-trip on obviously-too-long reasons; the server still truncates. Reason is sanitized server-side (output_scanner) before ever reaching the agent. - `bgagent pending [--output text|json]` Lists every PENDING approval owned by the caller. Rendered with approve/deny hints inline so the user can copy-paste the next command. JSON output for scripting. Rate-limited server-side. - `bgagent policies list --repo <owner/repo> [--tier hard|soft]` `bgagent policies show --repo <owner/repo> --rule <rule_id>` Discovery commands so users can find rule IDs without reading CDK source. Both subcommands reuse a single `listPolicies` API call and filter locally. Wire changes ------------ - `cli/src/api-client.ts`: `approveTask`, `denyTask`, `listPending`, `listPolicies` — each matching the §7.1 / §7.2 / §7.6 / §7.7 request/response shapes. `approveTask` omits the `scope` body field when unset so the server's `this_call` default applies. - `cli/src/types.ts`: mirrors the Chunk 5 server types verbatim — `ApprovalScope` union, `ApprovalRequest/Response`, `DenyRequest/Response`, `PendingApprovalSummary`, `GetPoliciesResponse`, `PolicyRuleSummary`, plus the five constants (`DENY_REASON_MAX_LENGTH`, `INITIAL_APPROVALS_MAX_ENTRIES`, `INITIAL_APPROVALS_MAX_ENTRY_LENGTH`, `APPROVAL_TIMEOUT_S_MIN/MAX/DEFAULT`). - `cli/src/bin/bgagent.ts`: registers the four new commands in the order they appear in help output. Tests: +27 new (217 total). - approve (9): default scope, custom scope, all_session guard + `--yes` bypass, JSON output, 404/409/401/429 error classifications. - deny (6): no-reason path, `--reason`, `--reason-file` with tmpdir fixture, mutually-exclusive rejection, over-length rejection, 404 classification. - pending (5): empty render, populated render with approve/deny hints, JSON output, 401 and 429 classifications. - policies (7): list both tiers, `--tier` filter, `--output json`, bad `--tier`, show found rule, show unknown rule, 404 repo-not-onboarded classification. Carry-forward ------------- - `submit.ts` extension with `--approval-timeout` / `--pre-approve` flags is deferred to a follow-up commit — the server already accepts these fields on POST /v1/tasks (Chunk 5), and `bgagent submit` already forwards unknown payload fields through the existing request path, so users can set them via `--body-file` today until the explicit flags land. - `--output json` on error branches currently returns a CliError instead of a JSON error envelope; matches the pattern the existing commands use (status, cancel, nudge). Follow-up to standardize JSON error envelopes across the whole CLI if that becomes a common scripting pain point. * feat(cedar-hitl): Chunk 7a — persist gate counter + IMPL-23 cache observability Persist approval_gate_count to TaskTable across container restarts per §13.6 so the cumulative gate budget survives eviction. Emit pre_approvals_loaded after PolicyEngine init per §4 step 7 / §11.1 so operators see the starting approval posture in the live SSE stream. Add IMPL-23 cache-hit observability: cache hits attach metadata to PolicyDecision, hook forwards to new write_policy_decision_cached progress helper (decision_source="recent_decision_cache"). Why: container restarts were silently resetting the per-task gate counter, re-exposing users to another approvalGateCap-worth of gates per restart. Cache-driven denies were invisible in TaskEventsTable beyond the initial gate. Fresh tasks emitted no "starting posture" signal so dashboards could not distinguish "no pre-approvals seeded" from "agent has not started". Surface additions: - task_state.increment_approval_gate_count_in_ddb — best-effort atomic ADD on approval_gate_count - PolicyEngine(initial_approval_gate_count=N) — seed session counter - TaskConfig.initial_approval_gate_count — orchestrator payload field - progress_writer.write_policy_decision_cached — IMPL-23 emitter - PolicyDecision.cache_hit_metadata — observability-only field - _CachedDecision.original_decision_ts — wall-clock preservation - runner._initialize_policy_engine_and_hooks — extracted helper Counter survival is a safety bound, not correctness: DDB failure does NOT block the gate (§13.6). Joint-update invariant on status + awaiting_approval_request_id (§10.2) is preserved — counter uses separate UpdateItem, not merged into resume transaction. Tests: +36 agent (648→684), +8 CDK (1219→1227), +6 new runner tests. * feat(cedar-hitl): Chunk 7b — persist approval_gate_cap from blueprint Capture the per-task approval-gate cap at submit-time (§4 step 5, decision #13, §13.6) so a blueprint-configured override is frozen onto the TaskRecord. Mid-task blueprint edits cannot shift the cap beneath a running task; container restarts re-seed the agent's PolicyEngine from the persisted value instead of its compile-time default-50. Why: Chunk 7a added approval_gate_count persistence but the cap itself was still resolved from the blueprint on every restart — so an operator lowering security.approvalGateCap mid-task would retroactively fail-close the running task. The design has always said cap is frozen at submit; this chunk makes the implementation match. Surface additions: - BlueprintProps.security.approvalGateCap (CDK, synth-validated [1, 500] integer) — new per-repo blueprint prop - RepoConfig.approval_gate_cap + BlueprintConfig.approval_gate_cap - TaskRecord.approval_gate_cap + APPROVAL_GATE_CAP_{MIN,MAX,DEFAULT} - create-task-core now calls loadRepoConfig, resolves cap, bounds- checks, persists; returns 503 SERVICE_UNAVAILABLE on invalid blueprint data (permanent until admin re-deploys, not transient) - orchestrator.ts: isValidApprovalGateCap integer+bounds guard; logs warn if a persisted cap is structurally invalid (schema drift / hand-edited DDB row) - TaskConfig.approval_gate_cap: int | None = None (agent-side); runner threads to PolicyEngine kwarg when not None - "Task created" log line now carries approval_gate_cap + approval_gate_cap_source ("blueprint" | "platform_default") so operators can detect a broken-plumbing deploy at the single chokepoint where all fallback layers converge Per silent-failure review: - HIGH: 500 → 503 + logger.error for permanent misconfig - HIGH: cap + source in task-created log (catches 4-layer cascade) - MEDIUM: orchestrator guard tightened past typeof (NaN, Infinity, floats, out-of-bounds all omitted + warned) Tests: CDK 1263/1263 (+36), agent 694/694 (+10). CLI unchanged. * feat(cedar-hitl): Chunk 7c — observability wrap-up for resolved cap + warn path Close three deferred items from Chunks 7a/7b before Chunks 8-10: - runner.py init log now carries approval_gate_cap=N + approval_gate_cap_source=threaded|engine_default. Matches the handler log key so CloudWatch Insights can join across the cascade; agent can't distinguish blueprint-override from platform-default-frozen (handler log is the ground truth). - server.py adds _warn_cw helper routing [server/warn] lines to a dedicated CloudWatch stream (server_warn/<task_id>). stdout print is preserved for local dev + existing capsys tests. AgentCore does not forward container stdout to APPLICATION_LOGS, so pre-7c warnings about malformed invocation payloads were invisible in production. Failure counter shared with _debug_cw for a single alarm surface; hoisted above writer defs for import-time ordering safety. - blueprint.ts emits a synth-time info annotation when security.approvalGateCap is omitted so operators see a signal that the repo will rely on the platform default of 50. Without this, the default was a silent fallback at the handler layer — only visible by inspecting a TaskRecord at runtime. Tests: agent 694→700 (+6), cdk 1263→1265 (+2), cli unchanged. Design refs: §4 step 5, §11.1, §13.6, decision #13. * feat(cedar-hitl): Chunk 8a — extend approval outcome event schema Add created_at / effective_timeout_s / matching_rule_ids to approval_granted / approval_denied / approval_timed_out events so the incoming ApprovalMetricsPublisher Lambda (Chunk 8b) can compute decision latency and emit a rule_id-dimensioned timeout breakdown without a round-trip GetItem against TaskEventsTable. Fields are added conditionally — omitted from metadata when the caller did not supply them — so the event stream stays free of null-value noise and legacy callers continue to produce valid payloads. Publisher handles missing fields via explicit skip-and-log on the specific metric branch (not fallback-to-zero). Agent tests extended: +6 progress_writer tests, +3 hooks tests. Baseline 700 → 710. No consumer wired yet — this commit is a forward-compatible superset; Chunk 8b ships the CDK publisher + dashboard widgets. * feat(cedar-hitl): Chunk 8b — ApprovalMetricsPublisher + native CloudWatch dashboard widgets Ship the Cedar-HITL dashboard widgets from §11.3 / IMPL-28 via the MetricsPublisher architecture (Option E): - New ApprovalMetricsPublisher Lambda consumes TaskEventsTable DDB stream as consumer #2 (FanoutConsumer is #1; stream is within its 2-consumer soft cap — documented in task-events-table.ts). - Handler emits CloudWatch EMF for 3 metrics in namespace ABCA/Cedar-HITL: * ApprovalRequestCount + ClippedApprovalCount (reason dim) → ApprovalTimeoutClipRate widget (MathExpression with IF-guard against NaN on zero-denominator periods) * TimedOutEffectiveTimeout (rule_id dim with allowlist cardinality cap) → ApprovalTimeoutBreakdown widget * ApprovalDecisionLatencyMs (outcome dim) → ApprovalDecisionLatency widget with per-outcome p50/p90/p99 - Observability-of-observability (silent-failure review): * MetricsPublisherHeartbeat per batch so dashboard gaps distinguish "no traffic" from "pipeline broken" * MetricEmitSkipped with a reason dim on schema mismatches, parse anomalies, unknown rule ids — never fall back to latency=0 or count=0 which would poison percentile widgets * Expected high-volume skip reasons (non-milestone events, REMOVE records) DO NOT emit MetricEmitSkipped — only anomaly reasons (missing keys, missing milestone name) do, so real signal isn't drowned * Structured log lines alongside every skip so the absence of metrics is also observable via CloudWatch Logs Insights - Cardinality caps via ``RULE_ID_ALLOWLIST`` + ``normalizeClipReason``. Unknown values collapse to ``other`` / ``unknown`` buckets with dashboard series so the collapse is discoverable rather than silently accruing custom-metric cost. - Event-source-mapping filter pattern rejects non-agent_milestone records at the service layer; handler-layer allowlist catches anything that slips through. Filter pattern correctness tested structurally + positively/negatively probed (silent-failure H3). - Per-record try/catch + reportBatchItemFailures + SQS DLQ mirror the fanout-task-events.ts poison-pill pattern exactly. Deferred to Chunk 10 chore issues: - DLQ alarms (fanout + publisher) — fire-into-void until notification channel lands, so wire with §11.5 alarms as a group - Explicit log-group declaration (IAM drift defense) - stdout-flush race documentation (pre-existing pattern in fanout) - EMF 100-updates/sec throttle alarm Tests: cdk 1265 → 1327 (+62); agent 710 (unchanged); cli 217 (unchanged). All pass. §11.5 alarm plumbing now unblocked — publisher provides the metrics infrastructure the design always intended; only the notification-channel SNS wiring is left. * docs(cedar-hitl): Chunk 9 — sync design doc to Chunks 7b / 8a / 8b Bring CEDAR_HITL_GATES.md current with the code that shipped in Chunks 7b (approval_gate_cap persist), 8a (outcome event schema superset), and 8b (ApprovalMetricsPublisher + dashboard widgets): - §10.2 adds the missing approval_gate_cap row (carry-forward drift from Chunk 7b). Bounds + frozen-at-submit semantics documented. - §11.1 outcome events (approval_granted / approval_denied / approval_timed_out) now document the Chunk 8a optional fields (created_at, effective_timeout_s, matching_rule_ids) plus the publisher's skip-on-missing-field policy. - §11.1 intro names ApprovalMetricsPublisherFn as consumer #2 and points to §11.3 for the metric schema. - §11.3 rewritten to describe the Option E architecture: publisher Lambda + EMF + native CloudWatch metrics in namespace ABCA/Cedar-HITL, MathExpression with divide-by-zero guard, rule_id cardinality cap, observability-of-observability via heartbeat + skip meta-metrics, widget layout (12/12 over 24), 2-consumer stream budget. Dropped the stale "Retired the old bundled widget" line — that widget never shipped. - §11.5 reframed as "deferred (notification-channel gated)" with a plumbing-status paragraph noting the metric infra now exists; only SNS wiring remains. Alarm list expanded to include DLQ and publisher-health alarms. - §16 IMPL-28 rewritten for Option E; §15.2 row 46 expanded to reference the 4 new test files; Appendix B checklist updated. Starlight mirror regenerated via ``cd docs && node scripts/sync-starlight.mjs``. No code changes. Test baselines unchanged. Adversarial comment-analyzer review verified every new claim against committed code — zero inaccuracies. * feat(cedar-hitl): Chunk 10 review fixes — close 2 blockers + tighten 2 mediums Full-branch adversarial review (code-reviewer + silent-failure-hunter on all 18 commits) surfaced findings that only appear at final-state. Addressing the blockers + low-cost meds before deploy: B2 — stranded approvals were invisible to the dashboard: - Reconciler writes ``event_type: 'task_stranded'``; the metrics publisher's event-source filter only accepts ``event_type: 'agent_milestone'``, so AWAITING_APPROVAL evictions produced zero §11.3 signal. - Fix: reconciler now additionally emits an ``agent_milestone`` with ``milestone: 'approval_stranded'`` when the stranded task was AWAITING_APPROVAL. Publisher allowlist extended; classifier emits ``ApprovalStrandedCount`` counter. SUBMITTED / HYDRATING stranded events unchanged (guarded by test). B1 — heartbeat comment was false reassurance: - Event-source filter blocks Lambda invocation when no ``agent_milestone`` records exist in the poll window, so a quiet period produces the same widget gap as a broken pipeline. The code + design-doc wording claimed "gap = pipeline broken" which would mislead the on-call. - Fix: corrected module + function docstrings to describe the heartbeat as "present when active, not pipeline-alive-always." Operators should alarm on the combination (heartbeat-absent + recent TaskEventsTable traffic) or wire a scheduled canary — the latter tracked as a §11.5 follow-up. M1 — safety-critical milestones produced zero dashboard signal: - ``approval_cap_exceeded`` (§12.9 per-task cap) and ``approval_rate_limit_exceeded`` (per-user per-minute rate) were emitted by the agent but not on the publisher allowlist. A production bug where every gate hit the cap would have been invisible. - Fix: both added to APPROVAL_METRIC_MILESTONES with ``ApprovalCapExceededCount`` / ``ApprovalRateLimitExceededCount`` counters. No dimensions — the request_id in the event carries per-user correlation for ad-hoc log-insights investigation. H2 — filter / handler eventName disagreement: - Event-source filter required ``INSERT``; handler accepted ``INSERT`` and ``MODIFY``. Benign today (TaskEventsTable is put-only), but a future chunk MODIFY-ing records would be silently dropped by the filter while the handler was ready to process them. - Fix: handler now INSERT-only, matching the filter. Single source of truth on the eventName invariant. M1-rename — ``expected_non_approval_milestone`` skip reason was misleading (the non-metric approval milestones like ``approval_late_win`` also land in this bucket). Renamed to ``expected_milestone_not_tracked``. Tests: cdk 1327 → 1332 (+5: 3 classifier branches for new metrics, 1 reconciler AWAITING_APPROVAL path, 1 SUBMITTED-not-double-counted guard). Agent + CLI unchanged. All pre-commit hooks green; pre-push security fails only on the 3 pre-existing CVEs tracked for chore issue filing. Deferred findings from the same review (file as chore issues): - H1: agent-dies-between-TIMED_OUT-and-resume loses latency (edge, affects p99 bias) - H3: late-win APPROVED created_at staleness invariant (works today, document invariant) - H4: _warn_cw daemon-thread burst under adversarial payload - M2-M4: late-win metric, rename helpers, etc. No upstream PR filing this chunk — deploy to Sam's AWS account for integration testing first. * fix(cedar-hitl): suppress AwsSolutions-IAM5 on Runtime ExecutionRole overflow policies Synth + deploy were blocked by cdk-nag: the Cedar HITL additions (TaskApprovalsTable grant + SlackUserMappingTable + extra env vars threaded to the AgentCore runtime) pushed the runtime ExecutionRole past CDK's inline-policy size limit, so CDK auto-splits excess statements into ``OverflowPolicy1``. The overflow inherits the same wildcard ``bedrock:InvokeModel*`` / CloudWatch actions as the base policy but lives at a path (``Runtime/ExecutionRole/OverflowPolicy1/Resource``) that the existing ``addResourceSuppressions(runtime, ..., applyToChildren: true)`` cannot reach — CDK creates overflow policies lazily during synth ``prepare()``, after the construct tree has been frozen and after static suppressions have been cached. Suppress via an Aspect at MUTATING priority so the suppression is applied before cdk-nag's READONLY visitor runs. Matches any path containing ``/Runtime/ExecutionRole/OverflowPolicy`` + ending ``/Resource`` so future ``OverflowPolicy2``, etc. are covered without hardcoding indices. Verified: ``mise //cdk:synth`` now completes cleanly. ``mise //cdk:test`` still 1332/1332. * fix(cedar-hitl): E2E deploy-readiness — policies bundle + onboarding gate + CLI error visibility Three E2E T1.4 + T2.2 findings from the Chunk 10 integration-test session. Batched into one commit since all three need the same redeploy to verify: 1. agent/Dockerfile: COPY policies/ into the container image. ``PolicyEngine.__init__`` reads ``/app/policies/hard_deny.cedar`` + ``soft_deny.cedar`` at import time via ``_POLICIES_DIR = Path(__file__).parent.parent / "policies"``. The Dockerfile only copied ``src/``, so the directory was missing and every Cedar-HITL task failed at 0 turns with ``missing built-in hard-deny policies``. Introduced alongside Chunk 2 when the policy files were first added — Dockerfile was never updated. Zero tasks on this branch ever su…
1 parent 79d7d8f commit be3207a

143 files changed

Lines changed: 25105 additions & 2463 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.dockerignore

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,58 @@
1+
# Build context is repo root (see cdk/src/stacks/agent.ts) so the
2+
# Dockerfile can COPY contracts/ alongside agent/. Exclusions below
3+
# keep the context lean — without them the entire monorepo (CDK
4+
# cdk.out/, node_modules/, docs/dist/, etc.) gets uploaded on every
5+
# AgentCore deploy.
6+
7+
# CDK output (recursive include if not excluded)
8+
cdk/cdk.out/
9+
cdk/lib/
10+
cdk/node_modules/
11+
12+
# CLI and docs build artifacts
13+
cli/lib/
14+
cli/node_modules/
15+
docs/dist/
16+
docs/node_modules/
17+
docs/.astro/
18+
19+
# Shared node_modules
20+
node_modules/
21+
22+
# Agent venv and cache (rebuilt inside image via uv)
23+
agent/.venv/
24+
agent/__pycache__/
25+
agent/**/__pycache__/
26+
agent/**/*.pyc
27+
28+
# Git and tooling
29+
.git/
30+
.prek/
31+
.claude/
32+
**/.DS_Store
33+
34+
# Docs and assets not needed in image
35+
*.md
36+
*.png
37+
*.drawio
38+
*.html
39+
*.gif
40+
*.tape
41+
42+
# Worktrees + scratch
43+
abca-worktrees/
44+
.next-session-prompt.md
45+
.e2e-test-plan.md
46+
47+
# Test/coverage output
48+
coverage/
49+
**/coverage/
50+
.pytest_cache/
51+
**/.pytest_cache/
52+
53+
# IDE / OS
54+
.idea/
55+
.vscode/
56+
yarn-error.log
57+
yarn-debug.log
58+
npm-debug.log*

.pre-commit-config.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -71,6 +71,22 @@ repos:
7171
exclude: ^docs/node_modules/
7272
stages: [pre-commit]
7373

74+
- id: types-sync-cdk-cli
75+
name: type sync drift (CDK ↔ CLI)
76+
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)" && mise run check:types-sync'
77+
language: system
78+
pass_filenames: false
79+
files: ^(cdk/src/handlers/shared/types\.ts$|cli/src/types\.ts$|scripts/check-types-sync\.ts$)
80+
stages: [pre-commit]
81+
82+
- id: constants-sync
83+
name: cross-language constants drift (contracts/constants.json)
84+
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)" && mise run check:constants-sync'
85+
language: system
86+
pass_filenames: false
87+
files: ^(contracts/constants\.json$|agent/src/policy\.py$|cdk/src/handlers/shared/types\.ts$|cdk/src/constructs/blueprint\.ts$|scripts/check-constants-sync\.ts$)
88+
stages: [pre-commit]
89+
7490
- id: monorepo-security-pre-push
7591
name: security scans (pre-push)
7692
entry: bash -lc 'cd "$(git rev-parse --show-toplevel)" && mise run hooks:pre-push:security'

AGENTS.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,7 @@ Handler entry tests: `cdk/test/handlers/orchestrate-task.test.ts`, `create-task.
4747
- **`prek install`** fails if Git **`core.hooksPath`** is set — another hook manager owns hooks; see [CONTRIBUTING.md](./CONTRIBUTING.md).
4848
- **Editing on `main` directly** — ALWAYS create a worktree with a feature branch for changes, even trivial ones. Main should stay clean; all work flows through worktree → branch → PR → merge.
4949
- **Git worktrees** — Always **`git fetch origin main`** before creating a new worktree to ensure you branch from the latest remote state. `node_modules/` and `agent/.venv/` are per-tree (not shared). Run **`mise run install`** in each new worktree before building. All CDK path references (`__dirname`-relative) and mise `config_roots` resolve correctly without extra setup.
50+
- **Bumping Cedar engines in isolation**`cedarpy` (Python, `agent/pyproject.toml`) and `@cedar-policy/cedar-wasm` (TypeScript, `cdk/package.json`) are two language bindings over the same Cedar Rust core. They MUST move together; even patch-version drift between bindings can yield divergent `(decision, matching_rule_ids)` on the same `(policy, input)` — invisible to per-side unit tests, caught (only) by `contracts/cedar-parity/` golden fixtures in CI. If you bump one engine you MUST bump the other to a tested-compatible version AND refresh the parity fixtures in the same commit. Both pins are EXACT (no `^`/`~`). See `docs/design/CEDAR_HITL_GATES.md` §15.6 (decision #23) and the parity-contract banner in `mise.toml`. **DO NOT** accept upstream's "Update branch" or auto-merge suggestions on cedarpy without verifying parity with cedar-wasm.
5051

5152
### Tech stack
5253

agent/.dockerignore

Lines changed: 0 additions & 6 deletions
This file was deleted.

agent/Dockerfile

Lines changed: 25 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -20,14 +20,19 @@ COPY --from=gh-builder /out/gh /usr/local/bin/gh
2020
# - build-essential (native compilation for some repos)
2121
# - curl (downloads)
2222
RUN apt-get update && \
23+
# Patch any base-image CVEs that have a fix available in the
24+
# current Debian point release. Without this, transitive system-
25+
# library CVEs (e.g. libnghttp2 CVE-2026-27135) ride the base
26+
# ``python:3.13-slim`` tag until upstream rebuilds, which can be
27+
# weeks. ``--no-install-recommends`` keeps the upgrade narrow and
28+
# reproducible — only already-installed packages get bumped.
29+
apt-get upgrade -y --no-install-recommends && \
2330
apt-get install -y --no-install-recommends \
2431
curl \
2532
git \
2633
build-essential \
2734
ca-certificates \
2835
gnupg && \
29-
# Upgrade base image's CVE-2026-27135 vulnerability
30-
apt-get upgrade -y --no-install-recommends libnghttp2-14 && \
3136
# Cleanup early to keep peak disk usage low during builds.
3237
apt-get clean && \
3338
rm -rf /var/lib/apt/lists/* /var/cache/apt/archives/*
@@ -49,15 +54,28 @@ RUN npm install -g npm@latest && \
4954
# Install uv (fast Python package manager) — pinned for reproducibility
5055
COPY --from=ghcr.io/astral-sh/uv:0.11.14 /uv /usr/local/bin/uv
5156

52-
# Install Python dependencies via uv
53-
COPY pyproject.toml uv.lock /app/
57+
# Install Python dependencies via uv. Build context is repo root (set in
58+
# ``cdk/src/stacks/agent.ts``) so source paths are prefixed with ``agent/``.
59+
COPY agent/pyproject.toml agent/uv.lock /app/
5460
RUN uv sync --frozen --no-dev --directory /app
5561

5662
# Copy agent code (ARG busts cache so file edits are always picked up)
5763
ARG CACHE_BUST=0
58-
COPY src/ /app/src/
59-
COPY prepare-commit-msg.sh /app/
60-
COPY test_sdk_smoke.py test_subprocess_threading.py /app/
64+
COPY agent/src/ /app/src/
65+
# Cedar HITL built-in policy files (hard_deny.cedar + soft_deny.cedar).
66+
# ``agent/src/policy.py::_POLICIES_DIR`` resolves to ``/app/policies``
67+
# at import time; without these files the PolicyEngine init raises
68+
# ``missing built-in hard-deny policies`` and every task fails at 0
69+
# turns before the agent even connects to the CLI. Discovered during
70+
# Chunk 10 E2E T2.2 — the Dockerfile previously only copied ``src/``.
71+
COPY agent/policies/ /app/policies/
72+
# Cross-language constants (S9). ``agent/src/policy.py`` reads
73+
# ``/app/contracts/constants.json`` at import; the same file is consumed
74+
# by ``cdk/src/handlers/shared/types.ts`` at synth time. See
75+
# ``contracts/README.md`` for the contract.
76+
COPY contracts/ /app/contracts/
77+
COPY agent/prepare-commit-msg.sh /app/
78+
COPY agent/test_sdk_smoke.py agent/test_subprocess_threading.py /app/
6179

6280
# Create non-root user (Claude Code CLI refuses bypassPermissions as root)
6381
RUN useradd -m -s /bin/bash agent && \

agent/mise.toml

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,8 +52,12 @@ run = "uvx bandit[toml] -c pyproject.toml -r . --severity-level=high"
5252

5353
[tasks."security:image"]
5454
description = "Scan container image with trivy"
55+
# Build context is repo root (..) so Dockerfile can COPY contracts/
56+
# alongside agent/ — matches cdk/src/stacks/agent.ts. Without -f and
57+
# the .. context, the build fails because COPY agent/... can't find
58+
# agent/ inside the agent/ directory.
5559
run = [
56-
"docker image inspect bgagent-local:latest >/dev/null 2>&1 || (ARCH=\"$(uname -m)\"; PLATFORM=\"linux/arm64\"; if [ \"$ARCH\" = \"x86_64\" ]; then PLATFORM=\"linux/amd64\"; fi; docker build --build-arg TARGETPLATFORM=\"$PLATFORM\" --build-arg CACHE_BUST=\"$(date +%s)\" -t bgagent-local:latest .)",
60+
"docker image inspect bgagent-local:latest >/dev/null 2>&1 || (ARCH=\"$(uname -m)\"; PLATFORM=\"linux/arm64\"; if [ \"$ARCH\" = \"x86_64\" ]; then PLATFORM=\"linux/amd64\"; fi; docker build --build-arg TARGETPLATFORM=\"$PLATFORM\" --build-arg CACHE_BUST=\"$(date +%s)\" -f Dockerfile -t bgagent-local:latest ..)",
5761
"trivy image --scanners vuln --ignore-unfixed --ignorefile .trivyignore --severity HIGH,CRITICAL --exit-code 1 bgagent-local:latest",
5862
]
5963

agent/policies/hard_deny.cedar

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
// Built-in hard-deny policy set for Cedar HITL engine.
2+
//
3+
// Hard-deny is ABSOLUTE: no --pre-approve scope and no blueprint `disable:`
4+
// directive can bypass these rules. See docs/design/CEDAR_HITL_GATES.md
5+
// §12.5 and decision #8.
6+
//
7+
// Every rule in this file MUST carry @tier("hard") + @rule_id annotations.
8+
// Adding a rule here expands the set of categorically-forbidden agent
9+
// actions; removing a rule requires a security review.
10+
11+
// Base catch-all permit. Specific forbid rules below override.
12+
@rule_id("base_permit")
13+
permit (principal, action, resource);
14+
15+
// pr_review tasks may never invoke Write. Absolute; cannot be overridden
16+
// by per-blueprint customization or --pre-approve.
17+
@tier("hard")
18+
@rule_id("pr_review_forbid_write")
19+
forbid (
20+
principal == Agent::TaskAgent::"pr_review",
21+
action == Agent::Action::"invoke_tool",
22+
resource == Agent::Tool::"Write"
23+
);
24+
25+
// pr_review tasks may never invoke Edit.
26+
@tier("hard")
27+
@rule_id("pr_review_forbid_edit")
28+
forbid (
29+
principal == Agent::TaskAgent::"pr_review",
30+
action == Agent::Action::"invoke_tool",
31+
resource == Agent::Tool::"Edit"
32+
);
33+
34+
// Reject `rm -rf /` and similar absolute-root destructive commands.
35+
@tier("hard")
36+
@rule_id("rm_slash")
37+
forbid (principal, action == Agent::Action::"execute_bash", resource)
38+
when { context.command like "*rm -rf /*" };
39+
40+
// Reject writes into `.git/` at the repo root (breaks local git state).
41+
@tier("hard")
42+
@rule_id("write_git_internals")
43+
forbid (principal, action == Agent::Action::"write_file", resource)
44+
when { context.file_path like ".git/*" };
45+
46+
// Reject writes into nested `.git/` directories (submodules, worktrees).
47+
@tier("hard")
48+
@rule_id("write_git_internals_nested")
49+
forbid (principal, action == Agent::Action::"write_file", resource)
50+
when { context.file_path like "*/.git/*" };
51+
52+
// Reject any SQL DROP TABLE through Bash — agents should not be running
53+
// destructive DDL against production or dev databases without a human
54+
// in the loop. Hard-deny because even "just testing locally" is a common
55+
// vector for data loss (wrong DB connected via saved credentials).
56+
@tier("hard")
57+
@rule_id("drop_table")
58+
forbid (principal, action == Agent::Action::"execute_bash", resource)
59+
when { context.command like "*DROP TABLE*" };

agent/policies/soft_deny.cedar

Lines changed: 84 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,84 @@
1+
// Base catch-all permit. Without it, cedarpy's default-deny would turn
2+
// every non-matching Cedar evaluation on this tier into a DENY decision,
3+
// making the soft tier indistinguishable from hard-deny. With it, Cedar
4+
// returns ALLOW (no matching forbid) and our engine's STEP 3 sees only
5+
// the genuine forbid hits as REQUIRE_APPROVAL.
6+
@rule_id("base_permit")
7+
permit (principal, action, resource);
8+
9+
// Built-in soft-deny policy set for Cedar HITL engine.
10+
//
11+
// Soft-deny is the HUMAN-IN-THE-LOOP surface: matching rules pause the
12+
// tool call, write an approval request to DynamoDB, and await a human
13+
// response via `bgagent approve` / `bgagent deny`. See
14+
// docs/design/CEDAR_HITL_GATES.md §§2, 6, 15.4.
15+
//
16+
// Every rule in this file MUST carry:
17+
// @tier("soft")
18+
// @rule_id("...") — stable ID for --pre-approve rule:X
19+
// @approval_timeout_s — integer seconds >= 30 (<120 emits WARN per IMPL-25)
20+
// @severity — "low" | "medium" | "high"
21+
// @category — optional free-form UX grouping
22+
//
23+
// Blueprints may OPT OUT of specific rules here via
24+
// `security.cedarPolicies.disable: [rule_id]`. They may NOT disable any
25+
// rule in hard_deny.cedar (blueprint loader rejects those at task start).
26+
27+
// Gate any git --force / -f push. 300s default approval window, medium severity.
28+
// Covers both long-form (--force) and short-form (-f) variants, including
29+
// the bare `git push -f` invocation with no branch argument.
30+
@tier("soft")
31+
@rule_id("force_push_any")
32+
@approval_timeout_s("300")
33+
@severity("medium")
34+
@category("destructive")
35+
forbid (principal, action == Agent::Action::"execute_bash", resource)
36+
when { context.command like "*git push --force*"
37+
|| context.command like "*git push -f *"
38+
|| context.command like "*git push -f" };
39+
40+
// Force-push to main/prod specifically — longer window, higher severity.
41+
// Multi-match with force_push_any is expected: the engine's annotation
42+
// merging picks min(300, 600)=300s and max(medium, high)=high.
43+
@tier("soft")
44+
@rule_id("force_push_main")
45+
@approval_timeout_s("600")
46+
@severity("high")
47+
@category("destructive")
48+
forbid (principal, action == Agent::Action::"execute_bash", resource)
49+
when { context.command like "*git push --force origin main*"
50+
|| context.command like "*git push --force origin prod*"
51+
|| context.command like "*git push -f origin main*"
52+
|| context.command like "*git push -f origin prod*" };
53+
54+
// Non-force pushes to protected branches — catches the case where an
55+
// agent bypasses PR workflow by pushing directly.
56+
@tier("soft")
57+
@rule_id("push_to_protected_branch")
58+
@approval_timeout_s("300")
59+
@severity("medium")
60+
@category("destructive")
61+
forbid (principal, action == Agent::Action::"execute_bash", resource)
62+
when { context.command like "*git push origin main*"
63+
|| context.command like "*git push origin master*"
64+
|| context.command like "*git push origin prod*"
65+
|| context.command like "*git push origin release/*" };
66+
67+
// Writes to `.env` files typically contain secrets. 600s window, high severity.
68+
@tier("soft")
69+
@rule_id("write_env_files")
70+
@approval_timeout_s("600")
71+
@severity("high")
72+
@category("filesystem")
73+
forbid (principal, action == Agent::Action::"write_file", resource)
74+
when { context.file_path like "*.env" };
75+
76+
// Writes to any path containing "credentials" — SSH keys, AWS creds,
77+
// service-account JSON, etc. 300s window, high severity.
78+
@tier("soft")
79+
@rule_id("write_credentials")
80+
@approval_timeout_s("300")
81+
@severity("high")
82+
@category("auth")
83+
forbid (principal, action == Agent::Action::"write_file", resource)
84+
when { context.file_path like "*credentials*" };

agent/pyproject.toml

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,18 @@ dependencies = [
1111
"uvicorn==0.47.0", #https://pypi.org/project/uvicorn/
1212
"aws-opentelemetry-distro==0.17.0", #https://pypi.org/project/aws-opentelemetry-distro/
1313
"mcp==1.27.1", #https://pypi.org/project/mcp/
14-
"cedarpy==4.8.3", #https://github.com/k9securityio/cedar-py
14+
# CEDAR ENGINE PARITY — DO NOT BUMP IN ISOLATION.
15+
# cedarpy (Python, agent runtime) and @cedar-policy/cedar-wasm (TypeScript,
16+
# CDK Lambdas) are two language bindings over the same Cedar Rust core.
17+
# Even patch-version drift between the bindings can produce divergent
18+
# (decision, matching_rule_ids) on the same (policy, input) — a class
19+
# of bug invisible to per-side unit tests. The contracts/cedar-parity/
20+
# golden fixtures are how CI catches divergence; if you bump cedarpy
21+
# you MUST bump @cedar-policy/cedar-wasm to a tested-compatible version
22+
# in cdk/package.json AND refresh the parity fixtures, in the same
23+
# commit. See docs/design/CEDAR_HITL_GATES.md §15.6 (decision #23) and
24+
# the parity-contract banner in mise.toml.
25+
"cedarpy==4.8.0", #https://github.com/k9securityio/cedar-py — EXACT pin (no ^/~), parity with @cedar-policy/cedar-wasm@4.10.0
1526
]
1627

1728
[tool.bandit]

agent/src/config.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -92,6 +92,10 @@ def build_config(
9292
channel_metadata: dict[str, str] | None = None,
9393
trace: bool = False,
9494
user_id: str = "",
95+
approval_timeout_s: int | None = None,
96+
initial_approvals: list[str] | None = None,
97+
initial_approval_gate_count: int = 0,
98+
approval_gate_cap: int | None = None,
9599
) -> TaskConfig:
96100
"""Build and validate configuration from explicit parameters.
97101
@@ -146,6 +150,10 @@ def build_config(
146150
channel_metadata=channel_metadata or {},
147151
trace=trace,
148152
user_id=user_id,
153+
approval_timeout_s=approval_timeout_s,
154+
initial_approvals=initial_approvals or [],
155+
initial_approval_gate_count=initial_approval_gate_count,
156+
approval_gate_cap=approval_gate_cap,
149157
)
150158

151159

0 commit comments

Comments
 (0)