RFC: Event-Driven Governance and Actions

**Primary area:** Cross-cutting / multiple

**Related issue or feature request:** (none — new proposal)

---

## Summary

Today, human-in-the-loop (HITL) and most governance controls are **synchronous and tool-centric**: Cedar policies in `PreToolUse` gate Bash, Write, Read, etc. Operators configure bash patterns and tool shapes — not semantic moments like "plan ready", "PR opened", or "cumulative cost exceeded $25".

Meanwhile, **observability and notifications are already event-driven** via `TaskEventsTable` and `FanOutConsumer`, but that plane cannot prevent side effects unless something blocked earlier on the hot path.

This RFC proposes a unified **Event Governance** layer: a normative event catalog, declarative **event rules** (condition → action), **sync** (in-agent, can block) vs **async** (stream consumer, react only) evaluation modes, **registry-native configuration** (versioned `event-rule-pack` assets pinned by blueprints), and **UX as a first-class requirement** (`bgagent submit` governance preview, unified `bgagent pending` for event- and tool-sourced approvals). Tool-level Cedar HITL remains the fail-closed safety net for execution.

## Use case and motivation

**Who it's for:** platform engineers, blueprint authors, security reviewers, and operators using `bgagent`, Slack, and GitHub fan-out.

**Pain today:**

| Need | Today | Gap |
|------|-------|-----|
| Approve plan before code runs | Awkward Cedar on Write/Bash | Same intent reachable via many tool sequences |
| Pause at cumulative cost threshold | Tool-centric only | No aggregate rules on `agent_cost_update` |
| Notify on `pr_created` on protected branch | Fan-out filters | Cannot gate *before* PR without sync checkpoint |
| Escalate on high-severity `approval_requested` | Partial fan-out | Not unified with rule packs / audit |

Expressing lifecycle governance as Cedar on tool argv is **unreliable** (retry/model variation, approval fatigue, false positives). Reactive needs (notify, audit, post-hoc cancel) do not belong on the tool hot path.

**After this RFC:** Operators configure governance against **named lifecycle moments** (checkpoints, milestones, aggregates). Sync checkpoints block irreversible work; async rules notify, escalate, or cancel without pretending they prevented an action that already happened.

## Proposal

Core design:

### Two planes, one catalog

1. **Normative event catalog** — stable names/schemas for lifecycle, execution, milestone, checkpoint, and policy events (JSON Schema, additive versioning).
2. **Event rules** — `on` event + `when` conditions → actions: `require_approval`, `notify`, `escalate`, `cancel_task`, `inject_nudge`, `observe_only`.
3. **Sync evaluation** — in-agent at checkpoints (`checkpoint:before_execution`, `before_open_pr`, etc.); same latency class as Cedar; can transition to `AWAITING_APPROVAL`.
4. **Async evaluation** — `TaskEventsTable` stream consumer for notify, aggregates (`agent_cost_update` cost ceiling), post-hoc cancel; tens–hundreds of ms; must not imply blocking unless UX is explicit.
5. **Precedence:** tool Cedar hard-deny always wins; async never overrides sync deny; composable with existing `TaskApprovalsTable` / `bgagent approve` / `deny`.

### Capability registry (configuration home)

| Asset type | Consumed by |
|------------|-------------|
| `cedar-policy-module` | Agent `PolicyEngine` |
| `event-rule-pack` | Sync evaluator + async consumer |
| `notification-profile` | Fan-out / event consumer |
| `checkpoint-catalog` | Agent pipeline + Change Manifest L1 |

Blueprints **pin** semver assets instead of inlining fragile YAML. Interim: inline `eventRules` until registry MVP (Phase 3).

### UX (before → after)

**Before:** `bgagent pending` shows bash argv; submitters surprised by mid-run tool gates.

**After:**

- **Submit:** governance preview from resolved registry pins (estimated interactive gates, rules that may fire).
- **Watch:** human-readable moments (`Plan verified — awaiting your approval`).
- **Pending:** unified queue — event-sourced and tool-sourced approvals differ in trigger context, not approve/deny mechanics.
- **Authoring:** `bgagent registry list`, `bgagent rules eval --fixture`, observe→enforce rollout per pack.

### Phased delivery

| Phase | Scope | Outcome |
|-------|--------|---------|
| 0 | Catalog + `observe_only` + `PolicyDecisionEvent` | "Would have fired" in watch stream |
| 1 | Async notify/fan-out + webhook | Ping on PR/cost without new HITL |
| 2 | Sync checkpoints + manifest | Plan review before code (primary UX win) |
| 3 | Registry-native `event-rule-pack` | Org-wide versioned policy rollout |
| 4 | Advanced aggregates + async cancel | Operator automation |

Phases 0–2 can ship with inline blueprint config; Phase 3 aligns with agent asset registry MVP.

### Data model extensions

- **TaskApprovalsTable:** `source` (`tool`|`event`), `event_id`, `checkpoint`, `rule_pack_id`, `rule_id`.
- **TaskEventsTable:** catalog event types; optional `correlation_id` for dedupe.
- **PolicyDecisionEvent** (roadmap): unified audit for every evaluation.

**Cross-links:** [`CEDAR_HITL_GATES.md`](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/blob/main/docs/design/CEDAR_HITL_GATES.md), [`CHANGE_MANIFEST.md`](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/blob/main/docs/design/CHANGE_MANIFEST.md), [`ORCHESTRATOR.md`](https://github.com/aws-samples/sample-autonomous-cloud-coding-agents/blob/main/docs/design/ORCHESTRATOR.md), Roadmap — Agent asset registry & Centralized policy framework.

## Out of scope

- **Replacing tool Cedar with event rules** — fail-closed execution safety stays on `PreToolUse`.
- **Stream-only HITL** — async consumer alone cannot block fast agents (race).
- **Inline blueprint YAML as the long-term config model** — bootstrap only until registry ships.
- **EventBridge as the primary internal bus** — complementary export; `TaskEventsTable` remains source of truth for task-scoped ordering (initially).
- **Full registry MVP** in Phase 0–2 (designed for; not required to land catalog + observe + sync checkpoints).
- **Raw JSONPath/Cedar in default operator UX** — verbose mode only.
- **Separate approve commands** for event vs tool gates.

## Potential challenges

| Risk | Mitigation |
|------|------------|
| Sync vs async confusion | Explicit modes in rule schema; UX copy for reactive approvals |
| Third DDB stream consumer capacity | Plan Kinesis migration; do not multiply consumers blindly |
| Overlapping tool + event `require_approval` | Scope algebra TBD; idempotency key `(task_id, rule_id, correlation_id)` |
| Async `require_approval` after `pr_created` | UX must state PR already exists — cannot un-create |
| Evaluator down | Sync blocking rules fail-closed; async notify may degrade with telemetry |
| Registry not ready | Inline blueprint `eventRules` + migration to pins |
| Checkpoint trust model | Mandatory pipeline hooks vs agent-declared milestones — open question |
| Rule language choice | Cedar-on-events vs CEL/JSONLogic — author persona TBD |

**Open questions** (see RFC §12): rule language, scope algebra, checkpoint emission trust, third consumer design, Change Manifest state vs `manifest_verified`, multi-tenant registry merge order, observe→enforce granularity.

## Dependencies and integrations

| Component | Role |
|-----------|------|
| **Agent** (`pipeline.py`, `runner.py`, `progress_writer.py`, hooks) | Emit catalog events; sync checkpoint evaluation |
| **CDK** (`fanout-task-events.ts`, orchestrator, Blueprint construct) | Stream consumer; resolve registry pins at task start; extend fan-out |
| **CLI** (`bgagent submit`, `watch`, `pending`, future `registry`/`rules eval`) | Governance preview, unified pending UX |
| **TaskEventsTable / TaskApprovalsTable** | Event log + approval extensions |
| **Agent asset registry** (roadmap) | `event-rule-pack`, `notification-profile` assets |
| **Centralized policy framework** (roadmap) | `PolicyDecisionEvent`, observe/enforce modes |

**Related roadmap items:** Agent asset registry, Centralized policy framework.

## Alternative solutions

| Alternative | Verdict |
|-------------|---------|
| Extend Cedar only — synthetic `action::Event` types | Defer until event context schema stable; high author burden |
| Stream-only HITL in Lambda | **Rejected** for blocking — race with fast agents |
| Inline blueprint YAML forever | Bootstrap only; no versioning at scale |
| Replace tool Cedar with events | **Rejected** |
| EventBridge as primary bus | Complementary export; internal SoT stays TaskEventsTable |

---

**Note:** Non-triaged RFCs may not get timely review. PRs on non-triaged issues might not be accepted.

* RFC PR:
* Approved by:
* Reviewed by:


Component	Role
Agent (`pipeline.py`, `runner.py`, `progress_writer.py`, hooks)	Emit catalog events; sync checkpoint evaluation
CDK (`fanout-task-events.ts`, orchestrator, Blueprint construct)	Stream consumer; resolve registry pins at task start; extend fan-out
CLI (`bgagent submit`, `watch`, `pending`, future `registry`/`rules eval`)	Governance preview, unified pending UX
TaskEventsTable / TaskApprovalsTable	Event log + approval extensions
Agent asset registry (roadmap)	`event-rule-pack`, `notification-profile` assets
Centralized policy framework (roadmap)	`PolicyDecisionEvent`, observe/enforce modes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Event-Driven Governance and Actions #230

Summary

Use case and motivation

Proposal

Two planes, one catalog

Capability registry (configuration home)

UX (before → after)

Phased delivery

Data model extensions

Out of scope

Potential challenges

Dependencies and integrations

Alternative solutions

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Need	Today	Gap
Approve plan before code runs	Awkward Cedar on Write/Bash	Same intent reachable via many tool sequences
Pause at cumulative cost threshold	Tool-centric only	No aggregate rules on `agent_cost_update`
Notify on `pr_created` on protected branch	Fan-out filters	Cannot gate before PR without sync checkpoint
Escalate on high-severity `approval_requested`	Partial fan-out	Not unified with rule packs / audit

Asset type	Consumed by
`cedar-policy-module`	Agent `PolicyEngine`
`event-rule-pack`	Sync evaluator + async consumer
`notification-profile`	Fan-out / event consumer
`checkpoint-catalog`	Agent pipeline + Change Manifest L1

Phase	Scope	Outcome
0	Catalog + `observe_only` + `PolicyDecisionEvent`	"Would have fired" in watch stream
1	Async notify/fan-out + webhook	Ping on PR/cost without new HITL
2	Sync checkpoints + manifest	Plan review before code (primary UX win)
3	Registry-native `event-rule-pack`	Org-wide versioned policy rollout
4	Advanced aggregates + async cancel	Operator automation

Risk	Mitigation
Sync vs async confusion	Explicit modes in rule schema; UX copy for reactive approvals
Third DDB stream consumer capacity	Plan Kinesis migration; do not multiply consumers blindly
Overlapping tool + event `require_approval`	Scope algebra TBD; idempotency key `(task_id, rule_id, correlation_id)`
Async `require_approval` after `pr_created`	UX must state PR already exists — cannot un-create
Evaluator down	Sync blocking rules fail-closed; async notify may degrade with telemetry
Registry not ready	Inline blueprint `eventRules` + migration to pins
Checkpoint trust model	Mandatory pipeline hooks vs agent-declared milestones — open question
Rule language choice	Cedar-on-events vs CEL/JSONLogic — author persona TBD

Alternative	Verdict
Extend Cedar only — synthetic `action::Event` types	Defer until event context schema stable; high author burden
Stream-only HITL in Lambda	Rejected for blocking — race with fast agents
Inline blueprint YAML forever	Bootstrap only; no versioning at scale
Replace tool Cedar with events	Rejected
EventBridge as primary bus	Complementary export; internal SoT stays TaskEventsTable

RFC: Event-Driven Governance and Actions #230

Description

Summary

Use case and motivation

Proposal

Two planes, one catalog

Capability registry (configuration home)

UX (before → after)

Phased delivery

Data model extensions

Out of scope

Potential challenges

Dependencies and integrations

Alternative solutions

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions