Primary area: Cross-cutting / multiple
Related issue or feature request: (none — new proposal)
Summary
Today, human-in-the-loop (HITL) and most governance controls are synchronous and tool-centric: Cedar policies in PreToolUse gate Bash, Write, Read, etc. Operators configure bash patterns and tool shapes — not semantic moments like "plan ready", "PR opened", or "cumulative cost exceeded $25".
Meanwhile, observability and notifications are already event-driven via TaskEventsTable and FanOutConsumer, but that plane cannot prevent side effects unless something blocked earlier on the hot path.
This RFC proposes a unified Event Governance layer: a normative event catalog, declarative event rules (condition → action), sync (in-agent, can block) vs async (stream consumer, react only) evaluation modes, registry-native configuration (versioned event-rule-pack assets pinned by blueprints), and UX as a first-class requirement (bgagent submit governance preview, unified bgagent pending for event- and tool-sourced approvals). Tool-level Cedar HITL remains the fail-closed safety net for execution.
Use case and motivation
Who it's for: platform engineers, blueprint authors, security reviewers, and operators using bgagent, Slack, and GitHub fan-out.
Pain today:
| Need |
Today |
Gap |
| Approve plan before code runs |
Awkward Cedar on Write/Bash |
Same intent reachable via many tool sequences |
| Pause at cumulative cost threshold |
Tool-centric only |
No aggregate rules on agent_cost_update |
Notify on pr_created on protected branch |
Fan-out filters |
Cannot gate before PR without sync checkpoint |
Escalate on high-severity approval_requested |
Partial fan-out |
Not unified with rule packs / audit |
Expressing lifecycle governance as Cedar on tool argv is unreliable (retry/model variation, approval fatigue, false positives). Reactive needs (notify, audit, post-hoc cancel) do not belong on the tool hot path.
After this RFC: Operators configure governance against named lifecycle moments (checkpoints, milestones, aggregates). Sync checkpoints block irreversible work; async rules notify, escalate, or cancel without pretending they prevented an action that already happened.
Proposal
Core design:
Two planes, one catalog
- Normative event catalog — stable names/schemas for lifecycle, execution, milestone, checkpoint, and policy events (JSON Schema, additive versioning).
- Event rules —
on event + when conditions → actions: require_approval, notify, escalate, cancel_task, inject_nudge, observe_only.
- Sync evaluation — in-agent at checkpoints (
checkpoint:before_execution, before_open_pr, etc.); same latency class as Cedar; can transition to AWAITING_APPROVAL.
- Async evaluation —
TaskEventsTable stream consumer for notify, aggregates (agent_cost_update cost ceiling), post-hoc cancel; tens–hundreds of ms; must not imply blocking unless UX is explicit.
- Precedence: tool Cedar hard-deny always wins; async never overrides sync deny; composable with existing
TaskApprovalsTable / bgagent approve / deny.
Capability registry (configuration home)
| Asset type |
Consumed by |
cedar-policy-module |
Agent PolicyEngine |
event-rule-pack |
Sync evaluator + async consumer |
notification-profile |
Fan-out / event consumer |
checkpoint-catalog |
Agent pipeline + Change Manifest L1 |
Blueprints pin semver assets instead of inlining fragile YAML. Interim: inline eventRules until registry MVP (Phase 3).
UX (before → after)
Before: bgagent pending shows bash argv; submitters surprised by mid-run tool gates.
After:
- Submit: governance preview from resolved registry pins (estimated interactive gates, rules that may fire).
- Watch: human-readable moments (
Plan verified — awaiting your approval).
- Pending: unified queue — event-sourced and tool-sourced approvals differ in trigger context, not approve/deny mechanics.
- Authoring:
bgagent registry list, bgagent rules eval --fixture, observe→enforce rollout per pack.
Phased delivery
| Phase |
Scope |
Outcome |
| 0 |
Catalog + observe_only + PolicyDecisionEvent |
"Would have fired" in watch stream |
| 1 |
Async notify/fan-out + webhook |
Ping on PR/cost without new HITL |
| 2 |
Sync checkpoints + manifest |
Plan review before code (primary UX win) |
| 3 |
Registry-native event-rule-pack |
Org-wide versioned policy rollout |
| 4 |
Advanced aggregates + async cancel |
Operator automation |
Phases 0–2 can ship with inline blueprint config; Phase 3 aligns with agent asset registry MVP.
Data model extensions
- TaskApprovalsTable:
source (tool|event), event_id, checkpoint, rule_pack_id, rule_id.
- TaskEventsTable: catalog event types; optional
correlation_id for dedupe.
- PolicyDecisionEvent (roadmap): unified audit for every evaluation.
Cross-links: CEDAR_HITL_GATES.md, CHANGE_MANIFEST.md, ORCHESTRATOR.md, Roadmap — Agent asset registry & Centralized policy framework.
Out of scope
- Replacing tool Cedar with event rules — fail-closed execution safety stays on
PreToolUse.
- Stream-only HITL — async consumer alone cannot block fast agents (race).
- Inline blueprint YAML as the long-term config model — bootstrap only until registry ships.
- EventBridge as the primary internal bus — complementary export;
TaskEventsTable remains source of truth for task-scoped ordering (initially).
- Full registry MVP in Phase 0–2 (designed for; not required to land catalog + observe + sync checkpoints).
- Raw JSONPath/Cedar in default operator UX — verbose mode only.
- Separate approve commands for event vs tool gates.
Potential challenges
| Risk |
Mitigation |
| Sync vs async confusion |
Explicit modes in rule schema; UX copy for reactive approvals |
| Third DDB stream consumer capacity |
Plan Kinesis migration; do not multiply consumers blindly |
Overlapping tool + event require_approval |
Scope algebra TBD; idempotency key (task_id, rule_id, correlation_id) |
Async require_approval after pr_created |
UX must state PR already exists — cannot un-create |
| Evaluator down |
Sync blocking rules fail-closed; async notify may degrade with telemetry |
| Registry not ready |
Inline blueprint eventRules + migration to pins |
| Checkpoint trust model |
Mandatory pipeline hooks vs agent-declared milestones — open question |
| Rule language choice |
Cedar-on-events vs CEL/JSONLogic — author persona TBD |
Open questions (see RFC §12): rule language, scope algebra, checkpoint emission trust, third consumer design, Change Manifest state vs manifest_verified, multi-tenant registry merge order, observe→enforce granularity.
Dependencies and integrations
| Component |
Role |
Agent (pipeline.py, runner.py, progress_writer.py, hooks) |
Emit catalog events; sync checkpoint evaluation |
CDK (fanout-task-events.ts, orchestrator, Blueprint construct) |
Stream consumer; resolve registry pins at task start; extend fan-out |
CLI (bgagent submit, watch, pending, future registry/rules eval) |
Governance preview, unified pending UX |
| TaskEventsTable / TaskApprovalsTable |
Event log + approval extensions |
| Agent asset registry (roadmap) |
event-rule-pack, notification-profile assets |
| Centralized policy framework (roadmap) |
PolicyDecisionEvent, observe/enforce modes |
Related roadmap items: Agent asset registry, Centralized policy framework.
Alternative solutions
| Alternative |
Verdict |
Extend Cedar only — synthetic action::Event types |
Defer until event context schema stable; high author burden |
| Stream-only HITL in Lambda |
Rejected for blocking — race with fast agents |
| Inline blueprint YAML forever |
Bootstrap only; no versioning at scale |
| Replace tool Cedar with events |
Rejected |
| EventBridge as primary bus |
Complementary export; internal SoT stays TaskEventsTable |
Note: Non-triaged RFCs may not get timely review. PRs on non-triaged issues might not be accepted.
- RFC PR:
- Approved by:
- Reviewed by:
Primary area: Cross-cutting / multiple
Related issue or feature request: (none — new proposal)
Summary
Today, human-in-the-loop (HITL) and most governance controls are synchronous and tool-centric: Cedar policies in
PreToolUsegate Bash, Write, Read, etc. Operators configure bash patterns and tool shapes — not semantic moments like "plan ready", "PR opened", or "cumulative cost exceeded $25".Meanwhile, observability and notifications are already event-driven via
TaskEventsTableandFanOutConsumer, but that plane cannot prevent side effects unless something blocked earlier on the hot path.This RFC proposes a unified Event Governance layer: a normative event catalog, declarative event rules (condition → action), sync (in-agent, can block) vs async (stream consumer, react only) evaluation modes, registry-native configuration (versioned
event-rule-packassets pinned by blueprints), and UX as a first-class requirement (bgagent submitgovernance preview, unifiedbgagent pendingfor event- and tool-sourced approvals). Tool-level Cedar HITL remains the fail-closed safety net for execution.Use case and motivation
Who it's for: platform engineers, blueprint authors, security reviewers, and operators using
bgagent, Slack, and GitHub fan-out.Pain today:
agent_cost_updatepr_createdon protected branchapproval_requestedExpressing lifecycle governance as Cedar on tool argv is unreliable (retry/model variation, approval fatigue, false positives). Reactive needs (notify, audit, post-hoc cancel) do not belong on the tool hot path.
After this RFC: Operators configure governance against named lifecycle moments (checkpoints, milestones, aggregates). Sync checkpoints block irreversible work; async rules notify, escalate, or cancel without pretending they prevented an action that already happened.
Proposal
Core design:
Two planes, one catalog
onevent +whenconditions → actions:require_approval,notify,escalate,cancel_task,inject_nudge,observe_only.checkpoint:before_execution,before_open_pr, etc.); same latency class as Cedar; can transition toAWAITING_APPROVAL.TaskEventsTablestream consumer for notify, aggregates (agent_cost_updatecost ceiling), post-hoc cancel; tens–hundreds of ms; must not imply blocking unless UX is explicit.TaskApprovalsTable/bgagent approve/deny.Capability registry (configuration home)
cedar-policy-modulePolicyEngineevent-rule-packnotification-profilecheckpoint-catalogBlueprints pin semver assets instead of inlining fragile YAML. Interim: inline
eventRulesuntil registry MVP (Phase 3).UX (before → after)
Before:
bgagent pendingshows bash argv; submitters surprised by mid-run tool gates.After:
Plan verified — awaiting your approval).bgagent registry list,bgagent rules eval --fixture, observe→enforce rollout per pack.Phased delivery
observe_only+PolicyDecisionEventevent-rule-packPhases 0–2 can ship with inline blueprint config; Phase 3 aligns with agent asset registry MVP.
Data model extensions
source(tool|event),event_id,checkpoint,rule_pack_id,rule_id.correlation_idfor dedupe.Cross-links:
CEDAR_HITL_GATES.md,CHANGE_MANIFEST.md,ORCHESTRATOR.md, Roadmap — Agent asset registry & Centralized policy framework.Out of scope
PreToolUse.TaskEventsTableremains source of truth for task-scoped ordering (initially).Potential challenges
require_approval(task_id, rule_id, correlation_id)require_approvalafterpr_createdeventRules+ migration to pinsOpen questions (see RFC §12): rule language, scope algebra, checkpoint emission trust, third consumer design, Change Manifest state vs
manifest_verified, multi-tenant registry merge order, observe→enforce granularity.Dependencies and integrations
pipeline.py,runner.py,progress_writer.py, hooks)fanout-task-events.ts, orchestrator, Blueprint construct)bgagent submit,watch,pending, futureregistry/rules eval)event-rule-pack,notification-profileassetsPolicyDecisionEvent, observe/enforce modesRelated roadmap items: Agent asset registry, Centralized policy framework.
Alternative solutions
action::EventtypesNote: Non-triaged RFCs may not get timely review. PRs on non-triaged issues might not be accepted.