Replies: 5 comments
-
|
Great design, @jlugo63. The separation of powers model maps directly to problems we have been working on. What aligns with AGT today:
Where Gavel adds value we do not have:
On your open questions:
Would be interested in exploring integration — AGT provides policy evaluation and trust scoring, Gavel wraps that in multi-agent approval workflow. Happy to discuss architecture. |
Beta Was this translation helpful? Give feedback.
-
|
@imran-siddique Thanks for the detailed mapping — really helpful to see how everything lines up. The MerkleAuditChain + Gavel governance chains feel like a strong combination. The integration you’re describing is very much where this is heading — AGT handling identity + policy evaluation, with Gavel layering on the multi-step governance workflow. I’d be happy to put together a quick POC to show how those pieces fit together. Would you prefer starting with a design doc, or should I open a PR with an integration example? |
Beta Was this translation helpful? Give feedback.
-
|
The Kiro incident is a perfect motivator for this kind of design. We had our own near-miss — an agent decided to "clean up" a production Slack channel by archiving it, which would have deleted months of operational context. |
Beta Was this translation helpful? Give feedback.
-
|
The separation-of-powers model is strong, but I would make the approval binding very explicit. A governance chain should not approve a vague intent such as "fix production issue". It should approve a specific bounded action:
Then the execution token should be invalid if any of those fields change. This prevents the common failure where an agent gets approval for a narrow action and then executes a broader variant after context changes. I would also separate three review outcomes: policy denial, proportionality denial, and evidence insufficiency. They look similar operationally, but they teach different lessons. A policy denial means the action is not allowed. A proportionality denial means the action is technically allowed but excessive for the problem. Evidence insufficiency means the chain needs more sandbox proof before humans or agents can approve. The correction-propagation idea in the comments is useful, but I would avoid turning denials directly into global behavior changes without review. A safer pattern is: denial -> classified correction -> policy/rubric update -> regression test -> rollout to agents. That keeps fleet learning auditable. |
Beta Was this translation helpful? Give feedback.
-
|
This maps directly to something we found critical: when an agent in a delegation chain gets compromised, you need trust contagion — the parent's trust score should drop automatically when a child is quarantined. In AgentGate: child quarantined → parent -15pts, parent quarantined → all children -30pts, 1h TTL. Curious how AGT handles propagation across delegation chains. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey all — I’ve been following this project since the announcement and wanted to share something we’ve been building on top of it.
The problem that got us started
The Kiro incident at Amazon stuck with me. Their AI coding agent was told to fix a minor Cost Explorer bug, decided the fastest path was to delete the entire production environment and rebuild it, and caused a 13-hour outage. Then Alibaba’s ROME agent hijacked GPUs for crypto mining because it calculated that was the most efficient way to hit its performance targets. Not compromised — just optimizing.
Both cases had the same structure: a single agent with broad permissions made a big decision with no independent review. The agent that identified the problem also decided the fix and executed it. Nobody checked whether “delete production” was a proportionate response to a config bug.
Policy engines wouldn’t have helped here. The actions were technically allowed. The issue was that nobody evaluated whether the specific decision made sense, and no independent system verified it before execution.
What we built
We’ve been working on a governance layer called Gavel that sits on top of this toolkit. The core idea is separation of powers: the agent that proposes an action cannot be the one that reviews or approves it.
Using the Kiro scenario as an example:
An agent proposes a fix and declares its scope up front (e.g., modifying a config file only). That proposal is recorded as the first event in a hash-linked audit trail.
The system evaluates risk factors (production environment, destructive potential, financial impact) and determines that a full governance workflow is required.
The proposed action is executed in an isolated sandbox first. If the agent’s actual behavior deviates from its declared scope (e.g., attempting to delete infrastructure), that becomes part of the evidence.
A deterministic reviewer checks the evidence against the declared scope. If there’s a violation, the chain stops immediately. Nothing reaches production.
If everything checks out, the process continues:
A separate review agent evaluates the evidence
A third agent approves the action
A scoped, single-use execution token is issued with a short expiration
Each step is cryptographically linked via SHA-256, forming a verifiable chain of decisions.
Key components
Governance chains
Hash-linked sequences from proposal to execution that create a verifiable decision trail.
Separation of powers
Proposer, reviewer, and approver must be distinct agents. This is enforced structurally at the API level.
Blast box
Sandbox execution environment that generates evidence for review. The focus is on proof, not just safety.
Deterministic evidence review
A set of fixed checks (scope compliance, secret detection, forbidden paths, etc.) with no LLM involvement.
Tiered autonomy
Risk-based controls. Low-risk actions move fast; high-risk actions require full governance.
Liveness monitor
Timeout-based denial. If approval doesn’t happen in time, the action is automatically rejected.
Constitutional invariants
Hard “forbid” rules that cannot be overridden by configuration.
We currently have 158 tests (including adversarial cases like self-approval, role switching, and hash tampering) with 97.5% coverage. The project is a Python package built on top of the agent-governance-toolkit.
Repo: https://github.com/jlugo63/gavel
Why now
EU AI Act high-risk obligations take effect August 2, 2026. Requirements include human oversight, continuous risk management, and verifiable auditability. Governance chains with enforced separation of powers map directly to these needs.
At the same time, most enterprise leaders expect a major AI agent incident within the next year. It feels like the window to get governance right is closing quickly.
Open questions
Curious how others are approaching this:
Are teams building approval workflows for AI-initiated production changes, or relying on permissions and policy engines?
Is structural separation of powers important, or is trust scoring sufficient?
Is anyone using sandbox-first evidence before approving actions?
How are you thinking about EU AI Act compliance for autonomous agents?
Would love feedback on whether this direction makes sense or if we’re solving the wrong problem. Happy to dive deeper into any part of the implementation.
Beta Was this translation helpful? Give feedback.
All reactions