Architecture overview

Bridle is a runtime control plane for production AI agents. The architecture has three planes (control / data / telemetry) and one identity model that links them.

Three planes

Control plane (`bridle/cp_server/`)

The central FastAPI service. Owns policy authoring artifacts (bundles in Postgres), gateway registration, audit ingestion, mode-flip operations, and the two reports (shadow + trace).

POST  /v1/bundles                       publish + sign
GET   /v1/bundles/{id}                  fetch by id
GET   /v1/bundles/active                latest for gateway
POST  /v1/gateways/register             register + model_list
GET   /v1/gateways/{id}/status          active bundle + last-seen
POST  /v1/gateways/{id}/heartbeat       gateway pings on activate
POST  /v1/audit                         batch audit ingest
POST  /v1/policies/{id}/mode            shadow ↔ enforce flip
GET   /v1/reports/shadow                aggregated would-have-action
GET   /v1/reports/trace/{trace_id}      ordered obs/decision/outcome
GET   /v1/public_key                    bootstrap the signature verifier

The CP signs bundles with an ed25519 key. Gateways are bootstrapped with the matching public key and verify every bundle they activate.

Data plane (in the gateway)

Two enforcement surfaces, one GatewayInterceptor instance:

LLM gateway: LiteLLM Proxy + BridleLogger (CustomLogger in callback position 0). The logger's async_pre_call_hook calls the interceptor, which evaluates policy and returns allow / modified dict / block-string / raise.
Tool surface: @bridle.tool("issue_refund") decorator wraps any async tool function. Identity (session_id, agent_id, trace_id, ...) flows via contextvars set by session_context(...). The decorator calls the same interceptor before invoking the tool.

Both surfaces share _pending, state_service, audit_ledger, policy_engine, classifier. This is what makes Bridle "one session, two enforcement surfaces, one policy engine, one audit ledger" — the unifying invariant.

The data plane connects to Postgres for audit + session state, and to the CP via HTTPBundleLoader for signed bundle distribution.

Telemetry plane (the audit ledger)

Every decision lands as one append-only row in audit_rows: tenant_id, agent_id, actor_id, session_id, trace_id, request_id, observation_type, matched_policy_ids, mode_at_evaluation, final_action, would_have_action, final_outcome, cost_at_decision_usd, record_hash, previous_record_hash.

Rows are chained by record_hash for tamper evidence. Queries:

Shadow report (/v1/reports/shadow): group by policy across a tenant + window, sum cost_at_decision_usd as a v0 proxy for "prevented spend."
Trace report (/v1/reports/trace/{trace_id}): ordered events for one agent turn — the incident-review primitive.

Identity envelope

Every observation, decision, outcome, and audit row carries the same envelope. The fields that link surfaces together are:

Field	Purpose
`tenant_id`	Customer-level isolation
`session_id`	Per-product session; joins LLM + tool events for the same agent run
`trace_id`	Per-call trace; can be set by the agent to link a turn across surfaces
`agent_id`	Which agent made the call
`actor_id`	The end-user/service the agent is acting on behalf of
`request_id`	Joins one observation + decision + outcome triple

session_id is the v0.6 grouping primitive. trace_id was added in v0.5.1 hardening — set X-Trace-Id on an LLM call and pass the same value to session_context(trace_id=...) for the tool call, and one trace report links the whole turn.

YAML policy authoring (v0.5)

operator                                  control plane
─────────                                 ─────────────
  edit policy.yaml
       │
  bridle policy publish *.yaml
       │
       │   POST /v1/bundles
       ▼
       (CP validates + signs + persists)
       │
       │
       ▼
  gateway HTTPBundleLoader polls
       │
       │   GET /v1/bundles/active
       ▼
       (verify signature with cached public key)
       (run bundle_validator)
       (check expires_at — refuse if past)
       (engine.set_active_bundle(bundle))
       │
       ▼
       runtime: next request evaluates against the new bundle

YAML compiles to the existing signed PolicyBundle. No runtime changes — only an authoring layer. The six supported type: values map to canonical kebab-case policy IDs the engine already recognizes.

Failure modes (v0.5.1 hardening)

Failure	Behavior	Test
Bundle signature invalid	Loader refuses; cached stays	`test_http_bundle_loader.test_loader_rejects_bundle_with_bad_signature`
Bundle expired	Loader refuses; cached stays	`test_failure_modes.test_loader_rejects_expired_bundle_and_keeps_cached`
CP unreachable	Loader returns None; cached stays	`test_failure_modes.test_cp_*`
Audit shipper unavailable	Re-buffers in memory	`test_audit_shipper.*`
Policy engine raises	Synthetic `policy-engine-error` decision via worst-severity `fail_modes.on_engine_error`; raw exception never propagates	`test_failure_modes.test_engine_error_*`
Postgres restart	All five durable tables survive	`test_postgres_durability.*`

Durability

All five operational stores are Postgres-backed via asyncpg:

Table	Holds
`audit_rows`	Every decision
`sessions`	Per-session cost + counters
`tool_intents`	Loop-detector window
`policy_bundles`	Signed bundles + signature blob
`gateway_registry`	Gateway model_list + last-seen + active bundle

Migrations are flat SQL under bridle/migrations/, mounted into the Postgres container as init scripts via docker-compose.yml.

Spike regression suite

The original LiteLLM Path-A spike that picked the architecture lives at tests/spikes/litellm_enforcement/. It pins litellm==1.86.0 and re-runs 16 tests against a live LiteLLM Proxy + mock OpenAI upstream to verify the async_pre_call_hook contract the rest of the product depends on. Run it before any LiteLLM bump:

bash tests/spikes/litellm_enforcement/run_spike.sh

What's intentionally NOT in v0.6

See ADR-006 §"What v0.5.1 deliberately does NOT do" and ADR-005 §"What v0.5 deliberately does NOT do":

No web UI
No RBAC
No billing
No arbitrary policy logic / full DSL
No additional providers beyond LiteLLM
No per-rule targeting in the runtime (bundle-level only)
No auto-rollback / canary on bundle activation (mode-flip endpoint is the rollout mechanism)

These wait for design-partner pain.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture overview

Three planes

Control plane (`bridle/cp_server/`)

Data plane (in the gateway)

Telemetry plane (the audit ledger)

Identity envelope

YAML policy authoring (v0.5)

Failure modes (v0.5.1 hardening)

Durability

Spike regression suite

What's intentionally NOT in v0.6

FilesExpand file tree

overview.md

Latest commit

History

overview.md

File metadata and controls

Architecture overview

Three planes

Control plane (bridle/cp_server/)

Data plane (in the gateway)

Telemetry plane (the audit ledger)

Identity envelope

YAML policy authoring (v0.5)

Failure modes (v0.5.1 hardening)

Durability

Spike regression suite

What's intentionally NOT in v0.6

Control plane (`bridle/cp_server/`)