Skip to content

Latest commit

 

History

History
177 lines (144 loc) · 6.93 KB

File metadata and controls

177 lines (144 loc) · 6.93 KB

v0.3 — Control plane loop demo

Tag: v0.3-control-loop-demo Commit: c70e3c8 week3-control-plane-loop-green Date: 2026-05-25

What this release is

The first runnable version of Bridle that demonstrates the company thesis end-to-end: a central control plane, two enforcement surfaces, signed policy distribution, and an audit-backed shadow report that proves a central mode flip changes runtime behavior without redeploy.

This is a demo release. It is intentionally not pilot-ready yet — see "What must become durable in Week 4" below.

Architecture at this tag

                  +--------------------------+
                  |   Bridle Control Plane   |
                  |   (FastAPI, in-memory)   |
                  |  - signed bundles        |
                  |  - gateway registry      |
                  |  - audit ingest          |
                  |  - shadow report         |
                  |  - mode-flip endpoint    |
                  +-----------+--------------+
                              | signed bundles (poll)
                              | audit rows (push, batched)
            +-----------------+---------------------+
            |                                       |
   +--------+------------+                +---------+---------+
   |  LiteLLM Proxy +    |                |  Tool middleware  |
   |  BridleLogger (#0)  |                |  @bridle.tool     |
   +---------------------+                +-------------------+
            |                                       |
            v                                       v
       upstream provider                       business logic

Both surfaces share:

  • one GatewayInterceptor
  • one LocalDemoPolicyEngine (bundle-driven config)
  • one InMemorySessionStateService
  • one InMemoryAuditLedger
  • and emit identity envelopes with the same trace_id / session_id / agent_id / actor_id / tenant_id.

Test scoreboard

Suite Tests Result
Contract models 4
Bundle validator 4
Interceptor unit 4
Week 1 budget lifecycle 2
Week 1 e2e (real LiteLLM proxy) 5
Week 2 tool-call acceptance 9
Week 3 CP server 10
Week 3 HTTP bundle loader 3
Week 3 audit shipper 5
Week 3 central loop + trace linkage 3
Spike regression (separate run_spike.sh) 16
Total 65 65/65

Latency

Metric Measured Target
LLM e2e p95 (Week 1, real LiteLLM) 10.63 ms < 25 ms
Tool decision p95 (Week 2) 0.08 ms < 25 ms
CP publish + signature roundtrip < 5 ms n/a
Loader fetch + verify + swap ~2 ms n/a

Accepted DoD

Week 1: LLM gateway enforcement with allow / mutate / block / fail-closed and audit row per request. ✅

Week 2: Tool-call enforcement via @bridle.tool decorator + contextvars. Refund-agent demo proves one session, two enforcement surfaces, one policy engine, one audit ledger. ✅

Week 3: From the central control plane, flip a policy from shadow to enforce, publish a signed bundle, gateway picks it up, next request changes behavior, audit/report proves it. ✅

Demo

One command runs the full Week 3 DoD demo end-to-end (CP + interceptor

  • poller + audit ship + report + mode flip):
python -m bridle.demo.control_loop

A Week 2 refund-agent demo also runs standalone:

python -m bridle.demo.refund_agent

For the LLM gateway path through a real LiteLLM proxy (Week 1 demo), see tests/test_week1_e2e.py — the proxy command is documented in that file's header.

Known constraints (at v0.3)

These are deliberate v0 trade-offs documented in ADR-001 / ADR-002 / ADR-003. They do not block Week 4.

  • LiteLLM 1.86.0 pin. The spike regression suite must pass before any LiteLLM version bump. tests/spikes/litellm_enforcement/run_spike.sh.
  • Single CustomLogger position #0. Observability loggers register after enforcement; only callback [0] receives async_pre_call_hook.
  • Mutation targets must be in model_list. Bundle validator enforces this at publish and mode-flip time.
  • Callback module must live next to the LiteLLM config YAML. LiteLLM resolves dotted paths relative to the config file's directory.
  • Bundle validator runs against the union of all registered model lists for a tenant at publish time. The gateway-side validator is the binding check.
  • prevented_spend_usd in the shadow report is a v0 proxy. It uses cost_at_decision_usd as the proxy for what the policy would have prevented. Exact per-action calculation is a v1 task.
  • Trace propagation requires the agent to set X-Trace-Id and pass the same trace_id to session_context. Framework adapters (OpenAI Agents SDK, LangGraph) will bridge this automatically when written; v0 requires the agent author to do it.

What is intentionally fake / in-memory at v0.3

These are the surfaces that must become durable before a pilot:

Surface Today Week 4 target
Audit ledger InMemoryAuditLedger, lost on gateway restart Postgres
Audit store (CP) InMemoryAuditStore, lost on CP restart Postgres
Session state InMemorySessionStateService, lost on gateway restart Postgres
Policy bundles InMemoryBundleStore, lost on CP restart Postgres
Gateway registry InMemoryGatewayRegistry, lost on CP restart Postgres
CP signing key KeyManager.from_env or generated ephemeral Persist in CP filesystem or KMS
Auth Static bearer token (BRIDLE_CP_MASTER_KEY) Tracked but not in Week 4
RBAC, web UI, billing Absent Not in Week 4

What must become durable in Week 4

Per the Week 4 brief, in order:

  1. Postgres durability for audit_rows, events, sessions, policy_bundles, gateway_registry. No ClickHouse, no warehouse.
  2. Failure-mode tests that demonstrate trust: bad signature, bundle expired, CP unreachable, audit shipper unreachable, policy engine error → fail-by-severity.
  3. One-command demo: make demo-control-loop (or equivalent) spins up CP + LiteLLM proxy + mock + runs the loop + tears down.
  4. Trace report helper: query trace_id → ordered LLM/tool observation/decision/outcome rows. This is the incident-review primitive.
  5. Design-partner walkthrough at docs/demo/design_partner_walkthrough.md in buyer language.
  6. Pilot-readiness review — Week 4 ADR + updated release note.

Week 4 DoD: a fresh machine can run one command and observe the full loop, with durable audit rows surviving a CP and gateway restart, plus a trace report that links LLM and tool decisions for one agent turn.

Commits in this release

c70e3c8 week3-control-plane-loop-green
8aa4b02 w3-s1: bridle control plane HTTP server + bundle publish/sign
779626b rename: billion-baby/controlplane → Bridle
4398a45 week2-tool-call-enforcement-green
0fcd9e5 week1-litellm-gateway-enforcement-green