barbacane-dev · ndreno · Apr 29, 2026 · Apr 20, 2026 · Apr 20, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -13,6 +13,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - **cli**: `barbacane compile` now discovers specs from the manifest's `specs` folder when `--spec` is not provided — `barbacane compile -m barbacane.yaml -o api.bca` works with zero spec args.
 - **cli**: `barbacane init` now scaffolds a `specs/` directory and places the generated spec in `specs/api.yaml` with `specs: ./specs/` in the manifest.
 
+#### AI Gateway middlewares (ADR-0024)
+- **`ai-prompt-guard` middleware plugin**: validates LLM chat-completion requests before dispatch — named profiles carry `max_messages`, `max_message_length`, regex `blocked_patterns`, and managed `system_template` with `{var}` substitution. Short-circuits with 400 + RFC 9457 problem+json on violation.
+- **`ai-token-limit` middleware plugin**: token-based sliding-window rate limiting for LLM endpoints. Named profiles carry `quota` + `window` (seconds); `partition_key` / `policy_name` / `count` stay top-level. Advisory semantics: streaming responses can't be interrupted mid-flight, so overshoots are absorbed and the next request 429s. Emits standard `ratelimit-*` response headers.
+- **`ai-cost-tracker` middleware plugin**: per-request LLM cost in USD from a configurable `provider/model` price table (USD per 1,000 tokens). Emits the Prometheus counter `barbacane_plugin_ai_cost_tracker_cost_dollars` with `provider` and `model` labels for Grafana spend dashboards. No profile map — prices are operator facts, not policy.
+- **`ai-response-guard` middleware plugin**: inspects LLM responses (OpenAI chat-completion format) in on_response. Named profiles carry `redact` rules (regex → replacement, scoped to `choices[].message.content` and `delta.content`) and `blocked_patterns` (match replaces the response with 502). Streamed responses cannot be redacted after the fact; the plugin emits `redactions_skipped_streaming_total` instead.
+- **Named-profile + CEL composition pattern**: all four AI middlewares read a `context_key` (default `ai.policy`, overridable) to select the active profile. A `cel` middleware upstream writes `ai.policy` via `on_match.set_context`; one CEL decision fans out to prompt strictness, token budget, redaction strictness, and the `ai-proxy` dispatcher's named targets (via `ai.target`).
+
+### Changed
+- **plugin**: `ai-token-limit` config now uses `quota` + `window` (seconds) — aligned with the `rate-limit` plugin — instead of `max_tokens_per_minute` / `max_tokens_per_hour`. For multiple concurrent windows (e.g. per-minute and per-hour caps), stack two instances of the middleware with different `policy_name`s.
+- **plugin**: AI guard/limit plugins (`ai-prompt-guard`, `ai-token-limit`, `ai-response-guard`) **fail-closed** on misconfiguration — a missing `default_profile` or invalid regex in a profile returns `500 problem+json` instead of silently letting traffic through. A silently disabled PII rule is precisely the class of bug operators only catch from an incident.
+- **plugin**: `ai-token-limit` now persists the resolved partition key into context between `on_request` and `on_response` (scoped by `policy_name`) so `client_ip` and `header:*` partition sources charge the same bucket the request was admitted against. Previously token consumption leaked into a shared `"unknown"` bucket, effectively disabling per-consumer budgeting for those partition sources.
+
+### Fixed
+- **gateway**: dispatcher plugins now receive the middleware chain's accumulated context — previously `host_context_get` calls inside a dispatcher (e.g. `ai-proxy` reading `ai.target` written by `cel`) returned nothing because the dispatcher instance was started with an empty context. This also means context keys *written* by a dispatcher (e.g. `ai.prompt_tokens` from `ai-proxy`) now flow into the `on_response` middleware chain, which is what makes `ai-cost-tracker` and `ai-token-limit` actually see token usage.
+- **gateway**: stale framing headers (`content-length`, `transfer-encoding`, `connection`, `keep-alive`) from upstream responses are stripped before returning to the client so `on_response` middleware that mutates the body (e.g. `ai-response-guard` PII redaction) doesn't cause `IncompleteMessage` errors from a length mismatch.
+
 ## [0.6.3] - 2026-04-07
 
 ### Fixed

diff --git a/README.md b/README.md
@@ -9,10 +9,10 @@
 <p align="center">
   <a href="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml"><img src="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
   <a href="https://docs.barbacane.dev"><img src="https://img.shields.io/badge/docs-docs.barbacane.dev-blue" alt="Documentation"></a>
-  <img src="https://img.shields.io/badge/unit%20tests-505%20passing-brightgreen" alt="Unit Tests">
-  <img src="https://img.shields.io/badge/plugin%20tests-684%20passing-brightgreen" alt="Plugin Tests">
-  <img src="https://img.shields.io/badge/integration%20tests-267%20passing-brightgreen" alt="Integration Tests">
-  <img src="https://img.shields.io/badge/cli%20tests-16%20passing-brightgreen" alt="CLI Tests">
+  <img src="https://img.shields.io/badge/unit%20tests-517%20passing-brightgreen" alt="Unit Tests">
+  <img src="https://img.shields.io/badge/plugin%20tests-777%20passing-brightgreen" alt="Plugin Tests">
+  <img src="https://img.shields.io/badge/integration%20tests-275%20passing-brightgreen" alt="Integration Tests">
+  <img src="https://img.shields.io/badge/cli%20tests-23%20passing-brightgreen" alt="CLI Tests">
   <img src="https://img.shields.io/badge/ui%20tests-44%20passing-brightgreen" alt="UI Tests">
   <img src="https://img.shields.io/badge/e2e%20tests-11%20passing-brightgreen" alt="E2E Tests">
   <img src="https://img.shields.io/badge/rust-1.75%2B-orange" alt="Rust Version">
@@ -59,7 +59,7 @@ Full documentation is available at **[docs.barbacane.dev](https://docs.barbacane
 
 - [Getting Started](https://docs.barbacane.dev/guide/getting-started.html) — First steps with Barbacane
 - [Spec Configuration](https://docs.barbacane.dev/guide/spec-configuration.html) — Configure routing and middleware
-- [Middlewares](https://docs.barbacane.dev/guide/middlewares.html) — Authentication, rate limiting, caching
+- [Middlewares](https://docs.barbacane.dev/guide/middlewares/) — Authentication, rate limiting, caching
 - [Dispatchers](https://docs.barbacane.dev/guide/dispatchers.html) — Route requests to backends
 - [Control Plane](https://docs.barbacane.dev/guide/control-plane.html) — REST API for spec and artifact management
 - [Web UI](https://docs.barbacane.dev/guide/web-ui.html) — Web-based management interface
@@ -115,6 +115,10 @@ The playground includes a Train Travel API demo with WireMock backend, full obse
 | `response-transformer` | Middleware | Modify status code, headers, and body before client |
 | `observability` | Middleware | SLO monitoring and detailed logging |
 | `http-log` | Middleware | Send request/response logs to HTTP endpoint |
+| `ai-prompt-guard` | Middleware | Validate and constrain LLM prompts under named policy profiles |
+| `ai-token-limit` | Middleware | Token-based sliding-window rate limiting for LLM endpoints |
+| `ai-cost-tracker` | Middleware | Record per-request LLM cost (USD) from a configurable price table |
+| `ai-response-guard` | Middleware | PII redaction and blocked-pattern scanning on LLM responses |
 
 ## Performance
 

diff --git a/ROADMAP.md b/ROADMAP.md
@@ -12,7 +12,7 @@ What's actively being worked on:
 
 - [x] `request-transformer` plugin — modify headers, query params, path, body before upstream
 - [x] `response-transformer` plugin — modify response status code, headers, body before client
-- [x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares.md`)
+- [x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares/`)
 
 ---
 
@@ -21,7 +21,7 @@ What's actively being worked on:
 Near-term items ready to be picked up:
 
 - [ ] `tcp-log` plugin — send logs to TCP endpoint
-- [x] Security plugins documentation — **done** (documented in `docs/guide/middlewares.md`)
+- [x] Security plugins documentation — **done** (documented in `docs/guide/middlewares/`)
 - [ ] Structured log format documentation
 - [ ] Integration guides (Datadog, Splunk, ELK)
 - [x] `barbacane dev` — local dev server with file watching — **done**
@@ -87,10 +87,10 @@ Near-term items ready to be picked up:
 |--------|------|----------|-------------|
 | ~~`cel` routing extension~~ | ~~Middleware~~ | ~~P0~~ | ~~`on_match.set_context` + `context_set` capability for policy-driven model routing~~ — **done** |
 | ~~`ai-proxy`~~ | ~~Dispatcher~~ | ~~P0~~ | ~~Route requests to LLM providers (OpenAI, Anthropic, Ollama); unified OpenAI-compatible API; format translation; provider fallback; policy-driven routing via named targets; token count context propagation~~ — **done** |
-| `ai-token-limit` | Middleware | P1 | Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`) |
-| `ai-cost-tracker` | Middleware | P1 | Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards |
-| `ai-prompt-guard` | Middleware | P1 | Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection |
-| `ai-response-guard` | Middleware | P1 | Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses |
+| ~~`ai-token-limit`~~ | ~~Middleware~~ | ~~P1~~ | ~~Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`)~~ — **done** |
+| ~~`ai-cost-tracker`~~ | ~~Middleware~~ | ~~P1~~ | ~~Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards~~ — **done** |
+| ~~`ai-prompt-guard`~~ | ~~Middleware~~ | ~~P1~~ | ~~Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection~~ — **done** |
+| ~~`ai-response-guard`~~ | ~~Middleware~~ | ~~P1~~ | ~~Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses~~ — **done** |
 
 ---