Skip to content

Commit a4b02e4

Browse files
authored
Merge pull request #67 from barbacane-dev/feat/ai-gateway-middlewares
feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024)
2 parents 9b6a1e5 + fc825eb commit a4b02e4

55 files changed

Lines changed: 7309 additions & 1626 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

CHANGELOG.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1313
- **cli**: `barbacane compile` now discovers specs from the manifest's `specs` folder when `--spec` is not provided — `barbacane compile -m barbacane.yaml -o api.bca` works with zero spec args.
1414
- **cli**: `barbacane init` now scaffolds a `specs/` directory and places the generated spec in `specs/api.yaml` with `specs: ./specs/` in the manifest.
1515

16+
#### AI Gateway middlewares (ADR-0024)
17+
- **`ai-prompt-guard` middleware plugin**: validates LLM chat-completion requests before dispatch — named profiles carry `max_messages`, `max_message_length`, regex `blocked_patterns`, and managed `system_template` with `{var}` substitution. Short-circuits with 400 + RFC 9457 problem+json on violation.
18+
- **`ai-token-limit` middleware plugin**: token-based sliding-window rate limiting for LLM endpoints. Named profiles carry `quota` + `window` (seconds); `partition_key` / `policy_name` / `count` stay top-level. Advisory semantics: streaming responses can't be interrupted mid-flight, so overshoots are absorbed and the next request 429s. Emits standard `ratelimit-*` response headers.
19+
- **`ai-cost-tracker` middleware plugin**: per-request LLM cost in USD from a configurable `provider/model` price table (USD per 1,000 tokens). Emits the Prometheus counter `barbacane_plugin_ai_cost_tracker_cost_dollars` with `provider` and `model` labels for Grafana spend dashboards. No profile map — prices are operator facts, not policy.
20+
- **`ai-response-guard` middleware plugin**: inspects LLM responses (OpenAI chat-completion format) in on_response. Named profiles carry `redact` rules (regex → replacement, scoped to `choices[].message.content` and `delta.content`) and `blocked_patterns` (match replaces the response with 502). Streamed responses cannot be redacted after the fact; the plugin emits `redactions_skipped_streaming_total` instead.
21+
- **Named-profile + CEL composition pattern**: all four AI middlewares read a `context_key` (default `ai.policy`, overridable) to select the active profile. A `cel` middleware upstream writes `ai.policy` via `on_match.set_context`; one CEL decision fans out to prompt strictness, token budget, redaction strictness, and the `ai-proxy` dispatcher's named targets (via `ai.target`).
22+
23+
### Changed
24+
- **plugin**: `ai-token-limit` config now uses `quota` + `window` (seconds) — aligned with the `rate-limit` plugin — instead of `max_tokens_per_minute` / `max_tokens_per_hour`. For multiple concurrent windows (e.g. per-minute and per-hour caps), stack two instances of the middleware with different `policy_name`s.
25+
- **plugin**: AI guard/limit plugins (`ai-prompt-guard`, `ai-token-limit`, `ai-response-guard`) **fail-closed** on misconfiguration — a missing `default_profile` or invalid regex in a profile returns `500 problem+json` instead of silently letting traffic through. A silently disabled PII rule is precisely the class of bug operators only catch from an incident.
26+
- **plugin**: `ai-token-limit` now persists the resolved partition key into context between `on_request` and `on_response` (scoped by `policy_name`) so `client_ip` and `header:*` partition sources charge the same bucket the request was admitted against. Previously token consumption leaked into a shared `"unknown"` bucket, effectively disabling per-consumer budgeting for those partition sources.
27+
28+
### Fixed
29+
- **gateway**: dispatcher plugins now receive the middleware chain's accumulated context — previously `host_context_get` calls inside a dispatcher (e.g. `ai-proxy` reading `ai.target` written by `cel`) returned nothing because the dispatcher instance was started with an empty context. This also means context keys *written* by a dispatcher (e.g. `ai.prompt_tokens` from `ai-proxy`) now flow into the `on_response` middleware chain, which is what makes `ai-cost-tracker` and `ai-token-limit` actually see token usage.
30+
- **gateway**: stale framing headers (`content-length`, `transfer-encoding`, `connection`, `keep-alive`) from upstream responses are stripped before returning to the client so `on_response` middleware that mutates the body (e.g. `ai-response-guard` PII redaction) doesn't cause `IncompleteMessage` errors from a length mismatch.
31+
1632
## [0.6.3] - 2026-04-07
1733

1834
### Fixed

README.md

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -9,10 +9,10 @@
99
<p align="center">
1010
<a href="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml"><img src="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
1111
<a href="https://docs.barbacane.dev"><img src="https://img.shields.io/badge/docs-docs.barbacane.dev-blue" alt="Documentation"></a>
12-
<img src="https://img.shields.io/badge/unit%20tests-505%20passing-brightgreen" alt="Unit Tests">
13-
<img src="https://img.shields.io/badge/plugin%20tests-684%20passing-brightgreen" alt="Plugin Tests">
14-
<img src="https://img.shields.io/badge/integration%20tests-267%20passing-brightgreen" alt="Integration Tests">
15-
<img src="https://img.shields.io/badge/cli%20tests-16%20passing-brightgreen" alt="CLI Tests">
12+
<img src="https://img.shields.io/badge/unit%20tests-517%20passing-brightgreen" alt="Unit Tests">
13+
<img src="https://img.shields.io/badge/plugin%20tests-777%20passing-brightgreen" alt="Plugin Tests">
14+
<img src="https://img.shields.io/badge/integration%20tests-275%20passing-brightgreen" alt="Integration Tests">
15+
<img src="https://img.shields.io/badge/cli%20tests-23%20passing-brightgreen" alt="CLI Tests">
1616
<img src="https://img.shields.io/badge/ui%20tests-44%20passing-brightgreen" alt="UI Tests">
1717
<img src="https://img.shields.io/badge/e2e%20tests-11%20passing-brightgreen" alt="E2E Tests">
1818
<img src="https://img.shields.io/badge/rust-1.75%2B-orange" alt="Rust Version">
@@ -59,7 +59,7 @@ Full documentation is available at **[docs.barbacane.dev](https://docs.barbacane
5959

6060
- [Getting Started](https://docs.barbacane.dev/guide/getting-started.html) — First steps with Barbacane
6161
- [Spec Configuration](https://docs.barbacane.dev/guide/spec-configuration.html) — Configure routing and middleware
62-
- [Middlewares](https://docs.barbacane.dev/guide/middlewares.html) — Authentication, rate limiting, caching
62+
- [Middlewares](https://docs.barbacane.dev/guide/middlewares/) — Authentication, rate limiting, caching
6363
- [Dispatchers](https://docs.barbacane.dev/guide/dispatchers.html) — Route requests to backends
6464
- [Control Plane](https://docs.barbacane.dev/guide/control-plane.html) — REST API for spec and artifact management
6565
- [Web UI](https://docs.barbacane.dev/guide/web-ui.html) — Web-based management interface
@@ -115,6 +115,10 @@ The playground includes a Train Travel API demo with WireMock backend, full obse
115115
| `response-transformer` | Middleware | Modify status code, headers, and body before client |
116116
| `observability` | Middleware | SLO monitoring and detailed logging |
117117
| `http-log` | Middleware | Send request/response logs to HTTP endpoint |
118+
| `ai-prompt-guard` | Middleware | Validate and constrain LLM prompts under named policy profiles |
119+
| `ai-token-limit` | Middleware | Token-based sliding-window rate limiting for LLM endpoints |
120+
| `ai-cost-tracker` | Middleware | Record per-request LLM cost (USD) from a configurable price table |
121+
| `ai-response-guard` | Middleware | PII redaction and blocked-pattern scanning on LLM responses |
118122

119123
## Performance
120124

ROADMAP.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -12,7 +12,7 @@ What's actively being worked on:
1212

1313
- [x] `request-transformer` plugin — modify headers, query params, path, body before upstream
1414
- [x] `response-transformer` plugin — modify response status code, headers, body before client
15-
- [x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares.md`)
15+
- [x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares/`)
1616

1717
---
1818

@@ -21,7 +21,7 @@ What's actively being worked on:
2121
Near-term items ready to be picked up:
2222

2323
- [ ] `tcp-log` plugin — send logs to TCP endpoint
24-
- [x] Security plugins documentation — **done** (documented in `docs/guide/middlewares.md`)
24+
- [x] Security plugins documentation — **done** (documented in `docs/guide/middlewares/`)
2525
- [ ] Structured log format documentation
2626
- [ ] Integration guides (Datadog, Splunk, ELK)
2727
- [x] `barbacane dev` — local dev server with file watching — **done**
@@ -87,10 +87,10 @@ Near-term items ready to be picked up:
8787
|--------|------|----------|-------------|
8888
| ~~`cel` routing extension~~ | ~~Middleware~~ | ~~P0~~ | ~~`on_match.set_context` + `context_set` capability for policy-driven model routing~~**done** |
8989
| ~~`ai-proxy`~~ | ~~Dispatcher~~ | ~~P0~~ | ~~Route requests to LLM providers (OpenAI, Anthropic, Ollama); unified OpenAI-compatible API; format translation; provider fallback; policy-driven routing via named targets; token count context propagation~~**done** |
90-
| `ai-token-limit` | Middleware | P1 | Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`) |
91-
| `ai-cost-tracker` | Middleware | P1 | Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards |
92-
| `ai-prompt-guard` | Middleware | P1 | Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection |
93-
| `ai-response-guard` | Middleware | P1 | Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses |
90+
| ~~`ai-token-limit`~~ | ~~Middleware~~ | ~~P1~~ | ~~Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`)~~**done** |
91+
| ~~`ai-cost-tracker`~~ | ~~Middleware~~ | ~~P1~~ | ~~Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards~~**done** |
92+
| ~~`ai-prompt-guard`~~ | ~~Middleware~~ | ~~P1~~ | ~~Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection~~**done** |
93+
| ~~`ai-response-guard`~~ | ~~Middleware~~ | ~~P1~~ | ~~Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses~~**done** |
9494

9595
---
9696

0 commit comments

Comments
 (0)