You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+16Lines changed: 16 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,6 +13,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
13
13
-**cli**: `barbacane compile` now discovers specs from the manifest's `specs` folder when `--spec` is not provided — `barbacane compile -m barbacane.yaml -o api.bca` works with zero spec args.
14
14
-**cli**: `barbacane init` now scaffolds a `specs/` directory and places the generated spec in `specs/api.yaml` with `specs: ./specs/` in the manifest.
15
15
16
+
#### AI Gateway middlewares (ADR-0024)
17
+
-**`ai-prompt-guard` middleware plugin**: validates LLM chat-completion requests before dispatch — named profiles carry `max_messages`, `max_message_length`, regex `blocked_patterns`, and managed `system_template` with `{var}` substitution. Short-circuits with 400 + RFC 9457 problem+json on violation.
18
+
-**`ai-token-limit` middleware plugin**: token-based sliding-window rate limiting for LLM endpoints. Named profiles carry `quota` + `window` (seconds); `partition_key` / `policy_name` / `count` stay top-level. Advisory semantics: streaming responses can't be interrupted mid-flight, so overshoots are absorbed and the next request 429s. Emits standard `ratelimit-*` response headers.
19
+
-**`ai-cost-tracker` middleware plugin**: per-request LLM cost in USD from a configurable `provider/model` price table (USD per 1,000 tokens). Emits the Prometheus counter `barbacane_plugin_ai_cost_tracker_cost_dollars` with `provider` and `model` labels for Grafana spend dashboards. No profile map — prices are operator facts, not policy.
20
+
-**`ai-response-guard` middleware plugin**: inspects LLM responses (OpenAI chat-completion format) in on_response. Named profiles carry `redact` rules (regex → replacement, scoped to `choices[].message.content` and `delta.content`) and `blocked_patterns` (match replaces the response with 502). Streamed responses cannot be redacted after the fact; the plugin emits `redactions_skipped_streaming_total` instead.
21
+
-**Named-profile + CEL composition pattern**: all four AI middlewares read a `context_key` (default `ai.policy`, overridable) to select the active profile. A `cel` middleware upstream writes `ai.policy` via `on_match.set_context`; one CEL decision fans out to prompt strictness, token budget, redaction strictness, and the `ai-proxy` dispatcher's named targets (via `ai.target`).
22
+
23
+
### Changed
24
+
-**plugin**: `ai-token-limit` config now uses `quota` + `window` (seconds) — aligned with the `rate-limit` plugin — instead of `max_tokens_per_minute` / `max_tokens_per_hour`. For multiple concurrent windows (e.g. per-minute and per-hour caps), stack two instances of the middleware with different `policy_name`s.
25
+
-**plugin**: AI guard/limit plugins (`ai-prompt-guard`, `ai-token-limit`, `ai-response-guard`) **fail-closed** on misconfiguration — a missing `default_profile` or invalid regex in a profile returns `500 problem+json` instead of silently letting traffic through. A silently disabled PII rule is precisely the class of bug operators only catch from an incident.
26
+
-**plugin**: `ai-token-limit` now persists the resolved partition key into context between `on_request` and `on_response` (scoped by `policy_name`) so `client_ip` and `header:*` partition sources charge the same bucket the request was admitted against. Previously token consumption leaked into a shared `"unknown"` bucket, effectively disabling per-consumer budgeting for those partition sources.
27
+
28
+
### Fixed
29
+
-**gateway**: dispatcher plugins now receive the middleware chain's accumulated context — previously `host_context_get` calls inside a dispatcher (e.g. `ai-proxy` reading `ai.target` written by `cel`) returned nothing because the dispatcher instance was started with an empty context. This also means context keys *written* by a dispatcher (e.g. `ai.prompt_tokens` from `ai-proxy`) now flow into the `on_response` middleware chain, which is what makes `ai-cost-tracker` and `ai-token-limit` actually see token usage.
30
+
-**gateway**: stale framing headers (`content-length`, `transfer-encoding`, `connection`, `keep-alive`) from upstream responses are stripped before returning to the client so `on_response` middleware that mutates the body (e.g. `ai-response-guard` PII redaction) doesn't cause `IncompleteMessage` errors from a length mismatch.
Copy file name to clipboardExpand all lines: ROADMAP.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -12,7 +12,7 @@ What's actively being worked on:
12
12
13
13
-[x]`request-transformer` plugin — modify headers, query params, path, body before upstream
14
14
-[x]`response-transformer` plugin — modify response status code, headers, body before client
15
-
-[x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares.md`)
15
+
-[x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares/`)
16
16
17
17
---
18
18
@@ -21,7 +21,7 @@ What's actively being worked on:
21
21
Near-term items ready to be picked up:
22
22
23
23
-[ ]`tcp-log` plugin — send logs to TCP endpoint
24
-
-[x] Security plugins documentation — **done** (documented in `docs/guide/middlewares.md`)
24
+
-[x] Security plugins documentation — **done** (documented in `docs/guide/middlewares/`)
25
25
-[ ] Structured log format documentation
26
26
-[ ] Integration guides (Datadog, Splunk, ELK)
27
27
-[x]`barbacane dev` — local dev server with file watching — **done**
@@ -87,10 +87,10 @@ Near-term items ready to be picked up:
87
87
|--------|------|----------|-------------|
88
88
|~~`cel` routing extension~~|~~Middleware~~|~~P0~~|~~`on_match.set_context` + `context_set` capability for policy-driven model routing~~ — **done**|
89
89
|~~`ai-proxy`~~|~~Dispatcher~~|~~P0~~|~~Route requests to LLM providers (OpenAI, Anthropic, Ollama); unified OpenAI-compatible API; format translation; provider fallback; policy-driven routing via named targets; token count context propagation~~ — **done**|
90
-
|`ai-token-limit`| Middleware |P1| Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`) |
91
-
|`ai-cost-tracker`| Middleware |P1| Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards |
92
-
|`ai-prompt-guard`| Middleware |P1| Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection |
93
-
|`ai-response-guard`| Middleware |P1| Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses |
90
+
|~~`ai-token-limit`~~|~~Middleware~~|~~P1~~|~~Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`)~~ — **done**|
91
+
|~~`ai-cost-tracker`~~|~~Middleware~~|~~P1~~|~~Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards~~ — **done**|
92
+
|~~`ai-prompt-guard`~~|~~Middleware~~|~~P1~~|~~Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection~~ — **done**|
93
+
|~~`ai-response-guard`~~|~~Middleware~~|~~P1~~|~~Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses~~ — **done**|
0 commit comments