Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,22 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- **cli**: `barbacane compile` now discovers specs from the manifest's `specs` folder when `--spec` is not provided — `barbacane compile -m barbacane.yaml -o api.bca` works with zero spec args.
- **cli**: `barbacane init` now scaffolds a `specs/` directory and places the generated spec in `specs/api.yaml` with `specs: ./specs/` in the manifest.

#### AI Gateway middlewares (ADR-0024)
- **`ai-prompt-guard` middleware plugin**: validates LLM chat-completion requests before dispatch — named profiles carry `max_messages`, `max_message_length`, regex `blocked_patterns`, and managed `system_template` with `{var}` substitution. Short-circuits with 400 + RFC 9457 problem+json on violation.
- **`ai-token-limit` middleware plugin**: token-based sliding-window rate limiting for LLM endpoints. Named profiles carry `quota` + `window` (seconds); `partition_key` / `policy_name` / `count` stay top-level. Advisory semantics: streaming responses can't be interrupted mid-flight, so overshoots are absorbed and the next request 429s. Emits standard `ratelimit-*` response headers.
- **`ai-cost-tracker` middleware plugin**: per-request LLM cost in USD from a configurable `provider/model` price table (USD per 1,000 tokens). Emits the Prometheus counter `barbacane_plugin_ai_cost_tracker_cost_dollars` with `provider` and `model` labels for Grafana spend dashboards. No profile map — prices are operator facts, not policy.
- **`ai-response-guard` middleware plugin**: inspects LLM responses (OpenAI chat-completion format) in on_response. Named profiles carry `redact` rules (regex → replacement, scoped to `choices[].message.content` and `delta.content`) and `blocked_patterns` (match replaces the response with 502). Streamed responses cannot be redacted after the fact; the plugin emits `redactions_skipped_streaming_total` instead.
- **Named-profile + CEL composition pattern**: all four AI middlewares read a `context_key` (default `ai.policy`, overridable) to select the active profile. A `cel` middleware upstream writes `ai.policy` via `on_match.set_context`; one CEL decision fans out to prompt strictness, token budget, redaction strictness, and the `ai-proxy` dispatcher's named targets (via `ai.target`).

### Changed
- **plugin**: `ai-token-limit` config now uses `quota` + `window` (seconds) — aligned with the `rate-limit` plugin — instead of `max_tokens_per_minute` / `max_tokens_per_hour`. For multiple concurrent windows (e.g. per-minute and per-hour caps), stack two instances of the middleware with different `policy_name`s.
- **plugin**: AI guard/limit plugins (`ai-prompt-guard`, `ai-token-limit`, `ai-response-guard`) **fail-closed** on misconfiguration — a missing `default_profile` or invalid regex in a profile returns `500 problem+json` instead of silently letting traffic through. A silently disabled PII rule is precisely the class of bug operators only catch from an incident.
- **plugin**: `ai-token-limit` now persists the resolved partition key into context between `on_request` and `on_response` (scoped by `policy_name`) so `client_ip` and `header:*` partition sources charge the same bucket the request was admitted against. Previously token consumption leaked into a shared `"unknown"` bucket, effectively disabling per-consumer budgeting for those partition sources.

### Fixed
- **gateway**: dispatcher plugins now receive the middleware chain's accumulated context — previously `host_context_get` calls inside a dispatcher (e.g. `ai-proxy` reading `ai.target` written by `cel`) returned nothing because the dispatcher instance was started with an empty context. This also means context keys *written* by a dispatcher (e.g. `ai.prompt_tokens` from `ai-proxy`) now flow into the `on_response` middleware chain, which is what makes `ai-cost-tracker` and `ai-token-limit` actually see token usage.
- **gateway**: stale framing headers (`content-length`, `transfer-encoding`, `connection`, `keep-alive`) from upstream responses are stripped before returning to the client so `on_response` middleware that mutates the body (e.g. `ai-response-guard` PII redaction) doesn't cause `IncompleteMessage` errors from a length mismatch.

## [0.6.3] - 2026-04-07

### Fixed
Expand Down
14 changes: 9 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,10 @@
<p align="center">
<a href="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml"><img src="https://github.com/barbacane-dev/barbacane/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
<a href="https://docs.barbacane.dev"><img src="https://img.shields.io/badge/docs-docs.barbacane.dev-blue" alt="Documentation"></a>
<img src="https://img.shields.io/badge/unit%20tests-505%20passing-brightgreen" alt="Unit Tests">
<img src="https://img.shields.io/badge/plugin%20tests-684%20passing-brightgreen" alt="Plugin Tests">
<img src="https://img.shields.io/badge/integration%20tests-267%20passing-brightgreen" alt="Integration Tests">
<img src="https://img.shields.io/badge/cli%20tests-16%20passing-brightgreen" alt="CLI Tests">
<img src="https://img.shields.io/badge/unit%20tests-517%20passing-brightgreen" alt="Unit Tests">
<img src="https://img.shields.io/badge/plugin%20tests-777%20passing-brightgreen" alt="Plugin Tests">
<img src="https://img.shields.io/badge/integration%20tests-275%20passing-brightgreen" alt="Integration Tests">
<img src="https://img.shields.io/badge/cli%20tests-23%20passing-brightgreen" alt="CLI Tests">
<img src="https://img.shields.io/badge/ui%20tests-44%20passing-brightgreen" alt="UI Tests">
<img src="https://img.shields.io/badge/e2e%20tests-11%20passing-brightgreen" alt="E2E Tests">
<img src="https://img.shields.io/badge/rust-1.75%2B-orange" alt="Rust Version">
Expand Down Expand Up @@ -59,7 +59,7 @@ Full documentation is available at **[docs.barbacane.dev](https://docs.barbacane

- [Getting Started](https://docs.barbacane.dev/guide/getting-started.html) — First steps with Barbacane
- [Spec Configuration](https://docs.barbacane.dev/guide/spec-configuration.html) — Configure routing and middleware
- [Middlewares](https://docs.barbacane.dev/guide/middlewares.html) — Authentication, rate limiting, caching
- [Middlewares](https://docs.barbacane.dev/guide/middlewares/) — Authentication, rate limiting, caching
- [Dispatchers](https://docs.barbacane.dev/guide/dispatchers.html) — Route requests to backends
- [Control Plane](https://docs.barbacane.dev/guide/control-plane.html) — REST API for spec and artifact management
- [Web UI](https://docs.barbacane.dev/guide/web-ui.html) — Web-based management interface
Expand Down Expand Up @@ -115,6 +115,10 @@ The playground includes a Train Travel API demo with WireMock backend, full obse
| `response-transformer` | Middleware | Modify status code, headers, and body before client |
| `observability` | Middleware | SLO monitoring and detailed logging |
| `http-log` | Middleware | Send request/response logs to HTTP endpoint |
| `ai-prompt-guard` | Middleware | Validate and constrain LLM prompts under named policy profiles |
| `ai-token-limit` | Middleware | Token-based sliding-window rate limiting for LLM endpoints |
| `ai-cost-tracker` | Middleware | Record per-request LLM cost (USD) from a configurable price table |
| `ai-response-guard` | Middleware | PII redaction and blocked-pattern scanning on LLM responses |

## Performance

Expand Down
12 changes: 6 additions & 6 deletions ROADMAP.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ What's actively being worked on:

- [x] `request-transformer` plugin — modify headers, query params, path, body before upstream
- [x] `response-transformer` plugin — modify response status code, headers, body before client
- [x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares.md`)
- [x] Documentation for transformation plugins — **done** (documented in `docs/guide/middlewares/`)

---

Expand All @@ -21,7 +21,7 @@ What's actively being worked on:
Near-term items ready to be picked up:

- [ ] `tcp-log` plugin — send logs to TCP endpoint
- [x] Security plugins documentation — **done** (documented in `docs/guide/middlewares.md`)
- [x] Security plugins documentation — **done** (documented in `docs/guide/middlewares/`)
- [ ] Structured log format documentation
- [ ] Integration guides (Datadog, Splunk, ELK)
- [x] `barbacane dev` — local dev server with file watching — **done**
Expand Down Expand Up @@ -87,10 +87,10 @@ Near-term items ready to be picked up:
|--------|------|----------|-------------|
| ~~`cel` routing extension~~ | ~~Middleware~~ | ~~P0~~ | ~~`on_match.set_context` + `context_set` capability for policy-driven model routing~~ — **done** |
| ~~`ai-proxy`~~ | ~~Dispatcher~~ | ~~P0~~ | ~~Route requests to LLM providers (OpenAI, Anthropic, Ollama); unified OpenAI-compatible API; format translation; provider fallback; policy-driven routing via named targets; token count context propagation~~ — **done** |
| `ai-token-limit` | Middleware | P1 | Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`) |
| `ai-cost-tracker` | Middleware | P1 | Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards |
| `ai-prompt-guard` | Middleware | P1 | Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection |
| `ai-response-guard` | Middleware | P1 | Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses |
| ~~`ai-token-limit`~~ | ~~Middleware~~ | ~~P1~~ | ~~Token-based rate limiting per consumer/model/time window (runs on_response, reads token counts from context set by `ai-proxy`)~~ — **done** |
| ~~`ai-cost-tracker`~~ | ~~Middleware~~ | ~~P1~~ | ~~Records cost metrics per provider/model via configurable price table; emits Prometheus counter for spend dashboards~~ — **done** |
| ~~`ai-prompt-guard`~~ | ~~Middleware~~ | ~~P1~~ | ~~Validate and constrain prompts: length limits, regex-based prompt injection detection, managed system template injection~~ — **done** |
| ~~`ai-response-guard`~~ | ~~Middleware~~ | ~~P1~~ | ~~Inspect LLM responses: PII redaction, blocked pattern detection; logs warnings when redaction is needed on already-streamed responses~~ — **done** |

---

Expand Down
Loading