feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024) by ndreno · Pull Request #67 · barbacane-dev/barbacane

ndreno · 2026-04-20T08:31:38Z

Summary

Ships the four middlewares that extend ai-proxy into a full LLM gateway (ADR-0024), plus the host-side and lint plumbing discovered while validating the end-to-end composition.

4 new plugins (named-profile + CEL composition, fail-closed on misconfig): ai-prompt-guard, ai-token-limit, ai-cost-tracker, ai-response-guard.
Host fixes to make cross-plugin context work: dispatchers now receive the middleware chain's context and their post-dispatch writes flow into on_response; stale framing headers are stripped so body-mutating response middlewares don't break HTTP framing.
Lint: new vacuum function compiles regex patterns at lint time; removed the stale "no duplicate middlewares" rule — stacking is first-class.
Docs reorg: split the 1,782-line middlewares.md into 8 per-category pages under docs/guide/middlewares/.

What's new

Plugins

Plugin	Role
`ai-prompt-guard`	Per-profile message count / length limits, blocked-pattern regex, managed system template with `{var}` substitution
`ai-token-limit`	Token-based sliding-window rate limiting; persists the resolved partition in context so `client_ip` / `header:*` sources charge the same bucket on_request and on_response
`ai-cost-tracker`	Per-request USD metric from a configurable price table; emits `cost_dollars` Prometheus counter
`ai-response-guard`	PII regex redaction + blocked-pattern 502 rewrite; fail-closed on misconfig or invalid regex

Host fixes

Dispatcher context plumbing (crates/barbacane/src/main.rs): middleware_context is now set on the dispatcher instance before dispatch, and post_dispatch_context is captured after. Fixes cel → ai-proxy (via ai.target) and ai-proxy → ai-cost-tracker / ai-proxy → ai-token-limit (via ai.prompt_tokens etc.) — both were silently broken before.
Stale framing-header strip (build_response_from_plugin): content-length, transfer-encoding, connection, keep-alive from upstream are dropped so on_response middlewares that mutate the body (redaction!) don't cause IncompleteMessage errors.

Lint (shift-left)

barbacane-validate-ai-regex: new vacuum function that compiles regex patterns in ai-prompt-guard and ai-response-guard profiles at lint time.
Removed barbacane-no-duplicate-middlewares: middleware stacking is first-class per ADR-0024 (cel routing rules, rate-limit layered keys, ai-token-limit multi-window). The rule's premise was stale.

Docs reorg

Old: one 1,782-line middlewares.md. New: 8 per-category pages under docs/guide/middlewares/ (index / authentication / authorization / traffic-control / observability / transformation / caching / ai-gateway). Stacking is documented as a first-class composition mechanism with worked examples for cel, rate-limit, and ai-token-limit. Cross-links updated in dispatchers.md, extensions.md, SUMMARY.md, README.md, ROADMAP.md.

Test plan

cargo test on each plugin (93 unit tests total): ai-prompt-guard 27, ai-token-limit 29, ai-cost-tracker 13, ai-response-guard 24.
cargo clippy --all-targets -- -D warnings clean on each plugin.
cargo test -p barbacane-test --test ai_gateway (3 integration tests): redaction end-to-end, CEL profile selection, token-limit partition regression.
cargo test -p barbacane-test --test compilation (16 smoke tests, 6 new).
cargo test -p barbacane-test --test ai_proxy (no regression from host changes).
cargo test --workspace --exclude barbacane-test (no regression).
./docs/rulesets/tests/run-tests.sh — 14/14 passing including the new invalid-ai-regex negative fixture.
cargo fmt --all clean.
cargo deny check advisories — pre-existing RUSTSEC-2026-0098/0099 on rustls-webpki via async-nats, unrelated to this PR.

ADR

ADR-0024: AI Gateway Plugin — implemented in full.
ADR-0023: WASM Plugin Streaming Support — already shipped; advisory-stream semantics respected in ai-token-limit and ai-response-guard.

Breaking notes

Pre-1.0, so no back-compat shim, but worth calling out:

ai-token-limit config shape changed mid-design (before this PR's branch) from max_tokens_per_minute / max_tokens_per_hour to quota + window, aligning with the rate-limit plugin. Multi-window setups stack two instances with distinct policy_names.
docs/guide/middlewares.md was removed; anchor-based deep links (e.g. middlewares.html#ai-token-limit) will 404. Canonical URLs are now under docs/guide/middlewares/<category>.html.

Completes the ADR-0024 AI gateway by shipping the four middlewares that extend `ai-proxy` into a full LLM gateway, plus the host-side and lint plumbing discovered while validating the end-to-end composition. ### Plugins (named-profile + CEL composition, fail-closed on misconfig) - **ai-prompt-guard**: per-profile message count / length limits, blocked-pattern regex, and managed system-template injection with `{var}` substitution. - **ai-token-limit**: token-based sliding-window rate limiting. Persists the resolved partition key in context (scoped by `policy_name`) so `client_ip` / `header:*` sources charge the same bucket on_request and on_response. `quota` + `window` match the `rate-limit` plugin's semantics. - **ai-cost-tracker**: per-request USD metric from a configurable price table; emits `cost_dollars` Prometheus counter labelled by provider and model. - **ai-response-guard**: per-profile PII regex redaction + blocked- pattern 502 rewrite. Invalid regexes and missing `default_profile` return 500 — silently disabled PII rules are the kind of bug operators only catch from an incident. ### Host fixes (uncovered by the new integration test) - Dispatcher instances now receive the middleware chain's accumulated context and their post-dispatch context flows into on_response. This makes `cel → ai-proxy` (via `ai.target`) and `ai-proxy → cost-tracker` / `ai-proxy → token-limit` (via `ai.prompt_tokens` etc.) actually work — previously each plugin instance had its own isolated context. - Stale `content-length` / `transfer-encoding` / `connection` / `keep-alive` headers from upstream responses are stripped before the client response, so on_response middlewares that mutate the body (redaction!) don't cause `IncompleteMessage` errors. ### Lint (shift-left) - `barbacane-validate-ai-regex`: new vacuum function that compiles regex patterns in `ai-prompt-guard` and `ai-response-guard` profiles at lint time. - Removed `barbacane-no-duplicate-middlewares`: middleware stacking is a first-class composition mechanism (cel routing rules, rate-limit layered keys, ai-token-limit multi-window) — the rule's "each plugin at most once" premise was stale. ### Docs - Split `docs/guide/middlewares.md` (1,782 lines) into per-category pages under `docs/guide/middlewares/` (index, authentication, authorization, traffic-control, observability, transformation, caching, ai-gateway). Stacking is now documented as a first-class composition mechanism, with worked examples for `cel`, `rate-limit`, and `ai-token-limit`. - Cross-links updated in `dispatchers.md`, `extensions.md`, `SUMMARY.md`, `index.md`, `README.md`, `ROADMAP.md`. ### Testing - 93 plugin unit tests across the 4 AI plugins. - 3 integration tests (`crates/barbacane-test/tests/ai_gateway.rs`): redaction end-to-end, CEL profile selection, token-limit partition regression. - 6 compilation smoke tests in `tests/fixtures/` — one per plugin plus the combined `ai-gateway.yaml` composition. - 1 negative ruleset fixture (`invalid-ai-regex.yaml`) asserting the new regex validator flags four broken patterns.

…dvisories Two pre-existing CI issues uncovered by this PR's runs, unrelated to the AI gateway work itself: - **Clippy**: `collect_refs_recursive` in `barbacane-wasm/src/secrets.rs` used a nested `if` inside a `match` arm that a newer clippy version flags (`clippy::collapsible_match`). Collapsed into a guarded arm. - **cargo-deny**: two new `rustls-webpki` advisories (RUSTSEC-2026-0098 and RUSTSEC-2026-0099, both about name-constraint validation) pinned in via `async-nats`. Added to the existing ignore list — same rationale as the earlier RUSTSEC-2026-0049 entry: no upgrade path until `async-nats` bumps its `rustls-webpki` dependency.

Misclick! I'm still looking.

bbe64

its huge, it looks good. Here is my approval.

…ility Two passages drifted from reality once ai-proxy + the AI governance middleware suite were verified: 1. The "MCP gateway vs AI gateway vs API gateway" section claimed the three are "not the same box". They are three categories, but a single well-architected gateway (dispatcher + middleware composition) can span all three. Reworded to clarify that the architecture choice is orthogonal to the category distinction. 2. The closing Barbacane mention positioned Barbacane only as an MCP gateway, which undersells actual capability. Updated to mention all three layers: API gateway, outbound AI gateway (ai-proxy), and MCP gateway, composed from the same primitives. Claims verified against: - docs.barbacane.dev/guide/dispatchers.html (ai-proxy) - adr/0024-ai-gateway-plugin.md (positioning) - barbacane-dev/barbacane#67 (AI governance middleware suite)

ndreno self-assigned this Apr 20, 2026

ndreno requested review from NathanPrats, bbe64, clement-software, emilien-puget, klaude, marmeladema and thomassaison April 20, 2026 08:32

ndreno added documentation Improvements or additions to documentation enhancement New feature or request architecture middleware labels Apr 20, 2026

ndreno mentioned this pull request Apr 20, 2026

docs: refresh ROADMAP structure and competitive landscape #68

Merged

2 tasks

klaude previously approved these changes Apr 20, 2026

View reviewed changes

This was referenced Apr 22, 2026

feat(homepage): reposition around MCP / AI gateway wedge barbacane-dev/website#9

Closed

feat(site): audience nav, /mcp + /platform pages, AI cards in feature grid barbacane-dev/website#10

Merged

bbe64 approved these changes Apr 29, 2026

View reviewed changes

ndreno merged commit a4b02e4 into main Apr 29, 2026
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024)#67

feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024)#67
ndreno merged 2 commits intomainfrom
feat/ai-gateway-middlewares

ndreno commented Apr 20, 2026 •

edited

Loading

Uh oh!

bbe64 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ndreno commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What's new

Plugins

Host fixes

Lint (shift-left)

Docs reorg

Test plan

ADR

Breaking notes

Uh oh!

bbe64 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ndreno commented Apr 20, 2026 •

edited

Loading