feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024)#67
Merged
feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024)#67
Conversation
Completes the ADR-0024 AI gateway by shipping the four middlewares that
extend `ai-proxy` into a full LLM gateway, plus the host-side and lint
plumbing discovered while validating the end-to-end composition.
### Plugins (named-profile + CEL composition, fail-closed on misconfig)
- **ai-prompt-guard**: per-profile message count / length limits,
blocked-pattern regex, and managed system-template injection with
`{var}` substitution.
- **ai-token-limit**: token-based sliding-window rate limiting.
Persists the resolved partition key in context (scoped by
`policy_name`) so `client_ip` / `header:*` sources charge the same
bucket on_request and on_response. `quota` + `window` match the
`rate-limit` plugin's semantics.
- **ai-cost-tracker**: per-request USD metric from a configurable price
table; emits `cost_dollars` Prometheus counter labelled by provider
and model.
- **ai-response-guard**: per-profile PII regex redaction + blocked-
pattern 502 rewrite. Invalid regexes and missing `default_profile`
return 500 — silently disabled PII rules are the kind of bug
operators only catch from an incident.
### Host fixes (uncovered by the new integration test)
- Dispatcher instances now receive the middleware chain's accumulated
context and their post-dispatch context flows into on_response. This
makes `cel → ai-proxy` (via `ai.target`) and `ai-proxy → cost-tracker`
/ `ai-proxy → token-limit` (via `ai.prompt_tokens` etc.) actually
work — previously each plugin instance had its own isolated context.
- Stale `content-length` / `transfer-encoding` / `connection` /
`keep-alive` headers from upstream responses are stripped before the
client response, so on_response middlewares that mutate the body
(redaction!) don't cause `IncompleteMessage` errors.
### Lint (shift-left)
- `barbacane-validate-ai-regex`: new vacuum function that compiles
regex patterns in `ai-prompt-guard` and `ai-response-guard` profiles
at lint time.
- Removed `barbacane-no-duplicate-middlewares`: middleware stacking is a
first-class composition mechanism (cel routing rules, rate-limit
layered keys, ai-token-limit multi-window) — the rule's
"each plugin at most once" premise was stale.
### Docs
- Split `docs/guide/middlewares.md` (1,782 lines) into per-category
pages under `docs/guide/middlewares/` (index, authentication,
authorization, traffic-control, observability, transformation,
caching, ai-gateway). Stacking is now documented as a first-class
composition mechanism, with worked examples for `cel`, `rate-limit`,
and `ai-token-limit`.
- Cross-links updated in `dispatchers.md`, `extensions.md`,
`SUMMARY.md`, `index.md`, `README.md`, `ROADMAP.md`.
### Testing
- 93 plugin unit tests across the 4 AI plugins.
- 3 integration tests (`crates/barbacane-test/tests/ai_gateway.rs`):
redaction end-to-end, CEL profile selection, token-limit partition
regression.
- 6 compilation smoke tests in `tests/fixtures/` — one per plugin plus
the combined `ai-gateway.yaml` composition.
- 1 negative ruleset fixture (`invalid-ai-regex.yaml`) asserting the
new regex validator flags four broken patterns.
…dvisories Two pre-existing CI issues uncovered by this PR's runs, unrelated to the AI gateway work itself: - **Clippy**: `collect_refs_recursive` in `barbacane-wasm/src/secrets.rs` used a nested `if` inside a `match` arm that a newer clippy version flags (`clippy::collapsible_match`). Collapsed into a guarded arm. - **cargo-deny**: two new `rustls-webpki` advisories (RUSTSEC-2026-0098 and RUSTSEC-2026-0099, both about name-constraint validation) pinned in via `async-nats`. Added to the existing ignore list — same rationale as the earlier RUSTSEC-2026-0049 entry: no upgrade path until `async-nats` bumps its `rustls-webpki` dependency.
2 tasks
klaude
previously approved these changes
Apr 20, 2026
This was referenced Apr 22, 2026
bbe64
approved these changes
Apr 29, 2026
Contributor
bbe64
left a comment
There was a problem hiding this comment.
its huge, it looks good. Here is my approval.
ndreno
added a commit
to barbacane-dev/website
that referenced
this pull request
Apr 29, 2026
…ility Two passages drifted from reality once ai-proxy + the AI governance middleware suite were verified: 1. The "MCP gateway vs AI gateway vs API gateway" section claimed the three are "not the same box". They are three categories, but a single well-architected gateway (dispatcher + middleware composition) can span all three. Reworded to clarify that the architecture choice is orthogonal to the category distinction. 2. The closing Barbacane mention positioned Barbacane only as an MCP gateway, which undersells actual capability. Updated to mention all three layers: API gateway, outbound AI gateway (ai-proxy), and MCP gateway, composed from the same primitives. Claims verified against: - docs.barbacane.dev/guide/dispatchers.html (ai-proxy) - adr/0024-ai-gateway-plugin.md (positioning) - barbacane-dev/barbacane#67 (AI governance middleware suite)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Ships the four middlewares that extend
ai-proxyinto a full LLM gateway (ADR-0024), plus the host-side and lint plumbing discovered while validating the end-to-end composition.ai-prompt-guard,ai-token-limit,ai-cost-tracker,ai-response-guard.on_response; stale framing headers are stripped so body-mutating response middlewares don't break HTTP framing.middlewares.mdinto 8 per-category pages underdocs/guide/middlewares/.What's new
Plugins
ai-prompt-guard{var}substitutionai-token-limitclient_ip/header:*sources charge the same bucket on_request and on_responseai-cost-trackercost_dollarsPrometheus counterai-response-guardHost fixes
crates/barbacane/src/main.rs):middleware_contextis now set on the dispatcher instance before dispatch, andpost_dispatch_contextis captured after. Fixescel → ai-proxy(viaai.target) andai-proxy → ai-cost-tracker/ai-proxy → ai-token-limit(viaai.prompt_tokensetc.) — both were silently broken before.build_response_from_plugin):content-length,transfer-encoding,connection,keep-alivefrom upstream are dropped so on_response middlewares that mutate the body (redaction!) don't causeIncompleteMessageerrors.Lint (shift-left)
barbacane-validate-ai-regex: new vacuum function that compiles regex patterns inai-prompt-guardandai-response-guardprofiles at lint time.barbacane-no-duplicate-middlewares: middleware stacking is first-class per ADR-0024 (cel routing rules, rate-limit layered keys, ai-token-limit multi-window). The rule's premise was stale.Docs reorg
Old: one 1,782-line
middlewares.md. New: 8 per-category pages underdocs/guide/middlewares/(index / authentication / authorization / traffic-control / observability / transformation / caching / ai-gateway). Stacking is documented as a first-class composition mechanism with worked examples forcel,rate-limit, andai-token-limit. Cross-links updated indispatchers.md,extensions.md,SUMMARY.md,README.md,ROADMAP.md.Test plan
cargo teston each plugin (93 unit tests total):ai-prompt-guard27,ai-token-limit29,ai-cost-tracker13,ai-response-guard24.cargo clippy --all-targets -- -D warningsclean on each plugin.cargo test -p barbacane-test --test ai_gateway(3 integration tests): redaction end-to-end, CEL profile selection, token-limit partition regression.cargo test -p barbacane-test --test compilation(16 smoke tests, 6 new).cargo test -p barbacane-test --test ai_proxy(no regression from host changes).cargo test --workspace --exclude barbacane-test(no regression)../docs/rulesets/tests/run-tests.sh— 14/14 passing including the newinvalid-ai-regexnegative fixture.cargo fmt --allclean.cargo deny check advisories— pre-existing RUSTSEC-2026-0098/0099 onrustls-webpkiviaasync-nats, unrelated to this PR.ADR
ai-token-limitandai-response-guard.Breaking notes
Pre-1.0, so no back-compat shim, but worth calling out:
ai-token-limitconfig shape changed mid-design (before this PR's branch) frommax_tokens_per_minute/max_tokens_per_hourtoquota+window, aligning with therate-limitplugin. Multi-window setups stack two instances with distinctpolicy_names.docs/guide/middlewares.mdwas removed; anchor-based deep links (e.g.middlewares.html#ai-token-limit) will 404. Canonical URLs are now underdocs/guide/middlewares/<category>.html.