Skip to content

feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024)#67

Merged
ndreno merged 2 commits intomainfrom
feat/ai-gateway-middlewares
Apr 29, 2026
Merged

feat: AI gateway middleware suite + middlewares docs reorg (ADR-0024)#67
ndreno merged 2 commits intomainfrom
feat/ai-gateway-middlewares

Conversation

@ndreno
Copy link
Copy Markdown
Contributor

@ndreno ndreno commented Apr 20, 2026

Summary

Ships the four middlewares that extend ai-proxy into a full LLM gateway (ADR-0024), plus the host-side and lint plumbing discovered while validating the end-to-end composition.

  • 4 new plugins (named-profile + CEL composition, fail-closed on misconfig): ai-prompt-guard, ai-token-limit, ai-cost-tracker, ai-response-guard.
  • Host fixes to make cross-plugin context work: dispatchers now receive the middleware chain's context and their post-dispatch writes flow into on_response; stale framing headers are stripped so body-mutating response middlewares don't break HTTP framing.
  • Lint: new vacuum function compiles regex patterns at lint time; removed the stale "no duplicate middlewares" rule — stacking is first-class.
  • Docs reorg: split the 1,782-line middlewares.md into 8 per-category pages under docs/guide/middlewares/.

What's new

Plugins

Plugin Role
ai-prompt-guard Per-profile message count / length limits, blocked-pattern regex, managed system template with {var} substitution
ai-token-limit Token-based sliding-window rate limiting; persists the resolved partition in context so client_ip / header:* sources charge the same bucket on_request and on_response
ai-cost-tracker Per-request USD metric from a configurable price table; emits cost_dollars Prometheus counter
ai-response-guard PII regex redaction + blocked-pattern 502 rewrite; fail-closed on misconfig or invalid regex

Host fixes

  • Dispatcher context plumbing (crates/barbacane/src/main.rs): middleware_context is now set on the dispatcher instance before dispatch, and post_dispatch_context is captured after. Fixes cel → ai-proxy (via ai.target) and ai-proxy → ai-cost-tracker / ai-proxy → ai-token-limit (via ai.prompt_tokens etc.) — both were silently broken before.
  • Stale framing-header strip (build_response_from_plugin): content-length, transfer-encoding, connection, keep-alive from upstream are dropped so on_response middlewares that mutate the body (redaction!) don't cause IncompleteMessage errors.

Lint (shift-left)

  • barbacane-validate-ai-regex: new vacuum function that compiles regex patterns in ai-prompt-guard and ai-response-guard profiles at lint time.
  • Removed barbacane-no-duplicate-middlewares: middleware stacking is first-class per ADR-0024 (cel routing rules, rate-limit layered keys, ai-token-limit multi-window). The rule's premise was stale.

Docs reorg

Old: one 1,782-line middlewares.md. New: 8 per-category pages under docs/guide/middlewares/ (index / authentication / authorization / traffic-control / observability / transformation / caching / ai-gateway). Stacking is documented as a first-class composition mechanism with worked examples for cel, rate-limit, and ai-token-limit. Cross-links updated in dispatchers.md, extensions.md, SUMMARY.md, README.md, ROADMAP.md.

Test plan

  • cargo test on each plugin (93 unit tests total): ai-prompt-guard 27, ai-token-limit 29, ai-cost-tracker 13, ai-response-guard 24.
  • cargo clippy --all-targets -- -D warnings clean on each plugin.
  • cargo test -p barbacane-test --test ai_gateway (3 integration tests): redaction end-to-end, CEL profile selection, token-limit partition regression.
  • cargo test -p barbacane-test --test compilation (16 smoke tests, 6 new).
  • cargo test -p barbacane-test --test ai_proxy (no regression from host changes).
  • cargo test --workspace --exclude barbacane-test (no regression).
  • ./docs/rulesets/tests/run-tests.sh — 14/14 passing including the new invalid-ai-regex negative fixture.
  • cargo fmt --all clean.
  • cargo deny check advisories — pre-existing RUSTSEC-2026-0098/0099 on rustls-webpki via async-nats, unrelated to this PR.

ADR

Breaking notes

Pre-1.0, so no back-compat shim, but worth calling out:

  • ai-token-limit config shape changed mid-design (before this PR's branch) from max_tokens_per_minute / max_tokens_per_hour to quota + window, aligning with the rate-limit plugin. Multi-window setups stack two instances with distinct policy_names.
  • docs/guide/middlewares.md was removed; anchor-based deep links (e.g. middlewares.html#ai-token-limit) will 404. Canonical URLs are now under docs/guide/middlewares/<category>.html.

Completes the ADR-0024 AI gateway by shipping the four middlewares that
extend `ai-proxy` into a full LLM gateway, plus the host-side and lint
plumbing discovered while validating the end-to-end composition.

### Plugins (named-profile + CEL composition, fail-closed on misconfig)

- **ai-prompt-guard**: per-profile message count / length limits,
  blocked-pattern regex, and managed system-template injection with
  `{var}` substitution.
- **ai-token-limit**: token-based sliding-window rate limiting.
  Persists the resolved partition key in context (scoped by
  `policy_name`) so `client_ip` / `header:*` sources charge the same
  bucket on_request and on_response. `quota` + `window` match the
  `rate-limit` plugin's semantics.
- **ai-cost-tracker**: per-request USD metric from a configurable price
  table; emits `cost_dollars` Prometheus counter labelled by provider
  and model.
- **ai-response-guard**: per-profile PII regex redaction + blocked-
  pattern 502 rewrite. Invalid regexes and missing `default_profile`
  return 500 — silently disabled PII rules are the kind of bug
  operators only catch from an incident.

### Host fixes (uncovered by the new integration test)

- Dispatcher instances now receive the middleware chain's accumulated
  context and their post-dispatch context flows into on_response. This
  makes `cel → ai-proxy` (via `ai.target`) and `ai-proxy → cost-tracker`
  / `ai-proxy → token-limit` (via `ai.prompt_tokens` etc.) actually
  work — previously each plugin instance had its own isolated context.
- Stale `content-length` / `transfer-encoding` / `connection` /
  `keep-alive` headers from upstream responses are stripped before the
  client response, so on_response middlewares that mutate the body
  (redaction!) don't cause `IncompleteMessage` errors.

### Lint (shift-left)

- `barbacane-validate-ai-regex`: new vacuum function that compiles
  regex patterns in `ai-prompt-guard` and `ai-response-guard` profiles
  at lint time.
- Removed `barbacane-no-duplicate-middlewares`: middleware stacking is a
  first-class composition mechanism (cel routing rules, rate-limit
  layered keys, ai-token-limit multi-window) — the rule's
  "each plugin at most once" premise was stale.

### Docs

- Split `docs/guide/middlewares.md` (1,782 lines) into per-category
  pages under `docs/guide/middlewares/` (index, authentication,
  authorization, traffic-control, observability, transformation,
  caching, ai-gateway). Stacking is now documented as a first-class
  composition mechanism, with worked examples for `cel`, `rate-limit`,
  and `ai-token-limit`.
- Cross-links updated in `dispatchers.md`, `extensions.md`,
  `SUMMARY.md`, `index.md`, `README.md`, `ROADMAP.md`.

### Testing

- 93 plugin unit tests across the 4 AI plugins.
- 3 integration tests (`crates/barbacane-test/tests/ai_gateway.rs`):
  redaction end-to-end, CEL profile selection, token-limit partition
  regression.
- 6 compilation smoke tests in `tests/fixtures/` — one per plugin plus
  the combined `ai-gateway.yaml` composition.
- 1 negative ruleset fixture (`invalid-ai-regex.yaml`) asserting the
  new regex validator flags four broken patterns.
@ndreno ndreno self-assigned this Apr 20, 2026
@ndreno ndreno added documentation Improvements or additions to documentation enhancement New feature or request architecture middleware labels Apr 20, 2026
…dvisories

Two pre-existing CI issues uncovered by this PR's runs, unrelated to the
AI gateway work itself:

- **Clippy**: `collect_refs_recursive` in `barbacane-wasm/src/secrets.rs`
  used a nested `if` inside a `match` arm that a newer clippy version
  flags (`clippy::collapsible_match`). Collapsed into a guarded arm.
- **cargo-deny**: two new `rustls-webpki` advisories (RUSTSEC-2026-0098
  and RUSTSEC-2026-0099, both about name-constraint validation) pinned
  in via `async-nats`. Added to the existing ignore list — same
  rationale as the earlier RUSTSEC-2026-0049 entry: no upgrade path
  until `async-nats` bumps its `rustls-webpki` dependency.
klaude
klaude previously approved these changes Apr 20, 2026
Copy link
Copy Markdown
Contributor

@bbe64 bbe64 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its huge, it looks good. Here is my approval.

@ndreno ndreno merged commit a4b02e4 into main Apr 29, 2026
12 checks passed
ndreno added a commit to barbacane-dev/website that referenced this pull request Apr 29, 2026
…ility

Two passages drifted from reality once ai-proxy + the AI governance
middleware suite were verified:

1. The "MCP gateway vs AI gateway vs API gateway" section claimed the
   three are "not the same box". They are three categories, but a single
   well-architected gateway (dispatcher + middleware composition) can
   span all three. Reworded to clarify that the architecture choice is
   orthogonal to the category distinction.

2. The closing Barbacane mention positioned Barbacane only as an MCP
   gateway, which undersells actual capability. Updated to mention all
   three layers: API gateway, outbound AI gateway (ai-proxy), and MCP
   gateway, composed from the same primitives.

Claims verified against:
- docs.barbacane.dev/guide/dispatchers.html (ai-proxy)
- adr/0024-ai-gateway-plugin.md (positioning)
- barbacane-dev/barbacane#67 (AI governance middleware suite)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

architecture documentation Improvements or additions to documentation enhancement New feature or request middleware

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants