docs(adr): propose ADR-0030 — AI gateway Responses API support#69
Open
docs(adr): propose ADR-0030 — AI gateway Responses API support#69
Conversation
Extends ADR-0024 with OpenAI Responses API support, dynamic model routing, and a spec-driven /v1/models endpoint. Establishes the caller-owned model principle and retires the `model` field from target configs. Key decisions: - Protocol-aware ai-proxy dispatcher (Chat Completions + Responses) - Stateless Responses translation; pragmatic `store` UX (accept + warn header, hard-reject on `previous_response_id`) - `routes` table with glob-pattern matching and `allow`/`deny` gating - `/v1/models` served by the ai-proxy dispatcher via an importable OpenAPI fragment — no native data plane carve-out Breaking change for existing ADR-0024 deployments: delete the `model` field from each target. Justified by pre-1.0 status.
Resolves the seven review points + a few stylistic fixes:
1. Fix the consumer-policy CEL example: today the cel plugin exposes
request.body as a raw string, so request.body.model wouldn't evaluate.
Commit to a small cel-plugin extension that binds parsed JSON under
request.body_json as an explicit prerequisite for the AI example.
2. Rewrite §4's compiler-prerequisite block: multi-file specs already
work (manifest.rs:268-322 + artifact.rs:417-426), and env:// secrets
already work. The shipped ai-gateway fragment is a regular spec the
compiler already knows how to consume. The only remaining friction
is per-operation dispatch-config duplication, with two concrete
v1 paths laid out (env://-baked fragment now; root-level
x-barbacane-dispatch-defaults as a follow-up).
3. /v1/models caching: spell out cache via the existing host_cache_*
capability, scope (per-instance), thundering-herd mitigation
(single-flight), and partial-failure response shape (200 + partial
flag + warnings array).
4. routes + allow/deny + ai.target: catalog policy is attached to the
target, not the resolution path — applies on every path, including
ai.target-driven dispatch. Prevents a CEL misconfig from leaking a
denied model.
5. Pin glob syntax via a regex pattern on the plugin JSON schema so
invalid syntax fails at lint time, not at runtime.
6. Add an escape-hatch example for the no-fallthrough allow/deny rule.
7. Rename metrics to barbacane_plugin_<plugin>_<metric> convention:
barbacane_plugin_ai_proxy_responses_store_downgrades_total,
barbacane_plugin_ai_proxy_responses_reasoning_dropped_total,
barbacane_plugin_ai_proxy_models_provider_failures_total{provider}.
Stylistic:
- Drop a duplicate bullet in §0.
- Document the synthetic Responses id as random uuid-v4, matching
upstream OpenAI semantics.
- Document the silent reasoning-item drop with a Warning: 299 and a
metric, since silently dropping reasoning can degrade multi-turn
agent quality in ways the client cannot detect.
- Spell out the migration UX: additionalProperties: false on
ai-proxy's schema means a leftover model: field is rejected by
vacuum:barbacane at lint time with a self-explanatory error.
uuid-v7 ids sort chronologically in log greps (no separate sort by created_at needed), and the embedded timestamp leaks no information the response wasn't already carrying. Workspace already enables both v4 and v7 features on the uuid crate, so no dep change.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Proposes ADR-0030, extending the AI gateway (ADR-0024) with:
ai-proxydispatcher becomes protocol-aware and serves both surfaces against the same target pool. Motivated by clients likecodex-clithat use the Responses API, not Chat Completions.modelidentifier is part of the client's contract, not the gateway's. The gateway routes, validates, and gates; it does not decide the model.routestable with glob patterns (claude-*→ anthropic,gpt-*→ openai), optionalallow/denylists for static catalog policy, and the existingcelmiddleware for dynamic consumer policy./v1/modelsserved by theai-proxydispatcher itself when bound to the route — no native data plane carve-out, stays inside Barbacane's spec-driven routing model. Shipped via an importable OpenAPI fragment so operators don't duplicate config across three operations.storeUX — acceptstore: true(most clients send it as an unexamined default), emitWarning: 299header +store_downgrade_totalmetric, reject only the genuinely stateful features (previous_response_id).Breaking change
The
modelfield is removed fromtargets.<name>and from the flat top-level config. Migration for existing ADR-0024 deployments: delete the field from each target. Justified by the pre-1.0 status and the codebase convention of avoiding backward-compat shims at this stage.Out of scope (deferred)
previous_response_id,GET /v1/responses/{id}, cancel) — requires a session-scoped storage capability in the WASM runtime; separate ADR.Context
This ADR originated from a conversation with @marmeladema, who built a stateful OpenAI Responses ↔ Anthropic translation proxy for running
codex-cliagainst Claude — a harder problem than what Barbacane'sai-proxydoes today, precisely because it preserves conversation state across turns (previous_response_id, tool-use chains, etc.). That discussion surfaced two gaps on Barbacane's side:modelidentifiers in gateway config is operational friction and conflates routing intent with catalog policy.This ADR addresses the stateless slice of what @marmeladema's proxy does — which covers the 80% use case (clients that send
store: trueas an unexamined default but never actually chain responses) without requiring a new session-scoped storage primitive in the WASM runtime. The fully stateful slice (previous_response_idcontinuation, response retrieval/cancel) is deferred to a phase 2 ADR that will introduce that primitive; @marmeladema's existing work is likely directly reusable there.Test plan
storeUX) for alignment with @marmeladema's implementation, especially the stateless boundary