docs(adr): propose ADR-0030 — AI gateway Responses API support by ndreno · Pull Request #69 · barbacane-dev/barbacane

ndreno · 2026-04-22T22:27:30Z

Summary

Proposes ADR-0030, extending the AI gateway (ADR-0024) with:

OpenAI Responses API support alongside Chat Completions. The ai-proxy dispatcher becomes protocol-aware and serves both surfaces against the same target pool. Motivated by clients like codex-cli that use the Responses API, not Chat Completions.
Caller-owned model principle — the model identifier is part of the client's contract, not the gateway's. The gateway routes, validates, and gates; it does not decide the model.
Dynamic model routing via a new routes table with glob patterns (claude-* → anthropic, gpt-* → openai), optional allow / deny lists for static catalog policy, and the existing cel middleware for dynamic consumer policy.
/v1/models served by the ai-proxy dispatcher itself when bound to the route — no native data plane carve-out, stays inside Barbacane's spec-driven routing model. Shipped via an importable OpenAPI fragment so operators don't duplicate config across three operations.
Pragmatic store UX — accept store: true (most clients send it as an unexamined default), emit Warning: 299 header + store_downgrade_total metric, reject only the genuinely stateful features (previous_response_id).

Breaking change

The model field is removed from targets.<name> and from the flat top-level config. Migration for existing ADR-0024 deployments: delete the field from each target. Justified by the pre-1.0 status and the codebase convention of avoiding backward-compat shims at this stage.

Out of scope (deferred)

Stateful Responses API (previous_response_id, GET /v1/responses/{id}, cancel) — requires a session-scoped storage capability in the WASM runtime; separate ADR.
True SSE translation against Anthropic — inherits the deferral from ADR-0024.
Ollama Responses API support — Ollama's OpenAI-compat surface is Chat Completions only as of 2026-04.

Context

This ADR originated from a conversation with @marmeladema, who built a stateful OpenAI Responses ↔ Anthropic translation proxy for running codex-cli against Claude — a harder problem than what Barbacane's ai-proxy does today, precisely because it preserves conversation state across turns (previous_response_id, tool-use chains, etc.). That discussion surfaced two gaps on Barbacane's side:

Clients-send-OpenAI-Chat-Completions is a narrower assumption than "OpenAI-compatible" — the Responses API is a distinct protocol with a cleaner mapping to Anthropic Messages (item-based vs. message-based).
Pinning model identifiers in gateway config is operational friction and conflates routing intent with catalog policy.

This ADR addresses the stateless slice of what @marmeladema's proxy does — which covers the 80% use case (clients that send store: true as an unexamined default but never actually chain responses) without requiring a new session-scoped storage primitive in the WASM runtime. The fully stateful slice (previous_response_id continuation, response retrieval/cancel) is deferred to a phase 2 ADR that will introduce that primitive; @marmeladema's existing work is likely directly reusable there.

Test plan

Review the decisions in section 0 (principle) and section 2 (Responses translation, store UX) for alignment with @marmeladema's implementation, especially the stateless boundary
Confirm the spec fragment / import mechanism proposed in section 4 is compatible with the compiler's current capabilities (or scope the addition)
Once accepted: implementation PRs per section (1 → 2 → 3 → 4), contract tests added per provider × protocol matrix

Extends ADR-0024 with OpenAI Responses API support, dynamic model routing, and a spec-driven /v1/models endpoint. Establishes the caller-owned model principle and retires the `model` field from target configs. Key decisions: - Protocol-aware ai-proxy dispatcher (Chat Completions + Responses) - Stateless Responses translation; pragmatic `store` UX (accept + warn header, hard-reject on `previous_response_id`) - `routes` table with glob-pattern matching and `allow`/`deny` gating - `/v1/models` served by the ai-proxy dispatcher via an importable OpenAPI fragment — no native data plane carve-out Breaking change for existing ADR-0024 deployments: delete the `model` field from each target. Justified by pre-1.0 status.

Resolves the seven review points + a few stylistic fixes: 1. Fix the consumer-policy CEL example: today the cel plugin exposes request.body as a raw string, so request.body.model wouldn't evaluate. Commit to a small cel-plugin extension that binds parsed JSON under request.body_json as an explicit prerequisite for the AI example. 2. Rewrite §4's compiler-prerequisite block: multi-file specs already work (manifest.rs:268-322 + artifact.rs:417-426), and env:// secrets already work. The shipped ai-gateway fragment is a regular spec the compiler already knows how to consume. The only remaining friction is per-operation dispatch-config duplication, with two concrete v1 paths laid out (env://-baked fragment now; root-level x-barbacane-dispatch-defaults as a follow-up). 3. /v1/models caching: spell out cache via the existing host_cache_* capability, scope (per-instance), thundering-herd mitigation (single-flight), and partial-failure response shape (200 + partial flag + warnings array). 4. routes + allow/deny + ai.target: catalog policy is attached to the target, not the resolution path — applies on every path, including ai.target-driven dispatch. Prevents a CEL misconfig from leaking a denied model. 5. Pin glob syntax via a regex pattern on the plugin JSON schema so invalid syntax fails at lint time, not at runtime. 6. Add an escape-hatch example for the no-fallthrough allow/deny rule. 7. Rename metrics to barbacane_plugin_<plugin>_<metric> convention: barbacane_plugin_ai_proxy_responses_store_downgrades_total, barbacane_plugin_ai_proxy_responses_reasoning_dropped_total, barbacane_plugin_ai_proxy_models_provider_failures_total{provider}. Stylistic: - Drop a duplicate bullet in §0. - Document the synthetic Responses id as random uuid-v4, matching upstream OpenAI semantics. - Document the silent reasoning-item drop with a Warning: 299 and a metric, since silently dropping reasoning can degrade multi-turn agent quality in ways the client cannot detect. - Spell out the migration UX: additionalProperties: false on ai-proxy's schema means a leftover model: field is rejected by vacuum:barbacane at lint time with a self-explanatory error.

uuid-v7 ids sort chronologically in log greps (no separate sort by created_at needed), and the embedded timestamp leaks no information the response wasn't already carrying. Workspace already enables both v4 and v7 features on the uuid crate, so no dep change.

ndreno requested a review from marmeladema April 22, 2026 22:27

ndreno self-assigned this Apr 22, 2026

ndreno added enhancement New feature or request help wanted Extra attention is needed labels Apr 22, 2026

ndreno added 2 commits April 30, 2026 17:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(adr): propose ADR-0030 — AI gateway Responses API support#69

docs(adr): propose ADR-0030 — AI gateway Responses API support#69
ndreno wants to merge 3 commits intomainfrom
docs/adr-0030-responses-api

ndreno commented Apr 22, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ndreno commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Breaking change

Out of scope (deferred)

Context

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ndreno commented Apr 22, 2026 •

edited

Loading