Skip to content

docs(adr): propose ADR-0030 — AI gateway Responses API support#69

Open
ndreno wants to merge 3 commits intomainfrom
docs/adr-0030-responses-api
Open

docs(adr): propose ADR-0030 — AI gateway Responses API support#69
ndreno wants to merge 3 commits intomainfrom
docs/adr-0030-responses-api

Conversation

@ndreno
Copy link
Copy Markdown
Contributor

@ndreno ndreno commented Apr 22, 2026

Summary

Proposes ADR-0030, extending the AI gateway (ADR-0024) with:

  • OpenAI Responses API support alongside Chat Completions. The ai-proxy dispatcher becomes protocol-aware and serves both surfaces against the same target pool. Motivated by clients like codex-cli that use the Responses API, not Chat Completions.
  • Caller-owned model principle — the model identifier is part of the client's contract, not the gateway's. The gateway routes, validates, and gates; it does not decide the model.
  • Dynamic model routing via a new routes table with glob patterns (claude-* → anthropic, gpt-* → openai), optional allow / deny lists for static catalog policy, and the existing cel middleware for dynamic consumer policy.
  • /v1/models served by the ai-proxy dispatcher itself when bound to the route — no native data plane carve-out, stays inside Barbacane's spec-driven routing model. Shipped via an importable OpenAPI fragment so operators don't duplicate config across three operations.
  • Pragmatic store UX — accept store: true (most clients send it as an unexamined default), emit Warning: 299 header + store_downgrade_total metric, reject only the genuinely stateful features (previous_response_id).

Breaking change

The model field is removed from targets.<name> and from the flat top-level config. Migration for existing ADR-0024 deployments: delete the field from each target. Justified by the pre-1.0 status and the codebase convention of avoiding backward-compat shims at this stage.

Out of scope (deferred)

  • Stateful Responses API (previous_response_id, GET /v1/responses/{id}, cancel) — requires a session-scoped storage capability in the WASM runtime; separate ADR.
  • True SSE translation against Anthropic — inherits the deferral from ADR-0024.
  • Ollama Responses API support — Ollama's OpenAI-compat surface is Chat Completions only as of 2026-04.

Context

This ADR originated from a conversation with @marmeladema, who built a stateful OpenAI Responses ↔ Anthropic translation proxy for running codex-cli against Claude — a harder problem than what Barbacane's ai-proxy does today, precisely because it preserves conversation state across turns (previous_response_id, tool-use chains, etc.). That discussion surfaced two gaps on Barbacane's side:

  1. Clients-send-OpenAI-Chat-Completions is a narrower assumption than "OpenAI-compatible" — the Responses API is a distinct protocol with a cleaner mapping to Anthropic Messages (item-based vs. message-based).
  2. Pinning model identifiers in gateway config is operational friction and conflates routing intent with catalog policy.

This ADR addresses the stateless slice of what @marmeladema's proxy does — which covers the 80% use case (clients that send store: true as an unexamined default but never actually chain responses) without requiring a new session-scoped storage primitive in the WASM runtime. The fully stateful slice (previous_response_id continuation, response retrieval/cancel) is deferred to a phase 2 ADR that will introduce that primitive; @marmeladema's existing work is likely directly reusable there.

Test plan

  • Review the decisions in section 0 (principle) and section 2 (Responses translation, store UX) for alignment with @marmeladema's implementation, especially the stateless boundary
  • Confirm the spec fragment / import mechanism proposed in section 4 is compatible with the compiler's current capabilities (or scope the addition)
  • Once accepted: implementation PRs per section (1 → 2 → 3 → 4), contract tests added per provider × protocol matrix

Extends ADR-0024 with OpenAI Responses API support, dynamic model
routing, and a spec-driven /v1/models endpoint. Establishes the
caller-owned model principle and retires the `model` field from
target configs.

Key decisions:
- Protocol-aware ai-proxy dispatcher (Chat Completions + Responses)
- Stateless Responses translation; pragmatic `store` UX (accept +
  warn header, hard-reject on `previous_response_id`)
- `routes` table with glob-pattern matching and `allow`/`deny` gating
- `/v1/models` served by the ai-proxy dispatcher via an importable
  OpenAPI fragment — no native data plane carve-out

Breaking change for existing ADR-0024 deployments: delete the `model`
field from each target. Justified by pre-1.0 status.
@ndreno ndreno requested a review from marmeladema April 22, 2026 22:27
@ndreno ndreno self-assigned this Apr 22, 2026
@ndreno ndreno added enhancement New feature or request help wanted Extra attention is needed labels Apr 22, 2026
ndreno added 2 commits April 30, 2026 17:17
Resolves the seven review points + a few stylistic fixes:

1. Fix the consumer-policy CEL example: today the cel plugin exposes
   request.body as a raw string, so request.body.model wouldn't evaluate.
   Commit to a small cel-plugin extension that binds parsed JSON under
   request.body_json as an explicit prerequisite for the AI example.
2. Rewrite §4's compiler-prerequisite block: multi-file specs already
   work (manifest.rs:268-322 + artifact.rs:417-426), and env:// secrets
   already work. The shipped ai-gateway fragment is a regular spec the
   compiler already knows how to consume. The only remaining friction
   is per-operation dispatch-config duplication, with two concrete
   v1 paths laid out (env://-baked fragment now; root-level
   x-barbacane-dispatch-defaults as a follow-up).
3. /v1/models caching: spell out cache via the existing host_cache_*
   capability, scope (per-instance), thundering-herd mitigation
   (single-flight), and partial-failure response shape (200 + partial
   flag + warnings array).
4. routes + allow/deny + ai.target: catalog policy is attached to the
   target, not the resolution path — applies on every path, including
   ai.target-driven dispatch. Prevents a CEL misconfig from leaking a
   denied model.
5. Pin glob syntax via a regex pattern on the plugin JSON schema so
   invalid syntax fails at lint time, not at runtime.
6. Add an escape-hatch example for the no-fallthrough allow/deny rule.
7. Rename metrics to barbacane_plugin_<plugin>_<metric> convention:
   barbacane_plugin_ai_proxy_responses_store_downgrades_total,
   barbacane_plugin_ai_proxy_responses_reasoning_dropped_total,
   barbacane_plugin_ai_proxy_models_provider_failures_total{provider}.

Stylistic:
- Drop a duplicate bullet in §0.
- Document the synthetic Responses id as random uuid-v4, matching
  upstream OpenAI semantics.
- Document the silent reasoning-item drop with a Warning: 299 and a
  metric, since silently dropping reasoning can degrade multi-turn
  agent quality in ways the client cannot detect.
- Spell out the migration UX: additionalProperties: false on
  ai-proxy's schema means a leftover model: field is rejected by
  vacuum:barbacane at lint time with a self-explanatory error.
uuid-v7 ids sort chronologically in log greps (no separate sort by
created_at needed), and the embedded timestamp leaks no information
the response wasn't already carrying. Workspace already enables both
v4 and v7 features on the uuid crate, so no dep change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant