diff --git a/modules/ai-gateway/pages/configure-provider.adoc b/modules/ai-gateway/pages/configure-provider.adoc index b146a3d..6675fdf 100644 --- a/modules/ai-gateway/pages/configure-provider.adoc +++ b/modules/ai-gateway/pages/configure-provider.adoc @@ -1,4 +1,246 @@ -= Configure Your LLM Provider -:description: Connect AI Gateway to your preferred LLM providers. += Configure an LLM Provider +:description: Create an LLM provider in AI Gateway to proxy requests to OpenAI, Anthropic, Google AI, AWS Bedrock, or any OpenAI-compatible endpoint through a managed Redpanda URL. +:page-topic-type: how-to +:personas: platform_admin, app_developer +// Page aliases for the consolidated quickstart and setup-guide redirects will land in a follow-up cleanup PR that also deletes the legacy pages (gateway-quickstart.adoc, gateway-architecture.adoc, aggregation.adoc, routing-cel.adoc, admin/setup-guide.adoc, builders/discover-gateways.adoc) and retargets the ~80 cross-module xrefs (agents, integrations, observability) that still point at them. +:learning-objective-1: Create an LLM provider for OpenAI, Anthropic, Google AI, AWS Bedrock, or an OpenAI-compatible endpoint +:learning-objective-2: Select the models you want to expose through the provider +:learning-objective-3: Verify the provider is reachable using the built-in Test Connection control -// TODO: Add content +include::ROOT:partial$adp-la.adoc[] + +An LLM provider is the primary resource in AI Gateway. When you create one, Redpanda gives you a managed proxy URL that your applications can point at: Redpanda handles the upstream API keys, forwards requests to the provider, and records usage for you. This guide walks you through creating a provider for each supported upstream. + +After completing this guide, you will be able to: + +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} + +== Prerequisites + +* Access to a Redpanda Cloud cluster with ADP enabled. ++ +// TODO: this guide describes the cluster-embedded view available today on cloud.redpanda.com. The standalone-ADP UI launches as a separate product surface; sign-in URL, IAM model, and role-permission requirements will change. Update once standalone ADP ships. +* An API key (or AWS credentials for Bedrock) for the upstream provider you want to configure. +* One or more secrets already created in your dataplane's secret store for the provider's credentials. Secret references must use `UPPER_SNAKE_CASE`. For example: `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `AWS_ACCESS_KEY_ID`. ++ +// TODO: xref the secrets-management page for ADP once confirmed. + +== Open the Create LLM provider page + +. Sign in to https://cloud.redpanda.com[cloud.redpanda.com] and open the cluster you want to configure. +. In the sidebar, expand *ADP* and select *LLM Providers*. +. Click *Create provider*. The *Create LLM provider* page opens. + +== Fill in the Provider card + +The first card on the page collects identity fields. + +[cols="1,1,3"] +|=== +|Field |Required |Notes + +|*Name* +|Yes +|Machine identifier. Lowercase letters, numbers, and hyphens only (`^[a-z][a-z0-9-]*$`), up to 63 characters. Immutable after creation. Appears in the proxy URL (`/llm/v1/providers//...`). The form auto-suggests a friendly name (for example, `red-space-bear`); override it if you want something more descriptive. + +|*Display name* (Advanced options) +|No +|Human-readable label shown in dashboards and model selectors. Up to 253 characters. Leave blank to use the *Name*. +|=== + +Display name lives in the *Advanced options* expander on the same card. + +== Choose a provider type + +The *Provider type* card shows five cards. Pick the one that matches your upstream. + +[cols="1,3"] +|=== +|Type |Use when + +|*OpenAI* +|Proxy GPT, o-series, and embeddings through the OpenAI API. Best when you already hold an OpenAI API key or want the broadest GPT model catalog. + +|*Anthropic* +|Call Claude Opus, Sonnet, and Haiku directly. Strong at coding, long-context reasoning, and tool use. Supports forwarding client `Authorization` headers to Anthropic for enterprise and Max-plan subscription passthrough (see <>). + +|*Google AI* +|Reach Gemini Pro, Flash, and multimodal models via Google AI Studio. Ideal for long-context workloads and image/video inputs. + +|*AWS Bedrock* +|Invoke foundation models (Claude, Llama, Titan, Nova) hosted inside your AWS account. Requires an AWS region and credentials (static, STS-assumed role, or the default credential chain). + +|*OpenAI-compatible* +|Point at any OpenAI-compatible endpoint that ships `/v1/chat/completions` (vLLM, Ollama, LM Studio, LocalAI, Together, Groq, OpenRouter). Useful for self-hosted models and aggregator gateways. Requires a *Base URL*; authentication is optional. +|=== + +Selecting a type reveals the type-specific configuration block below the picker. + +== Fill in the type-specific configuration + +[tabs] +====== +OpenAI:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|*Base URL* +|Optional. Leave empty for the standard OpenAI API (`https://api.openai.com/v1`). Override for Azure OpenAI or other OpenAI-hosted endpoints. + +|*API key* +|Required. Secret-store reference for the OpenAI API key. Must be `UPPER_SNAKE_CASE`, for example `OPENAI_API_KEY`. +|=== + +Anthropic:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|*Base URL* +|Optional. Leave empty for the standard Anthropic API (`https://api.anthropic.com`). + +|*API key* +|Required unless *Auth passthrough* is on. `UPPER_SNAKE_CASE`, for example `ANTHROPIC_API_KEY`. + +|*Auth passthrough* +|Optional toggle. When on, the client's `Authorization` header is forwarded to Anthropic instead of using a server-side API key. Used for enterprise and Max-plan OAuth passthrough: each client authenticates with its own Anthropic subscription. Leave the API key reference empty when using passthrough. +|=== + +Google AI:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|*Base URL* +|Optional. Leave empty for the standard Google AI API (`https://generativelanguage.googleapis.com`). + +|*API key* +|Required. Secret-store reference for the Google AI API key. `UPPER_SNAKE_CASE`, for example `GOOGLE_AI_API_KEY`. +|=== ++ +[IMPORTANT] +==== +Gemini uses the `x-goog-api-key` header for authentication, not `Authorization: Bearer`. This matters when you wire up clients. See xref:connect-agent.adoc[Connect your agent]. +==== + +AWS Bedrock:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|*Region* +|Required. AWS region where the Bedrock endpoint is deployed, for example `us-east-1`. + +|*Base URL* +|Optional. Override the default regional Bedrock endpoint. + +|*Credentials* +|Choose one of: + +* *Default credential chain* (leave the credentials oneof unset). Uses environment variables, IRSA, EKS Pod Identity, or instance profile. +* *Static credentials*. Secret-store references for the access key ID and secret access key, both `UPPER_SNAKE_CASE` (typically `AWS_ACCESS_KEY_ID` and `AWS_SECRET_ACCESS_KEY`). +* *Assume role*. Provide a `role_arn`, optional `external_id` (required when the role's trust policy mandates it), and optional `session_name` (surfaces in CloudTrail audit). +|=== + +OpenAI-compatible:: ++ +[cols="1,3"] +|=== +|Field |Notes + +|*Base URL* +|Required. URL of your OpenAI-compatible endpoint, for example `http://vllm.internal:8000/v1`, `http://ollama.local:11434/v1`, or an aggregator like Together / Groq / OpenRouter. + +|*API key* +|Optional. Leave empty for no-auth endpoints (common for local runtimes). `UPPER_SNAKE_CASE` if set. +|=== ++ +TIP: OpenAI-compatible endpoints can serve any model. Enter the exact model identifiers your upstream server exposes (for example, `meta-llama/Llama-3.3-70B-Instruct` or `qwen3:8b`). +====== + +[[select-models]] +== Select models + +Models you select on this form become the catalog the provider exposes. Leave the list empty to allow every model the upstream catalog returns. + +For *OpenAI*, *Anthropic*, *Google AI*, and *AWS Bedrock*, the form shows a picker backed by the provider's catalog. Pick from the list, or type a model identifier the catalog doesn't show. For *OpenAI-compatible*, the form takes a freeform list — type the exact identifiers your upstream serves. + +[NOTE] +==== +Models are stored as structured `ProviderModel` entries (one entry per model, with the model name as the only required field). A future Phase 2 release will add per-model metadata such as custom pricing overrides. The legacy flat `models` field still works on writes for backward compatibility. +==== + +After you create the provider, the detail page renders each model as a card with capability badges (for example, *Vision*, *Reasoning*, *Streaming*) lifted from the catalog. + +== Save and verify + +. Click *Create provider*. The button activates once *Name* and *Type* are both set; the right-hand *Summary* panel checks them off as you fill them in. +. On the provider's detail page, the *Connection* card shows your *Proxy URL*, *Discovery* URL, *Base URL*, and *API key ref*. Copy the *Proxy URL* — this is where your applications point. +. Scroll to the *Verify connection* section. Pick a model from the dropdown and click *Test Connection*. The status updates from "Not tested yet" to a pass/fail indicator. Use the *Show commands* disclosure if you want to see the equivalent curl or SDK call. +. To wire up an application, open *Connect your app* further down the page or follow xref:connect-agent.adoc[Connect your agent]. + +A successful Test Connection result confirms that the provider's credentials, region (Bedrock), and network path are all correct. If the call fails, see <>. + +[[anthropic-authorization-passthrough]] +== Anthropic: authorization passthrough + +If you want each client to authenticate against Anthropic with its own subscription (Claude Pro, Max, Team, or enterprise), enable *Auth passthrough* instead of configuring a server-side API key. In this mode: + +* Leave the *API key* field empty. +* Clients must send their own Anthropic `Authorization` header with every request. AI Gateway forwards it unchanged. +* Use this when you want to aggregate individual client subscriptions rather than share a single API account. + +The provider detail page shows whether Auth passthrough is enabled in the *Connection* card. + +== Edit, disable, or delete a provider + +* *Edit*: click *Edit* on the detail page. You can change any field *except* `Name` and `Type`, which are immutable. Model lists, credential references, and the enabled state can all change. +* *Disable*: click *Disable* on the detail page. The provider remains in the list, but requests to its proxy URL are rejected until you enable it again. Use this when you want to pause traffic without losing configuration. +* *Delete*: scroll to the *Delete this provider* section at the bottom of the detail page and click *Delete*. The action is permanent; in-flight requests fail and downstream clients receive errors until reconfigured. + +[[troubleshooting]] +== Troubleshooting + +[cols="1,2"] +|=== +|Symptom |What to check + +|`secret "" not found` +|Confirm the secret exists in your dataplane's secret store and the reference in the provider configuration is spelled identically (`UPPER_SNAKE_CASE`, no typos). + +|Bedrock returns `AccessDenied` or region errors +|Verify the AWS region field matches the region where your Bedrock models are enabled. Bedrock model availability varies by region. + +|Anthropic returns 401 when passthrough is enabled +|Confirm the client is sending its own `Authorization` header and the *API key* field on the provider is empty. + +|Gemini returns 401 +|Gemini uses the `x-goog-api-key` header, not `Authorization`. If you're seeing 401s on Gemini, check that the client is sending the correct header. See xref:connect-agent.adoc[Connect your agent]. + +|Provider list empty or 403 +|Confirm your account has the `dataplane_adp_llmprovider_*` permissions in ADP. ++ +// TODO: confirm the exact role/permission model once the standalone ADP UI launches. +|=== + +// TODO: add screenshots of common error toasts once captured from the live environment. + +== Out of scope + +AI Gateway does not provide these capabilities. For current status, consult the Redpanda Cloud release notes. + +* *Multi-provider routing, failover, and retries across providers.* A synthetic provider that fans requests to multiple upstreams is not part of AI Gateway. +* *Spend limits.* Per-user, per-org, and global cost caps are not available. The provider detail page shows a *Cost & usage* placeholder labeled "Coming soon"; see xref:governance:budgets.adoc[Token budgets and limits]. +* *Rate limits.* Requests-per-second, per-minute, or per-day limits are not available. +* *Managed MCP aggregation at the gateway.* Register MCP tool servers separately under *ADP* → *MCP Servers*. + +== Next steps + +* xref:connect-agent.adoc[Connect your agent]. Point your application's SDK at the proxy URL and make requests. diff --git a/modules/ai-gateway/pages/connect-agent.adoc b/modules/ai-gateway/pages/connect-agent.adoc index 5210f25..21b080a 100644 --- a/modules/ai-gateway/pages/connect-agent.adoc +++ b/modules/ai-gateway/pages/connect-agent.adoc @@ -1,59 +1,89 @@ = Connect Your Agent -:description: Integrate your AI agent or application with Redpanda Agentic Data Plan for unified LLM access. +:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the rpai-based local auth flow, the OIDC client-credentials flow for CI, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints. :page-topic-type: how-to :personas: app_developer -:learning-objective-1: Configure your application to use AI Gateway with OpenAI-compatible SDKs -:learning-objective-2: Make LLM requests through the gateway and handle responses appropriately -:learning-objective-3: Validate your integration end-to-end +:page-aliases: redpanda-cloud:ai-agents:ai-gateway/builders/connect-your-agent.adoc +:learning-objective-1: Construct the proxy URL for an LLM provider you have configured +:learning-objective-2: Authenticate to AI Gateway using rpai for local development or OIDC client credentials for CI and programmatic clients +:learning-objective-3: Send requests through the proxy URL with the SDK of your choice include::ROOT:partial$adp-la.adoc[] -This guide shows you how to connect your glossterm:AI agent[] or application to Redpanda Agentic Data Plan. This is also called "Bring Your Own Agent" (BYOA). You'll configure your client SDK, make your first request, and validate the integration. +This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You'll construct the proxy URL for a provider you have already created, authenticate (with `rpai` for local development or with OIDC client credentials for CI), and send your first request with the SDK of your choice. After completing this guide, you will be able to: -* [ ] Configure your application to use AI Gateway with OpenAI-compatible SDKs -* [ ] Make LLM requests through the gateway and handle responses appropriately -* [ ] Validate your integration end-to-end +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} == Prerequisites -* You have discovered an available gateway and noted its Gateway ID and endpoint. +* A configured LLM provider. If you haven't created one yet, see xref:configure-provider.adoc[Configure an LLM provider]. +* For local development: nothing else — you'll install the `rpai` CLI in the next section. +* For CI or programmatic clients: a Redpanda Cloud service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[Authenticate to Redpanda Cloud]. + -If not, see xref:builders/discover-gateways.adoc[]. +// TODO: confirm whether ADP hosts its own service-account IAM post-standalone, or continues to share Redpanda Cloud Organization IAM. +* A development environment with your chosen programming language. -* You have a service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[]. -* You have a development environment with your chosen programming language. +== Proxy URL anatomy -== Integration overview +Every provider you create in AI Gateway gets its own proxy URL: -Connecting to AI Gateway requires two configuration changes: +[source,text] +---- +/llm/v1/providers// +---- -. *Change the base URL*: Point to the gateway endpoint instead of the provider's API. The gateway ID is embedded in the endpoint URL. -. *Add authentication*: Use an OIDC access token from your service account instead of provider API keys. +* ``: the AI Gateway base URL for your dataplane. Cluster-specific subdomain on `clusters.rdpa.co` (for example, `https://aigw..clusters.rdpa.co`). Copy the exact value from the *Proxy URL* field on any provider's *Connection* card. +* ``: the name you gave the provider when you created it, for example `my-openai` or `prod-anthropic`. +* ``: the upstream provider's native API path (for example, `v1/chat/completions` for OpenAI, `v1/messages` for Anthropic). -[[authenticate-with-oidc]] -== Authenticate with OIDC +AI Gateway forwards the request to the upstream provider, attaches the configured credentials, and records the request for observability. Your application never sees the upstream API key. -AI Gateway uses OIDC through service accounts that can be used as a `client_credentials` grant to authenticate and exchange for access and ID tokens. +TIP: The provider detail page generates ready-to-run `rpai`-based snippets pre-filled with the correct proxy URL and paths. When in doubt, copy from the *Connect your app* section there. -=== Create a service account +[[authenticate-with-rpai]] +== Authenticate with `rpai` (recommended for local development) -. In the Redpanda Cloud UI, go to https://cloud.redpanda.com/organization-iam?tab=service-accounts[*Organization IAM* > *Service account*^]. -. Create a new service account and note the *Client ID* and *Client Secret*. +The provider detail page surfaces an *Install rpai CLI* card with copy-pasteable steps. The flow is the same for every provider type: -For details, see xref:redpanda-cloud:security:cloud-authentication.adoc#authenticate-to-the-cloud-api[Authenticate to the Cloud API]. +. Install the CLI. Pick the install method that matches your OS — for example, on macOS: ++ +[source,bash] +---- +brew install redpanda-data/tap/rpai +---- ++ +// TODO: confirm the canonical install methods for Linux and Windows once the standalone ADP UI ships. -=== Configure your OIDC client +. Log in with the gateway URL from the provider's *Connection* card: ++ +[source,bash] +---- +rpai auth login --server https://aigw..clusters.rdpa.co +---- -Use the following OIDC configuration: +. Point your SDK at the proxy URL and let `rpai auth token` mint a fresh token on each call. Set environment variables: ++ +[source,bash] +---- +export PROXY_URL="/llm/v1/providers/" +export OPENAI_API_KEY="$(rpai auth token)" # or ANTHROPIC_API_KEY, etc. +---- + +`rpai auth token` returns a short-lived OIDC access token. Refresh by running it again — most users wire it into a wrapper script or shell function. + +== Authenticate with OIDC client credentials (CI and programmatic) + +When `rpai` isn't available (CI runners, server-side processes, headless agents), use the OIDC `client_credentials` grant directly. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below. [cols="1,2", options="header"] |=== -|Parameter |Value +|Parameter |Value (today) |Discovery URL -|`\https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration` +|`\https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration`. Also surfaced as the *Discovery* field on the provider's Connection card. |Token endpoint |`\https://auth.prd.cloud.redpanda.com/oauth/token` @@ -65,10 +95,10 @@ Use the following OIDC configuration: |`client_credentials` |=== -The discovery URL returns OIDC metadata, including the token endpoint and other configuration details. Use an OIDC client library that supports metadata discovery (such as `openid-client` for Node.js) so that endpoints are resolved automatically. If your library does not support discovery, you can fetch the discovery URL directly and extract the required endpoints from the JSON response. +// TODO: confirm the audience for ADP once the standalone UI launches. The values above match today's cluster-embedded view. [tabs] -==== +====== cURL:: + -- @@ -92,21 +122,21 @@ Python (authlib):: [source,python] ---- from authlib.integrations.requests_client import OAuth2Session - -client = OAuth2Session( - client_id="", - client_secret="", -) +import requests # Discover token endpoint from OIDC metadata -import requests metadata = requests.get( "https://auth.prd.cloud.redpanda.com/.well-known/openid-configuration" ).json() token_endpoint = metadata["token_endpoint"] +client = OAuth2Session( + client_id="", + client_secret="", + token_endpoint=token_endpoint, +) + token = client.fetch_token( - token_endpoint, grant_type="client_credentials", audience="cloudv2-production.redpanda.cloud", ) @@ -114,7 +144,7 @@ token = client.fetch_token( access_token = token["access_token"] ---- -This example performs a one-time token fetch. For automatic token renewal on subsequent requests, pass `token_endpoint` to the `OAuth2Session` constructor. Note that for `client_credentials` grants, `authlib` obtains a new token rather than using a refresh token. +Passing `token_endpoint` to the `OAuth2Session` constructor lets `authlib` handle renewal automatically. For `client_credentials` grants, it fetches a new token rather than using a refresh token. -- Node.js (openid-client):: @@ -139,51 +169,51 @@ const tokenSet = await client.grant({ const accessToken = tokenSet.access_token; ---- -==== +====== + +=== Token lifecycle management -=== Make authenticated requests +IMPORTANT: Your client is responsible for refreshing tokens before they expire. OIDC access tokens have a limited TTL set by the identity provider and are not automatically renewed by AI Gateway. Check the `expires_in` field in the token response for the exact duration. -Requests require two headers: +* Proactively refresh at ~80% of the token's TTL to avoid failed requests. +* `authlib` (Python) handles renewal automatically when you pass `token_endpoint` to `OAuth2Session`. +* For other languages, cache the token and its expiry, then request a new token before the current one expires. +* If you're using `rpai`, just rerun `rpai auth token` — it handles refresh against the same OIDC endpoint. -* `Authorization: Bearer ` - your OIDC access token -* `rp-aigw-id: ` - your AI Gateway ID +== Send requests with your SDK -Set these environment variables for consistent configuration: +The examples in this section assume you've set: [source,bash] ---- -export REDPANDA_GATEWAY_URL="" -export REDPANDA_GATEWAY_ID="" +export PROXY_URL="/llm/v1/providers/" +export AUTH_TOKEN="$(rpai auth token)" # or an OIDC access token from above ---- [tabs] -==== -Python (OpenAI SDK):: +====== +OpenAI SDK:: + [source,python] ---- import os from openai import OpenAI -# Configure client to use AI Gateway with OIDC token client = OpenAI( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token from Step 2 + base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-openai + api_key=os.environ["AUTH_TOKEN"], # rpai or OIDC access token ) -# Make a request response = client.chat.completions.create( - model="openai/gpt-5.2-mini", # Note: vendor/model_id format - messages=[{"role": "user", "content": "Hello, AI Gateway!"}], - max_tokens=100 + model="gpt-4o", # native OpenAI model ID + messages=[{"role": "user", "content": "Hello from AI Gateway"}], ) - print(response.choices[0].message.content) ---- - -Python (Anthropic SDK):: + -The Anthropic SDK can also route through AI Gateway using the OpenAI-compatible endpoint: +The OpenAI SDK calls the proxy's `/v1/chat/completions` path, which AI Gateway forwards to OpenAI unchanged. Use it with any OpenAI provider and, with a different `base_url`, with any OpenAI-compatible provider (vLLM, Ollama, LM Studio, Together, Groq, OpenRouter). + +Anthropic SDK:: + [source,python] ---- @@ -191,475 +221,170 @@ import os from anthropic import Anthropic client = Anthropic( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token from Step 2 + base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-anthropic + auth_token=os.environ["AUTH_TOKEN"], # rpai or OIDC access token ) -# Make a request message = client.messages.create( - model="anthropic/claude-sonnet-4.5", - max_tokens=100, - messages=[{"role": "user", "content": "Hello, AI Gateway!"}] + model="claude-sonnet-4-6", + max_tokens=1024, + messages=[{"role": "user", "content": "Hello from AI Gateway"}], ) - print(message.content[0].text) ---- - -Node.js (OpenAI SDK):: + -[source,javascript] ----- -import OpenAI from 'openai'; +The Anthropic SDK hits `v1/messages` on the proxy, which AI Gateway forwards to Anthropic. If the provider is configured with *Auth passthrough*, send your own Anthropic `Authorization` header instead of an `auth_token`. AI Gateway forwards it unchanged. -const openai = new OpenAI({ - baseURL: process.env.REDPANDA_GATEWAY_URL, - apiKey: accessToken, // OIDC access token from Step 2 -}); - -// Make a request -const response = await openai.chat.completions.create({ - model: 'openai/gpt-5.2-mini', - messages: [{ role: 'user', content: 'Hello, AI Gateway!' }], - max_tokens: 100 -}); - -console.log(response.choices[0].message.content); ----- - -cURL:: +Google Gemini SDK:: + -[source,bash] ----- -curl ${REDPANDA_GATEWAY_URL}/chat/completions \ - -H "Authorization: Bearer ${AUTH_TOKEN}" \ - -H "Content-Type: application/json" \ - -H "rp-aigw-id: ${REDPANDA_GATEWAY_ID}" \ - -d '{ - "model": "openai/gpt-5.2-mini", - "messages": [{"role": "user", "content": "Hello, AI Gateway!"}], - "max_tokens": 100 - }' ----- -==== - -=== Token lifecycle management - -IMPORTANT: Your agent is responsible for refreshing tokens before they expire. OIDC access tokens have a limited time-to-live (TTL), determined by the identity provider, and are not automatically renewed by the AI Gateway. Check the `expires_in` field in the token response for the exact duration. - -* Proactively refresh tokens at approximately 80% of the token's TTL to avoid failed requests. -* `authlib` (Python) can handle token renewal automatically when you pass `token_endpoint` to the `OAuth2Session` constructor. For `client_credentials` grants, it obtains a new token rather than using a refresh token. -* For other languages, cache the token and its expiry time, then request a new token before the current one expires. - -== Model naming convention - -When making requests through AI Gateway, use the `vendor/model_id` format for the model parameter: - -* `openai/gpt-5.2` -* `openai/gpt-5.2-mini` -* `anthropic/claude-sonnet-4.5` -* `anthropic/claude-opus-4.6` - -This format tells AI Gateway which provider to route the request to. For example: - [source,python] ---- -# Route to OpenAI -response = client.chat.completions.create( - model="openai/gpt-5.2", - messages=[...] +import os +from google import genai + +client = genai.Client( + api_key=os.environ["AUTH_TOKEN"], # forwarded as x-goog-api-key + http_options={"base_url": os.environ["PROXY_URL"]}, # .../llm/v1/providers/my-google ) -# Route to Anthropic (same client, different model) -response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[...] +response = client.models.generate_content( + model="gemini-2.0-flash", + contents="Hello from AI Gateway", ) +print(response.text) ---- ++ +[IMPORTANT] +==== +Gemini authenticates with the `x-goog-api-key` header, not `Authorization: Bearer`. Most Google SDKs set `x-goog-api-key` automatically from the `api_key` parameter. If you hand-roll the request, set the header yourself. +==== -// To see which models are available in your gateway, see xref:builders/available-models.adoc[]. - -== Handle responses - -Responses from AI Gateway follow the OpenAI API format: - +AWS Bedrock:: ++ +Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an `rpai` or OIDC token. ++ [source,python] ---- -response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Explain AI Gateway"}], - max_tokens=200 -) - -# Access the response -message_content = response.choices[0].message.content -finish_reason = response.choices[0].finish_reason # 'stop', 'length', etc. - -# Token usage -prompt_tokens = response.usage.prompt_tokens -completion_tokens = response.usage.completion_tokens -total_tokens = response.usage.total_tokens +import os, httpx -print(f"Response: {message_content}") -print(f"Tokens: {prompt_tokens} prompt + {completion_tokens} completion = {total_tokens} total") +response = httpx.post( + f"{os.environ['PROXY_URL']}/model/anthropic.claude-3-5-sonnet-20241022-v2:0/invoke", + headers={"Authorization": f"Bearer {os.environ['AUTH_TOKEN']}"}, + json={ + "anthropic_version": "bedrock-2023-05-31", + "messages": [{"role": "user", "content": "Hello"}], + "max_tokens": 1024, + }, +) +print(response.json()) ---- ++ +// TODO: verify Bedrock request shape end-to-end on adp-production once credentials are available; replace placeholder model ID with the inference profile your provider exposes. -== Handle errors - -AI Gateway returns standard HTTP status codes: - +OpenAI-compatible:: ++ +Use the OpenAI SDK with the proxy URL of the OpenAI-compatible provider and whatever model identifier the upstream exposes: ++ [source,python] ---- -from openai import OpenAI, OpenAIError +import os +from openai import OpenAI client = OpenAI( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token + base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-vllm + api_key=os.environ["AUTH_TOKEN"], ) -try: - response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Hello"}] - ) - print(response.choices[0].message.content) - -except OpenAIError as e: - if e.status_code == 400: - print("Bad request - check model name and parameters") - elif e.status_code == 401: - print("Authentication failed - check OIDC token") - elif e.status_code == 404: - print("Model not found - check available models") - elif e.status_code == 429: - print("Rate limit exceeded - slow down requests") - elif e.status_code >= 500: - print("Gateway or provider error - retry with exponential backoff") - else: - print(f"Error: {e}") +response = client.chat.completions.create( + model="meta-llama/Llama-3.3-70B-Instruct", # as exposed by your upstream + messages=[{"role": "user", "content": "Hello"}], +) ---- +====== -Common error codes: - -* *400*: Bad request (invalid parameters, malformed JSON) -* *401*: Authentication failed (invalid or expired OIDC token) -* *403*: Forbidden (no access to this gateway) -* *404*: Model not found (model not enabled in gateway) -* *429*: Rate limit exceeded (too many requests) -* *500/502/503*: Server error (gateway or provider issue) +[NOTE] +==== +The provider detail page also has client guides for *Claude Code*, *Codex*, and *Gemini* (the desktop client). Open *Connect your app* on the provider's page to see the per-client setup instructions. +==== == Streaming responses -AI Gateway supports streaming for real-time token generation: +Streaming passes through unchanged. Use the SDK's native streaming API; the proxy forwards the stream byte-for-byte. [source,python] ---- response = client.chat.completions.create( - model="openai/gpt-5.2-mini", + model="gpt-4o", messages=[{"role": "user", "content": "Write a short poem"}], - stream=True # Enable streaming + stream=True, ) -# Process chunks as they arrive for chunk in response: if chunk.choices[0].delta.content: - print(chunk.choices[0].delta.content, end='', flush=True) - -print() # New line after streaming completes ----- - -== Switch between providers - -One of AI Gateway's key benefits is easy provider switching without code changes: - -[source,python] + print(chunk.choices[0].delta.content, end="", flush=True) ---- -# Try OpenAI -response = client.chat.completions.create( - model="openai/gpt-5.2", - messages=[{"role": "user", "content": "Explain quantum computing"}] -) -# Try Anthropic (same code, different model) -response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[{"role": "user", "content": "Explain quantum computing"}] -) ----- - -Compare responses, latency, and cost to determine the best model for your use case. - -== Validate your integration +== Handle errors -=== Test connectivity +AI Gateway returns standard HTTP status codes. The upstream provider's error body passes through, so your existing SDK error handling works: -[source,python] ----- -import os -from openai import OpenAI - -def test_gateway_connection(access_token): - """Test basic connectivity to AI Gateway""" - client = OpenAI( - base_url=os.getenv("REDPANDA_GATEWAY_URL"), - api_key=access_token, # OIDC access token - ) - - try: - # Simple test request - response = client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "test"}], - max_tokens=10 - ) - print("✓ Gateway connection successful") - return True - except Exception as e: - print(f"✗ Gateway connection failed: {e}") - return False - -if __name__ == "__main__": - token = get_oidc_token() # Your OIDC token retrieval - test_gateway_connection(token) ----- - -=== Test multiple models +[cols="1,3"] +|=== +|Status |Meaning -[source,python] ----- -def test_models(): - """Test multiple models through the gateway""" - models = [ - "openai/gpt-5.2-mini", - "anthropic/claude-sonnet-4.5" - ] - - for model in models: - try: - response = client.chat.completions.create( - model=model, - messages=[{"role": "user", "content": "Say hello"}], - max_tokens=10 - ) - print(f"✓ {model}: {response.choices[0].message.content}") - except Exception as e: - print(f"✗ {model}: {e}") ----- +|400 +|Bad request. Invalid parameters or malformed JSON. -// === Check request logs -// -// After making requests, verify they appear in observability: -// -// . Navigate to *AI Gateway* → *Gateways* → Select your gateway → *Logs* -// . Filter by your request timestamp -// . Verify your requests are logged with correct model, tokens, and cost +|401 +|Authentication failed. Token invalid, expired, or (for Gemini) sent in the wrong header. -// See xref:builders/monitor-your-usage.adoc[] for details. +|403 +|Forbidden. The service account lacks the required role, or the provider is disabled. -== Integrate with AI development tools +|404 +|Provider or model not found. Verify the provider name in the URL and the model identifier. -[tabs] -==== -Claude Code:: -+ -Configure Claude Code to use AI Gateway: -+ -[source,bash] ----- -claude mcp add --transport http redpanda-aigateway ${REDPANDA_GATEWAY_URL}/mcp \ - --header "Authorization: Bearer ${AUTH_TOKEN}" ----- -+ -Or edit `~/.claude/config.json`: -+ -[source,json] ----- -{ - "mcpServers": { - "redpanda-ai-gateway": { - "transport": "http", - "url": "/mcp", - "headers": { - "Authorization": "Bearer " - } - } - } -} ----- -+ -ifdef::integrations-available[] -See xref:integrations/claude-code-user.adoc[] for complete setup. -endif::[] +|429 +|Rate limited by the upstream provider. AI Gateway does not enforce its own rate limits today. Respect `Retry-After` if present. -VS Code Continue Extension:: -+ -Edit `~/.continue/config.json`: -+ -[source,json] ----- -{ - "models": [ - { - "title": "AI Gateway - GPT-5.2", - "provider": "openai", - "model": "openai/gpt-5.2", - "apiBase": "", - "apiKey": "" - } - ] -} ----- -+ -ifdef::integrations-available[] -See xref:integrations/continue-user.adoc[] for complete setup. -endif::[] - -Cursor IDE:: -+ -. Open Cursor Settings (*Cursor* → *Settings* or `Cmd+,`) -. Navigate to *AI* settings -. Add custom OpenAI-compatible provider: -+ -[source,json] ----- -{ - "cursor.ai.providers.openai.apiBase": "" -} ----- -+ -ifdef::integrations-available[] -See xref:integrations/cursor-user.adoc[] for complete setup. -endif::[] -==== +|5xx +|Upstream or gateway error. Retry with exponential backoff. +|=== == Best practices -=== Use environment variables - -Store configuration in environment variables, not hardcoded in code: - -[source,python] ----- -# Good -base_url = os.getenv("REDPANDA_GATEWAY_URL") - -# Bad -base_url = "https://gw.ai.panda.com" # Don't hardcode URLs or credentials ----- - -=== Implement retry logic - -Implement exponential backoff for transient errors: - -[source,python] ----- -import time -from openai import OpenAI, OpenAIError - -def make_request_with_retry(client, max_retries=3): - for attempt in range(max_retries): - try: - return client.chat.completions.create( - model="openai/gpt-5.2-mini", - messages=[{"role": "user", "content": "Hello"}] - ) - except OpenAIError as e: - if e.status_code >= 500 and attempt < max_retries - 1: - wait_time = 2 ** attempt # Exponential backoff - print(f"Retrying in {wait_time}s...") - time.sleep(wait_time) - else: - raise ----- - -=== Monitor your usage - -Regularly check your usage to avoid unexpected costs: - -[source,python] ----- -# Track tokens in your application -total_tokens = 0 -request_count = 0 - -for request in requests: - response = client.chat.completions.create(...) - total_tokens += response.usage.total_tokens - request_count += 1 - -print(f"Total tokens: {total_tokens} across {request_count} requests") ----- - -// See xref:builders/monitor-your-usage.adoc[] for detailed monitoring. - -=== Handle rate limits gracefully - -Respect rate limits and implement backoff: - -[source,python] ----- -try: - response = client.chat.completions.create(...) -except OpenAIError as e: - if e.status_code == 429: - # Rate limited - wait and retry - retry_after = int(e.response.headers.get('Retry-After', 60)) - print(f"Rate limited. Waiting {retry_after}s...") - time.sleep(retry_after) - # Retry request ----- +* *Use environment variables* for the proxy URL and token; never hard-code them. +* *Wrap `rpai auth token`* in a script or shell function so refresh is invisible to your SDK code. +* *Implement retry with exponential backoff* for 5xx and timeout conditions. +* *Respect `Retry-After`* on 429 responses. +* *Rotate service account credentials* on a schedule your organization accepts. +* *Observe usage* through the Cloud UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today). == Troubleshooting -=== "Authentication failed" - -Problem: 401 Unauthorized - -Solutions: - -* Check that your OIDC token has not expired and refresh it if necessary -* Verify the audience is set to `cloudv2-production.redpanda.cloud` -* Check that the service account has access to the specified gateway -* Ensure the `Authorization` header is formatted correctly: `Bearer ` - -=== "Model not found" - -Problem: 404 Model not found - -Solutions: - -* Verify the model name uses `vendor/model_id` format -// * Check available models: See xref:builders/available-models.adoc[] -* Confirm the model is enabled in your gateway (contact administrator) - -=== "Rate limit exceeded" - -Problem: 429 Too Many Requests +=== 401 Unauthorized -Solutions: +* If you're using `rpai`: rerun `rpai auth login` to refresh the session, then `rpai auth token` to mint a new token. +* If you're using OIDC client credentials: check the token hasn't expired and refresh it. Verify the audience is `cloudv2-production.redpanda.cloud` and the `Authorization` header is formatted `Bearer `. +* For Gemini: ensure the token is sent as `x-goog-api-key`, not `Authorization`. +* For Anthropic with passthrough: ensure the client is sending a valid Anthropic `Authorization` header. -* Reduce request rate -* Implement exponential backoff -* Contact administrator to review rate limits -* Consider using a different gateway if available +=== 404 Not found -=== "Connection timeout" +* Re-check the provider name in the proxy URL. The segment after `/providers/` must match the provider's `Name` exactly. +* For model-not-found: confirm the model identifier is one your provider's catalog actually serves. OpenAI-compatible endpoints accept whatever model IDs the upstream exposes. -Problem: Request times out +=== 403 Forbidden -Solutions: +* The service account may lack the required roles. Ask an admin to grant `dataplane_adp_llmprovider_get` at minimum. +* The provider may be disabled. Check the *Status* field on its *Connection* card. -* Check network connectivity to the gateway endpoint -* Verify the gateway endpoint URL is correct -* Check if the gateway is operational (contact administrator) -* Increase client timeout if processing complex requests +=== Connection timeout or reset -//== Next steps +* Verify the proxy URL is correct (copy directly from the provider's *Connection* card). +* Check that the provider isn't pointing at a private base URL your client can't reach (OpenAI-compatible providers only). +* Confirm the upstream provider's status page. -//Now that your agent is connected: +== Next steps -// * xref:builders/available-models.adoc[Available Models] - Learn about model selection and routing -// * xref:builders/use-mcp-tools.adoc[Use MCP Tools] - Access tools from MCP servers (if enabled) -// * xref:builders/monitor-your-usage.adoc[Monitor Your Usage] - Track requests and costs -ifdef::integrations-available[] -* xref:integrations/index.adoc[Integrations] - Configure specific tools and IDEs -endif::[] +* xref:configure-provider.adoc[Configure an LLM provider]. Add another provider to your dataplane. diff --git a/modules/ai-gateway/pages/index.adoc b/modules/ai-gateway/pages/index.adoc index 916d833..8cb089c 100644 --- a/modules/ai-gateway/pages/index.adoc +++ b/modules/ai-gateway/pages/index.adoc @@ -1,6 +1,5 @@ = AI Gateway -:description: Keep AI-powered apps running with automatic provider failover, prevent runaway spend with centralized budget controls, and govern access across teams, apps, and service accounts. :page-layout: index -:personas: platform_admin, app_developer, evaluator +:description: Redpanda's managed proxy for LLM APIs. Create an LLM provider, and point your applications at a Redpanda-hosted URL with managed secrets, authentication, and observability. -include::ROOT:partial$adp-la.adoc[] \ No newline at end of file +include::ROOT:partial$adp-la.adoc[] diff --git a/modules/ai-gateway/pages/overview.adoc b/modules/ai-gateway/pages/overview.adoc index c92da1e..fd67291 100644 --- a/modules/ai-gateway/pages/overview.adoc +++ b/modules/ai-gateway/pages/overview.adoc @@ -1,195 +1,123 @@ -= What is an AI Gateway? -:page-aliases: redpanda-cloud:ai-agents:ai-gateway/what-is-ai-gateway.adoc -:description: Understand how AI Gateway keeps AI-powered apps highly available across providers and prevents runaway AI spend with centralized cost governance. -:page-topic-type: concept -:personas: evaluator, app_developer, platform_admin -:learning-objective-1: Explain how AI Gateway keeps AI-powered apps highly available through governed provider failover -:learning-objective-2: Describe how AI Gateway prevents runaway AI spend with centralized budget controls and tenancy-based governance -:learning-objective-3: Identify when AI Gateway fits your use case based on availability requirements, cost governance needs, and multi-provider or MCP tool usage += AI Gateway Overview +:description: AI Gateway is Redpanda's managed proxy for LLM APIs. Create a provider for OpenAI, Anthropic, Google AI, AWS Bedrock, or an OpenAI-compatible endpoint, and point your applications at a Redpanda-hosted URL with managed secrets, authentication, and observability. +:page-topic-type: overview +:personas: platform_admin, app_developer, evaluator +:page-aliases: redpanda-cloud:ai-agents:ai-gateway/what-is-ai-gateway.adoc, redpanda-cloud:ai-agents:ai-gateway/gateway-architecture.adoc, redpanda-cloud:ai-agents:ai-gateway/cel-routing-cookbook.adoc, redpanda-cloud:ai-agents:ai-gateway/mcp-aggregation-guide.adoc, redpanda-cloud:ai-agents:ai-gateway/builders/discover-gateways.adoc +:learning-objective-1: Describe what AI Gateway is and how a managed proxy differs from direct upstream calls +:learning-objective-2: Explain how LLM providers, secrets, and OIDC authentication fit together in AI Gateway +:learning-objective-3: Identify use cases where AI Gateway fits, and use cases where it does not include::ROOT:partial$adp-la.adoc[] -Redpanda AI Gateway keeps your AI-powered applications highly available and your AI spend under control. It sits between your applications and the LLM providers and AI tools they depend on. If a provider goes down, the gateway provides automatic failover to keep your apps running. It also offers centralized budget controls to prevent runaway costs. For platform teams, it adds governance at the model-fallback level, tenancy modeling for teams, individuals, apps, and service accounts, and a single proxy layer for both LLM models and glossterm:MCP server[,MCP servers]. +AI Gateway is Redpanda's managed proxy for LLM APIs. Instead of giving every application a provider API key and letting it call the upstream directly, you create an *LLM provider* in Redpanda Cloud and point your applications at a Redpanda-hosted proxy URL. Redpanda handles the upstream credentials, forwards the request, and records usage. Your code continues to use the provider's native SDK. -== The problem +After reading this page, you will be able to: -Modern AI applications face two business-critical challenges: staying up and staying on budget. +* [ ] {learning-objective-1} +* [ ] {learning-objective-2} +* [ ] {learning-objective-3} -First, applications typically hardcode provider-specific SDKs. An application using OpenAI's SDK cannot easily switch to Anthropic or Google without code changes and redeployment. When a provider hits rate limits, suffers an outage, or degrades in performance, your application goes down with it. Your end users don't care which provider you use; they care that the app works. +== The problem AI Gateway solves -Second, costs can spiral without centralized controls. Without a single view of token consumption across teams and applications, it's difficult to attribute costs to specific customers, features, or environments. Testing and debugging can generate unexpected bills, and there's no way to enforce budgets or rate limits per team, application, or service account. The result: runaway spend that finance discovers only after the fact. +Teams adopting LLMs can quickly hit operational problems: -These two challenges are compounded by fragmented observability across provider dashboards, which makes it harder to detect availability issues or cost anomalies in time to act. And as organizations adopt glossterm:AI agent[,AI agents] that call glossterm:MCP tool[,MCP tools], the lack of centralized tool governance adds another dimension of uncontrolled cost and risk. +* *Credential sprawl:* Every team that touches an LLM gets its own API key. Rotation is manual, offboarding is manual, and it's hard to know who's using what. +* *SDK lock-in and switching cost:* Each provider has its own SDK, auth scheme, and model catalog. Swapping OpenAI for Anthropic means a code change, not a configuration change. +* *No shared view of usage:* Provider dashboards tell you what a single API key spent. They don't tell you what your organization spent, broken down by team or application. -== What AI Gateway solves +== What AI Gateway gives you -Redpanda AI Gateway delivers two core business outcomes, high availability and cost governance, backed by platform-level controls that set it apart from simple proxy layers. +AI Gateway consolidates provider access behind the following capabilities. -=== High availability through governed failover +=== Traffic stays in your VPC -Your end users don't care whether you use OpenAI, Anthropic, or Google: they care that your app stays up. AI Gateway lets you configure provider pools with automatic failover, so when your primary provider hits rate limits, times out, or returns errors, the gateway routes requests to a fallback provider with no code changes and no downtime for your users. +LLM requests are proxied through your dataplane's AI Gateway. API keys are stored in your dataplane's secret store and never leave your infrastructure. Upstream calls leave your VPC only when the LLM provider is third-party (OpenAI, Anthropic, Google AI); self-hosted OpenAI-compatible endpoints stay entirely inside your network. -Unlike simple retry logic, AI Gateway provides governance at the failover level: you define which providers fail over to which, under what conditions, and with what priority. This controlled failover can significantly improve uptime even during extended provider outages. +=== Centralized secrets -=== Cost governance and budget controls +The upstream API key (or AWS credentials for Bedrock) lives in the Redpanda secret store and is attached to the provider at configuration time. Your application never sees it; rotation happens in one place. -AI Gateway gives you centralized fiscal control over AI spend. Set monthly budget caps for each gateway, enforce them automatically, and set rate limits per team, environment, or application. No more runaway costs discovered after the fact. +=== A managed proxy URL per provider -You can route requests to different models based on user attributes. For example, to direct premium users to a more capable model while routing free tier users to a cost-effective option, use a CEL expression. For example: +Every provider you create has its own URL of the form `/llm/v1/providers//`. Your application points its SDK at this URL instead of the upstream, continues to use the provider's native API, and authenticates to Redpanda with a short-lived OIDC access token. The gateway base is a cluster-specific subdomain (for example, `aigw..clusters.rdpa.co`); copy the exact value from the *Proxy URL* field on any provider's detail page. -[source,cel] ----- -// Route premium users to best model, free users to cost-effective model -request.headers["x-user-tier"] == "premium" - ? "anthropic/claude-opus-4.6" - : "anthropic/claude-sonnet-4.5" ----- +=== Native SDK compatibility -You can also set different rate limits and spend limits for each environment to prevent staging or development traffic from consuming production budgets. +Use the provider's own SDK: OpenAI, Anthropic, Google AI, AWS Bedrock, or any OpenAI-compatible client (vLLM, Ollama, LM Studio, LocalAI, Together, Groq, OpenRouter). AI Gateway does not require a single unified SDK; it forwards native requests to the native upstream. -=== Tenancy and access governance +=== Managed authentication -AI Gateway provides multi-tenant isolation by design. Create separate gateways for teams, individual developers, applications, or service accounts, each with their own budgets, rate limits, routing policies, and observability scope. This tenancy model lets platform teams govern who uses what, how much they spend, and which models and tools they can access, without building custom authorization layers. +Applications authenticate to Redpanda with OIDC service accounts instead of long-lived provider API keys. Service accounts live in Redpanda Cloud IAM, follow the same role and audit model as every other resource, and mint short-lived tokens that are easy to revoke. The recommended local flow uses the `rpai` CLI for token refresh; CI and programmatic clients use the OIDC client-credentials grant directly. See xref:connect-agent.adoc[Connect your agent]. -=== Unified LLM access (single endpoint for all providers) +=== Per-provider observability -AI Gateway provides a single OpenAI-compatible endpoint that routes requests to multiple LLM providers. Instead of integrating with each provider's SDK separately, you configure your application once and switch providers by changing only the model parameter. +The provider's detail page in the Cloud UI records request and token counts. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today). -Without AI Gateway, you need different SDKs and patterns for each provider: +== What's in the UI -[source,python] ----- -# OpenAI -from openai import OpenAI -client = OpenAI(api_key="sk-...") -response = client.chat.completions.create( - model="gpt-5.2", - messages=[{"role": "user", "content": "Hello"}] -) +// TODO: this guide describes the cluster-embedded view available today on cloud.redpanda.com. The standalone-ADP UI launches as a separate product surface; sign-in URL, IAM model, and sidebar layout will change. Update this section once standalone ADP ships. -# Anthropic (different SDK, different patterns) -from anthropic import Anthropic -client = Anthropic(api_key="sk-ant-...") -response = client.messages.create( - model="claude-sonnet-4.5", - max_tokens=1024, - messages=[{"role": "user", "content": "Hello"}] -) ----- +In Redpanda Cloud, open your cluster and expand the *ADP* section in the sidebar. You'll see four sub-items: -With AI Gateway, you use the OpenAI SDK for all providers: +* *LLM Providers*: Create, edit, enable, and delete providers. This is the home of AI Gateway configuration. +* *MCP Servers*: Register glossterm:MCP[] tool servers for agents. Separate from the AI Gateway proxy URL. +* *OAuth Providers*: Register OAuth providers for user-delegated flows (for example, GitHub or Google). +* *My Connections*: Per-user OAuth token management. -[source,python] ----- -from openai import OpenAI +LLM Providers is where you'll spend most of your time. The other three are covered by their own docs. -# Single configuration, multiple providers -client = OpenAI( - base_url="", - api_key="your-redpanda-token", -) +== Supported providers -# Route to OpenAI -response = client.chat.completions.create( - model="openai/gpt-5.2", - messages=[{"role": "user", "content": "Hello"}] -) +AI Gateway supports five provider types. The UI labels and short descriptions match the picker on the *Create LLM provider* page. -# Route to Anthropic (same code, different model string) -response = client.chat.completions.create( - model="anthropic/claude-sonnet-4.5", - messages=[{"role": "user", "content": "Hello"}] -) +[cols="1,3"] +|=== +|Type |Typical upstream -# Route to Google Gemini (same code, different model string) -response = client.chat.completions.create( - model="google/gemini-2.0-flash", - messages=[{"role": "user", "content": "Hello"}] -) ----- +|*OpenAI* +|Proxy GPT, o-series, and embeddings through the OpenAI API. Best when you already hold an OpenAI API key or want the broadest GPT model catalog. -To switch providers, you change only the `model` parameter from `openai/gpt-5.2` to `anthropic/claude-sonnet-4.5`. No code changes or redeployment needed. +|*Anthropic* +|Call Claude Opus, Sonnet, and Haiku directly. Optionally forwards the client's `Authorization` header for enterprise and Max-plan subscription passthrough. -=== Proxy for LLM models and MCP servers +|*Google AI* +|Reach Gemini Pro, Flash, and multimodal models via Google AI Studio. Ideal for long-context workloads and image/video inputs. -AI Gateway acts as a single proxy layer for both LLM model requests and MCP servers. For LLM traffic, it provides a unified endpoint. For AI agents that use MCP tools, it aggregates multiple MCP servers and provides deferred tool loading, which dramatically reduces token costs. +|*AWS Bedrock* +|Invoke foundation models (Claude, Llama, Titan, Nova) hosted inside your AWS account. Use when data residency, IAM, or VPC egress matter more than raw feature parity. Signed with SigV4 server-side by AI Gateway. -Without AI Gateway, agents typically load all available MCP tools from multiple MCP servers at startup. This approach sends 50+ tool definitions with every request, creating high token costs (thousands of tokens per request), slow agent startup times, and no centralized governance over which tools agents can access. +|*OpenAI-compatible* +|Point at any OpenAI-compatible endpoint (vLLM, Ollama, LM Studio, LocalAI, Together, Groq, OpenRouter). Useful for self-hosted models and aggregator gateways that ship `/v1/chat/completions`. +|=== -With AI Gateway, you configure approved MCP servers once, and the gateway loads only search and orchestrator tools initially. Agents query for specific tools only when needed, which often reduces token usage by 80-90% depending on your configuration and the number of tools aggregated. You also gain centralized approval and governance over which MCP servers your agents can access. - -For complex workflows, AI Gateway provides a JavaScript-based orchestrator tool that reduces multi-step workflows from multiple round trips to a single call. For example, you can create a workflow that searches a vector database and, if the results are insufficient, falls back to web search—all in one orchestration step. - -=== Unified observability and cost tracking - -AI Gateway provides a single dashboard that tracks all LLM traffic across providers, eliminating the need to switch between multiple provider dashboards. - -The dashboard tracks request volume for each gateway, model, and provider, along with token usage for both prompt and completion tokens. You can view estimated spend per model with cross-provider comparisons, latency metrics (p50, p95, p99), and errors broken down by type, provider, and model. - -This unified view helps you answer critical questions such as which model is the most cost-effective for your use case, why a specific user request failed, how much your staging environment costs each week, and what the latency difference is between providers for your workload. - -ifdef::ai-hub-available[] -== Gateway modes - -AI Gateway supports two modes to accommodate different organizational needs: - -*AI Hub Mode* provides zero-configuration access with pre-configured backend pools and intelligent routing. Platform admins simply add provider credentials (OpenAI, Anthropic, Google Gemini), and all teams immediately benefit from 17 routing rules and 6 backend pools. Users can toggle preferences like vision routing or long-context routing, but the underlying architecture is managed by Redpanda. This mode eliminates the complexity of LLM gateway configuration. IT adds API keys once, and all teams benefit immediately. - -*Custom Mode* provides full control over routing rules, backend pools, rate limits, and policies. Admins configure every aspect of the gateway to meet specific requirements. This mode is ideal when you need custom routing logic based business rules, specific failover behavior, or integration with custom infrastructure like Azure OpenAI or AWS Bedrock. - -To understand which mode fits your use case, see xref:gateway-modes.adoc[]. -endif::[] - -== Common gateway patterns - -Some common patterns for configuring gateways include: - -* *Team isolation*: When multiple teams share infrastructure but need separate budgets and policies, create one gateway for each team. For example, you might configure Team A's gateway with a $5K/month budget for both staging and production environments, while Team B's gateway has a $10K/month budget with different rate limits. Each team sees only their own traffic in the observability dashboards, providing clear cost attribution and isolation. -* *Environment separation*: To prevent staging traffic from affecting production metrics, create separate gateways for each environment. Configure the staging gateway with lower rate limits, restricted model access, and aggressive cost controls to prevent runaway expenses. The production gateway can have higher rate limits, access to all models, and alerting configured to detect anomalies. -* *Primary and fallback for reliability*: To ensure uptime during provider outages, configure provider pools with automatic failover. For example, you can set OpenAI as your primary provider (preferred for quality) and configure Anthropic as the fallback that activates when the gateway detects rate limits or timeouts from OpenAI. Monitor the fallback rate to detect primary provider issues early, before they impact your users. -* *A/B testing models*: To compare model quality and cost without dual integration, route a percentage of traffic to different models. For example, you can send 80% of traffic to `claude-sonnet-4.5` and 20% to `claude-opus-4.6`, then compare quality metrics and costs in the observability dashboard before adjusting the split. -* *Customer-based routing*: For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: - -=== Customer-based routing - -For SaaS products with tiered pricing (for example, free, pro, enterprise), use CEL routing based on request headers to match users with appropriate models: - -[source,cel] ----- -request.headers["x-customer-tier"] == "enterprise" ? "anthropic/claude-opus-4.6" : -request.headers["x-customer-tier"] == "pro" ? "anthropic/claude-sonnet-4.5" : -"anthropic/claude-haiku" ----- +See xref:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type. == When to use AI Gateway -AI Gateway is ideal for organizations that: - -* Use or plan to use multiple LLM providers -* Need centralized cost tracking and budgeting -* Want to experiment with different models without code changes -* Require high availability during provider outages -* Have multiple teams or customers using AI services -* Build AI agents that need MCP tool aggregation -* Need unified observability across all AI traffic +AI Gateway is a good fit when you want to: -AI Gateway may not be necessary if: +* Pull provider API keys out of application code and manage them centrally. +* Keep LLM traffic inside your dataplane's VPC and your secrets out of application code. +* Authenticate applications to LLMs using the same OIDC identity you use for other Redpanda Cloud resources. +* Run a self-hosted OpenAI-compatible endpoint (vLLM, Ollama, LM Studio) alongside 1P providers behind a single management plane. +* Separate operator and developer roles. Operators configure providers and credentials; developers point at proxy URLs. -* You only use a single provider with simple requirements -* You have minimal AI traffic (< 1000 requests/day) -* You don't need cost tracking or policy enforcement -* Your application doesn't require provider switching +It is not the right fit when you: -== Next steps +* Only ever call a single provider with a single API key and are happy managing that key inline. +* Need routing, failover, or cross-provider load balancing across providers. AI Gateway does not provide these capabilities. -* xref:gateway-quickstart.adoc[Gateway Quickstart] - Get started quickly with a basic gateway setup +[[out-of-scope]] +== Out of scope -*For Administrators:* +AI Gateway does not provide these capabilities. For current status, consult the Redpanda Cloud release notes. -* xref:admin/setup-guide.adoc[Setup Guide] - Enable providers, models, and create gateways -* xref:gateway-architecture.adoc[Architecture Deep Dive] - Technical architecture details +* *Multi-provider routing, failover, and retries.* A synthetic provider that fans requests to multiple upstreams is not part of AI Gateway. +* *Spend limits.* Per-user, per-org, and global cost caps are not available. The provider detail page shows a *Cost & usage* placeholder labeled "Coming soon"; see xref:governance:budgets.adoc[Token budgets and limits] for the read-only spending visibility shipping at GA. +* *Rate limits.* Requests-per-second, per-minute, or per-day caps are not available. +* *Managed MCP aggregation at the gateway.* Register MCP tool servers separately under *ADP* → *MCP Servers*. -*For Builders:* +== Next steps -* xref:builders/discover-gateways.adoc[Discover Available Gateways] - Find which gateways you can access -* xref:connect-agent.adoc[Connect Your Agent] - Integrate your application +. xref:configure-provider.adoc[Configure an LLM provider]. Create your first provider and copy its proxy URL. +. xref:connect-agent.adoc[Connect your agent]. Point your application's SDK at the proxy URL.