Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion modules/ai-gateway/pages/admin/setup-guide.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -379,4 +379,4 @@ Users can then discover and connect to the gateway using the information provide
== Next steps

* xref:routing-cel.adoc[CEL Routing Cookbook]
* xref:integrations/index.adoc[Integrations]
* xref:integrations:index.adoc[Integrations]
129 changes: 92 additions & 37 deletions modules/ai-gateway/pages/connect-agent.adoc
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
= Connect Your Agent
:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local auth flow with the `rpk ai` plugin, the OIDC client-credentials flow for CI, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local development workflow with `rpk ai`, the OIDC client-credentials flow for CI and application code, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
:page-topic-type: how-to
:personas: app_developer
:page-aliases: redpanda-cloud:ai-agents:ai-gateway/builders/connect-your-agent.adoc
:learning-objective-1: Construct the proxy URL for an LLM provider you have configured
:learning-objective-2: Authenticate to AI Gateway using the `rpk ai` plugin for local development or OIDC client credentials for CI and programmatic clients
:learning-objective-2: Authenticate to AI Gateway with `rpk` for local development or with OIDC client credentials for CI and programmatic clients
:learning-objective-3: Send requests through the proxy URL with the SDK of your choice

This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You'll construct the proxy URL for a provider you have already created, authenticate (with the `rpk ai` plugin for local development or with OIDC client credentials for CI), and send your first request with the SDK of your choice.
This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You construct the proxy URL for a provider you have already created, authenticate (with `rpk cloud login` for local development or with OIDC client credentials for CI and application code), and send your first request with the SDK of your choice.

After completing this guide, you will be able to:

Expand All @@ -17,8 +17,8 @@ After completing this guide, you will be able to:

== Prerequisites

* A configured LLM provider. If you haven't created one yet, see xref:configure-provider.adoc[Configure an LLM provider].
* For local development: nothing else; you'll install the `rpk ai` plugin in the next section.
* A configured LLM provider. If you haven't created one yet, see xref:ai-gateway:configure-provider.adoc[Configure an LLM provider].
* For local development, nothing else. You'll install `rpk ai` in the next section.
* For CI or programmatic clients: a Redpanda Cloud service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[Authenticate to Redpanda Cloud].
+
// TODO: confirm whether ADP hosts its own service-account IAM post-standalone, or continues to share Redpanda Cloud Organization IAM.
Expand All @@ -41,41 +41,84 @@ AI Gateway forwards the request to the upstream provider, attaches the configure

TIP: The provider detail page generates ready-to-run snippets pre-filled with the correct proxy URL and paths. When in doubt, copy from the *Connect your app* section there.

// Updated for PRs #30273 / #30327 / #30360 (rpk ai managed plugin).
[[authenticate-with-rpk-ai]]
[[authenticate-with-rpai]]
== Authenticate with `rpk ai` (recommended for local development)
== Use `rpk ai` for local development

The `rpk ai` plugin is distributed through `rpk`'s plugin manager. The provider detail page surfaces an *Install* card with copy-pasteable steps. The flow is the same for every provider type:
The `rpk ai` command is the Redpanda AI CLI. Use it to manage AI Gateway resources (LLM providers, MCP servers, OAuth providers) and call MCP tools from the command line. Authentication for `rpk ai` is owned by `rpk cloud login`. The active AI Gateway URL comes from your active rpk cloud profile.

. Install the plugin:
. Install `rpk ai`:
+
[source,bash]
----
rpk plugin install ai
rpk ai install
----
+
Update later with `rpk ai upgrade`; remove with `rpk ai uninstall`.

. Log in with the gateway URL from the provider's *Connection* card:
. Log in to Redpanda Cloud:
+
[source,bash]
----
rpk ai auth login --server https://aigw.<cluster-id>.clusters.rdpa.co
rpk cloud login
----
+
This caches a cloud token in `~/.config/rpk/rpk.yaml`. On every invocation, `rpk ai` reads the cached token automatically.

. Point your SDK at the proxy URL and let `rpk ai auth token` mint a fresh token on each call. Set environment variables:
. Select a profile that points at a cluster with AI Gateway v2 attached. The AI Gateway URL is cached on the profile when you create it.
+
[source,bash]
----
export PROXY_URL="<your-gateway-base>/llm/v1/providers/<provider-name>"
export OPENAI_API_KEY="$(rpk ai auth token)" # or ANTHROPIC_API_KEY, etc.
rpk profile use <profile-name>
# or, to switch the cluster the active profile points at:
rpk cloud cluster use <cluster-id>
----

`rpk ai auth token` returns a short-lived OIDC access token. Refresh by running it again: most users wire it into a wrapper script or shell function.
. Verify the connection:
+
[source,bash]
----
rpk ai llm list
----

TIP: The plugin supports named profiles for pointing at multiple gateways. Run `rpk ai profile create <name> --dataplane-url <gateway-url> --auth-mode device` to create one, then `rpk ai profile use <name>` to switch. See `rpk ai profile --help` for the full set of subcommands.
If the cached cloud token has expired, `rpk ai` returns a 401 with a hint to rerun `rpk cloud login`.

[TIP]
====
To target a specific gateway URL for a single invocation (for example, when running against a staging gateway without switching profiles), pass `--rpai-endpoint`:

[source,bash]
----
rpk ai --rpai-endpoint https://aigw.<cluster-id>.clusters.rdpa.co llm list
----

You can also export `RPAI_ENDPOINT` to override for the shell session.
====

// TODO(rpk-ai): rpai suppresses auth/profile subtrees in plugin mode today (cloudv2 apps/rpai/internal/cmd/root.go:127-135). If that changes, document `rpk ai auth` and `rpk ai profile` here.

=== Environment variables

The `rpk ai` command honors the following environment variables:

[cols="1,3"]
|===
|Variable |Purpose

|`RPAI_TOKEN`
|Bearer token for the gateway. Normally injected automatically from your cached `rpk cloud login` token; set explicitly to override.

|`RPAI_ENDPOINT`
|AI Gateway URL. Normally resolved from your active rpk cloud profile; set explicitly to override.

|`RPAI_PROFILE`, `RPAI_CONFIG`, `RPAI_VERBOSE`, `RPAI_FORMAT`
|Map to `--rpai-profile`, `--rpai-config`, `--rpai-verbose`, `--format`. Long flag names are renamed under `rpk ai` to avoid collision with `rpk`'s globals; short flags (`-p`, `-c`, `-v`, `-o`) are unchanged.
|===

== Authenticate with OIDC client credentials (CI and programmatic)

When the `rpk ai` plugin isn't available (CI runners, server-side processes, headless agents), use the OIDC `client_credentials` grant directly. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
For application code, CI runners, server-side processes, and headless agents, use the OIDC `client_credentials` grant directly. This is the canonical authentication path for SDK-style usage; `rpk ai` is for command-line workflows, not for embedding in application code. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.

[cols="1,2", options="header"]
|===
Expand Down Expand Up @@ -146,6 +189,7 @@ Passing `token_endpoint` to the `OAuth2Session` constructor lets `authlib` handl

Node.js (openid-client)::
+
--
[source,javascript]
----
import { Issuer } from 'openid-client';
Expand All @@ -166,6 +210,7 @@ const tokenSet = await client.grant({

const accessToken = tokenSet.access_token;
----
--
======

=== Token lifecycle management
Expand All @@ -175,7 +220,7 @@ IMPORTANT: Your client is responsible for refreshing tokens before they expire.
* Proactively refresh at ~80% of the token's TTL to avoid failed requests.
* `authlib` (Python) handles renewal automatically when you pass `token_endpoint` to `OAuth2Session`.
* For other languages, cache the token and its expiry, then request a new token before the current one expires.
* If you're using `rpk ai`, just rerun `rpk ai auth token`: it handles refresh against the same OIDC endpoint.
* For SDK code, refresh OIDC client-credentials tokens through your client library (see the `authlib` example above).

== Send requests with your SDK

Expand All @@ -184,21 +229,22 @@ The examples in this section assume you've set:
[source,bash]
----
export PROXY_URL="<your-gateway-base>/llm/v1/providers/<provider-name>"
export AUTH_TOKEN="$(rpk ai auth token)" # or an OIDC access token from above
export AUTH_TOKEN="<oidc-access-token>" # from the client_credentials flow above
----

[tabs]
======
OpenAI SDK::
+
--
[source,python]
----
import os
from openai import OpenAI

client = OpenAI(
base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-openai
api_key=os.environ["AUTH_TOKEN"], # rpk ai or OIDC access token
api_key=os.environ["AUTH_TOKEN"], # OIDC access token
)

response = client.chat.completions.create(
Expand All @@ -207,19 +253,21 @@ response = client.chat.completions.create(
)
print(response.choices[0].message.content)
----
+

The OpenAI SDK calls the proxy's `/v1/chat/completions` path, which AI Gateway forwards to OpenAI unchanged. Use it with any OpenAI provider and, with a different `base_url`, with any OpenAI-compatible provider (vLLM, Ollama, LM Studio, Together, Groq, OpenRouter).
--

Anthropic SDK::
+
--
[source,python]
----
import os
from anthropic import Anthropic

client = Anthropic(
base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-anthropic
auth_token=os.environ["AUTH_TOKEN"], # rpk ai or OIDC access token
auth_token=os.environ["AUTH_TOKEN"], # OIDC access token
)

message = client.messages.create(
Expand All @@ -229,11 +277,13 @@ message = client.messages.create(
)
print(message.content[0].text)
----
+

The Anthropic SDK hits `v1/messages` on the proxy, which AI Gateway forwards to Anthropic. If the provider is configured with *Auth passthrough*, send your own Anthropic `Authorization` header instead of an `auth_token`. AI Gateway forwards it unchanged.
--

Google Gemini SDK::
+
--
[source,python]
----
import os
Expand All @@ -250,16 +300,18 @@ response = client.models.generate_content(
)
print(response.text)
----
+

[IMPORTANT]
====
Gemini authenticates with the `x-goog-api-key` header, not `Authorization: Bearer`. Most Google SDKs set `x-goog-api-key` automatically from the `api_key` parameter. If you hand-roll the request, set the header yourself.
====
--

AWS Bedrock::
+
Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an `rpk ai` or OIDC token.
+
--
Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an OIDC access token.

[source,python]
----
import os, httpx
Expand All @@ -278,14 +330,16 @@ response = httpx.post(
print(response.json())
----

See xref:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.
+
See xref:ai-gateway:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.

TIP: Bedrock's `Converse` API works the same way: send to `/model/\{MODEL_ID}/converse` with a Converse-shaped body. Or use the AWS SDK's `bedrockruntime` client and set its `BaseEndpoint` to the proxy URL; the SDK signs the request, AI Gateway re-signs server-side with the provider's credentials, and your client never sees AWS keys.
--

OpenAI-compatible::
+
--
Use the OpenAI SDK with the proxy URL of the OpenAI-compatible provider and whatever model identifier the upstream exposes:
+

[source,python]
----
import os
Expand All @@ -301,6 +355,7 @@ response = client.chat.completions.create(
messages=[{"role": "user", "content": "Hello"}],
)
----
--
======

[NOTE]
Expand Down Expand Up @@ -354,18 +409,18 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod

== Best practices

* *Use environment variables* for the proxy URL and token; never hard-code them.
* *Wrap `rpk ai auth token`* in a script or shell function so refresh is invisible to your SDK code.
* *Implement retry with exponential backoff* for 5xx and timeout conditions.
* *Respect `Retry-After`* on 429 responses.
* *Rotate service account credentials* on a schedule your organization accepts.
* *Observe usage* through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).
* Use environment variables for the proxy URL and token. Never hard-code them.
* Refresh OIDC tokens through your client library so refresh is invisible to your SDK code (`authlib` for Python, `openid-client` for Node.js, etc.).
* Implement retry with exponential backoff for 5xx and timeout conditions.
* Respect `Retry-After` on 429 responses.
* Rotate service account credentials on a schedule your organization accepts.
* Observe usage through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).

== Troubleshooting

=== 401 Unauthorized

* If you're using `rpk ai`: rerun `rpk ai auth login` to refresh the session, then `rpk ai auth token` to mint a new token.
* If you're using `rpk ai`: rerun `rpk cloud login` to refresh the cached cloud token. Token expiry surfaces as a 401 with this hint in the error.
* If you're using OIDC client credentials: check the token hasn't expired and refresh it. Verify the audience is `cloudv2-production.redpanda.cloud` and the `Authorization` header is formatted `Bearer <token>`.
* For Gemini: ensure the token is sent as `x-goog-api-key`, not `Authorization`.
* For Anthropic with passthrough: ensure the client is sending a valid Anthropic `Authorization` header.
Expand All @@ -388,4 +443,4 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod

== Next steps

* xref:configure-provider.adoc[Configure an LLM provider]
* xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]
2 changes: 1 addition & 1 deletion modules/ai-gateway/pages/gateway-quickstart.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -529,6 +529,6 @@ const openai = new OpenAI({

* xref:routing-cel.adoc[]
* xref:aggregation.adoc[]
* xref:integrations/index.adoc[]
* xref:integrations:index.adoc[]
* xref:gateway-architecture.adoc[]
* xref:overview.adoc[]
10 changes: 5 additions & 5 deletions modules/ai-gateway/pages/overview.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Use the provider's own SDK: OpenAI, Anthropic, Google AI, AWS Bedrock, or any Op

=== Managed authentication

Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. The recommended local flow uses the `rpk ai` plugin for token refresh; CI and programmatic clients use the OIDC client-credentials grant directly. See xref:connect-agent.adoc[Connect your agent].
Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. For local command-line workflows, use `rpk cloud login` to authenticate and `rpk ai` to talk to the gateway. CI and programmatic clients use the OIDC client-credentials grant directly. See xref:ai-gateway:connect-agent.adoc[Connect your agent].

=== Per-provider observability

Expand Down Expand Up @@ -78,7 +78,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
|Call Claude Opus, Sonnet, and Haiku directly. Optionally forwards the client's `Authorization` header for enterprise and Max-plan subscription passthrough.

|*Google AI*
|Reach Gemini Pro, Flash, and multimodal models via Google AI Studio. Ideal for long-context workloads and image/video inputs.
|Reach Gemini Pro, Flash, and multimodal models through Google AI Studio. Ideal for long-context workloads and image/video inputs.

|*AWS Bedrock*
|Invoke foundation models (Claude, Llama, Titan, Nova) hosted inside your AWS account. Use when data residency, IAM, or VPC egress matter more than raw feature parity. Signed with SigV4 server-side by AI Gateway.
Expand All @@ -87,7 +87,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
|Point at any OpenAI-compatible endpoint (vLLM, Ollama, LM Studio, LocalAI, Together, Groq, OpenRouter). Useful for self-hosted models and aggregator gateways that ship `/v1/chat/completions`.
|===

See xref:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.
See xref:ai-gateway:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.

== When to use AI Gateway

Expand Down Expand Up @@ -116,5 +116,5 @@ AI Gateway does not provide these capabilities. For current status, consult the

== Next steps

. xref:configure-provider.adoc[Configure an LLM provider]
. xref:connect-agent.adoc[Connect your agent]
. xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]
. xref:ai-gateway:connect-agent.adoc[Connect your agent]
4 changes: 2 additions & 2 deletions modules/governance/pages/budgets.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ Some guardrail evaluators call an LLM to do their work. A toxicity classifier, f

Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically.

For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails.adoc[Configure guardrails].
For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails/index.adoc[Configure guardrails].

// TODO: confirm with eng that guardrail evaluator cost flows into the same SpendingService as user-facing LLM cost (vs. a separate stream). Open Q A3 in the companion plan, also flagged on the Guardrails plan.

Expand All @@ -87,7 +87,7 @@ Cap-management arrives after GA per the Governance V0 PRD. The planned feature s
* *Alert hooks* — webhook, email, or chat notifications when a cap is approached or exceeded.
* *Multi-tenant cap-setting* — per-tenant caps with override semantics.

Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails.adoc[Configure guardrails]) for selective request blocking.
Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails/index.adoc[Configure guardrails]) for selective request blocking.

// TODO: once the cap-management surface lands, replace this section with a forward link to the configuration how-to. If cap-management content grows beyond a single section, split this page into a sub-folder. Open Q C1 in the companion plan.

Expand Down
Loading