Skip to content

Commit fe71a7e

Browse files
authored
Merge pull request #12 from redpanda-data/rpk-ai-doc-corrections
Correct rpk ai usage across AI Gateway and MCP pages
2 parents 4c12189 + 63fb644 commit fe71a7e

12 files changed

Lines changed: 185 additions & 120 deletions

File tree

modules/ai-gateway/pages/admin/setup-guide.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -379,4 +379,4 @@ Users can then discover and connect to the gateway using the information provide
379379
== Next steps
380380

381381
* xref:routing-cel.adoc[CEL Routing Cookbook]
382-
* xref:integrations/index.adoc[Integrations]
382+
* xref:integrations:index.adoc[Integrations]

modules/ai-gateway/pages/connect-agent.adoc

Lines changed: 92 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,13 @@
11
= Connect Your Agent
2-
:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local auth flow with the `rpk ai` plugin, the OIDC client-credentials flow for CI, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
2+
:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local development workflow with `rpk ai`, the OIDC client-credentials flow for CI and application code, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
33
:page-topic-type: how-to
44
:personas: app_developer
55
:page-aliases: redpanda-cloud:ai-agents:ai-gateway/builders/connect-your-agent.adoc
66
:learning-objective-1: Construct the proxy URL for an LLM provider you have configured
7-
:learning-objective-2: Authenticate to AI Gateway using the `rpk ai` plugin for local development or OIDC client credentials for CI and programmatic clients
7+
:learning-objective-2: Authenticate to AI Gateway with `rpk` for local development or with OIDC client credentials for CI and programmatic clients
88
:learning-objective-3: Send requests through the proxy URL with the SDK of your choice
99

10-
This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You'll construct the proxy URL for a provider you have already created, authenticate (with the `rpk ai` plugin for local development or with OIDC client credentials for CI), and send your first request with the SDK of your choice.
10+
This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You construct the proxy URL for a provider you have already created, authenticate (with `rpk cloud login` for local development or with OIDC client credentials for CI and application code), and send your first request with the SDK of your choice.
1111

1212
After completing this guide, you will be able to:
1313

@@ -17,8 +17,8 @@ After completing this guide, you will be able to:
1717
1818
== Prerequisites
1919

20-
* A configured LLM provider. If you haven't created one yet, see xref:configure-provider.adoc[Configure an LLM provider].
21-
* For local development: nothing else; you'll install the `rpk ai` plugin in the next section.
20+
* A configured LLM provider. If you haven't created one yet, see xref:ai-gateway:configure-provider.adoc[Configure an LLM provider].
21+
* For local development, nothing else. You'll install `rpk ai` in the next section.
2222
* For CI or programmatic clients: a Redpanda Cloud service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[Authenticate to Redpanda Cloud].
2323
+
2424
// TODO: confirm whether ADP hosts its own service-account IAM post-standalone, or continues to share Redpanda Cloud Organization IAM.
@@ -41,41 +41,84 @@ AI Gateway forwards the request to the upstream provider, attaches the configure
4141

4242
TIP: The provider detail page generates ready-to-run snippets pre-filled with the correct proxy URL and paths. When in doubt, copy from the *Connect your app* section there.
4343

44+
// Updated for PRs #30273 / #30327 / #30360 (rpk ai managed plugin).
4445
[[authenticate-with-rpk-ai]]
4546
[[authenticate-with-rpai]]
46-
== Authenticate with `rpk ai` (recommended for local development)
47+
== Use `rpk ai` for local development
4748

48-
The `rpk ai` plugin is distributed through `rpk`'s plugin manager. The provider detail page surfaces an *Install* card with copy-pasteable steps. The flow is the same for every provider type:
49+
The `rpk ai` command is the Redpanda AI CLI. Use it to manage AI Gateway resources (LLM providers, MCP servers, OAuth providers) and call MCP tools from the command line. Authentication for `rpk ai` is owned by `rpk cloud login`. The active AI Gateway URL comes from your active rpk cloud profile.
4950

50-
. Install the plugin:
51+
. Install `rpk ai`:
5152
+
5253
[source,bash]
5354
----
54-
rpk plugin install ai
55+
rpk ai install
5556
----
57+
+
58+
Update later with `rpk ai upgrade`; remove with `rpk ai uninstall`.
5659

57-
. Log in with the gateway URL from the provider's *Connection* card:
60+
. Log in to Redpanda Cloud:
5861
+
5962
[source,bash]
6063
----
61-
rpk ai auth login --server https://aigw.<cluster-id>.clusters.rdpa.co
64+
rpk cloud login
6265
----
66+
+
67+
This caches a cloud token in `~/.config/rpk/rpk.yaml`. On every invocation, `rpk ai` reads the cached token automatically.
6368

64-
. Point your SDK at the proxy URL and let `rpk ai auth token` mint a fresh token on each call. Set environment variables:
69+
. Select a profile that points at a cluster with AI Gateway v2 attached. The AI Gateway URL is cached on the profile when you create it.
6570
+
6671
[source,bash]
6772
----
68-
export PROXY_URL="<your-gateway-base>/llm/v1/providers/<provider-name>"
69-
export OPENAI_API_KEY="$(rpk ai auth token)" # or ANTHROPIC_API_KEY, etc.
73+
rpk profile use <profile-name>
74+
# or, to switch the cluster the active profile points at:
75+
rpk cloud cluster use <cluster-id>
7076
----
7177

72-
`rpk ai auth token` returns a short-lived OIDC access token. Refresh by running it again: most users wire it into a wrapper script or shell function.
78+
. Verify the connection:
79+
+
80+
[source,bash]
81+
----
82+
rpk ai llm list
83+
----
7384

74-
TIP: The plugin supports named profiles for pointing at multiple gateways. Run `rpk ai profile create <name> --dataplane-url <gateway-url> --auth-mode device` to create one, then `rpk ai profile use <name>` to switch. See `rpk ai profile --help` for the full set of subcommands.
85+
If the cached cloud token has expired, `rpk ai` returns a 401 with a hint to rerun `rpk cloud login`.
86+
87+
[TIP]
88+
====
89+
To target a specific gateway URL for a single invocation (for example, when running against a staging gateway without switching profiles), pass `--rpai-endpoint`:
90+
91+
[source,bash]
92+
----
93+
rpk ai --rpai-endpoint https://aigw.<cluster-id>.clusters.rdpa.co llm list
94+
----
95+
96+
You can also export `RPAI_ENDPOINT` to override for the shell session.
97+
====
98+
99+
// TODO(rpk-ai): rpai suppresses auth/profile subtrees in plugin mode today (cloudv2 apps/rpai/internal/cmd/root.go:127-135). If that changes, document `rpk ai auth` and `rpk ai profile` here.
100+
101+
=== Environment variables
102+
103+
The `rpk ai` command honors the following environment variables:
104+
105+
[cols="1,3"]
106+
|===
107+
|Variable |Purpose
108+
109+
|`RPAI_TOKEN`
110+
|Bearer token for the gateway. Normally injected automatically from your cached `rpk cloud login` token; set explicitly to override.
111+
112+
|`RPAI_ENDPOINT`
113+
|AI Gateway URL. Normally resolved from your active rpk cloud profile; set explicitly to override.
114+
115+
|`RPAI_PROFILE`, `RPAI_CONFIG`, `RPAI_VERBOSE`, `RPAI_FORMAT`
116+
|Map to `--rpai-profile`, `--rpai-config`, `--rpai-verbose`, `--format`. Long flag names are renamed under `rpk ai` to avoid collision with `rpk`'s globals; short flags (`-p`, `-c`, `-v`, `-o`) are unchanged.
117+
|===
75118

76119
== Authenticate with OIDC client credentials (CI and programmatic)
77120

78-
When the `rpk ai` plugin isn't available (CI runners, server-side processes, headless agents), use the OIDC `client_credentials` grant directly. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
121+
For application code, CI runners, server-side processes, and headless agents, use the OIDC `client_credentials` grant directly. This is the canonical authentication path for SDK-style usage; `rpk ai` is for command-line workflows, not for embedding in application code. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
79122

80123
[cols="1,2", options="header"]
81124
|===
@@ -146,6 +189,7 @@ Passing `token_endpoint` to the `OAuth2Session` constructor lets `authlib` handl
146189
147190
Node.js (openid-client)::
148191
+
192+
--
149193
[source,javascript]
150194
----
151195
import { Issuer } from 'openid-client';
@@ -166,6 +210,7 @@ const tokenSet = await client.grant({
166210
167211
const accessToken = tokenSet.access_token;
168212
----
213+
--
169214
======
170215

171216
=== Token lifecycle management
@@ -175,7 +220,7 @@ IMPORTANT: Your client is responsible for refreshing tokens before they expire.
175220
* Proactively refresh at ~80% of the token's TTL to avoid failed requests.
176221
* `authlib` (Python) handles renewal automatically when you pass `token_endpoint` to `OAuth2Session`.
177222
* For other languages, cache the token and its expiry, then request a new token before the current one expires.
178-
* If you're using `rpk ai`, just rerun `rpk ai auth token`: it handles refresh against the same OIDC endpoint.
223+
* For SDK code, refresh OIDC client-credentials tokens through your client library (see the `authlib` example above).
179224

180225
== Send requests with your SDK
181226

@@ -184,21 +229,22 @@ The examples in this section assume you've set:
184229
[source,bash]
185230
----
186231
export PROXY_URL="<your-gateway-base>/llm/v1/providers/<provider-name>"
187-
export AUTH_TOKEN="$(rpk ai auth token)" # or an OIDC access token from above
232+
export AUTH_TOKEN="<oidc-access-token>" # from the client_credentials flow above
188233
----
189234

190235
[tabs]
191236
======
192237
OpenAI SDK::
193238
+
239+
--
194240
[source,python]
195241
----
196242
import os
197243
from openai import OpenAI
198244
199245
client = OpenAI(
200246
base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-openai
201-
api_key=os.environ["AUTH_TOKEN"], # rpk ai or OIDC access token
247+
api_key=os.environ["AUTH_TOKEN"], # OIDC access token
202248
)
203249
204250
response = client.chat.completions.create(
@@ -207,19 +253,21 @@ response = client.chat.completions.create(
207253
)
208254
print(response.choices[0].message.content)
209255
----
210-
+
256+
211257
The OpenAI SDK calls the proxy's `/v1/chat/completions` path, which AI Gateway forwards to OpenAI unchanged. Use it with any OpenAI provider and, with a different `base_url`, with any OpenAI-compatible provider (vLLM, Ollama, LM Studio, Together, Groq, OpenRouter).
258+
--
212259
213260
Anthropic SDK::
214261
+
262+
--
215263
[source,python]
216264
----
217265
import os
218266
from anthropic import Anthropic
219267
220268
client = Anthropic(
221269
base_url=os.environ["PROXY_URL"], # .../llm/v1/providers/my-anthropic
222-
auth_token=os.environ["AUTH_TOKEN"], # rpk ai or OIDC access token
270+
auth_token=os.environ["AUTH_TOKEN"], # OIDC access token
223271
)
224272
225273
message = client.messages.create(
@@ -229,11 +277,13 @@ message = client.messages.create(
229277
)
230278
print(message.content[0].text)
231279
----
232-
+
280+
233281
The Anthropic SDK hits `v1/messages` on the proxy, which AI Gateway forwards to Anthropic. If the provider is configured with *Auth passthrough*, send your own Anthropic `Authorization` header instead of an `auth_token`. AI Gateway forwards it unchanged.
282+
--
234283
235284
Google Gemini SDK::
236285
+
286+
--
237287
[source,python]
238288
----
239289
import os
@@ -250,16 +300,18 @@ response = client.models.generate_content(
250300
)
251301
print(response.text)
252302
----
253-
+
303+
254304
[IMPORTANT]
255305
====
256306
Gemini authenticates with the `x-goog-api-key` header, not `Authorization: Bearer`. Most Google SDKs set `x-goog-api-key` automatically from the `api_key` parameter. If you hand-roll the request, set the header yourself.
257307
====
308+
--
258309
259310
AWS Bedrock::
260311
+
261-
Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an `rpk ai` or OIDC token.
262-
+
312+
--
313+
Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an OIDC access token.
314+
263315
[source,python]
264316
----
265317
import os, httpx
@@ -278,14 +330,16 @@ response = httpx.post(
278330
print(response.json())
279331
----
280332
281-
See xref:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.
282-
+
333+
See xref:ai-gateway:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.
334+
283335
TIP: Bedrock's `Converse` API works the same way: send to `/model/\{MODEL_ID}/converse` with a Converse-shaped body. Or use the AWS SDK's `bedrockruntime` client and set its `BaseEndpoint` to the proxy URL; the SDK signs the request, AI Gateway re-signs server-side with the provider's credentials, and your client never sees AWS keys.
336+
--
284337
285338
OpenAI-compatible::
286339
+
340+
--
287341
Use the OpenAI SDK with the proxy URL of the OpenAI-compatible provider and whatever model identifier the upstream exposes:
288-
+
342+
289343
[source,python]
290344
----
291345
import os
@@ -301,6 +355,7 @@ response = client.chat.completions.create(
301355
messages=[{"role": "user", "content": "Hello"}],
302356
)
303357
----
358+
--
304359
======
305360

306361
[NOTE]
@@ -354,18 +409,18 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod
354409

355410
== Best practices
356411

357-
* *Use environment variables* for the proxy URL and token; never hard-code them.
358-
* *Wrap `rpk ai auth token`* in a script or shell function so refresh is invisible to your SDK code.
359-
* *Implement retry with exponential backoff* for 5xx and timeout conditions.
360-
* *Respect `Retry-After`* on 429 responses.
361-
* *Rotate service account credentials* on a schedule your organization accepts.
362-
* *Observe usage* through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).
412+
* Use environment variables for the proxy URL and token. Never hard-code them.
413+
* Refresh OIDC tokens through your client library so refresh is invisible to your SDK code (`authlib` for Python, `openid-client` for Node.js, etc.).
414+
* Implement retry with exponential backoff for 5xx and timeout conditions.
415+
* Respect `Retry-After` on 429 responses.
416+
* Rotate service account credentials on a schedule your organization accepts.
417+
* Observe usage through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).
363418

364419
== Troubleshooting
365420

366421
=== 401 Unauthorized
367422

368-
* If you're using `rpk ai`: rerun `rpk ai auth login` to refresh the session, then `rpk ai auth token` to mint a new token.
423+
* If you're using `rpk ai`: rerun `rpk cloud login` to refresh the cached cloud token. Token expiry surfaces as a 401 with this hint in the error.
369424
* If you're using OIDC client credentials: check the token hasn't expired and refresh it. Verify the audience is `cloudv2-production.redpanda.cloud` and the `Authorization` header is formatted `Bearer <token>`.
370425
* For Gemini: ensure the token is sent as `x-goog-api-key`, not `Authorization`.
371426
* For Anthropic with passthrough: ensure the client is sending a valid Anthropic `Authorization` header.
@@ -388,4 +443,4 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod
388443

389444
== Next steps
390445

391-
* xref:configure-provider.adoc[Configure an LLM provider]
446+
* xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]

modules/ai-gateway/pages/gateway-quickstart.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -529,6 +529,6 @@ const openai = new OpenAI({
529529

530530
* xref:routing-cel.adoc[]
531531
* xref:aggregation.adoc[]
532-
* xref:integrations/index.adoc[]
532+
* xref:integrations:index.adoc[]
533533
* xref:gateway-architecture.adoc[]
534534
* xref:overview.adoc[]

modules/ai-gateway/pages/overview.adoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -46,7 +46,7 @@ Use the provider's own SDK: OpenAI, Anthropic, Google AI, AWS Bedrock, or any Op
4646

4747
=== Managed authentication
4848

49-
Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. The recommended local flow uses the `rpk ai` plugin for token refresh; CI and programmatic clients use the OIDC client-credentials grant directly. See xref:connect-agent.adoc[Connect your agent].
49+
Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. For local command-line workflows, use `rpk cloud login` to authenticate and `rpk ai` to talk to the gateway. CI and programmatic clients use the OIDC client-credentials grant directly. See xref:ai-gateway:connect-agent.adoc[Connect your agent].
5050

5151
=== Per-provider observability
5252

@@ -78,7 +78,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
7878
|Call Claude Opus, Sonnet, and Haiku directly. Optionally forwards the client's `Authorization` header for enterprise and Max-plan subscription passthrough.
7979

8080
|*Google AI*
81-
|Reach Gemini Pro, Flash, and multimodal models via Google AI Studio. Ideal for long-context workloads and image/video inputs.
81+
|Reach Gemini Pro, Flash, and multimodal models through Google AI Studio. Ideal for long-context workloads and image/video inputs.
8282

8383
|*AWS Bedrock*
8484
|Invoke foundation models (Claude, Llama, Titan, Nova) hosted inside your AWS account. Use when data residency, IAM, or VPC egress matter more than raw feature parity. Signed with SigV4 server-side by AI Gateway.
@@ -87,7 +87,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
8787
|Point at any OpenAI-compatible endpoint (vLLM, Ollama, LM Studio, LocalAI, Together, Groq, OpenRouter). Useful for self-hosted models and aggregator gateways that ship `/v1/chat/completions`.
8888
|===
8989

90-
See xref:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.
90+
See xref:ai-gateway:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.
9191

9292
== When to use AI Gateway
9393

@@ -116,5 +116,5 @@ AI Gateway does not provide these capabilities. For current status, consult the
116116

117117
== Next steps
118118

119-
. xref:configure-provider.adoc[Configure an LLM provider]
120-
. xref:connect-agent.adoc[Connect your agent]
119+
. xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]
120+
. xref:ai-gateway:connect-agent.adoc[Connect your agent]

modules/governance/pages/budgets.adoc

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,7 @@ Some guardrail evaluators call an LLM to do their work. A toxicity classifier, f
6161

6262
Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically.
6363

64-
For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails.adoc[Configure guardrails].
64+
For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails/index.adoc[Configure guardrails].
6565

6666
// TODO: confirm with eng that guardrail evaluator cost flows into the same SpendingService as user-facing LLM cost (vs. a separate stream). Open Q A3 in the companion plan, also flagged on the Guardrails plan.
6767

@@ -87,7 +87,7 @@ Cap-management arrives after GA per the Governance V0 PRD. The planned feature s
8787
* *Alert hooks* — webhook, email, or chat notifications when a cap is approached or exceeded.
8888
* *Multi-tenant cap-setting* — per-tenant caps with override semantics.
8989

90-
Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails.adoc[Configure guardrails]) for selective request blocking.
90+
Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails/index.adoc[Configure guardrails]) for selective request blocking.
9191

9292
// TODO: once the cap-management surface lands, replace this section with a forward link to the configuration how-to. If cap-management content grows beyond a single section, split this page into a sub-folder. Open Q C1 in the companion plan.
9393

0 commit comments

Comments
 (0)