You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: modules/ai-gateway/pages/connect-agent.adoc
+92-37Lines changed: 92 additions & 37 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,13 +1,13 @@
1
1
= Connect Your Agent
2
-
:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local auth flow with the `rpk ai` plugin, the OIDC client-credentials flow for CI, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
2
+
:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local development workflow with `rpk ai`, the OIDC client-credentials flow for CI and application code, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
:learning-objective-1: Construct the proxy URL for an LLM provider you have configured
7
-
:learning-objective-2: Authenticate to AI Gateway using the `rpk ai` plugin for local development or OIDC client credentials for CI and programmatic clients
7
+
:learning-objective-2: Authenticate to AI Gateway with `rpk` for local development or with OIDC client credentials for CI and programmatic clients
8
8
:learning-objective-3: Send requests through the proxy URL with the SDK of your choice
9
9
10
-
This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You'll construct the proxy URL for a provider you have already created, authenticate (with the `rpk ai` plugin for local development or with OIDC client credentials for CI), and send your first request with the SDK of your choice.
10
+
This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You construct the proxy URL for a provider you have already created, authenticate (with `rpk cloud login`for local development or with OIDC client credentials for CI and application code), and send your first request with the SDK of your choice.
11
11
12
12
After completing this guide, you will be able to:
13
13
@@ -17,8 +17,8 @@ After completing this guide, you will be able to:
17
17
18
18
== Prerequisites
19
19
20
-
* A configured LLM provider. If you haven't created one yet, see xref:configure-provider.adoc[Configure an LLM provider].
21
-
* For local development: nothing else; you'll install the `rpk ai` plugin in the next section.
20
+
* A configured LLM provider. If you haven't created one yet, see xref:ai-gateway:configure-provider.adoc[Configure an LLM provider].
21
+
* For local development, nothing else. You'll install `rpk ai` in the next section.
22
22
* For CI or programmatic clients: a Redpanda Cloud service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[Authenticate to Redpanda Cloud].
23
23
+
24
24
// TODO: confirm whether ADP hosts its own service-account IAM post-standalone, or continues to share Redpanda Cloud Organization IAM.
@@ -41,41 +41,84 @@ AI Gateway forwards the request to the upstream provider, attaches the configure
41
41
42
42
TIP: The provider detail page generates ready-to-run snippets pre-filled with the correct proxy URL and paths. When in doubt, copy from the *Connect your app* section there.
43
43
44
+
// Updated for PRs #30273 / #30327 / #30360 (rpk ai managed plugin).
44
45
[[authenticate-with-rpk-ai]]
45
46
[[authenticate-with-rpai]]
46
-
== Authenticate with `rpk ai` (recommended for local development)
47
+
== Use `rpk ai` for local development
47
48
48
-
The `rpk ai` plugin is distributed through `rpk`'s plugin manager. The provider detail page surfaces an *Install* card with copy-pasteable steps. The flow is the same for every provider type:
49
+
The `rpk ai` command is the Redpanda AI CLI. Use it to manage AI Gateway resources (LLM providers, MCP servers, OAuth providers) and call MCP tools from the command line. Authentication for `rpk ai` is owned by `rpk cloud login`. The active AI Gateway URL comes from your active rpk cloud profile.
49
50
50
-
. Install the plugin:
51
+
. Install `rpk ai`:
51
52
+
52
53
[source,bash]
53
54
----
54
-
rpk plugin install ai
55
+
rpk ai install
55
56
----
57
+
+
58
+
Update later with `rpk ai upgrade`; remove with `rpk ai uninstall`.
56
59
57
-
. Log in with the gateway URL from the provider's *Connection* card:
60
+
. Log in to Redpanda Cloud:
58
61
+
59
62
[source,bash]
60
63
----
61
-
rpk ai auth login --server https://aigw.<cluster-id>.clusters.rdpa.co
64
+
rpk cloud login
62
65
----
66
+
+
67
+
This caches a cloud token in `~/.config/rpk/rpk.yaml`. On every invocation, `rpk ai` reads the cached token automatically.
63
68
64
-
. Point your SDK at the proxy URL and let `rpk ai auth token` mint a fresh token on each call. Set environment variables:
69
+
. Select a profile that points at a cluster with AI Gateway v2 attached. The AI Gateway URL is cached on the profile when you create it.
export OPENAI_API_KEY="$(rpk ai auth token)" # or ANTHROPIC_API_KEY, etc.
73
+
rpk profile use <profile-name>
74
+
# or, to switch the cluster the active profile points at:
75
+
rpk cloud cluster use <cluster-id>
70
76
----
71
77
72
-
`rpk ai auth token` returns a short-lived OIDC access token. Refresh by running it again: most users wire it into a wrapper script or shell function.
78
+
. Verify the connection:
79
+
+
80
+
[source,bash]
81
+
----
82
+
rpk ai llm list
83
+
----
73
84
74
-
TIP: The plugin supports named profiles for pointing at multiple gateways. Run `rpk ai profile create <name> --dataplane-url <gateway-url> --auth-mode device` to create one, then `rpk ai profile use <name>` to switch. See `rpk ai profile --help` for the full set of subcommands.
85
+
If the cached cloud token has expired, `rpk ai` returns a 401 with a hint to rerun `rpk cloud login`.
86
+
87
+
[TIP]
88
+
====
89
+
To target a specific gateway URL for a single invocation (for example, when running against a staging gateway without switching profiles), pass `--rpai-endpoint`:
90
+
91
+
[source,bash]
92
+
----
93
+
rpk ai --rpai-endpoint https://aigw.<cluster-id>.clusters.rdpa.co llm list
94
+
----
95
+
96
+
You can also export `RPAI_ENDPOINT` to override for the shell session.
97
+
====
98
+
99
+
// TODO(rpk-ai): rpai suppresses auth/profile subtrees in plugin mode today (cloudv2 apps/rpai/internal/cmd/root.go:127-135). If that changes, document `rpk ai auth` and `rpk ai profile` here.
100
+
101
+
=== Environment variables
102
+
103
+
The `rpk ai` command honors the following environment variables:
104
+
105
+
[cols="1,3"]
106
+
|===
107
+
|Variable |Purpose
108
+
109
+
|`RPAI_TOKEN`
110
+
|Bearer token for the gateway. Normally injected automatically from your cached `rpk cloud login` token; set explicitly to override.
111
+
112
+
|`RPAI_ENDPOINT`
113
+
|AI Gateway URL. Normally resolved from your active rpk cloud profile; set explicitly to override.
|Map to `--rpai-profile`, `--rpai-config`, `--rpai-verbose`, `--format`. Long flag names are renamed under `rpk ai` to avoid collision with `rpk`'s globals; short flags (`-p`, `-c`, `-v`, `-o`) are unchanged.
117
+
|===
75
118
76
119
== Authenticate with OIDC client credentials (CI and programmatic)
77
120
78
-
When the `rpk ai` plugin isn't available (CI runners, server-side processes, headless agents), use the OIDC `client_credentials` grant directly. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
121
+
For application code, CI runners, server-side processes, and headless agents, use the OIDC `client_credentials` grant directly. This is the canonical authentication path for SDK-style usage; `rpk ai` is for command-line workflows, not for embedding in application code. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
79
122
80
123
[cols="1,2", options="header"]
81
124
|===
@@ -146,6 +189,7 @@ Passing `token_endpoint` to the `OAuth2Session` constructor lets `authlib` handl
The OpenAI SDK calls the proxy's `/v1/chat/completions` path, which AI Gateway forwards to OpenAI unchanged. Use it with any OpenAI provider and, with a different `base_url`, with any OpenAI-compatible provider (vLLM, Ollama, LM Studio, Together, Groq, OpenRouter).
The Anthropic SDK hits `v1/messages` on the proxy, which AI Gateway forwards to Anthropic. If the provider is configured with *Auth passthrough*, send your own Anthropic `Authorization` header instead of an `auth_token`. AI Gateway forwards it unchanged.
Gemini authenticates with the `x-goog-api-key` header, not `Authorization: Bearer`. Most Google SDKs set `x-goog-api-key` automatically from the `api_key` parameter. If you hand-roll the request, set the header yourself.
257
307
====
308
+
--
258
309
259
310
AWS Bedrock::
260
311
+
261
-
Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an `rpk ai` or OIDC token.
262
-
+
312
+
--
313
+
Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an OIDC access token.
314
+
263
315
[source,python]
264
316
----
265
317
import os, httpx
@@ -278,14 +330,16 @@ response = httpx.post(
278
330
print(response.json())
279
331
----
280
332
281
-
See xref:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.
282
-
+
333
+
See xref:ai-gateway:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.
334
+
283
335
TIP: Bedrock's `Converse` API works the same way: send to `/model/\{MODEL_ID}/converse` with a Converse-shaped body. Or use the AWS SDK's `bedrockruntime` client and set its `BaseEndpoint` to the proxy URL; the SDK signs the request, AI Gateway re-signs server-side with the provider's credentials, and your client never sees AWS keys.
336
+
--
284
337
285
338
OpenAI-compatible::
286
339
+
340
+
--
287
341
Use the OpenAI SDK with the proxy URL of the OpenAI-compatible provider and whatever model identifier the upstream exposes:
@@ -354,18 +409,18 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod
354
409
355
410
== Best practices
356
411
357
-
* *Use environment variables* for the proxy URL and token; never hard-code them.
358
-
* *Wrap `rpk ai auth token`* in a script or shell function so refresh is invisible to your SDK code.
359
-
* *Implement retry with exponential backoff* for 5xx and timeout conditions.
360
-
* *Respect `Retry-After`* on 429 responses.
361
-
* *Rotate service account credentials* on a schedule your organization accepts.
362
-
* *Observe usage* through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).
412
+
* Use environment variables for the proxy URL and token. Never hard-code them.
413
+
* Refresh OIDC tokens through your client library so refresh is invisible to your SDK code (`authlib` for Python, `openid-client` for Node.js, etc.).
414
+
* Implement retry with exponential backoff for 5xx and timeout conditions.
415
+
* Respect `Retry-After` on 429 responses.
416
+
* Rotate service account credentials on a schedule your organization accepts.
417
+
* Observe usage through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).
363
418
364
419
== Troubleshooting
365
420
366
421
=== 401 Unauthorized
367
422
368
-
* If you're using `rpk ai`: rerun `rpk ai auth login` to refresh the session, then `rpk ai auth token` to mint a new token.
423
+
* If you're using `rpk ai`: rerun `rpk cloud login` to refresh the cached cloud token. Token expiry surfaces as a 401 with this hint in the error.
369
424
* If you're using OIDC client credentials: check the token hasn't expired and refresh it. Verify the audience is `cloudv2-production.redpanda.cloud` and the `Authorization` header is formatted `Bearer <token>`.
370
425
* For Gemini: ensure the token is sent as `x-goog-api-key`, not `Authorization`.
371
426
* For Anthropic with passthrough: ensure the client is sending a valid Anthropic `Authorization` header.
@@ -388,4 +443,4 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod
388
443
389
444
== Next steps
390
445
391
-
* xref:configure-provider.adoc[Configure an LLM provider]
446
+
* xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]
Copy file name to clipboardExpand all lines: modules/ai-gateway/pages/overview.adoc
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -46,7 +46,7 @@ Use the provider's own SDK: OpenAI, Anthropic, Google AI, AWS Bedrock, or any Op
46
46
47
47
=== Managed authentication
48
48
49
-
Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. The recommended local flow uses the `rpk ai` plugin for token refresh; CI and programmatic clients use the OIDC client-credentials grant directly. See xref:connect-agent.adoc[Connect your agent].
49
+
Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. For local command-line workflows, use `rpk cloud login` to authenticate and `rpk ai` to talk to the gateway. CI and programmatic clients use the OIDC client-credentials grant directly. See xref:ai-gateway:connect-agent.adoc[Connect your agent].
50
50
51
51
=== Per-provider observability
52
52
@@ -78,7 +78,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
78
78
|Call Claude Opus, Sonnet, and Haiku directly. Optionally forwards the client's `Authorization` header for enterprise and Max-plan subscription passthrough.
79
79
80
80
|*Google AI*
81
-
|Reach Gemini Pro, Flash, and multimodal models via Google AI Studio. Ideal for long-context workloads and image/video inputs.
81
+
|Reach Gemini Pro, Flash, and multimodal models through Google AI Studio. Ideal for long-context workloads and image/video inputs.
82
82
83
83
|*AWS Bedrock*
84
84
|Invoke foundation models (Claude, Llama, Titan, Nova) hosted inside your AWS account. Use when data residency, IAM, or VPC egress matter more than raw feature parity. Signed with SigV4 server-side by AI Gateway.
@@ -87,7 +87,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
87
87
|Point at any OpenAI-compatible endpoint (vLLM, Ollama, LM Studio, LocalAI, Together, Groq, OpenRouter). Useful for self-hosted models and aggregator gateways that ship `/v1/chat/completions`.
88
88
|===
89
89
90
-
See xref:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.
90
+
See xref:ai-gateway:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.
91
91
92
92
== When to use AI Gateway
93
93
@@ -116,5 +116,5 @@ AI Gateway does not provide these capabilities. For current status, consult the
116
116
117
117
== Next steps
118
118
119
-
. xref:configure-provider.adoc[Configure an LLM provider]
120
-
. xref:connect-agent.adoc[Connect your agent]
119
+
. xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]
120
+
. xref:ai-gateway:connect-agent.adoc[Connect your agent]
Copy file name to clipboardExpand all lines: modules/governance/pages/budgets.adoc
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -61,7 +61,7 @@ Some guardrail evaluators call an LLM to do their work. A toxicity classifier, f
61
61
62
62
Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically.
63
63
64
-
For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails.adoc[Configure guardrails].
64
+
For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails/index.adoc[Configure guardrails].
65
65
66
66
// TODO: confirm with eng that guardrail evaluator cost flows into the same SpendingService as user-facing LLM cost (vs. a separate stream). Open Q A3 in the companion plan, also flagged on the Guardrails plan.
67
67
@@ -87,7 +87,7 @@ Cap-management arrives after GA per the Governance V0 PRD. The planned feature s
87
87
* *Alert hooks* — webhook, email, or chat notifications when a cap is approached or exceeded.
88
88
* *Multi-tenant cap-setting* — per-tenant caps with override semantics.
89
89
90
-
Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails.adoc[Configure guardrails]) for selective request blocking.
90
+
Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails/index.adoc[Configure guardrails]) for selective request blocking.
91
91
92
92
// TODO: once the cap-management surface lands, replace this section with a forward link to the configuration how-to. If cap-management content grows beyond a single section, split this page into a sub-folder. Open Q C1 in the companion plan.
0 commit comments