redpanda-data · micheleRP · May 5, 2026 · May 5, 2026 · May 5, 2026 · May 5, 2026
@@ -379,4 +379,4 @@ Users can then discover and connect to the gateway using the information provide
 == Next steps
 
 * xref:routing-cel.adoc[CEL Routing Cookbook]
-* xref:integrations/index.adoc[Integrations]
+* xref:integrations:index.adoc[Integrations]
@@ -1,13 +1,13 @@
 = Connect Your Agent
-:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local auth flow with the `rpk ai` plugin, the OIDC client-credentials flow for CI, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
+:description: Point your application or AI agent at an AI Gateway provider's proxy URL. Covers the URL shape, the local development workflow with `rpk ai`, the OIDC client-credentials flow for CI and application code, and SDK examples for OpenAI, Anthropic, Google AI, AWS Bedrock, and OpenAI-compatible endpoints.
 :page-topic-type: how-to
 :personas: app_developer
 :page-aliases: redpanda-cloud:ai-agents:ai-gateway/builders/connect-your-agent.adoc
 :learning-objective-1: Construct the proxy URL for an LLM provider you have configured
-:learning-objective-2: Authenticate to AI Gateway using the `rpk ai` plugin for local development or OIDC client credentials for CI and programmatic clients
+:learning-objective-2: Authenticate to AI Gateway with `rpk` for local development or with OIDC client credentials for CI and programmatic clients
 :learning-objective-3: Send requests through the proxy URL with the SDK of your choice
 
-This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You'll construct the proxy URL for a provider you have already created, authenticate (with the `rpk ai` plugin for local development or with OIDC client credentials for CI), and send your first request with the SDK of your choice.
+This guide shows how to connect your glossterm:AI agent[] or application to AI Gateway. You construct the proxy URL for a provider you have already created, authenticate (with `rpk cloud login` for local development or with OIDC client credentials for CI and application code), and send your first request with the SDK of your choice.
 
 After completing this guide, you will be able to:
 
@@ -17,8 +17,8 @@ After completing this guide, you will be able to:
 
 == Prerequisites
 
-* A configured LLM provider. If you haven't created one yet, see xref:configure-provider.adoc[Configure an LLM provider].
-* For local development: nothing else; you'll install the `rpk ai` plugin in the next section.
+* A configured LLM provider. If you haven't created one yet, see xref:ai-gateway:configure-provider.adoc[Configure an LLM provider].
+* For local development, nothing else. You'll install `rpk ai` in the next section.
 * For CI or programmatic clients: a Redpanda Cloud service account with OIDC client credentials. See xref:redpanda-cloud:security:cloud-authentication.adoc[Authenticate to Redpanda Cloud].
 +
 // TODO: confirm whether ADP hosts its own service-account IAM post-standalone, or continues to share Redpanda Cloud Organization IAM.
@@ -41,41 +41,84 @@ AI Gateway forwards the request to the upstream provider, attaches the configure
 
 TIP: The provider detail page generates ready-to-run snippets pre-filled with the correct proxy URL and paths. When in doubt, copy from the *Connect your app* section there.
 
+// Updated for PRs #30273 / #30327 / #30360 (rpk ai managed plugin).
 [[authenticate-with-rpk-ai]]
 [[authenticate-with-rpai]]
-== Authenticate with `rpk ai` (recommended for local development)
+== Use `rpk ai` for local development
 
-The `rpk ai` plugin is distributed through `rpk`'s plugin manager. The provider detail page surfaces an *Install* card with copy-pasteable steps. The flow is the same for every provider type:
+The `rpk ai` command is the Redpanda AI CLI. Use it to manage AI Gateway resources (LLM providers, MCP servers, OAuth providers) and call MCP tools from the command line. Authentication for `rpk ai` is owned by `rpk cloud login`. The active AI Gateway URL comes from your active rpk cloud profile.
 
-. Install the plugin:
+. Install `rpk ai`:
 +
 [source,bash]
 ----
-rpk plugin install ai
+rpk ai install
 ----
++
+Update later with `rpk ai upgrade`; remove with `rpk ai uninstall`.
 
-. Log in with the gateway URL from the provider's *Connection* card:
+. Log in to Redpanda Cloud:
 +
 [source,bash]
 ----
-rpk ai auth login --server https://aigw.<cluster-id>.clusters.rdpa.co
+rpk cloud login
 ----
++
+This caches a cloud token in `~/.config/rpk/rpk.yaml`. On every invocation, `rpk ai` reads the cached token automatically.
 
-. Point your SDK at the proxy URL and let `rpk ai auth token` mint a fresh token on each call. Set environment variables:
+. Select a profile that points at a cluster with AI Gateway v2 attached. The AI Gateway URL is cached on the profile when you create it.
 +
 [source,bash]
 ----
-export PROXY_URL="<your-gateway-base>/llm/v1/providers/<provider-name>"
-export OPENAI_API_KEY="$(rpk ai auth token)"   # or ANTHROPIC_API_KEY, etc.
+rpk profile use <profile-name>
+# or, to switch the cluster the active profile points at:
+rpk cloud cluster use <cluster-id>
 ----
 
-`rpk ai auth token` returns a short-lived OIDC access token. Refresh by running it again: most users wire it into a wrapper script or shell function.
+. Verify the connection:
++
+[source,bash]
+----
+rpk ai llm list
+----
 
-TIP: The plugin supports named profiles for pointing at multiple gateways. Run `rpk ai profile create <name> --dataplane-url <gateway-url> --auth-mode device` to create one, then `rpk ai profile use <name>` to switch. See `rpk ai profile --help` for the full set of subcommands.
+If the cached cloud token has expired, `rpk ai` returns a 401 with a hint to rerun `rpk cloud login`.
+
+[TIP]
+====
+To target a specific gateway URL for a single invocation (for example, when running against a staging gateway without switching profiles), pass `--rpai-endpoint`:
+
+[source,bash]
+----
+rpk ai --rpai-endpoint https://aigw.<cluster-id>.clusters.rdpa.co llm list
+----
+
+You can also export `RPAI_ENDPOINT` to override for the shell session.
+====
+
+// TODO(rpk-ai): rpai suppresses auth/profile subtrees in plugin mode today (cloudv2 apps/rpai/internal/cmd/root.go:127-135). If that changes, document `rpk ai auth` and `rpk ai profile` here.
+
+=== Environment variables
+
+The `rpk ai` command honors the following environment variables:
+
+[cols="1,3"]
+|===
+|Variable |Purpose
+
+|`RPAI_TOKEN`
+|Bearer token for the gateway. Normally injected automatically from your cached `rpk cloud login` token; set explicitly to override.
+
+|`RPAI_ENDPOINT`
+|AI Gateway URL. Normally resolved from your active rpk cloud profile; set explicitly to override.
+
+|`RPAI_PROFILE`, `RPAI_CONFIG`, `RPAI_VERBOSE`, `RPAI_FORMAT`
+|Map to `--rpai-profile`, `--rpai-config`, `--rpai-verbose`, `--format`. Long flag names are renamed under `rpk ai` to avoid collision with `rpk`'s globals; short flags (`-p`, `-c`, `-v`, `-o`) are unchanged.
+|===
 
 == Authenticate with OIDC client credentials (CI and programmatic)
 
-When the `rpk ai` plugin isn't available (CI runners, server-side processes, headless agents), use the OIDC `client_credentials` grant directly. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
+For application code, CI runners, server-side processes, and headless agents, use the OIDC `client_credentials` grant directly. This is the canonical authentication path for SDK-style usage; `rpk ai` is for command-line workflows, not for embedding in application code. Values are surfaced on the provider's *Connection* card; defaults at the time of writing are below.
 
 [cols="1,2", options="header"]
 |===
@@ -146,6 +189,7 @@ Passing `token_endpoint` to the `OAuth2Session` constructor lets `authlib` handl
 
 Node.js (openid-client)::
 +
+--
 [source,javascript]
 ----
 import { Issuer } from 'openid-client';
@@ -166,6 +210,7 @@ const tokenSet = await client.grant({
 
 const accessToken = tokenSet.access_token;
 ----
+--
 ======
 
 === Token lifecycle management
@@ -175,7 +220,7 @@ IMPORTANT: Your client is responsible for refreshing tokens before they expire.
 * Proactively refresh at ~80% of the token's TTL to avoid failed requests.
 * `authlib` (Python) handles renewal automatically when you pass `token_endpoint` to `OAuth2Session`.
 * For other languages, cache the token and its expiry, then request a new token before the current one expires.
-* If you're using `rpk ai`, just rerun `rpk ai auth token`: it handles refresh against the same OIDC endpoint.
+* For SDK code, refresh OIDC client-credentials tokens through your client library (see the `authlib` example above).
 
 == Send requests with your SDK
 
@@ -184,21 +229,22 @@ The examples in this section assume you've set:
 [source,bash]
 ----
 export PROXY_URL="<your-gateway-base>/llm/v1/providers/<provider-name>"
-export AUTH_TOKEN="$(rpk ai auth token)"   # or an OIDC access token from above
+export AUTH_TOKEN="<oidc-access-token>"   # from the client_credentials flow above
 ----
 
 [tabs]
 ======
 OpenAI SDK::
 +
+--
 [source,python]
 ----
 import os
 from openai import OpenAI
 
 client = OpenAI(
     base_url=os.environ["PROXY_URL"],       # .../llm/v1/providers/my-openai
-    api_key=os.environ["AUTH_TOKEN"],        # rpk ai or OIDC access token
+    api_key=os.environ["AUTH_TOKEN"],        # OIDC access token
 )
 
 response = client.chat.completions.create(
@@ -207,19 +253,21 @@ response = client.chat.completions.create(
 )
 print(response.choices[0].message.content)
 ----
-+
+
 The OpenAI SDK calls the proxy's `/v1/chat/completions` path, which AI Gateway forwards to OpenAI unchanged. Use it with any OpenAI provider and, with a different `base_url`, with any OpenAI-compatible provider (vLLM, Ollama, LM Studio, Together, Groq, OpenRouter).
+--
 
 Anthropic SDK::
 +
+--
 [source,python]
 ----
 import os
 from anthropic import Anthropic
 
 client = Anthropic(
     base_url=os.environ["PROXY_URL"],       # .../llm/v1/providers/my-anthropic
-    auth_token=os.environ["AUTH_TOKEN"],     # rpk ai or OIDC access token
+    auth_token=os.environ["AUTH_TOKEN"],     # OIDC access token
 )
 
 message = client.messages.create(
@@ -229,11 +277,13 @@ message = client.messages.create(
 )
 print(message.content[0].text)
 ----
-+
+
 The Anthropic SDK hits `v1/messages` on the proxy, which AI Gateway forwards to Anthropic. If the provider is configured with *Auth passthrough*, send your own Anthropic `Authorization` header instead of an `auth_token`. AI Gateway forwards it unchanged.
+--
 
 Google Gemini SDK::
 +
+--
 [source,python]
 ----
 import os
@@ -250,16 +300,18 @@ response = client.models.generate_content(
 )
 print(response.text)
 ----
-+
+
 [IMPORTANT]
 ====
 Gemini authenticates with the `x-goog-api-key` header, not `Authorization: Bearer`. Most Google SDKs set `x-goog-api-key` automatically from the `api_key` parameter. If you hand-roll the request, set the header yourself.
 ====
+--
 
 AWS Bedrock::
 +
-Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an `rpk ai` or OIDC token.
-+
+--
+Bedrock is different: SigV4 signing is performed *server-side* by AI Gateway using the credentials on the provider. Your client only needs to call the proxy URL with an OIDC access token.
+
 [source,python]
 ----
 import os, httpx
@@ -278,14 +330,16 @@ response = httpx.post(
 print(response.json())
 ----
 
-See xref:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.
-+
+See xref:ai-gateway:configure-provider.adoc#bedrock-inference-profiles[the Bedrock provider reference] for inference-profile selection guidance.
+
 TIP: Bedrock's `Converse` API works the same way: send to `/model/\{MODEL_ID}/converse` with a Converse-shaped body. Or use the AWS SDK's `bedrockruntime` client and set its `BaseEndpoint` to the proxy URL; the SDK signs the request, AI Gateway re-signs server-side with the provider's credentials, and your client never sees AWS keys.
+--
 
 OpenAI-compatible::
 +
+--
 Use the OpenAI SDK with the proxy URL of the OpenAI-compatible provider and whatever model identifier the upstream exposes:
-+
+
 [source,python]
 ----
 import os
@@ -301,6 +355,7 @@ response = client.chat.completions.create(
     messages=[{"role": "user", "content": "Hello"}],
 )
 ----
+--
 ======
 
 [NOTE]
@@ -354,18 +409,18 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod
 
 == Best practices
 
-* *Use environment variables* for the proxy URL and token; never hard-code them.
-* *Wrap `rpk ai auth token`* in a script or shell function so refresh is invisible to your SDK code.
-* *Implement retry with exponential backoff* for 5xx and timeout conditions.
-* *Respect `Retry-After`* on 429 responses.
-* *Rotate service account credentials* on a schedule your organization accepts.
-* *Observe usage* through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).
+* Use environment variables for the proxy URL and token. Never hard-code them.
+* Refresh OIDC tokens through your client library so refresh is invisible to your SDK code (`authlib` for Python, `openid-client` for Node.js, etc.).
+* Implement retry with exponential backoff for 5xx and timeout conditions.
+* Respect `Retry-After` on 429 responses.
+* Rotate service account credentials on a schedule your organization accepts.
+* Observe usage through the ADP UI on each provider's detail page. A *Cost & usage* section is in development (the UI shows a "Coming soon" placeholder today).
 
 == Troubleshooting
 
 === 401 Unauthorized
 
-* If you're using `rpk ai`: rerun `rpk ai auth login` to refresh the session, then `rpk ai auth token` to mint a new token.
+* If you're using `rpk ai`: rerun `rpk cloud login` to refresh the cached cloud token. Token expiry surfaces as a 401 with this hint in the error.
 * If you're using OIDC client credentials: check the token hasn't expired and refresh it. Verify the audience is `cloudv2-production.redpanda.cloud` and the `Authorization` header is formatted `Bearer <token>`.
 * For Gemini: ensure the token is sent as `x-goog-api-key`, not `Authorization`.
 * For Anthropic with passthrough: ensure the client is sending a valid Anthropic `Authorization` header.
@@ -388,4 +443,4 @@ AI Gateway returns standard HTTP status codes. The upstream provider's error bod
 
 == Next steps
 
-* xref:configure-provider.adoc[Configure an LLM provider]
+* xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]
@@ -529,6 +529,6 @@ const openai = new OpenAI({
 
 * xref:routing-cel.adoc[]
 * xref:aggregation.adoc[]
-* xref:integrations/index.adoc[]
+* xref:integrations:index.adoc[]
 * xref:gateway-architecture.adoc[]
 * xref:overview.adoc[]
@@ -46,7 +46,7 @@ Use the provider's own SDK: OpenAI, Anthropic, Google AI, AWS Bedrock, or any Op
 
 === Managed authentication
 
-Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. The recommended local flow uses the `rpk ai` plugin for token refresh; CI and programmatic clients use the OIDC client-credentials grant directly. See xref:connect-agent.adoc[Connect your agent].
+Applications authenticate to ADP with OIDC service accounts instead of long-lived provider API keys. Service accounts use the same role and audit model as every other ADP resource, and mint short-lived tokens that are easy to revoke. For local command-line workflows, use `rpk cloud login` to authenticate and `rpk ai` to talk to the gateway. CI and programmatic clients use the OIDC client-credentials grant directly. See xref:ai-gateway:connect-agent.adoc[Connect your agent].
 
 === Per-provider observability
 
@@ -78,7 +78,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
 |Call Claude Opus, Sonnet, and Haiku directly. Optionally forwards the client's `Authorization` header for enterprise and Max-plan subscription passthrough.
 
 |*Google AI*
-|Reach Gemini Pro, Flash, and multimodal models via Google AI Studio. Ideal for long-context workloads and image/video inputs.
+|Reach Gemini Pro, Flash, and multimodal models through Google AI Studio. Ideal for long-context workloads and image/video inputs.
 
 |*AWS Bedrock*
 |Invoke foundation models (Claude, Llama, Titan, Nova) hosted inside your AWS account. Use when data residency, IAM, or VPC egress matter more than raw feature parity. Signed with SigV4 server-side by AI Gateway.
@@ -87,7 +87,7 @@ AI Gateway supports five provider types. The UI labels and short descriptions ma
 |Point at any OpenAI-compatible endpoint (vLLM, Ollama, LM Studio, LocalAI, Together, Groq, OpenRouter). Useful for self-hosted models and aggregator gateways that ship `/v1/chat/completions`.
 |===
 
-See xref:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.
+See xref:ai-gateway:configure-provider.adoc[Configure an LLM provider] for the full form reference for each type.
 
 == When to use AI Gateway
 
@@ -116,5 +116,5 @@ AI Gateway does not provide these capabilities. For current status, consult the
 
 == Next steps
 
-. xref:configure-provider.adoc[Configure an LLM provider]
-. xref:connect-agent.adoc[Connect your agent]
+. xref:ai-gateway:configure-provider.adoc[Configure an LLM provider]
+. xref:ai-gateway:connect-agent.adoc[Connect your agent]
@@ -61,7 +61,7 @@ Some guardrail evaluators call an LLM to do their work. A toxicity classifier, f
 
 Guardrail evaluator cost surfaces in the same spending pipeline as user-facing LLM calls. The evaluator's cost is attributed to the *evaluator's configured upstream provider* — usually a small classifier model, separate from the user-facing LLM — so per-provider breakdowns separate the two automatically.
 
-For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails.adoc[Configure guardrails].
+For the per-evaluator cost model and how it interacts with the dashboard's spend view, see xref:governance:guardrails/index.adoc[Configure guardrails].
 
 // TODO: confirm with eng that guardrail evaluator cost flows into the same SpendingService as user-facing LLM cost (vs. a separate stream). Open Q A3 in the companion plan, also flagged on the Guardrails plan.
 
@@ -87,7 +87,7 @@ Cap-management arrives after GA per the Governance V0 PRD. The planned feature s
 * *Alert hooks* — webhook, email, or chat notifications when a cap is approached or exceeded.
 * *Multi-tenant cap-setting* — per-tenant caps with override semantics.
 
-Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails.adoc[Configure guardrails]) for selective request blocking.
+Until those features ship, treat the dashboard and breakdown queries as your visibility layer and use platform-level guardrails (xref:governance:guardrails/index.adoc[Configure guardrails]) for selective request blocking.
 
 // TODO: once the cap-management surface lands, replace this section with a forward link to the configuration how-to. If cap-management content grows beyond a single section, split this page into a sub-folder. Open Q C1 in the companion plan.