Skip to content

Commit 6a94484

Browse files
committed
Merge branch 'DOC-1867-Document-feature-AI-Gateway-help-cloud-team-polish-clean-up' into adp-pkg1
# Conflicts: # modules/ai-agents/pages/mcp/remote/tool-patterns.adoc
2 parents 8eddc98 + 7a8a65c commit 6a94484

30 files changed

Lines changed: 974 additions & 1356 deletions

modules/ROOT/nav.adoc

Lines changed: 16 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -64,39 +64,35 @@
6464
*** xref:ai-agents:observability/ingest-custom-traces.adoc[Ingest Traces from Custom Agents]
6565
** xref:ai-agents:ai-gateway/index.adoc[AI Gateway]
6666
*** xref:ai-agents:ai-gateway/what-is-ai-gateway.adoc[Overview]
67-
*** xref:ai-agents:ai-gateway/gateway-modes.adoc[Gateway Modes]
6867
*** xref:ai-agents:ai-gateway/gateway-quickstart.adoc[Quickstart]
6968
*** xref:ai-agents:ai-gateway/gateway-architecture.adoc[Architecture]
7069
*** For Administrators
7170
**** xref:ai-agents:ai-gateway/admin/setup-guide.adoc[Setup Guide]
72-
**** xref:ai-agents:ai-gateway/admin/configure-ai-hub.adoc[Configure AI Hub Gateway]
73-
**** xref:ai-agents:ai-gateway/admin/eject-to-custom-mode.adoc[Eject to Custom Mode]
7471
*** For Builders
7572
**** xref:ai-agents:ai-gateway/builders/discover-gateways.adoc[Discover Gateways]
76-
**** xref:ai-agents:ai-gateway/builders/use-ai-hub-gateway.adoc[Use AI Hub Gateway]
7773
**** xref:ai-agents:ai-gateway/builders/connect-your-agent.adoc[Connect Your Agent]
7874
**** xref:ai-agents:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Patterns]
7975
**** xref:ai-agents:ai-gateway/mcp-aggregation-guide.adoc[MCP Aggregation]
8076
//*** Observability
8177
//**** xref:ai-agents:ai-gateway/observability-logs.adoc[Request Logs]
8278
//**** xref:ai-agents:ai-gateway/observability-metrics.adoc[Metrics and Analytics]
8379
//*** xref:ai-agents:ai-gateway/migration-guide.adoc[Migrate]
84-
*** xref:ai-agents:ai-gateway/integrations/index.adoc[Integrations]
85-
**** Claude Code
86-
***** xref:ai-agents:ai-gateway/integrations/claude-code-admin.adoc[Admin Guide]
87-
***** xref:ai-agents:ai-gateway/integrations/claude-code-user.adoc[User Guide]
88-
**** Cline
89-
***** xref:ai-agents:ai-gateway/integrations/cline-admin.adoc[Admin Guide]
90-
***** xref:ai-agents:ai-gateway/integrations/cline-user.adoc[User Guide]
91-
**** Continue.dev
92-
***** xref:ai-agents:ai-gateway/integrations/continue-admin.adoc[Admin Guide]
93-
***** xref:ai-agents:ai-gateway/integrations/continue-user.adoc[User Guide]
94-
**** Cursor IDE
95-
***** xref:ai-agents:ai-gateway/integrations/cursor-admin.adoc[Admin Guide]
96-
***** xref:ai-agents:ai-gateway/integrations/cursor-user.adoc[User Guide]
97-
**** GitHub Copilot
98-
***** xref:ai-agents:ai-gateway/integrations/github-copilot-admin.adoc[Admin Guide]
99-
***** xref:ai-agents:ai-gateway/integrations/github-copilot-user.adoc[User Guide]
80+
//*** xref:ai-agents:ai-gateway/integrations/index.adoc[Integrations]
81+
//**** Claude Code
82+
//***** xref:ai-agents:ai-gateway/integrations/claude-code-admin.adoc[Admin Guide]
83+
//***** xref:ai-agents:ai-gateway/integrations/claude-code-user.adoc[User Guide]
84+
//**** Cline
85+
//***** xref:ai-agents:ai-gateway/integrations/cline-admin.adoc[Admin Guide]
86+
//***** xref:ai-agents:ai-gateway/integrations/cline-user.adoc[User Guide]
87+
//**** Continue.dev
88+
//***** xref:ai-agents:ai-gateway/integrations/continue-admin.adoc[Admin Guide]
89+
//***** xref:ai-agents:ai-gateway/integrations/continue-user.adoc[User Guide]
90+
//**** Cursor IDE
91+
//***** xref:ai-agents:ai-gateway/integrations/cursor-admin.adoc[Admin Guide]
92+
//***** xref:ai-agents:ai-gateway/integrations/cursor-user.adoc[User Guide]
93+
//**** GitHub Copilot
94+
//***** xref:ai-agents:ai-gateway/integrations/github-copilot-admin.adoc[Admin Guide]
95+
//***** xref:ai-agents:ai-gateway/integrations/github-copilot-user.adoc[User Guide]
10096
10197
* xref:develop:connect/about.adoc[Redpanda Connect]
10298
** xref:develop:connect/connect-quickstart.adoc[Quickstart]

modules/ai-agents/pages/ai-gateway/admin/setup-guide.adoc

Lines changed: 104 additions & 65 deletions
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@
88

99
include::ai-agents:partial$ai-gateway-byoc-note.adoc[]
1010

11-
This guide walks administrators through the complete setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation.
11+
This guide walks administrators through the setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation.
1212

1313
After completing this guide, you will be able to:
1414

@@ -26,7 +26,7 @@ After completing this guide, you will be able to:
2626

2727
Providers represent upstream services (Anthropic, OpenAI, Google AI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator.
2828

29-
. In the Redpanda Cloud Console, navigate to *AI Gateway* → *Providers*.
29+
. In the Redpanda Cloud Console, navigate to *Agentic AI* → *Providers*.
3030
. Select a provider (for example, Anthropic).
3131
. On the Configuration tab for the provider, click *Add configuration*.
3232
. Enter your API Key for the provider.
@@ -43,17 +43,15 @@ The model catalog is the set of models made available through the gateway. Model
4343

4444
The infrastructure that serves the model differs based on the provider you select. For example, OpenAI has different reliability and availability metrics than Anthropic. When you consider all metrics, you can design your gateway to use different providers for different use cases.
4545

46-
. Navigate to *AI Gateway* → *Models*.
46+
. Navigate to *Agentic AI* → *Models*.
4747
. Review the list of available models from enabled providers.
48-
. For each model you want to expose through gateways, toggle it to *Enabled*.
49-
+
50-
Common models to enable:
48+
. For each model you want to expose through gateways, toggle it to *Enabled*. For example:
5149
+
5250
--
53-
* `openai/gpt-4o` - OpenAI's most capable model
54-
* `openai/gpt-4o-mini` - Cost-effective OpenAI model
55-
* `anthropic/claude-sonnet-3.5` - Balanced Anthropic model
56-
* `anthropic/claude-opus-4` - Anthropic's most capable model
51+
* `openai/gpt-5.2`
52+
* `openai/gpt-5.2-mini`
53+
* `anthropic/claude-sonnet-4.5`
54+
* `anthropic/claude-opus-4.6`
5755
--
5856

5957
. Click *Save changes*.
@@ -62,14 +60,13 @@ Only enabled models will be accessible through gateways. You can enable or disab
6260

6361
=== Model naming convention
6462

65-
Model requests must use the `vendor/model_id` format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider.
66-
67-
Examples:
63+
Model requests must use the `vendor/model_id` format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider. For example:
6864

69-
* `openai/gpt-4o`
70-
* `anthropic/claude-sonnet-3.5`
71-
* `openai/gpt-4o-mini`
65+
* `openai/gpt-5.2`
66+
* `anthropic/claude-sonnet-4.5`
67+
* `openai/gpt-5.2-mini`
7268

69+
ifdef::ai-hub-available[]
7370
== Choose a gateway mode
7471

7572
Before creating a gateway, decide which mode fits your needs.
@@ -102,12 +99,13 @@ For detailed comparison, see xref:ai-gateway/gateway-modes.adoc[].
10299

103100
* *AI Hub Mode*: See xref:ai-gateway/admin/configure-ai-hub.adoc[] for setup instructions
104101
* *Custom Mode*: Continue with "Create a gateway" below for manual configuration
102+
endif::[]
105103

106104
== Create a gateway
107105

108106
A gateway is a logical configuration boundary (policies + routing + observability) on top of a single deployment. It's a "virtual gateway" that you can create per team, environment (staging/production), product, or customer.
109107

110-
. Navigate to *AI Gateway* → *Gateways*.
108+
. Navigate to *Agentic AI* → *Gateways*.
111109
. Click *Create Gateway*.
112110
. Configure the gateway:
113111
+
@@ -126,11 +124,12 @@ TIP: A workspace is conceptually similar to a resource group in Redpanda streami
126124
. After creation, note the following information:
127125
+
128126
--
129-
* *Gateway ID*: Unique identifier (for example, `gw_abc123`) - users include this in the `rp-aigw-id` header
130-
* *Gateway Endpoint*: Base URL for API requests (for example, `https://gw.ai.panda.com`)
127+
* *Gateway endpoint*: URL for API requests (for example, `https://example/gateways/d633lffcc16s73ct95mg/v1`)
128+
+
129+
The gateway ID is embedded in the URL.
131130
--
132131

133-
You'll share the Gateway ID and Endpoint with users who need to access this gateway.
132+
You'll share the gateway endpoint with users who need to access this gateway.
134133

135134
== Configure LLM routing
136135

@@ -188,7 +187,7 @@ Provider pools define which LLM providers handle requests, with support for prim
188187
--
189188
* *Name*: For example, `primary-anthropic`
190189
* *Providers*: Select one or more providers (for example, Anthropic)
191-
* *Models*: Choose which models to include (for example, `anthropic/claude-sonnet-3.5`)
190+
* *Models*: Choose which models to include (for example, `anthropic/claude-sonnet-4.5`)
192191
* *Load balancing*: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.)
193192
--
194193

@@ -197,7 +196,7 @@ Provider pools define which LLM providers handle requests, with support for prim
197196
--
198197
* *Name*: For example, `fallback-openai`
199198
* *Providers*: Select fallback provider (for example, OpenAI)
200-
* *Models*: Choose fallback models (for example, `openai/gpt-4o`)
199+
* *Models*: Choose fallback models (for example, `openai/gpt-5.2`)
201200
* *Trigger conditions*: When to activate fallback:
202201
** Rate limit exceeded (429 from primary)
203202
** Timeout (primary provider slow)
@@ -215,8 +214,8 @@ Example CEL expression for tier-based routing:
215214
[source,cel]
216215
----
217216
request.headers["x-user-tier"] == "premium"
218-
? "anthropic/claude-opus-4"
219-
: "anthropic/claude-sonnet-3.5"
217+
? "anthropic/claude-opus-4.6"
218+
: "anthropic/claude-sonnet-4.5"
220219
----
221220

222221
. Click *Save routing configuration*.
@@ -227,49 +226,78 @@ TIP: Provider pool (UI) = Backend pool (API)
227226

228227
If a provider pool contains multiple providers, you can distribute traffic to balance load or optimize for cost/performance:
229228

230-
* *Round-robin*: Distribute evenly across all providers
231-
* *Weighted*: Assign weights (for example, 80% to Anthropic, 20% to OpenAI)
232-
* *Least latency*: Route to fastest provider based on recent performance
233-
* *Cost-optimized*: Route to cheapest provider for each model
229+
* Round-robin: Distribute evenly across all providers
230+
* Weighted: Assign weights (for example, 80% to Anthropic, 20% to OpenAI)
231+
* Least latency: Route to fastest provider based on recent performance
232+
* Cost-optimized: Route to cheapest provider for each model
234233

235234
== Configure MCP tools (optional)
236235

237236
If your users will build glossterm:AI agent[,AI agents] that need access to glossterm:MCP tool[,tools] via glossterm:MCP[,Model Context Protocol (MCP)], configure MCP tool aggregation.
238237

239238
On the gateway details page, select the *MCP* tab to configure tool discovery and execution. The MCP proxy aggregates multiple glossterm:MCP server[,MCP servers], allowing agents to find and call tools through a single endpoint.
240239

240+
=== Configure MCP rate limits
241+
242+
Rate limits for MCP work the same way as LLM rate limits.
243+
244+
. In the *MCP* tab, locate the *Rate Limit* section.
245+
. Click *Add rate limit*.
246+
. Configure the maximum requests per second and optional burst allowance.
247+
. Click *Save*.
248+
241249
=== Add MCP servers
242250

243-
. In the *MCP* tab, click *Add MCP server*.
251+
. In the *MCP* tab, click *Create MCP Server*.
244252
. Configure the server:
245253
+
246254
--
247-
* *Server name*: Human-readable identifier (for example, `database-server`, `slack-server`)
248-
* *Server URL*: Endpoint for the MCP server (for example, `https://mcp-database.example.com`)
249-
* *Authentication*: Configure authentication if required (bearer token, API key, mTLS)
250-
* *Enabled tools*: Select which tools from this server to expose (or *All tools*)
255+
* *Server ID*: Unique identifier for this server
256+
* *Display Name*: Human-readable name (for example, `database-server`, `slack-server`)
257+
* *Server Address*: Endpoint URL for the MCP server (for example, `https://mcp-database.example.com`)
251258
--
252259

253-
. Click *Test connection* to verify connectivity.
254-
. Click *Save* to add the server to this gateway.
260+
. Configure server settings:
261+
+
262+
--
263+
* *Timeout (seconds)*: Maximum time to wait for a response from this server
264+
* *Enabled*: Whether this server is active and accepting requests
265+
* *Defer Loading Override*: Controls whether tools from this server are loaded upfront or on demand
266+
+
267+
[cols="1,2"]
268+
|===
269+
|Option |Description
255270

256-
Repeat for each MCP server you want to aggregate.
271+
|Inherit from gateway
272+
|Use the gateway-level deferred loading setting (default)
257273

258-
=== Configure deferred tool loading
274+
|Enabled
275+
|Always defer loading from this server. Agents receive only a search tool initially and query for specific tools when needed. This can reduce token usage by 80-90%.
259276

260-
Deferred tool loading dramatically reduces token costs by initially exposing only a search tool and orchestrator, rather than listing all available tools.
277+
|Disabled
278+
|Always load all tools from this server upfront.
279+
|===
261280

262-
. In the *MCP* tab, locate *Deferred Loading*.
263-
. Toggle *Enable deferred tool loading* to *On*.
264-
. Configure behavior:
281+
* *Forward OIDC Token Override*: Controls whether the client's OIDC token is forwarded to this MCP server
265282
+
266-
--
267-
* *Initially expose*: Search tool + orchestrator only
268-
* *Load on demand*: Tools are retrieved when agents query for them
269-
* *Token savings*: Expect 80-90% reduction in token usage for tool definitions
283+
[cols="1,2"]
284+
|===
285+
|Option |Description
286+
287+
|Inherit from gateway
288+
|Use the gateway-level OIDC forwarding setting (default)
289+
290+
|Enabled
291+
|Always forward the OIDC token to this server
292+
293+
|Disabled
294+
|Never forward the OIDC token to this server
295+
|===
270296
--
271297

272-
. Click *Save*.
298+
. Click *Save* to add the server to this gateway.
299+
300+
Repeat for each MCP server you want to aggregate.
273301

274302
See xref:ai-gateway/mcp-aggregation-guide.adoc[] for detailed information about MCP aggregation.
275303

@@ -279,11 +307,24 @@ The MCP orchestrator is a built-in MCP server that enables programmatic tool cal
279307

280308
Example: A workflow requiring 47 file reads can be reduced from 49 round trips to just 1 round trip using the orchestrator.
281309

282-
The orchestrator is enabled by default when you enable MCP tools. You can configure:
310+
The orchestrator is pre-configured when you initialize the MCP gateway. Its server configuration (Server ID, Display Name, Transport, Command, and Timeout) is system-managed and cannot be modified.
283311

284-
* *Execution timeout*: Maximum time for orchestrator workflows (for example, 30 seconds)
285-
* *Memory limit*: Maximum memory for JavaScript execution (for example, 128MB)
286-
* *Allowed operations*: Restrict which MCP tools can be called from orchestrator workflows
312+
You can configure blocked tool patterns to prevent specific tools from being called through the orchestrator:
313+
314+
. In the *MCP* tab, select the orchestrator server to edit it.
315+
. Under *Blocked Tools*, click *Add Pattern* to add glob patterns for tools that should be blocked from execution.
316+
+
317+
Example patterns:
318+
+
319+
--
320+
* `server_id:*` - Block all tools from a specific server
321+
* `*:dangerous_tool` - Block a specific tool across all servers
322+
* `specific:tool` - Block a single tool on a specific server
323+
--
324+
+
325+
NOTE: The orchestrator's own tools are blocked by default to prevent recursive execution.
326+
327+
. Click *Save*.
287328

288329
== Verify your setup
289330

@@ -293,9 +334,8 @@ After completing the setup, verify that the gateway is working correctly:
293334

294335
[source,bash]
295336
----
296-
curl https://{GATEWAY_ENDPOINT}/v1/models \
297-
-H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
298-
-H "rp-aigw-id: ${GATEWAY_ID}"
337+
curl ${GATEWAY_ENDPOINT}/models \
338+
-H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}"
299339
----
300340

301341
Expected result: List of enabled models.
@@ -304,37 +344,34 @@ Expected result: List of enabled models.
304344

305345
[source,bash]
306346
----
307-
curl https://{GATEWAY_ENDPOINT}/v1/chat/completions \
347+
curl ${GATEWAY_ENDPOINT}/chat/completions \
308348
-H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
309-
-H "rp-aigw-id: ${GATEWAY_ID}" \
310349
-H "Content-Type: application/json" \
311350
-d '{
312-
"model": "openai/gpt-4o-mini",
351+
"model": "openai/gpt-5.2-mini",
313352
"messages": [{"role": "user", "content": "Hello, AI Gateway!"}],
314353
"max_tokens": 50
315354
}'
316355
----
317356

318357
Expected result: Successful completion response.
319358

320-
=== Check observability
359+
=== Check the gateway overview
321360

322-
. Navigate to *AI Gateway* → *Gateways* → Select your gateway → *Analytics*.
323-
. Verify that your test request appears in the request logs.
324-
. Check metrics:
361+
. Navigate to *Gateways* → Select your gateway → *Overview*.
362+
. Check the aggregate metrics to verify your test request was processed:
325363
+
326364
--
327-
* Request count: Should show your test request
328-
* Token usage: Should show tokens consumed
329-
* Estimated cost: Should show calculated cost
365+
* Total Requests: Should have incremented
366+
* Total Tokens: Should show tokens consumed
367+
* Total Cost: Should show estimated cost
330368
--
331369

332370
== Share access with users
333371

334372
Now that your gateway is configured, share access with users (builders):
335373

336-
. Provide the *Gateway ID* (for example, `gw_abc123`)
337-
. Provide the *Gateway Endpoint* (for example, `https://gw.ai.panda.com`)
374+
. Provide the *Gateway Endpoint* (for example, `https://example/gateways/gw_abc123/v1`)
338375
. Share API credentials (Redpanda Cloud tokens with appropriate permissions)
339376
. (Optional) Document available models and any routing policies
340377
. (Optional) Share rate limits and budget information
@@ -352,6 +389,8 @@ Users can then discover and connect to the gateway using the information provide
352389
//*Monitor and observe:*
353390
//
354391

392+
ifdef::integrations-available[]
355393
*Integrate tools:*
356394

357395
* xref:ai-gateway/integrations/index.adoc[Integrations] - Admin guides for Claude Code, Cursor, and other tools
396+
endif::[]

0 commit comments

Comments
 (0)