Skip to content

Commit 83ca90e

Browse files
committed
Merge branch 'DOC-1867-Document-feature-AI-Gateway-help-cloud-team-polish-clean-up' into adp-pkg1
# Conflicts: # modules/ROOT/nav.adoc
2 parents 5c20723 + 44c4528 commit 83ca90e

26 files changed

Lines changed: 14173 additions & 9 deletions

modules/ROOT/nav.adoc

Lines changed: 37 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,37 @@
4141
**** xref:ai-agents:agents/integration-overview.adoc[Integration Patterns]
4242
**** xref:ai-agents:agents/pipeline-integration-patterns.adoc[Pipeline to Agent]
4343
**** xref:ai-agents:agents/a2a-concepts.adoc[A2A Protocol]
44+
** xref:ai-agents:ai-gateway/index.adoc[AI Gateway]
45+
*** xref:ai-agents:ai-gateway/what-is-ai-gateway.adoc[Overview]
46+
*** xref:ai-agents:ai-gateway/gateway-quickstart.adoc[Quickstart]
47+
*** xref:ai-agents:ai-gateway/gateway-architecture.adoc[Architecture]
48+
*** For Administrators
49+
**** xref:ai-agents:ai-gateway/admin/setup-guide.adoc[Setup Guide]
50+
*** For Builders
51+
**** xref:ai-agents:ai-gateway/builders/discover-gateways.adoc[Discover Gateways]
52+
**** xref:ai-agents:ai-gateway/builders/connect-your-agent.adoc[Connect Your Agent]
53+
**** xref:ai-agents:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Patterns]
54+
**** xref:ai-agents:ai-gateway/mcp-aggregation-guide.adoc[MCP Aggregation]
55+
*** Observability
56+
**** xref:ai-agents:ai-gateway/observability-logs.adoc[Request Logs]
57+
**** xref:ai-agents:ai-gateway/observability-metrics.adoc[Metrics and Analytics]
58+
*** xref:ai-agents:ai-gateway/migration-guide.adoc[Migrate]
59+
*** xref:ai-agents:ai-gateway/integrations/index.adoc[Integrations]
60+
**** Claude Code
61+
***** xref:ai-agents:ai-gateway/integrations/claude-code-admin.adoc[Admin Guide]
62+
***** xref:ai-agents:ai-gateway/integrations/claude-code-user.adoc[User Guide]
63+
**** Cline
64+
***** xref:ai-agents:ai-gateway/integrations/cline-admin.adoc[Admin Guide]
65+
***** xref:ai-agents:ai-gateway/integrations/cline-user.adoc[User Guide]
66+
**** Continue.dev
67+
***** xref:ai-agents:ai-gateway/integrations/continue-admin.adoc[Admin Guide]
68+
***** xref:ai-agents:ai-gateway/integrations/continue-user.adoc[User Guide]
69+
**** Cursor IDE
70+
***** xref:ai-agents:ai-gateway/integrations/cursor-admin.adoc[Admin Guide]
71+
***** xref:ai-agents:ai-gateway/integrations/cursor-user.adoc[User Guide]
72+
**** GitHub Copilot
73+
***** xref:ai-agents:ai-gateway/integrations/github-copilot-admin.adoc[Admin Guide]
74+
***** xref:ai-agents:ai-gateway/integrations/github-copilot-user.adoc[User Guide]
4475
** xref:ai-agents:mcp/index.adoc[MCP]
4576
*** xref:ai-agents:mcp/overview.adoc[Overview]
4677
*** xref:ai-agents:mcp/remote/index.adoc[Remote MCP]
@@ -50,10 +81,13 @@
5081
**** xref:ai-agents:mcp/remote/create-tool.adoc[Create a Tool]
5182
**** xref:ai-agents:mcp/remote/best-practices.adoc[Best Practices]
5283
**** xref:ai-agents:mcp/remote/tool-patterns.adoc[Tool Patterns]
53-
**** xref:ai-agents:mcp/remote/troubleshooting.adoc[Troubleshoot]
54-
**** xref:ai-agents:mcp/remote/manage-servers.adoc[Manage Servers]
84+
**** xref:ai-agents:mcp/remote/troubleshooting.adoc[Troubleshooting]
85+
**** xref:ai-agents:mcp/remote/admin-guide.adoc[Admin Guide]
86+
***** xref:ai-agents:mcp/remote/manage-servers.adoc[Manage Servers]
5587
**** xref:ai-agents:mcp/remote/monitor-mcp-servers.adoc[Monitor MCP Servers]
56-
**** xref:ai-agents:mcp/remote/scale-resources.adoc[Scale Resources]
88+
***** xref:ai-agents:mcp/remote/scale-resources.adoc[Scale Resources]
89+
***** xref:ai-agents:mcp/remote/monitor-activity.adoc[Monitor Activity]
90+
**** xref:ai-agents:mcp/remote/pipeline-patterns.adoc[MCP Server Patterns]
5791
*** xref:ai-agents:mcp/local/index.adoc[Redpanda Cloud Management MCP Server]
5892
**** xref:ai-agents:mcp/local/overview.adoc[Overview]
5993
**** xref:ai-agents:mcp/local/quickstart.adoc[Quickstart]
Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
= AI Gateway Setup Guide
2+
:description: Complete setup guide for administrators to enable providers, configure models, create gateways, and set up routing policies.
3+
:page-topic-type: how-to
4+
:personas: platform_admin
5+
:learning-objective-1: Enable LLM providers and models in the catalog
6+
:learning-objective-2: Create and configure gateways with routing policies, rate limits, and spend limits
7+
:learning-objective-3: Set up MCP tool aggregation for AI agents
8+
9+
include::ai-agents:partial$ai-gateway-byoc-note.adoc[]
10+
11+
This guide walks administrators through the complete setup process for AI Gateway, from enabling LLM providers to configuring routing policies and MCP tool aggregation.
12+
13+
After completing this guide, you will be able to:
14+
15+
* [ ] Enable LLM providers and models in the catalog
16+
* [ ] Create and configure gateways with routing policies, rate limits, and spend limits
17+
* [ ] Set up MCP tool aggregation for AI agents
18+
19+
== Prerequisites
20+
21+
* Access to the Redpanda Cloud Console with administrator privileges
22+
* API keys for at least one LLM provider (OpenAI or Anthropic)
23+
* (Optional) MCP server endpoints if you plan to use tool aggregation
24+
25+
== Enable a provider
26+
27+
Providers represent upstream services (Anthropic, OpenAI) and associated credentials. Providers are disabled by default and must be enabled explicitly by an administrator.
28+
29+
. In the Redpanda Cloud Console, navigate to *AI Gateway* → *Providers*.
30+
. Select a provider (for example, Anthropic or OpenAI).
31+
. On the *Configuration* tab for the provider, click *Add configuration*.
32+
. Enter your API Key for the provider.
33+
+
34+
TIP: Store provider API keys securely. Each provider configuration can have multiple API keys for rotation and redundancy.
35+
36+
. Click *Save* to enable the provider.
37+
38+
Repeat this process for each LLM provider you want to make available through AI Gateway.
39+
40+
== Enable models
41+
42+
The model catalog is the set of models made available through the gateway. Models are disabled by default. After enabling a provider, you can enable its models.
43+
44+
The infrastructure that serves the model differs based on the provider you select. For example, OpenAI has different reliability and availability metrics than Anthropic. When you consider all metrics, you can design your gateway to use different providers for different use cases.
45+
46+
. Navigate to *AI Gateway* → *Models*.
47+
. Review the list of available models from enabled providers.
48+
. For each model you want to expose through gateways, toggle it to *Enabled*.
49+
+
50+
Common models to enable:
51+
+
52+
--
53+
* `openai/gpt-4o` - OpenAI's most capable model
54+
* `openai/gpt-4o-mini` - Cost-effective OpenAI model
55+
* `anthropic/claude-sonnet-3.5` - Balanced Anthropic model
56+
* `anthropic/claude-opus-4` - Anthropic's most capable model
57+
--
58+
59+
. Click *Save changes*.
60+
61+
Only enabled models will be accessible through gateways. You can enable or disable models at any time without affecting existing gateways.
62+
63+
=== Model naming convention
64+
65+
Model requests must use the `vendor/model_id` format in the model property of the request body. This format allows AI Gateway to route requests to the appropriate provider.
66+
67+
Examples:
68+
69+
* `openai/gpt-4o`
70+
* `anthropic/claude-sonnet-3.5`
71+
* `openai/gpt-4o-mini`
72+
73+
== Create a gateway
74+
75+
A gateway is a logical configuration boundary (policies + routing + observability) on top of a single deployment. It's a "virtual gateway" that you can create per team, environment (staging/production), product, or customer.
76+
77+
. Navigate to *AI Gateway* → *Gateways*.
78+
. Click *Create Gateway*.
79+
. Configure the gateway:
80+
+
81+
--
82+
* *Name*: Choose a descriptive name (for example, `production-gateway`, `team-ml-gateway`, `staging-gateway`)
83+
* *Workspace*: Select the workspace this gateway belongs to
84+
+
85+
TIP: A workspace is conceptually similar to a resource group in Redpanda streaming.
86+
+
87+
* *Description* (optional): Add context about this gateway's purpose
88+
* *Tags* (optional): Add metadata for organization and filtering
89+
--
90+
91+
. Click *Create*.
92+
93+
. After creation, note the following information:
94+
+
95+
--
96+
* *Gateway ID*: Unique identifier (for example, `gw_abc123`) - users include this in the `rp-aigw-id` header
97+
* *Gateway Endpoint*: Base URL for API requests (for example, `https://gw.ai.panda.com`)
98+
--
99+
100+
You'll share the Gateway ID and Endpoint with users who need to access this gateway.
101+
102+
== Configure LLM routing
103+
104+
On the gateway details page, select the *LLM* tab to configure rate limits, spend limits, routing, and provider pools with fallback options.
105+
106+
The LLM routing pipeline visually represents the request lifecycle:
107+
108+
. *Rate Limit*: Global rate limit (for example, 100 requests/second)
109+
. *Spend Limit / Monthly Budget*: Monthly budget with blocking enforcement (for example, $15K/month)
110+
. *Routing*: Primary provider pool with optional fallback provider pools
111+
112+
=== Configure rate limits
113+
114+
Rate limits control how many requests can be processed within a time window.
115+
116+
. In the *LLM* tab, locate the *Rate Limit* section.
117+
. Click *Add rate limit*.
118+
. Configure the limit:
119+
+
120+
--
121+
* *Requests per second*: Maximum requests per second (for example, `100`)
122+
* *Burst allowance* (optional): Allow temporary bursts above the limit
123+
--
124+
125+
. Click *Save*.
126+
127+
Rate limits apply to all requests through this gateway, regardless of model or provider.
128+
129+
=== Configure spend limits and budgets
130+
131+
Spend limits prevent runaway costs by blocking requests after a monthly budget is exceeded.
132+
133+
. In the *LLM* tab, locate the *Spend Limit* section.
134+
. Click *Configure budget*.
135+
. Set the budget:
136+
+
137+
--
138+
* *Monthly budget*: Maximum spend per month (for example, `$15000`)
139+
* *Enforcement*: Choose *Block* to reject requests after the budget is exceeded, or *Alert* to notify but allow requests
140+
* *Notification threshold* (optional): Alert when X% of budget is consumed (for example, `80%`)
141+
--
142+
143+
. Click *Save*.
144+
145+
Budget tracking uses estimated costs based on token usage and public provider pricing.
146+
147+
=== Configure routing and provider pools
148+
149+
Provider pools define which LLM providers handle requests, with support for primary and fallback configurations.
150+
151+
. In the *LLM* tab, locate the *Routing* section.
152+
. Click *Add provider pool*.
153+
. Configure the primary pool:
154+
+
155+
--
156+
* *Name*: For example, `primary-anthropic`
157+
* *Providers*: Select one or more providers (for example, Anthropic)
158+
* *Models*: Choose which models to include (for example, `anthropic/claude-sonnet-3.5`)
159+
* *Load balancing*: If multiple providers are selected, choose distribution strategy (round-robin, weighted, etc.)
160+
--
161+
162+
. (Optional) Click *Add fallback pool* to configure automatic failover:
163+
+
164+
--
165+
* *Name*: For example, `fallback-openai`
166+
* *Providers*: Select fallback provider (for example, OpenAI)
167+
* *Models*: Choose fallback models (for example, `openai/gpt-4o`)
168+
* *Trigger conditions*: When to activate fallback:
169+
** Rate limit exceeded (429 from primary)
170+
** Timeout (primary provider slow)
171+
** Server errors (5xx from primary)
172+
--
173+
174+
. Configure routing rules using CEL expressions (optional):
175+
+
176+
For simple routing, select *Route all requests to primary pool*.
177+
+
178+
For advanced routing based on request properties, use CEL expressions. See xref:ai-gateway/cel-routing-cookbook.adoc[] for examples.
179+
+
180+
Example CEL expression for tier-based routing:
181+
+
182+
[source,cel]
183+
----
184+
request.headers["x-user-tier"] == "premium"
185+
? "anthropic/claude-opus-4"
186+
: "anthropic/claude-sonnet-3.5"
187+
----
188+
189+
. Click *Save routing configuration*.
190+
191+
TIP: Provider pool (UI) = Backend pool (API)
192+
193+
=== Load balancing and multi-provider distribution
194+
195+
If a provider pool contains multiple providers, you can distribute traffic to balance load or optimize for cost/performance:
196+
197+
* *Round-robin*: Distribute evenly across all providers
198+
* *Weighted*: Assign weights (for example, 80% to Anthropic, 20% to OpenAI)
199+
* *Least latency*: Route to fastest provider based on recent performance
200+
* *Cost-optimized*: Route to cheapest provider for each model
201+
202+
== Configure MCP tools (optional)
203+
204+
If your users will build AI agents that need access to tools via MCP (Model Context Protocol), configure MCP tool aggregation.
205+
206+
On the gateway details page, select the *MCP* tab to configure tool discovery and execution. The MCP proxy aggregates multiple MCP servers, allowing agents to find and call tools through a single endpoint.
207+
208+
=== Add MCP servers
209+
210+
. In the *MCP* tab, click *Add MCP server*.
211+
. Configure the server:
212+
+
213+
--
214+
* *Server name*: Human-readable identifier (for example, `database-server`, `slack-server`)
215+
* *Server URL*: Endpoint for the MCP server (for example, `https://mcp-database.example.com`)
216+
* *Authentication*: Configure authentication if required (bearer token, API key, mTLS)
217+
* *Enabled tools*: Select which tools from this server to expose (or *All tools*)
218+
--
219+
220+
. Click *Test connection* to verify connectivity.
221+
. Click *Save* to add the server to this gateway.
222+
223+
Repeat for each MCP server you want to aggregate.
224+
225+
=== Configure deferred tool loading
226+
227+
Deferred tool loading dramatically reduces token costs by initially exposing only a search tool and orchestrator, rather than listing all available tools.
228+
229+
. In the *MCP* tab, locate *Deferred Loading*.
230+
. Toggle *Enable deferred tool loading* to *On*.
231+
. Configure behavior:
232+
+
233+
--
234+
* *Initially expose*: Search tool + orchestrator only
235+
* *Load on demand*: Tools are retrieved when agents query for them
236+
* *Token savings*: Expect 80-90% reduction in token usage for tool definitions
237+
--
238+
239+
. Click *Save*.
240+
241+
See xref:ai-gateway/mcp-aggregation-guide.adoc[] for detailed information about MCP aggregation.
242+
243+
=== Configure the MCP orchestrator
244+
245+
The MCP orchestrator is a built-in MCP server that enables programmatic tool calling. Agents can generate JavaScript code to call multiple tools in a single orchestrated step, reducing the number of round trips.
246+
247+
Example: A workflow requiring 47 file reads can be reduced from 49 round trips to just 1 round trip using the orchestrator.
248+
249+
The orchestrator is enabled by default when you enable MCP tools. You can configure:
250+
251+
* *Execution timeout*: Maximum time for orchestrator workflows (for example, 30 seconds)
252+
* *Memory limit*: Maximum memory for JavaScript execution (for example, 128MB)
253+
* *Allowed operations*: Restrict which MCP tools can be called from orchestrator workflows
254+
255+
== Verify your setup
256+
257+
After completing the setup, verify that the gateway is working correctly:
258+
259+
=== Test the gateway endpoint
260+
261+
[source,bash]
262+
----
263+
curl https://{GATEWAY_ENDPOINT}/v1/models \
264+
-H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
265+
-H "rp-aigw-id: ${GATEWAY_ID}"
266+
----
267+
268+
Expected result: List of enabled models.
269+
270+
=== Send a test request
271+
272+
[source,bash]
273+
----
274+
curl https://{GATEWAY_ENDPOINT}/v1/chat/completions \
275+
-H "Authorization: Bearer ${REDPANDA_CLOUD_TOKEN}" \
276+
-H "rp-aigw-id: ${GATEWAY_ID}" \
277+
-H "Content-Type: application/json" \
278+
-d '{
279+
"model": "openai/gpt-4o-mini",
280+
"messages": [{"role": "user", "content": "Hello, AI Gateway!"}],
281+
"max_tokens": 50
282+
}'
283+
----
284+
285+
Expected result: Successful completion response.
286+
287+
=== Check observability
288+
289+
. Navigate to *AI Gateway* → *Gateways* → Select your gateway → *Analytics*.
290+
. Verify that your test request appears in the request logs.
291+
. Check metrics:
292+
+
293+
--
294+
* Request count: Should show your test request
295+
* Token usage: Should show tokens consumed
296+
* Estimated cost: Should show calculated cost
297+
--
298+
299+
== Share access with users
300+
301+
Now that your gateway is configured, share access with users (builders):
302+
303+
. Provide the *Gateway ID* (for example, `gw_abc123`)
304+
. Provide the *Gateway Endpoint* (for example, `https://gw.ai.panda.com`)
305+
. Share API credentials (Redpanda Cloud tokens with appropriate permissions)
306+
. (Optional) Document available models and any routing policies
307+
. (Optional) Share rate limits and budget information
308+
309+
Users can then discover and connect to the gateway using the information provided. See xref:ai-gateway/builders/discover-gateways.adoc[] for user documentation.
310+
311+
== Next steps
312+
313+
*Configure and optimize:*
314+
315+
// * xref:ai-gateway/admin/manage-gateways.adoc[Manage Gateways] - List, edit, and delete gateways
316+
* xref:ai-gateway/cel-routing-cookbook.adoc[CEL Routing Cookbook] - Advanced routing patterns
317+
// * xref:ai-gateway/admin/networking-configuration.adoc[Networking Configuration] - Configure private endpoints and connectivity
318+
319+
*Monitor and observe:*
320+
321+
* xref:ai-gateway/observability-metrics.adoc[Monitor Usage] - Track costs and usage across all gateways
322+
* xref:ai-gateway/observability-logs.adoc[Request Logs] - View and filter request logs
323+
324+
*Integrate tools:*
325+
326+
* xref:ai-gateway/integrations/index.adoc[Integrations] - Admin guides for Claude Code, Cursor, and other tools

0 commit comments

Comments
 (0)