ai-agents:partial$adp-la.adoc
Configure Redpanda AI Gateway to support Continue.dev clients accessing multiple LLM providers and MCP tools through flexible, native-format endpoints.
After reading this page, you will be able to:
-
❏ Configure AI Gateway endpoints for Continue.dev connectivity.
-
❏ Set up multi-provider backends with native format routing.
-
❏ Deploy MCP tool aggregation for Continue.dev tool discovery.
-
AI Gateway deployed on a BYOC cluster running Redpanda version 25.3 or later
-
Administrator access to the AI Gateway UI
-
API keys for at least one LLM provider (Anthropic, OpenAI, or others)
-
Understanding of AI Gateway concepts
Continue.dev is a highly configurable open-source AI coding assistant that integrates with VS Code and JetBrains IDEs. Unlike other AI assistants, Continue.dev uses native provider API formats rather than requiring transforms to a unified format. This architectural choice provides maximum flexibility but requires specific gateway configuration.
Key characteristics:
-
Uses native provider formats (Anthropic format for Anthropic, OpenAI format for OpenAI)
-
Supports multiple LLM providers simultaneously with per-provider configuration
-
Custom API endpoints via
apiBaseconfiguration -
Custom headers via
requestOptions.headers -
Built-in MCP support for tool discovery and execution
-
Autocomplete, chat, and inline edit modes
Continue.dev connects to AI Gateway differently than unified-format clients:
-
Each provider requires a separate backend configured without format transforms
-
LLM endpoint:
https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1/{provider}(provider-specific paths) -
MCP endpoint:
https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/mcpfor tool discovery and execution
The gateway handles:
-
Authentication via bearer tokens in the
Authorizationheader -
Provider-specific request formats without transformation
-
Model routing using provider-native model identifiers
-
MCP server aggregation for multi-tool workflows
-
Request logging and cost tracking per gateway
Continue.dev works with multiple providers. Enable the providers your users will access.
To enable Anthropic with native format support:
-
Navigate to AI Gateway > Providers in the Redpanda Cloud console
-
Select Anthropic from the provider list
-
Click Add configuration
-
Enter your Anthropic API key
-
Under Format, select Native Anthropic (not OpenAI-compatible)
-
Click Save
The gateway now accepts Anthropic’s native /v1/messages format.
To enable OpenAI:
-
Navigate to AI Gateway > Providers
-
Select OpenAI from the provider list
-
Click Add configuration
-
Enter your OpenAI API key
-
Under Format, select Native OpenAI
-
Click Save
Continue.dev supports many providers. For each provider:
-
Add the provider configuration in the gateway
-
Ensure the format is set to the provider’s native format
-
Do not enable format transforms (Continue.dev handles format differences in its client code)
Common additional providers:
-
Google Gemini (native Google format)
-
Mistral AI (OpenAI-compatible format)
-
Together AI (OpenAI-compatible format)
-
Ollama (OpenAI-compatible format for local models)
After enabling providers, enable specific models:
-
Navigate to AI Gateway > Models
-
Enable the models you want Continue.dev clients to access
Common models for Continue.dev:
-
claude-opus-4.6(Anthropic, high quality) -
claude-sonnet-4.5(Anthropic, balanced) -
gpt-5.2(OpenAI, high quality) -
gpt-5.2-mini(OpenAI, fast autocomplete) -
o1-mini(OpenAI, reasoning)
-
-
Click Save
Continue.dev uses provider-native model identifiers (for example, claude-sonnet-4.5 not anthropic/claude-sonnet-4.5).
Create a dedicated gateway to isolate Continue.dev traffic and apply specific policies.
-
Navigate to Agentic > AI Gateway > Routers
-
Click Create Gateway
-
Enter gateway details:
Field Value Name
continue-gateway(or your preferred name)Workspace
Select the workspace for access control grouping
Description
Gateway for Continue.dev IDE clients
-
Click Create
-
Copy the gateway endpoint URL from the gateway details page
Continue.dev requires separate backend configurations for each provider because it uses native formats.
-
Navigate to the gateway’s Backends tab
-
Click Add Backend
-
Configure:
Field Value Backend name
anthropic-nativeProvider
Anthropic
Format
Native Anthropic (no transform)
Path
/v1/anthropicEnabled models
All Anthropic models you enabled in the catalog
-
Click Save
Continue.dev will send requests to https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1/anthropic using Anthropic’s native format.
-
Click Add Backend
-
Configure:
Field Value Backend name
openai-nativeProvider
OpenAI
Format
Native OpenAI (no transform)
Path
/v1/openaiEnabled models
All OpenAI models you enabled in the catalog
-
Click Save
Continue.dev will send requests to https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1/openai using OpenAI’s native format.
Set up routing policies for Continue.dev requests.
Configure routing rules that apply to each backend:
-
Navigate to the gateway’s Routing tab
-
For each backend, click Add Route
-
Configure basic routing:
true # Matches all requests to this backend -
Add a primary provider configuration with your Anthropic API key
-
(Optional) Add a fallback configuration for redundancy if you have multiple API keys
-
Click Save
For providers with multiple API keys, configure failover:
-
In the backend’s routing configuration, add multiple provider configurations
-
Set failover conditions:
-
Rate limits (HTTP 429)
-
Timeouts (no response within 30 seconds)
-
5xx errors (provider unavailable)
-
-
Configure load balancing: Round robin across available keys
-
Click Save
Continue.dev requests automatically fail over to healthy API keys when the primary key experiences issues.
Prevent runaway usage from Continue.dev clients:
-
Navigate to the gateway’s Rate Limits tab
-
Configure global limits:
Setting Recommended Value Global rate limit
200 requests per minute (Continue.dev autocomplete can generate many requests)
Per-user rate limit
20 requests per minute (if using user identification headers)
Per-backend limits
Vary by provider (autocomplete backends need higher limits)
-
Click Save
The gateway blocks requests exceeding these limits and returns HTTP 429 errors.
Continue.dev’s autocomplete feature generates frequent, short requests. Configure higher rate limits for autocomplete-specific backends:
-
Autocomplete models (for example,
gpt-5.2-mini): 100 requests per minute per user -
Chat models (for example,
claude-sonnet-4.5): 20 requests per minute per user
Control LLM costs across all providers:
-
Navigate to the gateway’s Spend Limits tab
-
Configure:
Setting Value Monthly budget
$10,000 (adjust based on expected usage)
Enforcement
Block requests after budget exceeded
Alert threshold
80% of budget (sends notification)
-
Click Save
The gateway tracks estimated costs per request across all providers and blocks traffic when the monthly budget is exhausted.
Enable Continue.dev to discover and use tools from multiple MCP servers through a single endpoint.
-
Navigate to the gateway’s MCP tab
-
Click Add MCP Server
-
Enter server details:
Field Value Display name
Descriptive name (for example,
redpanda-data-catalog,code-search-tools)Endpoint URL
MCP server endpoint (for example, Remote MCP server URL)
Authentication
Bearer token or other authentication mechanism
-
Click Save
Repeat for each MCP server you want to aggregate.
Reduce token costs for Continue.dev sessions with many available tools:
-
Under MCP Settings, enable Deferred tool loading
-
Click Save
When enabled:
-
Continue.dev initially receives only a search tool and orchestrator tool
-
Continue.dev queries for specific tools by name when needed
-
Token usage decreases by 80-90% for configurations with many tools
This is particularly important for Continue.dev because autocomplete and chat modes both use tool discovery.
The MCP orchestrator reduces multi-step workflows to single calls:
-
Under MCP Settings, enable MCP Orchestrator
-
Configure:
Setting Value Orchestrator model
Select a model with strong code generation capabilities (for example,
claude-sonnet-4.5)Execution timeout
30 seconds
Backend
Select the Anthropic backend (orchestrator works best with Claude models)
-
Click Save
Continue.dev can now invoke the orchestrator tool to execute complex, multi-step operations in a single request.
Continue.dev clients authenticate using bearer tokens.
-
Navigate to Security > API Tokens in the Redpanda Cloud console
-
Click Create Token
-
Enter token details:
Field Value Name
continue-accessScopes
ai-gateway:read,ai-gateway:writeExpiration
Set appropriate expiration based on security policies
-
Click Create
-
Copy the token (it appears only once)
Distribute this token to Continue.dev users through secure channels.
Provide these instructions to users configuring Continue.dev in their IDE.
Continue.dev supports both JSON and YAML configuration formats. This guide uses YAML (config.yaml) because it supports MCP server configuration and environment variable interpolation:
-
VS Code:
~/.continue/config.yaml -
JetBrains:
~/.continue/config.yaml
|
Note
|
While config.json is still supported for basic LLM configuration, config.yaml is required for MCP server integration.
|
Users configure Continue.dev with separate provider entries for each backend:
models:
- title: Claude Sonnet (Redpanda)
provider: anthropic
model: claude-sonnet-4.5
apiBase: https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1/anthropic
apiKey: YOUR_API_TOKEN
- title: GPT-5.2 (Redpanda)
provider: openai
model: gpt-5.2
apiBase: https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1/openai
apiKey: YOUR_API_TOKEN
- title: GPT-5.2-mini (Autocomplete)
provider: openai
model: gpt-5.2-mini
apiBase: https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1/openai
apiKey: YOUR_API_TOKEN
tabAutocompleteModel:
title: GPT-5.2-mini (Autocomplete)
provider: openai
model: gpt-5.2-mini
apiBase: https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/v1/openai
apiKey: YOUR_API_TOKENReplace:
-
{CLUSTER_ID}: Your Redpanda cluster ID -
YOUR_API_TOKEN: The API token generated earlier
Configure Continue.dev to connect to the aggregated MCP endpoint.
The preferred method is to create MCP server configuration files in the ~/.continue/mcpServers/ directory:
-
Create the directory:
mkdir -p ~/.continue/mcpServers -
Create
~/.continue/mcpServers/redpanda-ai-gateway.yaml:transport: type: streamable-http url: https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/mcp headers: Authorization: Bearer YOUR_API_TOKEN
ImportantFor production deployments, use environment variable interpolation with ${{ secrets.VARIABLE }}syntax instead of hardcoding tokens. See Configure with environment variables in the user guide for details.
Continue.dev automatically discovers MCP server configurations in this directory.
Alternatively, embed MCP server configuration in ~/.continue/config.yaml:
mcpServers:
- transport:
type: streamable-http
url: https://{CLUSTER_ID}.cloud.redpanda.com/ai-gateway/mcp
headers:
Authorization: Bearer YOUR_API_TOKENReplace:
-
{CLUSTER_ID}: Your Redpanda cluster ID -
YOUR_API_TOKEN: The API token generated earlier
This configuration connects Continue.dev to the aggregated MCP endpoint with authentication headers.
Configure different models for different Continue.dev modes:
| Mode | Recommended Model | Reason |
|---|---|---|
Chat |
|
High quality for complex questions |
Autocomplete |
|
Fast, cost-effective for frequent requests |
Inline edit |
|
Balanced quality and speed for code modifications |
Embeddings |
|
Cost-effective for code search |
Track Continue.dev activity through gateway observability features.
-
Navigate to AI Gateway > Observability > Logs
-
Filter by gateway ID:
continue-gateway -
Review:
-
Request timestamps and duration
-
Backend and model used per request
-
Token usage (prompt and completion tokens)
-
Estimated cost per request
-
HTTP status codes and errors
-
Continue.dev generates different request patterns:
-
Autocomplete: Many short requests with low token counts
-
Chat: Longer requests with context and multi-turn conversations
-
Inline edit: Medium-length requests with code context
-
Navigate to AI Gateway > Observability > Metrics
-
Select the Continue.dev gateway
-
Review:
Metric Purpose Request volume by backend
Identify which providers are most used
Token usage by model
Track consumption patterns (autocomplete vs chat)
Estimated spend by backend
Monitor costs across providers
Latency (p50, p95, p99) by backend
Detect provider-specific performance issues
Error rate by backend
Identify failing providers or misconfigured backends
Programmatically access logs for integration with monitoring systems:
curl https://{CLUSTER_ID}.cloud.redpanda.com/api/ai-gateway/logs \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{
"gateway_id": "GATEWAY_ID",
"start_time": "2026-01-01T00:00:00Z",
"end_time": "2026-01-14T23:59:59Z",
"limit": 100
}'Apply these security best practices for Continue.dev deployments.
Create tokens with minimal required scopes:
-
ai-gateway:read: Required for MCP tool discovery -
ai-gateway:write: Required for LLM requests and tool execution
Avoid granting broader scopes like admin or cluster:write.
If Continue.dev clients connect from known networks, configure network policies:
-
Use cloud provider security groups to restrict access to AI Gateway endpoints
-
Allowlist only the IP ranges where Continue.dev clients operate
-
Monitor for unauthorized access attempts in request logs
Set short token lifetimes for high-security environments:
-
Development environments: 90 days
-
Production environments: 30 days
Automate token rotation to reduce manual overhead.
Review which MCP tools Continue.dev clients can access:
-
Periodically audit the MCP servers configured in the gateway
-
Remove unused or deprecated MCP servers
-
Monitor tool execution logs for unexpected behavior
Common issues and solutions when configuring AI Gateway for Continue.dev.
Symptom: Connection errors when Continue.dev tries to discover tools or send LLM requests.
Causes and solutions:
-
Invalid gateway ID: Verify the gateway endpoint URL matches the URL from the console
-
Expired token: Generate a new API token and update the Continue.dev configuration
-
Wrong backend path: Verify
apiBasematches the backend path (for example,/v1/anthropicnot/v1) -
Network connectivity: Verify the cluster endpoint is accessible from the client network
-
Provider not enabled: Ensure at least one backend is configured with models enabled
Symptom: Continue.dev shows "model not found" or similar errors.
Causes and solutions:
-
Model not enabled in catalog: Enable the model in the gateway’s model catalog
-
Model identifier mismatch: Use provider-native names (for example,
claude-sonnet-4.5notanthropic/claude-sonnet-4.5) -
Wrong backend for model: Verify the model is associated with the correct backend (Anthropic models with Anthropic backend)
Symptom: Responses are malformed or Continue.dev reports format errors.
Causes and solutions:
-
Transform enabled on backend: Ensure backend format is set to native (no OpenAI-compatible transform for Anthropic)
-
Wrong provider for apiBase: Verify Continue.dev’s
providerfield matches the backend’s provider -
Headers not passed: Confirm
requestOptions.headersis correctly configured
Symptom: Autocomplete suggestions don’t appear or are delayed.
Causes and solutions:
-
Wrong model for autocomplete: Use a fast model like
gpt-5.2-miniintabAutocompleteModel -
Rate limits too restrictive: Increase rate limits for autocomplete backend
-
High backend latency: Check backend metrics and consider provider failover
-
Token exhaustion: Verify spending limits haven’t been reached
Symptom: Continue.dev does not discover MCP tools.
Causes and solutions:
-
MCP configuration missing: Ensure
mcpServersis configured -
MCP servers not configured in gateway: Add MCP server endpoints in the gateway’s MCP tab
-
Deferred loading enabled but search failing: Check that the search tool is correctly configured
-
MCP server authentication failing: Verify MCP server authentication credentials in the gateway configuration
Symptom: Token usage and costs exceed expectations.
Causes and solutions:
-
Autocomplete using expensive model: Configure
tabAutocompleteModelto usegpt-5.2-miniinstead of larger models -
Deferred tool loading disabled: Enable deferred tool loading to reduce tokens by 80-90%
-
No rate limits: Apply per-minute rate limits to prevent runaway usage
-
Missing spending limits: Set monthly budget limits with blocking enforcement
-
Chat using wrong model: Route chat requests to cost-effective models (for example,
claude-sonnet-4.5instead ofclaude-opus-4.6)
Symptom: Continue.dev receives HTTP 429 Too Many Requests errors.
Causes and solutions:
-
Rate limit exceeded: Review and increase rate limits if usage is legitimate (autocomplete needs higher limits)
-
Upstream provider rate limits: Check if the upstream LLM provider is rate-limiting; configure failover to alternate API keys
-
Budget exhausted: Verify monthly spending limit has not been reached
Symptom: Same prompt produces different results when switching providers.
This is expected behavior, not a configuration issue:
-
Different models have different capabilities and response styles
-
Continue.dev uses native formats, which may include provider-specific parameters
-
Users should select the appropriate model for their task (quality vs speed vs cost)