Skip to content

Commit 1578918

Browse files
hongyi-chenoz-agent
andcommitted
Add Custom inference endpoint page stub for CI link check
This file is the canonical version created on PR #71 (hyc/plan-updates-byok-cie). It is duplicated here so that the link checker on this branch can resolve the relative references to /support-and-community/plans-and-billing/custom-inference-endpoint/ that this PR introduces. When PR #71 merges into hyc/plan-updates, git will reconcile the identical file contents automatically. Co-Authored-By: Oz <oz-agent@warp.dev>
1 parent 980e9d0 commit 1578918

1 file changed

Lines changed: 117 additions & 0 deletions

File tree

Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
---
2+
title: Custom inference endpoint
3+
description: >-
4+
Connect Warp to any OpenAI-compatible custom inference endpoint, such as
5+
OpenRouter, LiteLLM, z.ai, or an internal gateway. Available on the Free
6+
plan and all eligible paid plans.
7+
---
8+
9+
A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure.
10+
11+
CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request.
12+
13+
:::note
14+
CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans.
15+
:::
16+
17+
:::note
18+
BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints.
19+
:::
20+
21+
## Key features
22+
23+
* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
24+
* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
25+
* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected.
26+
* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
27+
28+
## How it works
29+
30+
CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target:
31+
32+
* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing.
33+
* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers.
34+
* **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
35+
* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
36+
37+
When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
38+
39+
:::caution
40+
CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
41+
:::
42+
43+
When a CIE-routed model is selected:
44+
45+
* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/).
46+
* Costs are billed directly by your endpoint provider.
47+
* Warp does not retain or store your endpoint credentials on any of its servers.
48+
49+
## Enabling a custom inference endpoint
50+
51+
To enable and configure a custom inference endpoint:
52+
53+
1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration.
54+
2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key).
55+
3. Specify the model identifier(s) you want to route through this endpoint.
56+
4. Save the configuration. Once added, you'll see your custom models appear in the model picker.
57+
58+
When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits.
59+
60+
The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK.
61+
62+
## Billing behavior
63+
64+
### Warp credits
65+
66+
When you select a CIE-routed model from the model picker:
67+
68+
* No Warp credits are consumed for that request.
69+
* Inference is billed directly by your endpoint provider, according to their pricing.
70+
* The credit transparency footer will show "0 credits used" for CIE-routed requests.
71+
72+
### Auto routing still uses Warp credits
73+
74+
Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint.
75+
76+
To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option.
77+
78+
### Other AI features in Warp
79+
80+
Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details.
81+
82+
## Zero Data Retention (ZDR)
83+
84+
Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.
85+
86+
When you use a custom inference endpoint:
87+
88+
* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
89+
* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
90+
* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.
91+
92+
Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE.
93+
94+
## Plan availability
95+
96+
CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits.
97+
98+
CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints.
99+
100+
Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/).
101+
102+
## How CIE differs from BYOK and BYOLLM
103+
104+
Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
105+
106+
| Name | Meaning | Plans |
107+
| --- | --- | --- |
108+
| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
109+
| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
110+
| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
111+
112+
## Related resources
113+
114+
* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys.
115+
* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
116+
* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
117+
* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.

0 commit comments

Comments
 (0)