Skip to content

Commit d3f0b70

Browse files
BYOK rewrite + new Custom inference endpoint page + sidebar entry (#71)
* Rewrite BYOK page, add Custom inference endpoint page, and update sidebar Open BYOK to Free + all eligible paid plans (previously gated to paid plans starting with Build). Reframe BYOK alongside two adjacent options: - Add a 'How BYOK differs from Custom inference endpoint and BYOLLM' comparison section with a three-row matrix (Name / Meaning / Plans) on the BYOK page. - Refresh BYOK model examples to current frontier (Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, Gemini 3.1 Pro) for parity with the Model Choice page. - Clarify that centrally configured, admin-managed BYOK is not yet available; keep the user-level configuration story. - Add a Related resources section linking to the new CIE page, BYOLLM, Model Choice, and Credits. Add a new Custom inference endpoint page under plans-and-billing/ using the feature-doc structure. Covers what CIE is, the OpenAI-compatible Chat Completions API requirement, example endpoints (OpenRouter, LiteLLM, z.ai, internal gateways), enablement steps, billing behavior (no Warp credits consumed; Auto still uses credits), ZDR caveats (depend on endpoint provider), Free + eligible paid plan availability, the same three-row comparison matrix, and Related resources. Per the May 2026 editorial rule, neither page hard-codes per-plan monthly credit counts; both link to warp.dev/pricing. Insert the CIE page slug into the Plans and billing sidebar section immediately after BYOK. Part of the May 2026 pricing docs overhaul on hyc/plan-updates. Co-Authored-By: Oz <oz-agent@warp.dev> * Add BYOK / CIE org-size disclosure and soften CIE in-app search keyword Per follow-up review on PR #71: - Add a second top-of-page :::note callout on both bring-your-own-api-key.mdx and custom-inference-endpoint.mdx clarifying that BYOK and CIE are available to individual users and organizations with 10 or fewer employees; larger organizations need a Warp Business or Enterprise plan. - Mirror the same disclosure as a one-sentence statement near the bottom of each page so readers who jump to the BYOK Enterprise/Business section (BYOK) or the Plan availability section (CIE) see the restriction inline. - Soften the CIE 'Enabling' step from 'search for custom inference endpoint' to 'search for inference endpoint' since the exact configuration name was not verifiable. Co-Authored-By: Oz <oz-agent@warp.dev> * docs(pricing-may-2026): weave platform credits into BYOK + CIE billing copy Soften the "never consumes Warp credits" claims on the BYOK and Custom inference endpoint (CIE) pages so they accurately reflect the May 2026 launch: on Business and Enterprise, local agent runs that use customer-supplied inference still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). Changes on each page: - Trim the frontmatter description to mention the Business/Enterprise platform credits caveat for local runs (keeping length around the 160-char target). - Add a :::note callout right after the "never consumes your credits" paragraph pointing readers at the platform credits doc. - Add a footnote line under the BYOK / CIE / BYOLLM comparison table noting that platform credits may apply for local agent runs on Business and Enterprise across all three customer-supplied inference paths. Co-Authored-By: Oz <oz-agent@warp.dev> --------- Co-authored-by: Oz <oz-agent@warp.dev>
1 parent 4ecda11 commit d3f0b70

3 files changed

Lines changed: 171 additions & 17 deletions

File tree

src/content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx

Lines changed: 47 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,46 @@
11
---
2-
title: Bring Your Own API Key
2+
title: Bring your own API key
33
description: >-
4-
Warp's paid plans include the ability to bring your own API keys (BYOK) for
5-
OpenAI, Anthropic, and Google AI models.
4+
Use your own OpenAI, Anthropic, or Google API keys. Never consumes AI
5+
credits — on Business and Enterprise, platform credits may apply for
6+
local agent runs.
67
---
78

8-
Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp’s agent to their own Anthropic, OpenAI, or Google API accounts.
9+
Warp supports **Bring your own API key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts.
910

10-
This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models.
11+
BYOK gives you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. When you route a request through your own key, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for that request.
1112

12-
BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys.
13+
:::note
14+
On the Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered.
15+
:::
1316

1417
:::note
15-
BYOK is currently only available on Warp's paid plans, starting with Build. Learn more about plans and pricing [warp.dev/pricing](https://www.warp.dev/pricing).
18+
BYOK is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans.
1619
:::
1720

18-
## How does BYOK work?
21+
:::note
22+
BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use BYOK.
23+
:::
24+
25+
## How BYOK differs from Custom inference endpoint and BYOLLM
26+
27+
Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
28+
29+
| Name | Meaning | Plans |
30+
| --- | --- | --- |
31+
| **Bring your own API key** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
32+
| **[Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/)** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
33+
| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
34+
35+
See [warp.dev/pricing](https://www.warp.dev/pricing) for current plan availability.
36+
37+
Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, CIE, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/).
38+
39+
## How BYOK works
1940

2041
When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**.
2142

22-
Warp uses these API keys to directly route your agent requests to the model provider you've configured.
43+
Warp uses these API keys to route your agent requests directly to the model provider you've configured.
2344

2445
:::caution
2546
BYOK does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
@@ -53,9 +74,9 @@ When you explicitly select a model with a key icon, Warp routes requests through
5374

5475
### Auto Model
5576

56-
Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warps infrastructure, **Auto always consumes Warp's credits**, even if youve configured your own API keys.
77+
Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warp's infrastructure, **Auto always consumes Warp's credits**, even if you've configured your own API keys.
5778

58-
To use your own key, select a specific provider model (for example, Claude Sonnet 4.5, GPT-5, or Gemini 2.5 Pro) directly from the model picker with a key icon.
79+
To use your own key, select a specific provider model (for example, Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, or Gemini 3.1 Pro) directly from the model picker with a key icon.
5980

6081
### Credit usage
6182

@@ -93,7 +114,7 @@ If your key:
93114

94115
**Failover and fallback:**
95116

96-
By default, Warp does not fall back to your credits when a BYOK (Bring Your Own Key) request fails.
117+
By default, Warp does not fall back to your credits when a BYOK request fails.
97118

98119
You can choose to enable **Warp credit fallback**. When enabled, if an agent request fails with your BYOK model (for example, due to an API error or quota limit), Warp will automatically route the request to one of Warp’s provided models. Warp always prioritizes your API keys first and only uses Warp credits when necessary.
99120

@@ -113,10 +134,19 @@ Warp itself never stores your LLM API keys.
113134

114135
### BYOK on Enterprise and Business plans
115136

116-
Currently, BYOK is configured at the **user level**, not the team or admin level:
137+
BYOK is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use BYOK.
138+
139+
Today, BYOK is configured at the **user level** on every plan, including Enterprise and Business:
140+
141+
* Each team member can add and manage their own API keys locally on their device.
142+
* Centrally configured, admin-managed BYOK is not yet available — admins cannot enforce or share API keys across team members from a single place.
143+
* There is no organization-level Admin Panel for BYOK management today.
144+
145+
If your organization needs centrally managed model routing now, see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) for the Enterprise-managed option. To discuss a fit, contact us at [warp.dev/contact-sales](https://warp.dev/contact-sales).
117146

118-
* Each team member can add and manage their own API keys locally.
119-
* Team admins cannot yet enforce or share API keys across members.
120-
* There is currently no organization-level Admin Panel for BYOK management.
147+
## Related resources
121148

122-
If your organization has specific needs for managed keys or enterprise-level control, please contact us at [warp.dev/contact-sales](https://warp.dev/contact-sales).
149+
* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Route Warp through any OpenAI-compatible endpoint, such as OpenRouter, LiteLLM, z.ai, or an internal gateway.
150+
* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
151+
* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
152+
* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.
Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
---
2+
title: Custom inference endpoint
3+
description: >-
4+
Connect Warp to OpenAI-compatible endpoints (OpenRouter, LiteLLM, z.ai,
5+
internal gateways). On Business and Enterprise, platform credits may
6+
apply for local runs.
7+
---
8+
9+
A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure.
10+
11+
CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request.
12+
13+
:::note
14+
CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans.
15+
:::
16+
17+
:::note
18+
BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints.
19+
:::
20+
21+
## Key features
22+
23+
* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
24+
* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
25+
* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected.
26+
* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
27+
28+
## How it works
29+
30+
CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target:
31+
32+
* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing.
33+
* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers.
34+
* **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
35+
* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
36+
37+
When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
38+
39+
:::caution
40+
CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
41+
:::
42+
43+
When a CIE-routed model is selected:
44+
45+
* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/).
46+
* Costs are billed directly by your endpoint provider.
47+
* Warp does not retain or store your endpoint credentials on any of its servers.
48+
49+
## Enabling a custom inference endpoint
50+
51+
To enable and configure a custom inference endpoint:
52+
53+
1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration.
54+
2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key).
55+
3. Specify the model identifier(s) you want to route through this endpoint.
56+
4. Save the configuration. Once added, you'll see your custom models appear in the model picker.
57+
58+
When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits.
59+
60+
The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK.
61+
62+
## Billing behavior
63+
64+
### Warp credits
65+
66+
When you select a CIE-routed model from the model picker:
67+
68+
* No Warp credits are consumed for that request.
69+
* Inference is billed directly by your endpoint provider, according to their pricing.
70+
* The credit transparency footer will show "0 credits used" for CIE-routed requests.
71+
72+
:::note
73+
On Business and Enterprise plans, local agent runs that use a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown.
74+
:::
75+
76+
### Auto routing still uses Warp credits
77+
78+
Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint.
79+
80+
To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option.
81+
82+
### Other AI features in Warp
83+
84+
Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details.
85+
86+
## Zero Data Retention (ZDR)
87+
88+
Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.
89+
90+
When you use a custom inference endpoint:
91+
92+
* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
93+
* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
94+
* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.
95+
96+
Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE.
97+
98+
## Plan availability
99+
100+
CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits.
101+
102+
CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints.
103+
104+
Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/).
105+
106+
## How CIE differs from BYOK and BYOLLM
107+
108+
Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
109+
110+
| Name | Meaning | Plans |
111+
| --- | --- | --- |
112+
| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
113+
| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
114+
| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
115+
116+
Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, CIE, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/).
117+
118+
## Related resources
119+
120+
* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys.
121+
* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
122+
* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
123+
* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.

src/sidebar.ts

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,7 @@ export const sidebarTopics: StarlightSidebarTopicsUserConfig = [
502502
'support-and-community/plans-and-billing/credits',
503503
'support-and-community/plans-and-billing/add-on-credits',
504504
'support-and-community/plans-and-billing/bring-your-own-api-key',
505+
'support-and-community/plans-and-billing/custom-inference-endpoint',
505506
'support-and-community/plans-and-billing/overages-legacy',
506507
'support-and-community/plans-and-billing/pricing-faqs',
507508
],

0 commit comments

Comments
 (0)