warpdotdev · hongyi-chen · May 21, 2026 · May 19, 2026 · May 20, 2026 · May 20, 2026
diff --git a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx
@@ -1,16 +1,16 @@
 ---
 title: Bring your own LLM
 description: >-
-  Route Warp's agents through your AWS Bedrock models for billing control and
-  infrastructure flexibility.
+  Route Warp's agents through your organization's managed inference
+  infrastructure for governance, billing control, and model flexibility.
 ---
 
-Warp supports **Bring Your Own LLM (BYOLLM)** for enterprise teams that need to run inference on their own cloud infrastructure. With BYOLLM, your team can use Warp's agents while routing inference through models hosted in your AWS Bedrock environment.
+Warp supports **Bring your own LLM (BYOLLM)** for Enterprise teams that want to run inference on their own managed infrastructure. BYOLLM covers two patterns: cloud-provider Model-as-a-Service (AWS Bedrock, Google Vertex AI, Azure AI Foundry) and approved internal inference gateways.
 
-This gives you control over cloud spend and model hosting, without changing how your team works in Warp.
+With BYOLLM, your team uses Warp's agents while Warp manages routing, orchestration, governance, and observability across the providers you've approved. Inference runs in your environment; admins control which models are available to whom.
 
 :::caution
-BYOLLM currently supports **AWS Bedrock** only. Coming soon: Azure Foundry and Google Vertex support.
+**AWS Bedrock** is the GA implementation today. **Google Vertex AI** and **Azure AI Foundry** support is on the roadmap. Approved internal gateways are evaluated on a case-by-case basis with your Warp account team.
 
 BYOLLM applies to interactive agents in the terminal. Cloud agents do not yet support BYOLLM routing.
 :::
@@ -19,9 +19,29 @@ BYOLLM applies to interactive agents in the terminal. Cloud agents do not yet su
 BYOLLM is only available on Warp's Enterprise plan. Contact [warp.dev/contact-sales](https://www.warp.dev/contact-sales) to learn more.
 :::
 
+## How BYOLLM differs from BYOK and Custom inference endpoint
+
+Warp offers three ways to bring your own inference into the product. BYOLLM is one of them, and it serves a different use case than the others.
+
+| Name | Meaning | Plans |
+| --- | --- | --- |
+| Bring your own API key (BYOK) | User-level API keys for OpenAI, Anthropic, or Google. Each user configures their own key locally; Warp uses it to call the provider directly. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs |
+| Custom inference endpoint (CIE) | User-level OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. Each user configures the endpoint locally. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs |
+| Bring your own LLM (BYOLLM) | Enterprise-only managed inference infrastructure: cloud-provider Model-as-a-Service (Bedrock, Vertex, Foundry) or approved internal gateways. Warp manages routing, orchestration, governance, and observability for the whole team. | Enterprise |
+
+:::note
+BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features.
+:::
+
+Use [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) or [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) when an individual developer wants to authenticate to a provider with their own key or endpoint. Use BYOLLM when an organization wants Warp to manage inference routing across approved providers for the whole team.
+
+:::note
+Centrally configured BYOK and Custom inference endpoint for Enterprise — where admins approve providers or endpoints for the entire organization through the Admin Panel — are a fast-follow after launch, not at launch. Until then, BYOK and CIE remain user-level configurations, and BYOLLM remains the path for admin-managed inference infrastructure.
+:::
+
 ## Key features
 
-* **Cloud-native credentials** - Authenticate using each user’s AWS IAM identity. Warp does not store API keys.
+* **Cloud-native credentials** - Authenticate using each user's cloud-native identity (AWS IAM today; Google Cloud and Azure identities on the roadmap). Warp does not store API keys.
 * **Admin-enforced routing** - Team admins configure which models are available to users in AWS Bedrock, with the ability to disable non-Bedrock model access entirely.
 * **Consolidated billing** - Inference costs are billed directly to your AWS account, leveraging existing cloud commitments.
 
@@ -134,6 +154,10 @@ When a request routes through BYOLLM:
 * **Warp does not consume credits** for that request.
 * Your cloud provider account receives the inference costs directly.
 
+:::note
+BYOLLM-routed local agent runs on Enterprise still consume platform credits for Warp's platform infrastructure (run orchestration, observability, integrations). Inference costs are billed directly to your cloud provider account. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown.
+:::
+
 ### Routing behavior
 
 Warp's agents automatically select the best model for your task while respecting your admin's routing policies. If you configure a model for BYOLLM, requests for that model route to AWS Bedrock.
@@ -187,18 +211,9 @@ However, when using BYOLLM:
 
 ## FAQ
 
-### How is BYOLLM different from BYOK?
-
-**BYOK (Bring Your Own Key)** lets individual users add their own API keys for direct model provider access (e.g., Anthropic, OpenAI, Google). Warp stores keys locally on the user's device.
+### How is BYOLLM different from BYOK and Custom inference endpoint?
 
-**BYOLLM (Bring Your Own LLM)** routes inference through your organization's cloud infrastructure (AWS Bedrock) using cloud-native IAM. Admins configure it at the admin level and it applies to the entire team.
-
-| Feature | BYOK | BYOLLM |
-| --- | --- | --- |
-| Configuration level | User | Admin/Team |
-| Authentication | API keys (local) | Cloud IAM (per-user) |
-| Billing | Direct to provider | Your cloud account |
-| Data locality | Provider infrastructure | Your cloud infrastructure |
+See [How BYOLLM differs from BYOK and Custom inference endpoint](#how-byollm-differs-from-byok-and-custom-inference-endpoint) at the top of this page for a comparison and plan-availability details. In short: BYOK and CIE are user-level configurations available to individual users and orgs with 10 or fewer employees on Free, Build, and Max, and to all users on Business and Enterprise. BYOLLM is Enterprise-only managed inference infrastructure where Warp routes the whole team's traffic through providers your admins have approved.
 
 ### Does BYOLLM work with Auto?
 
@@ -222,7 +237,9 @@ Yes. Admins can configure routing policies to require specific models to use BYO
 
 ## Related resources
 
-* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/)
+* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — User-level keys for OpenAI, Anthropic, and Google
+* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway
+* [platform credits](/support-and-community/plans-and-billing/platform-credits/) — Warp's platform-infrastructure credit bucket
 * [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models
 * [Admin Panel](/enterprise/team-management/admin-panel/) — Configure team settings
 * [Contact Sales](https://www.warp.dev/contact-sales) — Get help with enterprise setup
diff --git a/...content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx b/...content/docs/support-and-community/plans-and-billing/bring-your-own-api-key.mdx
@@ -1,29 +1,46 @@
 ---
-title: Bring Your Own API Key
+title: Bring your own API key
 description: >-
-  Warp's paid plans include the ability to bring your own API keys (BYOK) for
-  OpenAI, Anthropic, and Google AI models.
+  Use your own OpenAI, Anthropic, or Google API keys. Never consumes AI
+  credits — on Business and Enterprise, platform credits may apply for
+  local agent runs.
 ---
 
-Warp supports **Bring Your Own Key (BYOK)** for users who want to connect Warp’s agent to their own Anthropic, OpenAI, or Google API accounts.
+Warp supports **Bring your own API key (BYOK)** for users who want to connect Warp's agents to their own Anthropic, OpenAI, or Google API accounts.
 
-This lets you use your own API keys to access models directly, giving you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for a list of supported models.
+BYOK gives you full control over model selection, billing, and data routing. See [Model Choice](/agent-platform/capabilities/model-choice/) for the full list of supported models. When you route a request through your own key, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for that request.
 
-BYOK provides greater flexibility in model access and ensures Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for requests routed through your own keys.
+:::note
+On the Business and Enterprise plans, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure (run lifecycle, integrations, observability). See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for what's covered.
+:::
 
 :::note
-BYOK is currently only available on Warp's paid plans, starting with Build. Learn more about plans and pricing [warp.dev/pricing](https://www.warp.dev/pricing).
+BYOK is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans.
 :::
 
-:::caution
-BYOK and customer-supplied inference (BYOLLM via Amazon Bedrock or Google Vertex, plus custom endpoints) are available to individual users and organizations with 10 or fewer employees or users on any plan. Organizations with more than 10 employees or users must be on a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. See Warp's [Terms of Service](https://www.warp.dev/terms-of-service) for details.
+:::note
+BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use BYOK or customer-supplied inference.
 :::
 
-## How does BYOK work?
+## How BYOK differs from Custom inference endpoint and BYOLLM
+
+Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
+
+| Name | Meaning | Plans |
+| --- | --- | --- |
+| **Bring your own API key** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
+| **[Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/)** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
+| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
+
+See [warp.dev/pricing](https://www.warp.dev/pricing) for current plan availability.
+
+Platform credits may apply for local agent runs on Business and Enterprise when using BYOK, CIE, or BYOLLM. See [platform credits](/support-and-community/plans-and-billing/platform-credits/).
+
+## How BYOK works
 
 When you add your own model API keys in Warp, those keys are stored **locally on your device** and are **never synced to the cloud**.
 
-Warp uses these API keys to directly route your agent requests to the model provider you've configured.
+Warp uses these API keys to route your agent requests directly to the model provider you've configured.
 
 :::caution
 BYOK does not apply to [Cloud Agents](/agent-platform/cloud-agents/overview/). Because your API keys are stored locally on your device, they are not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
@@ -57,9 +74,9 @@ When you explicitly select a model with a key icon, Warp routes requests through
 
 ### Auto Model
 
-Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warp’s infrastructure, **Auto always consumes Warp's credits**, even if you’ve configured your own API keys.
+Warp's **Auto** models dynamically route requests across different models based on context and performance. Because this routing logic depends on Warp's infrastructure, **Auto always consumes Warp's credits**, even if you've configured your own API keys.
 
-To use your own key, select a specific provider model (for example, Claude Sonnet 4.5, GPT-5, or Gemini 2.5 Pro) directly from the model picker with a key icon.
+To use your own key, select a specific provider model (for example, Claude Opus 4.7, Claude Sonnet 4.6, GPT-5.5, or Gemini 3.1 Pro) directly from the model picker with a key icon.
 
 ### Credit usage
 
@@ -97,7 +114,7 @@ If your key:
 
 **Failover and fallback:**
 
-By default, Warp does not fall back to your credits when a BYOK (Bring Your Own Key) request fails.
+By default, Warp does not fall back to your credits when a BYOK request fails.
 
 You can choose to enable **Warp credit fallback**. When enabled, if an agent request fails with your BYOK model (for example, due to an API error or quota limit), Warp will automatically route the request to one of Warp’s provided models. Warp always prioritizes your API keys first and only uses Warp credits when necessary.
 
@@ -117,12 +134,19 @@ Warp itself never stores your LLM API keys.
 
 ### BYOK on Enterprise and Business plans
 
-Organizations with more than 10 employees or users must be on a Warp Business or Enterprise plan to use BYOK or customer-supplied inference. See Warp's [Terms of Service](https://www.warp.dev/terms-of-service) for the full eligibility rule.
+BYOK is available to individual users and to organizations with 10 or fewer employees, subject to Warp's [Terms of Service](https://www.warp.dev/terms-of-service). Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use BYOK or customer-supplied inference.
+
+Today, BYOK is configured at the **user level** on every plan, including Enterprise and Business:
+
+* Each team member can add and manage their own API keys locally on their device.
+* Centrally configured, admin-managed BYOK is not yet available — admins cannot enforce or share API keys across team members from a single place.
+* There is no organization-level Admin Panel for BYOK management today.
 
-Currently, BYOK is configured at the **user level**, not the team or admin level:
+If your organization needs centrally managed model routing now, see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) for the Enterprise-managed option. To discuss a fit, contact us at [warp.dev/contact-sales](https://www.warp.dev/contact-sales).
 
-* Each team member can add and manage their own API keys locally.
-* Team admins cannot yet enforce or share API keys across members.
-* There is currently no organization-level Admin Panel for BYOK management.
+## Related resources
 
-If your organization has specific needs for managed keys or enterprise-level control, please contact us at [warp.dev/contact-sales](https://www.warp.dev/contact-sales).
+* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Route Warp through any OpenAI-compatible endpoint, such as OpenRouter, LiteLLM, z.ai, or an internal gateway.
+* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
+* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
+* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.