docs(pricing-may-2026): BYOLLM reframe, plan summary, Enterprise billing, and plans-and-billing index (#72)

hongyi-chen · oz-agent · web-flow · commit 1352c3f18656 · 2026-05-19T16:37:21.000-07:00
* docs: update BYOLLM, plan summary, Enterprise billing, and plans-and-billing index for May 2026 - bring-your-own-llm.mdx: Reframe as Enterprise-only managed inference (Bedrock GA, Vertex/Foundry roadmap, internal gateways). Add a comparison section that contrasts BYOLLM with BYOK and Custom inference endpoint, with a note that centrally configured BYOK/CIE for Enterprise is a fast-follow after launch. - plans-pricing-refunds.mdx: Add May 2026 plan summary (Free, Build, Max, Business, Enterprise) with seat limits and qualitative descriptions of each plan's value. Link out to warp.dev/pricing for current monthly credit allowances instead of hard-coding numbers. Add a Custom inference endpoint bullet to the existing sub-page list. - enterprise/support-and-resources/billing.mdx: Clarify that team-wide spend limits are also available on self-serve paid plans while per-user spend limits are Enterprise-only. Add a related-resources link to the Enterprise Analytics API. - plans-and-billing/index.mdx: Add a Custom inference endpoint bullet so the new page is discoverable from the plans-and-billing landing page. Part of the May 2026 Warp pricing docs overhaul (hyc/plan-updates). Co-Authored-By: Oz <oz-agent@warp.dev> * docs(pricing-may-2026): correct plan seat limits per orchestrator Update May 2026 plan summary in plans-pricing-refunds.mdx with the actual seat limits from the pricing-context dump: - Free / Build / Max: Up to 10 team members - Business: Up to 25 team members - Enterprise: Unlimited team members (custom contract) Also reframe descriptions so 'individual' language is dropped from plans that support up to 10 team members. Co-Authored-By: Oz <oz-agent@warp.dev> * docs(pricing-may-2026): add BYOK / CIE org-size disclosure (10 or fewer employees) Per user requirement: BYOK and custom inference endpoint are limited to individual users and orgs with 10 or fewer employees on Free / Build / Max; larger orgs require a Business or Enterprise plan. - bring-your-own-llm.mdx: Update the BYOK and CIE rows in the comparison matrix to reflect the org-size constraint, add a :::note disclosure callout immediately below the matrix, and rephrase the corresponding FAQ for consistency. - plans-pricing-refunds.mdx: Add the same :::note disclosure callout after the May 2026 plan summary bullet list, before the ZDR sentence (one callout covering the Free / Build / Max bullets). Co-Authored-By: Oz <oz-agent@warp.dev> * Add Custom inference endpoint page stub for CI link check This file is the canonical version created on PR #71 (hyc/plan-updates-byok-cie). It is duplicated here so that the link checker on this branch can resolve the relative references to /support-and-community/plans-and-billing/custom-inference-endpoint/ that this PR introduces. When PR #71 merges into hyc/plan-updates, git will reconcile the identical file contents automatically. Co-Authored-By: Oz <oz-agent@warp.dev> * docs(pricing-may-2026): BYOLLM cross-provider IAM + platform credits in plan summary, Enterprise billing - bring-your-own-llm.mdx: rewrite Cloud-native credentials bullet so it covers AWS IAM (GA) plus Google Cloud and Azure identities (roadmap); add a platform-credits note next to the BYOLLM 'no Warp credits consumed' framing so Enterprise readers know local agent runs still consume platform credits; add the platform-credits page to Related resources. - plans-pricing-refunds.mdx: append a one-liner about platform credits to the Business and Enterprise bullets of the May 2026 plan summary so readers understand when platform credits apply across customer-supplied inference. - enterprise/support-and-resources/billing.mdx: add the platform-credits page to Related resources alongside Add-on Credits and Credits. Pre-existing build/link-check failures (CIE sidebar registration; platform-credits.mdx not yet on the umbrella) are out of scope here and will be resolved by PR #71's sidebar.ts update and the umbrella rebase onto main. Co-Authored-By: Oz <oz-agent@warp.dev> * Remove duplicate CIE stub; PR #71 owns the canonical custom-inference-endpoint.mdx The CIE page is owned by PR #71 (hyc/plan-updates-byok-cie), which also adds the matching src/sidebar.ts entry. Removing the stub here so the PR #72 \u2192 umbrella merge auto-resolves cleanly after #71 lands. Co-Authored-By: Oz <oz-agent@warp.dev> --------- Co-authored-by: Oz <oz-agent@warp.dev>
diff --git a/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx b/src/content/docs/enterprise/enterprise-features/bring-your-own-llm.mdx
@@ -1,16 +1,16 @@
 ---
 title: Bring your own LLM
 description: >-
-  Route Warp's agents through your AWS Bedrock models for billing control and
-  infrastructure flexibility.
+  Route Warp's agents through your organization's managed inference
+  infrastructure for governance, billing control, and model flexibility.
 ---
 
-Warp supports **Bring Your Own LLM (BYOLLM)** for enterprise teams that need to run inference on their own cloud infrastructure. With BYOLLM, your team can use Warp's agents while routing inference through models hosted in your AWS Bedrock environment.
+Warp supports **Bring your own LLM (BYOLLM)** for Enterprise teams that want to run inference on their own managed infrastructure. BYOLLM covers two patterns: cloud-provider Model-as-a-Service (AWS Bedrock, Google Vertex AI, Azure AI Foundry) and approved internal inference gateways.
 
-This gives you control over cloud spend and model hosting, without changing how your team works in Warp.
+With BYOLLM, your team uses Warp's agents while Warp manages routing, orchestration, governance, and observability across the providers you've approved. Inference runs in your environment; admins control which models are available to whom.
 
 :::caution
-BYOLLM currently supports **AWS Bedrock** only. Coming soon: Azure Foundry and Google Vertex support.
+**AWS Bedrock** is the GA implementation today. **Google Vertex AI** and **Azure AI Foundry** support is on the roadmap. Approved internal gateways are evaluated on a case-by-case basis with your Warp account team.
 
 BYOLLM applies to interactive Oz agents in the terminal. Oz cloud agents do not yet support BYOLLM routing.
 :::
@@ -19,9 +19,29 @@ BYOLLM applies to interactive Oz agents in the terminal. Oz cloud agents do not
 BYOLLM is only available on Warp's Enterprise plan. Contact [warp.dev/contact-sales](https://warp.dev/contact-sales) to learn more.
 :::
 
+## How BYOLLM differs from BYOK and Custom inference endpoint
+
+Warp offers three ways to bring your own inference into the product. BYOLLM is one of them, and it serves a different use case than the others.
+
+| Name | Meaning | Plans |
+| --- | --- | --- |
+| Bring your own API key (BYOK) | User-level API keys for OpenAI, Anthropic, or Google. Each user configures their own key locally; Warp uses it to call the provider directly. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs |
+| Custom inference endpoint (CIE) | User-level OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. Each user configures the endpoint locally. | Free, Build, Max (orgs with 10 or fewer employees); Business or Enterprise required for larger orgs |
+| Bring your own LLM (BYOLLM) | Enterprise-only managed inference infrastructure: cloud-provider Model-as-a-Service (Bedrock, Vertex, Foundry) or approved internal gateways. Warp manages routing, orchestration, governance, and observability for the whole team. | Enterprise |
+
+:::note
+BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features.
+:::
+
+Use [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) or [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) when an individual developer wants to authenticate to a provider with their own key or endpoint. Use BYOLLM when an organization wants Warp to manage inference routing across approved providers for the whole team.
+
+:::note
+Centrally configured BYOK and Custom inference endpoint for Enterprise — where admins approve providers or endpoints for the entire organization through the Admin Panel — are a fast-follow after launch, not at launch. Until then, BYOK and CIE remain user-level configurations, and BYOLLM remains the path for admin-managed inference infrastructure.
+:::
+
 ## Key features
 
-* **Cloud-native credentials** - Authenticate using each user’s AWS IAM identity. Warp does not store API keys.
+* **Cloud-native credentials** - Authenticate using each user's cloud-native identity (AWS IAM today; Google Cloud and Azure identities on the roadmap). Warp does not store API keys.
 * **Admin-enforced routing** - Team admins configure which models are available to users in AWS Bedrock, with the ability to disable non-Bedrock model access entirely.
 * **Consolidated billing** - Inference costs are billed directly to your AWS account, leveraging existing cloud commitments.
 
@@ -134,6 +154,10 @@ When a request routes through BYOLLM:
 * **Warp does not consume credits** for that request.
 * Your cloud provider account receives the inference costs directly.
 
+:::note
+BYOLLM-routed local agent runs on Enterprise still consume platform credits for Warp's platform infrastructure (run orchestration, observability, integrations). Inference costs are billed directly to your cloud provider account. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown.
+:::
+
 ### Routing behavior
 
 Warp's agents automatically select the best model for your task while respecting your admin's routing policies. If you configure a model for BYOLLM, requests for that model route to AWS Bedrock.
@@ -187,18 +211,9 @@ However, when using BYOLLM:
 
 ## FAQ
 
-### How is BYOLLM different from BYOK?
-
-**BYOK (Bring Your Own Key)** lets individual users add their own API keys for direct model provider access (e.g., Anthropic, OpenAI, Google). Warp stores keys locally on the user's device.
+### How is BYOLLM different from BYOK and Custom inference endpoint?
 
-**BYOLLM (Bring Your Own LLM)** routes inference through your organization's cloud infrastructure (AWS Bedrock) using cloud-native IAM. Admins configure it at the admin level and it applies to the entire team.
-
-| Feature | BYOK | BYOLLM |
-| --- | --- | --- |
-| Configuration level | User | Admin/Team |
-| Authentication | API keys (local) | Cloud IAM (per-user) |
-| Billing | Direct to provider | Your cloud account |
-| Data locality | Provider infrastructure | Your cloud infrastructure |
+See [How BYOLLM differs from BYOK and Custom inference endpoint](#how-byollm-differs-from-byok-and-custom-inference-endpoint) at the top of this page for a comparison and plan-availability details. In short: BYOK and CIE are user-level configurations available to individual users and orgs with 10 or fewer employees on Free, Build, and Max, and to all users on Business and Enterprise. BYOLLM is Enterprise-only managed inference infrastructure where Warp routes the whole team's traffic through providers your admins have approved.
 
 ### Does BYOLLM work with Auto?
 
@@ -222,7 +237,9 @@ Yes. Admins can configure routing policies to require specific models to use BYO
 
 ## Related resources
 
-* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/)
+* [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — User-level keys for OpenAI, Anthropic, and Google
+* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway
+* [platform credits](/support-and-community/plans-and-billing/platform-credits/) — Warp's platform-infrastructure credit bucket
 * [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models
 * [Admin Panel](/enterprise/team-management/admin-panel/) — Configure team settings
 * [Contact Sales](https://warp.dev/contact-sales) — Get help with enterprise setup
diff --git a/src/content/docs/enterprise/support-and-resources/billing.mdx b/src/content/docs/enterprise/support-and-resources/billing.mdx
@@ -72,6 +72,10 @@ Enterprise administrators can set monthly spending limits across the following f
 
 Spending is tracked across all payment types (Add-on Credits, pay-as-you-go usage) so limits apply consistently regardless of how usage is funded.
 
+:::note
+Team-wide spending limits (cloud, local, and total) are also available on Warp's self-serve paid plans through admin-managed Reload settings. **Per-user spending limits are Enterprise-only.** For deeper visibility into how individual users consume credits, see the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/).
+:::
+
 #### Monthly spend alerts
 
 Warp sends alerts to administrators as team usage approaches each configured spending limit, so you can adjust caps, purchase more credits, or communicate with your team before agent usage is blocked at the cap.
@@ -84,6 +88,8 @@ For enterprises with credit pools, administrators receive alerts as the team cre
 
 * [Credits](/support-and-community/plans-and-billing/credits/) - How credits are calculated and consumed
 * [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) - Purchase additional credits and configure auto-reload
+* [platform credits](/support-and-community/plans-and-billing/platform-credits/) - The third credit bucket alongside AI credits and compute credits, covering Warp's platform infrastructure
 * [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) - Common billing questions
 * [Bring Your Own LLM](/enterprise/enterprise-features/bring-your-own-llm/) - BYOLLM billing and configuration
+* [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/) - Programmatic access to team usage and spend data
 * [Admin Panel](/enterprise/team-management/admin-panel/) - Configure spending limits and billing settings
diff --git a/src/content/docs/support-and-community/plans-and-billing/index.mdx b/src/content/docs/support-and-community/plans-and-billing/index.mdx
@@ -11,5 +11,6 @@ Warp offers flexible plans designed for individual developers, teams, and enterp
 * [**Credits**](/support-and-community/plans-and-billing/credits/) - How credits are used and calculated across AI features
 * [**Add-on Credits**](/support-and-community/plans-and-billing/add-on-credits/) - Purchase additional credits or enable automatic reloads
 * [**Bring Your Own API Key**](/support-and-community/plans-and-billing/bring-your-own-api-key/) - Connect your own model provider API keys
+* [**Custom inference endpoint**](/support-and-community/plans-and-billing/custom-inference-endpoint/) - Connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway
 * [**Overages (Legacy)**](/support-and-community/plans-and-billing/overages-legacy/) - Information for users on legacy plans with overages
 * [**Pricing FAQs**](/support-and-community/plans-and-billing/pricing-faqs/) - Answers to common questions about plans and billing
diff --git a/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx b/src/content/docs/support-and-community/plans-and-billing/plans-pricing-refunds.mdx
@@ -21,9 +21,26 @@ Visit [warp.dev/pricing](https://warp.dev/pricing) to see the latest plans and w
 * [Credits](/support-and-community/plans-and-billing/credits/) — learn how credits are used and calculated across AI features.
 * [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) — purchase additional credits or enable automatic reloads at discounted rates.
 * [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — connect your own model provider API keys for custom usage and billing.
+* [Custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) — connect an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway.
 * [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) — information for users on legacy plans with overages enabled.
 * [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) — answers to common questions about plans, billing, and usage. Don’t have Warp yet? [Download Warp](https://warp.dev/download) and get started for free today.
 
+### May 2026 plan summary
+
+Below is a high-level summary of Warp's plans as of May 14, 2026. Visit [warp.dev/pricing](https://www.warp.dev/pricing) for current monthly credit allowances and seat pricing.
+
+* **Free** — Up to 10 team members. For developers exploring Warp. Includes core terminal features and a monthly credit allowance for trying Warp's agents. Supports [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) and [custom inference endpoints](/support-and-community/plans-and-billing/custom-inference-endpoint/) so you can keep working with your own provider after the included allowance is used.
+* **Build** — Up to 10 team members. For developers using Warp's agents as a daily driver. Includes a higher monthly credit allowance than Free, [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) with auto-reload, and the same BYOK and custom inference endpoint support as Free.
+* **Max** — Up to 10 team members. For heavy users. Includes everything in Build with a higher monthly credit allowance and a better effective Reload rate.
+* **Business** — Up to 25 team members. For small-to-midsize teams. Includes everything in Max plus team-wide collaboration features, Reload credits with admin-managed auto-reload and team-wide spend caps, SAML-based SSO, and admin-configurable data controls. Platform credits apply to cloud agent runs and to local runs that use customer-supplied inference (BYOK or CIE).
+* **Enterprise** — Unlimited team members (custom contract). For organizations with advanced security, compliance, or scale needs. Includes everything in Business plus [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/), the [Enterprise Analytics API](/enterprise/enterprise-features/analytics-api/), per-user spend limits, advanced admin controls, and Implementation Engineer Support (a structured multi-week implementation program with hands-on guidance from Warp engineers to help your team deploy production Oz Cloud Agent use cases). Platform credits apply to all cloud agent runs and to local runs using BYOLLM, BYOK, or CIE.
+
+:::note
+BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features.
+:::
+
+Model provider Zero Data Retention (ZDR) applies across all plans through Warp's contracted LLM providers. See [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/) for details on data controls.
+
 ### Warp’s refund policies
 
 Please review the details of our refund policies below. To request a refund, email [**billing@warp.dev**](mailto:billing@warp.dev) with information about your situation — the more context you provide, the faster we can resolve your request.