From 83199d05cf0336fed79715d4b81d969334b29896 Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyigma@gmail.com>
Date: Tue, 12 May 2026 18:31:35 -0700
Subject: [PATCH 1/5] Overhaul pricing FAQs for May 14, 2026 pricing changes

* Delete the entire 'Warp's pricing change FAQs (Oct 30, 2025)' section
  and remove the stale anchor link from the 'How can I get the most out
  of my Warp plan?' callout.
* Rewrite plan-recommendation copy with qualitative Build / Max /
  Business / Enterprise positioning; replace 'enforced team-wide ZDR'
  language with 'admin-configurable data controls' and split model-
  provider ZDR out as a separate, all-plan concept.
* Update the Lite-model FAQ to mention BYOK and Custom Inference
  Endpoint (CIE) alongside Reload credits.
* Add new FAQs: multi-seat team credits with grandfathered pooled
  credits, what-to-do-when-you-need-more-AI-usage (Max + Reload + BYOK +
  CIE), how auto-reload works for teams, and how service-account /
  team-scoped API key requests are billed on self-serve plans.
* Add BYOK-on-all-plans and CIE FAQs; rename 'Add-on Credits' to
  'Reload credits' and fix one remaining stale anchor link inside the
  downgrade caution callout.
* Add new tail-end 'Warp's pricing change FAQs (May 14, 2026)' section
  covering seat limits + grandfathering, Reload credits attribution
  change, Max plan credit-allocation change + grandfathering, BYOK on
  Free, the CIE launch, and the ZDR / data-controls clarification.

Per the editorial rule, no per-plan monthly credit counts are
hard-coded; the page links to warp.dev/pricing for current allowances.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../plans-and-billing/pricing-faqs.mdx        | 228 +++++++++---------
 1 file changed, 111 insertions(+), 117 deletions(-)

diff --git a/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx b/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
index 5d9ddbea..a6ff7b01 100644
--- a/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
+++ b/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
@@ -23,39 +23,33 @@ After entering your payment details, you’ll receive an invoice and confirmatio
 ### How can I get the most out of my Warp plan?
 
 :::caution
-Warp's legacy paid plans included Pro, Turbo, and Lightspeed.
+Warp's legacy paid plans included Pro, Turbo, and Lightspeed. These plans are no longer offered to new subscribers.
 
-After **Oct 30, 2025**, we have rolled out the new Build plan to replace them. Existing subscribers will start to roll over onto the Build plan starting **Dec 1st, 2025**. For questions related to the new pricing changes, please refer to [Warp's pricing change FAQs](/support-and-community/plans-and-billing/pricing-faqs/#warps-pricing-change-faqs-oct-30-2025).
-
-To see more details on the latest plan, please visit [**warp.dev/pricing**](https://www.warp.dev/pricing).
+For details on those legacy plans and how their usage was billed, see [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/). For the most up-to-date list of plans, seat limits, and credit allowances, visit [warp.dev/pricing](https://www.warp.dev/pricing).
 :::
 
-Warp's plans are designed for developers who rely on AI to code, debug, and move faster with their team.
-
-* **Build**, one usage-based plan with a set of credits, ability to Bring Your Own API Key (BYOK), and access to [Add-on credits](/support-and-community/plans-and-billing/add-on-credits/) with volume-based discounts. See more on [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/).
-* **Business** includes everything in Build, with advanced features like automatically enforced team-wide Zero Data Retention, SAML-based SSO, and support for teams up to 50 seats.
+Warp's plans are designed for developers who rely on AI to code, debug, and move faster with their team. Pick the plan that matches the scale of your usage and the controls your team needs:
 
-Legacy plans:
+* **Build** — Single-user usage-based plan with monthly credits, the ability to [bring your own API key (BYOK)](/support-and-community/plans-and-billing/bring-your-own-api-key/) or point Warp at a [custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/), and access to [Reload credits](/support-and-community/plans-and-billing/add-on-credits/) with volume-based discounts.
+* **Max** — Single-user plan for developers with heavier AI usage. Includes a larger monthly credit allowance than Build, plus a better effective rate for credits than buying Reload credits on Build.
+* **Business** — Multi-seat plan for teams. Includes everything in Build, plus admin-configurable data controls, SAML-based SSO, and centralized billing. Available up to the seat limit listed at [warp.dev/pricing](https://www.warp.dev/pricing).
+* **Enterprise** — Custom plan for organizations that need higher seat counts, [Bring Your Own LLM (BYOLLM)](/enterprise/enterprise-features/bring-your-own-llm/) managed inference, granular admin controls, advanced security and compliance, and dedicated support.
 
-* **Pro** included higher credit limits than the Free plan, support for larger codebases with [Codebase Context](/agent-platform/capabilities/codebase-context/), and access to premium models with optional pay-as-you-go overages.
-* **Turbo** included even higher credit limits, larger Codebase Context indexing, and the option to pay for additional usage beyond included credits via [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/). Add-on Credits were not available on this plan.
-* **Pro** included higher credit limits than the Free plan, support for larger codebases with [Codebase Context](/agent-platform/capabilities/codebase-context/), and access to premium models with optional pay-as-you-go [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/).
-* **Turbo** included even higher credit limits, larger Codebase Context indexing, and the option to pay for additional usage beyond included credits via [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/). Add-on Credits were not available on this plan.
-* **Lightspeed** was Warp's most powerful legacy plan, offering the highest credit limits, expanded codebase indexing, access to top-tier models, and pay-as-you-go [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) so you could keep working without interruption.
-
-For the most up-to-date feature and usage details, visit [**warp.dev/pricing**](https://www.warp.dev/pricing).
+For the most up-to-date feature and usage details — including current per-plan seat limits and monthly credit allowances — visit [warp.dev/pricing](https://www.warp.dev/pricing).
 
 ### How can I subscribe to a Warp Enterprise plan?
 
 Warp offers two options for larger teams and organizations:
 
-* **Business Plan**: Supports up to 50 seats and is available for immediate upgrade. It includes automatically enforced team-wide Zero Data Retention by default and admin-controlled SAML-based SSO.
-* **Enterprise Plan**: Offers custom pricing, credit limits, and terms. Along with support for larger engineering orgs or teams with advanced security, compliance, or support needs.
+* **Business plan**: Self-serve multi-seat plan available for immediate upgrade. Includes admin-configurable data controls and admin-controlled SAML-based SSO. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current seat limit.
+* **Enterprise plan**: Custom pricing, credit allowances, and terms — built for larger engineering organizations or teams with advanced security, compliance, or support needs. Enterprise also includes [Bring Your Own LLM (BYOLLM)](/enterprise/enterprise-features/bring-your-own-llm/) managed inference, where Warp routes traffic through your AWS Bedrock, Google Vertex, Azure Foundry, or internal LLM gateway with routing, orchestration, governance, and observability provided by Warp.
 
 :::tip
 If you’d like to explore Enterprise, [contact our sales team](https://warp.dev/contact-sales) and someone from Warp will follow up.
 :::
 
+Independent of plan, all model provider traffic is covered by Warp's Zero Data Retention (ZDR) agreements with the underlying model providers (OpenAI, Anthropic, Google). Admin-configurable data controls — including team-wide retention policies and training opt-outs — are available on Business and Enterprise.
+
 ### What counts as a team member and how does billing work for members?
 
 In Warp, a _team member_ is any seat with access to your Team — including the shared Warp Drive, Notebooks, Workflows, and other team resources. All plans allow you to invite unlimited users, but to unlock higher limits and advanced features, you’ll need to upgrade your team to a plan. Upgrading applies to your entire team, including your own account and all active members.
@@ -101,7 +95,7 @@ You can use your Warp account on multiple personal computers. Warp is designed t
 ### What happens when I downgrade during a billing cycle?
 
 :::caution
-Note this only applies when switching between legacy plans (Pro, Turbo, Lightspeed, or the Old Business) or switching the new plans (Build, New Business). When switching between legacy to new plans, the change is immediate, prorated, and the credits are reset. See more in [What happens when I change from my legacy plan to the new Build or Business plans?](/support-and-community/plans-and-billing/pricing-faqs/#what-happens-when-i-change-from-my-legacy-plan-to-the-new-build-or-business-plans).
+This describes downgrades between current paid plans (Build, Max, Business). Users transitioning from a legacy plan (Pro, Turbo, Lightspeed, old Business) to a current plan saw an immediate, prorated change with a credit reset at renewal; see [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) for details on the legacy plans.
 :::
 
 The subscription will downgrade to the lower plan limits at the end of the billing cycle. If you’re switching between paid plans, any AI usage you've already accumulated will carry over.\
@@ -142,16 +136,67 @@ Warp now abstracts away token usage, so you don't need to manage or track it dir
 
 If you're curious, you can read the [OpenAI article on tokens](https://help.openai.com/en/articles/4936856-understanding-tokens), or refer to the pricing page for plan-level credit allocations. If you reach your monthly credit limits on a paid plan, premium models will be temporarily disabled until your quota resets at the start of your next billing cycle.
 
-If you’d like to continue using premium models beyond your included quota, purchase [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) in **Settings** > **Billing and usage** (users still on legacy Pro, Turbo, or Lightspeed plans continue to use [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) until their first renewal after December 1, 2025).
+If you’d like to continue using premium models beyond your included quota, purchase [Reload credits](/support-and-community/plans-and-billing/add-on-credits/) in **Settings** > **Billing and usage** (users still on legacy Pro, Turbo, or Lightspeed plans continue to use [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) until their first renewal after December 1, 2025).
 
 ### How often do my credits reset?
 
-Allotted credits refill every 30 days from your signup date. When you upgrade to a [paid plan](https://www.warp.dev/pricing), you will be given more credits immediately. You can follow along with your refill period by referencing **Settings** > **Billing and usage**. Alternatively, purchase [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/), or enable auto reload with a monthly spend limit, to continue using premium models beyond your included quota. Users still on legacy Pro, Turbo, or Lightspeed plans continue to use [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) until their first renewal after December 1, 2025.
+Allotted credits refill every 30 days from your signup date. When you upgrade to a [paid plan](https://www.warp.dev/pricing), you will be given more credits immediately. You can follow along with your refill period by referencing **Settings** > **Billing and usage**. Alternatively, purchase [Reload credits](/support-and-community/plans-and-billing/add-on-credits/), or enable auto-reload with a monthly spend limit, to continue using premium models beyond your included quota. Users still on legacy Pro, Turbo, or Lightspeed plans continue to use [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) until their first renewal after December 1, 2025.
 
 :::note
 Unused credits do not rollover to the next cycle and can't be transferred to other accounts.
 :::
 
+### How do credits work for multi-seat teams?
+
+On the new multi-seat paid plans, credits attribute differently than they did on Warp's older pooled-credit teams:
+
+* **Plan-included monthly credits** — Each seat receives its own monthly credit allowance that resets every 30 days based on the team's renewal date. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current per-seat allowance on your plan.
+* **Reload credits** — As of May 14, 2026, [Reload credits](/support-and-community/plans-and-billing/add-on-credits/) are scoped to the individual user who purchased or was allocated them, not pooled across the team. A single heavy user can no longer drain the whole team's purchased credits.
+* **Grandfathered pooled credits** — Teams that purchased Reload credits (formerly Add-on Credits) before May 14, 2026 keep their existing pooled balance until it's exhausted. Pooled credits are spent down first across the team; all new Reload credit purchases after May 14, 2026 are user-scoped.
+* **Team-wide spend cap** — Admins set a single team-wide monthly spend cap that governs auto-reload across the team. See [How does auto-reload work for teams?](#how-does-auto-reload-work-for-teams) below.
+
+Enterprise plans support team-scoped credit pools and per-user spend limits separately — see [enterprise billing](/enterprise/support-and-resources/billing/).
+
+### What if I need more AI usage than my plan includes?
+
+If you regularly run through your plan's monthly credit allowance, you have a few options:
+
+* **Upgrade to Max** — Designed for developers with heavier AI usage. Max includes a higher monthly credit allowance than Build, plus a better effective rate for credits than buying [Reload credits](/support-and-community/plans-and-billing/add-on-credits/) on Build. See [warp.dev/pricing](https://www.warp.dev/pricing) for current allowances.
+* **Purchase Reload credits** — Top up your account on demand at $10 / 400 credits, $20 / 1,000 credits, $50 / 3,000 credits, or $100 / 6,500 credits. Larger denominations have a better effective rate. Reload credits roll over month-to-month and remain valid for 12 months.
+* **Enable auto-reload** — Pick a denomination and a monthly spend cap, and Warp will automatically purchase Reload credits when your balance drops below 100 credits, up to your cap.
+* **Bring your own API key (BYOK)** — Point Warp at your own OpenAI, Anthropic, or Google API key. Requests routed through BYOK don't consume Warp credits — you're billed directly by the model provider. See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/).
+* **Use a custom inference endpoint** — Route requests through any OpenAI-compatible endpoint (OpenRouter, LiteLLM, z.ai, an internal gateway, etc.) without spending Warp credits. See [Custom Inference Endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/).
+
+For organization-scale needs (cloud-provider managed inference, granular admin controls, or higher seat counts), Enterprise plans include [Bring Your Own LLM (BYOLLM)](/enterprise/enterprise-features/bring-your-own-llm/).
+
+### How does auto-reload work for teams?
+
+Auto-reload prevents team members from getting blocked by credit exhaustion. It works the same way for single-user and multi-seat paid plans, with one extra knob for teams.
+
+When auto-reload is **on**:
+
+* The admin chooses a Reload credit **denomination** ($10 / 400, $20 / 1,000, $50 / 3,000, or $100 / 6,500). Larger denominations have a better effective per-credit rate.
+* Whenever any individual user's balance (plan credits + their share of Reload credits) drops below **100 credits**, Warp automatically purchases another bundle of the configured denomination on the team's behalf.
+* All auto-reload purchases count against a single **team-wide monthly spend cap** that the admin sets. Once the team hits the cap in a given month, auto-reload pauses until the next billing cycle or until the admin raises the cap.
+
+When auto-reload is **off**, users keep working as long as they have plan credits, previously purchased Reload credits, or have routed Warp at their own API key or [custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/). Once those run out, premium-model usage is blocked until credits are topped up or the next billing cycle begins.
+
+Auto-reload can be enabled, paused, or reconfigured at any time in **Settings** > **Billing and usage**.
+
+### How are service account / team-scoped API key requests billed on self-serve plans?
+
+Programmatic usage that runs against a Warp team-level API key, scheduled agent, or service account (rather than a specific human user) is billed against the **admin who created the key** on self-serve plans (Build, Max, Business):
+
+1. First, the admin's plan-included monthly credits are consumed.
+2. Once those are exhausted, the admin's own Reload credits are consumed (and auto-reload triggers if the admin has it on).
+3. If both are empty and auto-reload can't keep up with the team-wide spend cap, the request is blocked.
+
+This waterfall keeps team-level usage transparent and attributable on self-serve plans, even though credits are user-scoped.
+
+:::note
+Enterprise plans support team-scoped credit pools, so service-account and scheduled-agent traffic can be billed against a dedicated team pool instead of an individual admin. See [enterprise billing](/enterprise/support-and-resources/billing/) for details.
+:::
+
 ### Can I use a Free plan if I'm a developer at a large company or organization?
 
 Yes. Developers at any company size are welcome to use Warp’s Free plan.
@@ -175,12 +220,24 @@ For more details, please [visit the Security Overview](https://www.warp.dev/lega
 
 ### What happened to the Lite model?
 
-Over time, the Lite model—originally designed as a fallback when premium models ran out—began to deliver inconsistent results, especially for users running complex, multi-step prompts.
+Over time, the Lite model — originally designed as a fallback when premium models ran out — began to deliver inconsistent results, especially for users running complex, multi-step prompts.
 
-For credit-efficient usage, we encourage you to try our new **Auto (cost-efficiency) model**, which automatically selects the optimal model based on task complexity to help extend your credits. To continue AI usage please either add [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) or consider [using your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/).
+For credit-efficient usage, we encourage you to try our new **Auto (cost-efficiency) model**, which automatically selects the optimal model based on task complexity to help extend your credits. To continue AI usage beyond your included credits, you can purchase [Reload credits](/support-and-community/plans-and-billing/add-on-credits/), [bring your own API key (BYOK)](/support-and-community/plans-and-billing/bring-your-own-api-key/), or point Warp at a [custom inference endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/). BYOK and custom inference endpoints don't consume Warp credits, since requests go directly to your provider.
 
 If you have any questions or feedback, please connect with us in our [community Slack](/support-and-community/#sending-warp-feedback).
 
+### Can I bring my own API key?
+
+Yes. On Free, Build, Max, Business, and Enterprise plans, you can configure your own OpenAI, Anthropic, or Google API key in **Settings** > **AI** > **Manage models**. Requests routed through your own key don't consume Warp credits — you're billed directly by the model provider.
+
+See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) for setup steps, the list of supported providers and models, and the differences between BYOK, custom inference endpoints, and BYOLLM.
+
+### Does Warp support custom inference endpoints?
+
+Yes. In addition to BYOK, Warp can route requests to any OpenAI-compatible inference endpoint — including OpenRouter, LiteLLM, z.ai, and internal gateways your team already runs. Custom inference endpoint requests also don't consume Warp credits.
+
+Custom inference endpoints are available on Free, Build, Max, Business, and Enterprise. See [Custom Inference Endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) for configuration steps, billing behavior, and how custom inference endpoints differ from BYOK and from Enterprise's [Bring Your Own LLM (BYOLLM)](/enterprise/enterprise-features/bring-your-own-llm/) managed inference.
+
 ### What payment options are available for Warp's self-service plans?
 
 Warp uses Stripe as our payments processor and currently accepts the following payment methods: credit card, debit card, Link, Apple Pay, Google Pay.
@@ -233,122 +290,59 @@ The team at Warp is standing by and ready to help. For subscribers technical iss
 
 ---
 
-### Warp's pricing change FAQs (Oct 30, 2025)
-
-For more details, see this blog post on [Warp's plan changes](https://www.warp.dev/blog/warp-new-pricing-flexibility-byok).
-
-#### How do I change from my current plan to the new Build or Business plan?
-
-You can switch to the new Warp Build or Business plan anytime from **Settings** > **Billing and usage** > **Manage billing** > **Update subscription** in the Warp app or at [app.warp.dev/upgrade](https://app.warp.dev/upgrade). Select Change plan, then choose the plan that fits your needs.
-
-If you take no action, your Pro, Turbo, Lightspeed, or legacy Business plan will automatically move to the new structure on your first renewal after **December 1, 2025**. You’ll receive an email before your renewal with details to make the transition easier.
-
-#### What happens when I change from my legacy plan to the new Build or Business plans?
-
-If you move from Warp’s legacy Pro, Turbo, Lightspeed, or old Business plans to the new Build or Business plans:
-
-* You’ll receive a prorated credit balance on Stripe for your current plan, based on how far you are into your billing cycle. This balance can be applied toward monthly Build fees or any Add-on Credits you purchase.
-  * You can view your credit balance by going to **Settings** > **Billing and usage** > **Manage Account**. You can also view your credit balance on the Stripe invoice that was sent when your plan changed to Build or Business.
-* Your credit balance will reset to **0/1,500** when you switch to the Build or Business plan.
-
-If you switched immediately after the rollout, before a subsequent update was applied, we’ll retroactively reset your credit balance to 0/1,500.
-
-* You should see this reflected in **Settings** > **Billing and usage**. If you experience any issues, please contact us at **build-priority@warp.dev**.
-
-:::note
-We recommend you use all the credits on your legacy plan before you switch over to the new plans. This way you can make best use of them before they are reset to the new plan limits.
-:::
-
-:::caution
-Add-on credit auto reload will be enabled by default for some legacy plan users when they transition to the Build plan. Please see more in our [Pricing FAQs](/support-and-community/plans-and-billing/pricing-faqs/#what-happens-to-my-current-plan-pro-turbo-lightspeed).
-:::
-
-#### What should I keep in mind about this change?
-
-* **BYOK and Add-on credits**: These are only available on the new Build and Business plans. Switching early gives you immediate access.
-* **Pricing differences**: Depending on your usage, your monthly cost may increase or decrease. You’ll now pay based on what you actually use.
-* **Renewal timing**: You’ll stay on your current plan until your renewal date after December 1. No interruptions to service will occur.
-* **Transparency**: You can view your credit balance, monthly spend limit, and Add-on settings anytime in **Settings** > **Billing and usage**.
-
-For full details, see [warp.dev/pricing](https://www.warp.dev/pricing) or reach out to billing@warp.dev if you have questions about your transition.
-
-#### For existing paid users: when will the new pricing take effect for my account?
-
-For **new customers**, the new pricing and packaging take effect immediately on Oct 30, 2025.
-
-For **existing monthly subscribers**, changes will apply on your first renewal after **December 1, 2025**; most likely during the month of December 2025. For **annual subscribers**, the new plan and pricing will take effect on your next renewal after December 1, 2025.
-
-If you have any questions, please reach out to us at **billing@warp.dev**.
-
-#### **What happens to my current plan (Pro, Turbo, Lightspeed, Business)?**
-
-You will retain your current plan and credits until the first renewal after December 1, 2025. At renewal, all current Pro, Turbo, Lightspeed, and Business plans will transition to the new Warp Build and Business plans.
-
-The Build and new Business plans include 1,500 monthly credits, the ability to purchase [Add-on Credits](/support-and-community/plans-and-billing/add-on-credits/) that roll over for 12 months, and the ability to bring your own API key. Learn more at [warp.dev/pricing](https://www.warp.dev/pricing).
-
-In addition, [Add-on credit auto reload](/support-and-community/plans-and-billing/add-on-credits/#id-2.-enable-auto-reload) will be automatically enabled for some legacy plan users in the following ways (but can be opted out of or modified at any time). Our goal is to maintain the same maximum monthly spend in line with your legacy plan subscription plus any Overages:
-
-* **Pro:** Will transition to the Build plan. Auto-reload _**will not**_**&#x20;be enabled by default**.
-* **Turbo:** Will transition to the Build plan. Auto-reload _**will**_**&#x20;be enabled by default**. It will default to $30 auto-reload monthly spending limit for monthly subscribers and $22 for yearly subscribers. A handful of Turbo subscribers received a bulk discount for teams of 3 or more—please check your email for details on the default spending limits for your account.
-* **Lightspeed:** Will transition to the Build plan. Auto-reload _**will**_**&#x20;be enabled by default**. It will default to $205 auto-reload monthly spending limit for monthly subscribers and $182 for yearly subscribers.
-* **Business:** Will transition to the new Business plan. Auto-reload _**will**_**&#x20;be enabled by default**. It will default to $10 auto-reload monthly spending limit for both monthly and yearly subscribers.
-
-In any of the above cases, if Overages were enabled, we will set the monthly auto-reload spending limit equal to your Overage spending limit plus any of the amounts listed above.\
-\
-If your total auto-reload monthly spend limit is $80 or above, we will set the Add-on credit denomination to $20 / 1000 credits by default. If your total auto reload monthly spending limit is below $80, we will set the Add-on credit denomination to $10 / 400 credits by default.
-
-#### Can I continue to use Warp as my primary terminal?
-
-Yes, the terminal features of Warp will continue to be free to use for developers. Learn more at [Plans And Pricing](/support-and-community/plans-and-billing/plans-pricing-refunds/).
+### Warp's pricing change FAQs (May 14, 2026)
 
-#### How are Add-on credits different from overages?
+The following questions cover the May 14, 2026 pricing updates: seat limits per plan, Reload credits attribution, Max plan credit allowance changes, BYOK on Free, the new Custom Inference Endpoint feature, and ZDR / data-controls clarifications.
 
-Add-on credits replace overages with a simpler, prepaid system. They’re up to \~40% cheaper than the old overage rates, roll over month-to-month, and remain valid for 12 months. They also come with Warp’s full SOC 2 / Zero Data Retention protection.
+#### Are there new seat limits per plan?
 
-#### Do credits rollover?
+Yes. As of May 14, 2026, each plan has an explicit seat limit. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current per-plan caps.
 
-For existing users on legacy plans, plan credits on Pro, Turbo, and Lightspeed do not rollover.
+Teams that already exceed the new seat limit on their current plan remain in good standing — you won't be downgraded or charged differently. However, you **cannot add new members above the seat cap**, including backfilling members who leave. To grow past the cap, switch to the next plan up (or to Enterprise) at any time in **Settings** > **Billing and usage**.
 
-For the Build plan, credits will not rollover but Add-on credits will rollover and be valid for 12 months from the date of purchase.
+#### How are Reload credits being attributed differently?
 
-#### Can I purchase Add-on Credits on legacy plans (Pro, Turbo, Lightspeed)?
+Before May 14, 2026, Reload credits (previously called Add-on Credits) on multi-seat teams were **pooled** — every team member drew from a single shared balance. As of May 14, 2026, **Reload credits are user-scoped**: each user has their own balance, and a single heavy user can no longer drain the whole team's purchased credits.
 
-No. Add-on Credits (including auto-reload) are only available on the Build, Business, and Enterprise plans. If you attempt to purchase Add-on Credits on a legacy plan, the purchase will not go through. To access Add-on Credits, switch to the Build plan at any time from **Settings** > **Billing and usage** or at [app.warp.dev/upgrade](https://app.warp.dev/upgrade). If you need additional usage while on a legacy plan, you can use [Overages (Legacy)](/support-and-community/plans-and-billing/overages-legacy/) instead.
+**Grandfathering for pre-May 2026 pooled credits**: Existing pooled Reload credit balances are honored. They drain first across the team before any new user-scoped Reload credits are consumed. No new credits will be added to the pooled balance — once it's exhausted, all future Reload credit purchases are user-scoped.
 
-#### Can I bring my own key on legacy plans (Pro, Turbo, Lightspeed)?
+For details, see [Reload credits](/support-and-community/plans-and-billing/add-on-credits/).
 
-No, Bring-your-own API key for OpenAI, Anthropic, and Gemini is only available to users on the Warp Build plan. You can choose to switch your existing plan to Warp Build at any time before your applicable renewal date to access BYOK.
+#### Is the Max plan's monthly credit allowance changing?
 
-#### How does the monthly spend limit on Add-on Credits work?
+Yes. The Max plan's monthly credit allowance is decreasing as of May 14, 2026. For the current allowance, see [warp.dev/pricing](https://www.warp.dev/pricing).
 
-You set a monthly spend limit that applies to your AI usage for each calendar month. This limit acts as the maximum amount you can spend on credits during that period.
+**Grandfathering for existing Max subscribers**:
 
-If a purchase would exceed your limit, it won’t go through—you’ll need to either increase your limit or choose a smaller purchase amount.
+* **Annual subscribers** are honored at the prior credit allowance through the end of their current annual contract. The new allowance takes effect on the first renewal after the contract ends.
+* **Monthly subscribers** are honored at the prior credit allowance until their first renewal or payment on or after **July 1, 2026**. The new allowance takes effect at that renewal.
+* **Any plan change forfeits grandfathering** — switching to or from Max during the grandfather window moves you to the new allowance immediately.
 
-**For auto reload settings:**
+Confirm the exact allowance applied to your account in **Settings** > **Billing and usage**.
 
-* New users who enable auto reload will start with a $200 spend limit.
-* Existing paid plan users who enable auto reload will have their limit match their existing Overages spend limit (if previously configured, otherwise $200).
+#### Can I bring my own API key on the Free plan now?
 
-#### I’m an individual developer and need more than 1,500 credits per month. What’s the right plan for me?
+Yes. As of May 14, 2026, **bring your own API key (BYOK)** is available on all plans, including Free. Previously, BYOK required a Build, Business, or Enterprise subscription. You can configure your OpenAI, Anthropic, or Google key in **Settings** > **AI** > **Manage models**.
 
-If you regularly use more than 1,500 credits per month, the Build plan is designed for you. It includes 1,500 monthly credits and gives you the flexibility to scale further with Add-on Credits, which you can purchase at discounted rates directly under **Settings** > **Billing and usage**.
+See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) for the full list of supported providers and setup steps.
 
-Add-on Credits roll over month to month, remain valid for 12 months, and offer up to \~40% savings for larger denominations. You can also enable auto reload to automatically top up your credits when your balance runs low.
+#### What is the new Custom Inference Endpoint feature?
 
-If you’re part of a team that needs shared credit management, SSO, or enforced Zero Data Retention (ZDR), the Business plan provides all the same AI capabilities plus advanced security and administrative features.
+**Custom Inference Endpoint (CIE)** is a new way to route Warp's AI traffic through any OpenAI-compatible inference endpoint — including OpenRouter, LiteLLM, z.ai, and internal gateways your team already runs. CIE is available on Free, Build, Max, Business, and Enterprise.
 
-#### Should I subscribe to the Build plan or the Business plan?
+CIE differs from BYOK and BYOLLM:
 
-If you’re an individual developer or part of a small team, the Build plan is the best fit. It includes 1,500 monthly credits, discounted Add-on Credits for additional usage, and the ability to bring your own API key (BYOK) for OpenAI, Anthropic, or Google models. You’ll also get unlimited Warp Drive objects, collaboration tools, and the highest codebase indexing limits.
+* **BYOK** sends requests directly to OpenAI, Anthropic, or Google using your own provider API key.
+* **Custom inference endpoint** sends requests to any OpenAI-compatible URL you control (or that your team runs).
+* **BYOLLM** is an Enterprise-only managed inference feature where Warp routes traffic through your AWS Bedrock, Google Vertex, Azure Foundry, or internal LLM gateway with full routing, orchestration, governance, and observability provided by Warp.
 
-If you’re part of a larger team (up to 50 members) that needs advanced administrative and security controls, choose the Business plan. It includes everything in Build, plus SSO, enforced Zero Data Retention (ZDR), shared Add-on Credits that can be used by all team members, and centralized billing – ideal for organizations that need stronger security, compliance, and team-wide management.
+For setup and details, see [Custom Inference Endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/).
 
-#### How do credits work for multi-seat teams?
+#### How does Zero Data Retention (ZDR) work on the new plans?
 
-For teams on the Build or Business plans, credits are managed at two levels:
+Two distinct things are sometimes both called "ZDR" — May 14, 2026 makes the distinction clearer:
 
-* **Included monthly credits**: Each seat on a paid plan receives 1,500 individual credits per month. These credits are tied to the user and reset every 30 days based on their subscription or renewal date.
-* **Add-on Credits**: Once individual plan credits are used up, team members draw from a shared Add-on Credit balance that can be topped up and managed by team admins.
+* **Model provider ZDR** — Warp has Zero Data Retention agreements with the underlying model providers (OpenAI, Anthropic, Google) so your prompts and completions aren't retained by those providers for training or logging. This applies to **all plans, including Free**.
+* **Admin-configurable data controls** — Settings that let admins control what Warp itself retains for the team (data retention windows, training opt-outs, etc.). These are available on **Business and Enterprise**, and replace what was previously described as "automatically enforced team-wide ZDR."
 
-This shared model gives teams the flexibility to handle variable AI usage – heavy users can consume more when needed, while the entire team benefits from shared value and volume-based discounts.
+If you previously relied on the "automatically enforced team-wide ZDR" language on Business, your data controls are unchanged — Warp still does not retain your team's data for training, and admins can still configure retention. The terminology change disambiguates Warp's controls from the provider-level ZDR that applies to every plan.

From 86b11f61b5852e8e2e1101f3c4f32f227fee07ca Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyigma@gmail.com>
Date: Tue, 12 May 2026 18:40:41 -0700
Subject: [PATCH 2/5] Add BYOK / CIE org-size disclosure to pricing FAQs

Per PR #73 follow-up: BYOK and custom inference endpoint support are
available to individual users and organizations with 10 or fewer
employees; larger organizations need a Business or Enterprise plan.

* 'Can I bring my own API key?' FAQ: append the verbatim disclosure
  paragraph at the end of the answer.
* 'Does Warp support custom inference endpoints?' FAQ: same.
* 'What if I need more AI usage than my plan includes?' FAQ: add a
  one-line italicized cross-reference below the BYOK / CIE bullets
  pointing to the BYOK FAQ above.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../plans-and-billing/pricing-faqs.mdx                      | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx b/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
index a6ff7b01..58f70285 100644
--- a/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
+++ b/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
@@ -167,6 +167,8 @@ If you regularly run through your plan's monthly credit allowance, you have a fe
 * **Bring your own API key (BYOK)** — Point Warp at your own OpenAI, Anthropic, or Google API key. Requests routed through BYOK don't consume Warp credits — you're billed directly by the model provider. See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/).
 * **Use a custom inference endpoint** — Route requests through any OpenAI-compatible endpoint (OpenRouter, LiteLLM, z.ai, an internal gateway, etc.) without spending Warp credits. See [Custom Inference Endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/).
 
+*BYOK and custom inference endpoint availability is subject to organization size — see [Can I bring my own API key?](#can-i-bring-my-own-api-key) for details.*
+
 For organization-scale needs (cloud-provider managed inference, granular admin controls, or higher seat counts), Enterprise plans include [Bring Your Own LLM (BYOLLM)](/enterprise/enterprise-features/bring-your-own-llm/).
 
 ### How does auto-reload work for teams?
@@ -232,12 +234,16 @@ Yes. On Free, Build, Max, Business, and Enterprise plans, you can configure your
 
 See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/) for setup steps, the list of supported providers and models, and the differences between BYOK, custom inference endpoints, and BYOLLM.
 
+BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features.
+
 ### Does Warp support custom inference endpoints?
 
 Yes. In addition to BYOK, Warp can route requests to any OpenAI-compatible inference endpoint — including OpenRouter, LiteLLM, z.ai, and internal gateways your team already runs. Custom inference endpoint requests also don't consume Warp credits.
 
 Custom inference endpoints are available on Free, Build, Max, Business, and Enterprise. See [Custom Inference Endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/) for configuration steps, billing behavior, and how custom inference endpoints differ from BYOK and from Enterprise's [Bring Your Own LLM (BYOLLM)](/enterprise/enterprise-features/bring-your-own-llm/) managed inference.
 
+BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features.
+
 ### What payment options are available for Warp's self-service plans?
 
 Warp uses Stripe as our payments processor and currently accepts the following payment methods: credit card, debit card, Link, Apple Pay, Google Pay.

From eb9240f58f84e2895c3545e63f47780d4f7eb096 Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyigma@gmail.com>
Date: Tue, 12 May 2026 18:50:08 -0700
Subject: [PATCH 3/5] Add Custom inference endpoint page stub for CI link check

This file is the canonical version created on PR #71
(hyc/plan-updates-byok-cie). It is duplicated here so that the
link checker on this branch can resolve the relative references
to /support-and-community/plans-and-billing/custom-inference-endpoint/
that this PR introduces.

When PR #71 merges into hyc/plan-updates, git will reconcile the
identical file contents automatically.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../custom-inference-endpoint.mdx             | 117 ++++++++++++++++++
 1 file changed, 117 insertions(+)
 create mode 100644 src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx

diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx
new file mode 100644
index 00000000..77465ba4
--- /dev/null
+++ b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx
@@ -0,0 +1,117 @@
+---
+title: Custom inference endpoint
+description: >-
+  Connect Warp to any OpenAI-compatible custom inference endpoint, such as
+  OpenRouter, LiteLLM, z.ai, or an internal gateway. Available on the Free
+  plan and all eligible paid plans.
+---
+
+A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure.
+
+CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request.
+
+:::note
+CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans.
+:::
+
+:::note
+BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints.
+:::
+
+## Key features
+
+* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
+* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
+* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected.
+* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
+
+## How it works
+
+CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target:
+
+* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing.
+* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers.
+* **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
+* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
+
+When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
+
+:::caution
+CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
+:::
+
+When a CIE-routed model is selected:
+
+* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/).
+* Costs are billed directly by your endpoint provider.
+* Warp does not retain or store your endpoint credentials on any of its servers.
+
+## Enabling a custom inference endpoint
+
+To enable and configure a custom inference endpoint:
+
+1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration.
+2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key).
+3. Specify the model identifier(s) you want to route through this endpoint.
+4. Save the configuration. Once added, you'll see your custom models appear in the model picker.
+
+When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits.
+
+The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK.
+
+## Billing behavior
+
+### Warp credits
+
+When you select a CIE-routed model from the model picker:
+
+* No Warp credits are consumed for that request.
+* Inference is billed directly by your endpoint provider, according to their pricing.
+* The credit transparency footer will show "0 credits used" for CIE-routed requests.
+
+### Auto routing still uses Warp credits
+
+Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint.
+
+To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option.
+
+### Other AI features in Warp
+
+Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details.
+
+## Zero Data Retention (ZDR)
+
+Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.
+
+When you use a custom inference endpoint:
+
+* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
+* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
+* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.
+
+Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE.
+
+## Plan availability
+
+CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits.
+
+CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints.
+
+Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/).
+
+## How CIE differs from BYOK and BYOLLM
+
+Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
+
+| Name | Meaning | Plans |
+| --- | --- | --- |
+| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
+| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
+| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
+
+## Related resources
+
+* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys.
+* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
+* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
+* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.

From 3e22ebe27da285873a5b6b6b3a4424766f72c65b Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyigma@gmail.com>
Date: Tue, 19 May 2026 16:16:34 -0700
Subject: [PATCH 4/5] docs(pricing-may-2026): add platform-credits caveats,
 expand service-account FAQ, new platform-credits FAQ

Adds platform-credits caveats to the BYOK, custom inference endpoint, and "What if I need more AI usage" FAQs so Business/Enterprise readers know local agent runs that use customer-supplied inference still consume platform credits.

Expands the service-account / team-scoped API key FAQ to introduce the task billing principal concept, spell out the owner-pool waterfall, describe auto-reload off vs on behavior, and clarify that attribution stays on the service account while billing rolls up to the team owner. Expands the Enterprise note in that FAQ to cover team-scoped credit pool depletion and the PAYG fallback.

Adds a new "How do platform credits factor in?" FAQ after the multi-seat credits FAQ, summarizing the three credit buckets (AI, compute, platform) and when platform credits apply.

Pre-existing build error (CIE page missing from sidebar topic) is unrelated to this change and is fixed by PR #71's src/sidebar.ts update once it merges into the umbrella.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../plans-and-billing/pricing-faqs.mdx        | 34 ++++++++++++++-----
 1 file changed, 26 insertions(+), 8 deletions(-)

diff --git a/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx b/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
index 58f70285..50c13717 100644
--- a/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
+++ b/src/content/docs/support-and-community/plans-and-billing/pricing-faqs.mdx
@@ -157,6 +157,17 @@ On the new multi-seat paid plans, credits attribute differently than they did on
 
 Enterprise plans support team-scoped credit pools and per-user spend limits separately — see [enterprise billing](/enterprise/support-and-resources/billing/).
 
+### How do platform credits factor in?
+
+Warp meters credits across three buckets: **AI credits** (the model call), **compute credits** (the sandbox a cloud agent runs in), and **platform credits** (run lifecycle, integrations, dashboard, APIs, and observability). All three draw from the same pool — your monthly Warp credits first, then [Reload credits](/support-and-community/plans-and-billing/add-on-credits/) once those are exhausted.
+
+Platform credits apply in two situations:
+
+* **Every cloud agent run, on every plan.** Warp's platform infrastructure coordinates every cloud run regardless of which agent or inference source it uses.
+* **Local agent runs on Business and Enterprise that use customer-supplied inference** — BYOK, BYOLLM via Amazon Bedrock or Google Vertex, or a custom inference endpoint. Warp isn't paying for the model call, but Warp's platform infrastructure is still running the agent.
+
+Local agent runs on Free, Build, or Max — and local runs on Business or Enterprise that use Warp-managed inference — do not consume platform credits. See [platform credits](/support-and-community/plans-and-billing/platform-credits/) for the full breakdown.
+
 ### What if I need more AI usage than my plan includes?
 
 If you regularly run through your plan's monthly credit allowance, you have a few options:
@@ -164,8 +175,8 @@ If you regularly run through your plan's monthly credit allowance, you have a fe
 * **Upgrade to Max** — Designed for developers with heavier AI usage. Max includes a higher monthly credit allowance than Build, plus a better effective rate for credits than buying [Reload credits](/support-and-community/plans-and-billing/add-on-credits/) on Build. See [warp.dev/pricing](https://www.warp.dev/pricing) for current allowances.
 * **Purchase Reload credits** — Top up your account on demand at $10 / 400 credits, $20 / 1,000 credits, $50 / 3,000 credits, or $100 / 6,500 credits. Larger denominations have a better effective rate. Reload credits roll over month-to-month and remain valid for 12 months.
 * **Enable auto-reload** — Pick a denomination and a monthly spend cap, and Warp will automatically purchase Reload credits when your balance drops below 100 credits, up to your cap.
-* **Bring your own API key (BYOK)** — Point Warp at your own OpenAI, Anthropic, or Google API key. Requests routed through BYOK don't consume Warp credits — you're billed directly by the model provider. See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/).
-* **Use a custom inference endpoint** — Route requests through any OpenAI-compatible endpoint (OpenRouter, LiteLLM, z.ai, an internal gateway, etc.) without spending Warp credits. See [Custom Inference Endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/).
+* **Bring your own API key (BYOK)** — Point Warp at your own OpenAI, Anthropic, or Google API key. Requests routed through BYOK don't consume Warp credits — you're billed directly by the model provider. See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your-own-api-key/). On Business and Enterprise, platform credits may apply for local agent runs — see [platform credits](/support-and-community/plans-and-billing/platform-credits/).
+* **Use a custom inference endpoint** — Route requests through any OpenAI-compatible endpoint (OpenRouter, LiteLLM, z.ai, an internal gateway, etc.) without spending Warp credits. See [Custom Inference Endpoint](/support-and-community/plans-and-billing/custom-inference-endpoint/). On Business and Enterprise, platform credits may apply for local agent runs — see [platform credits](/support-and-community/plans-and-billing/platform-credits/).
 
 *BYOK and custom inference endpoint availability is subject to organization size — see [Can I bring my own API key?](#can-i-bring-my-own-api-key) for details.*
 
@@ -187,16 +198,19 @@ Auto-reload can be enabled, paused, or reconfigured at any time in **Settings**
 
 ### How are service account / team-scoped API key requests billed on self-serve plans?
 
-Programmatic usage that runs against a Warp team-level API key, scheduled agent, or service account (rather than a specific human user) is billed against the **admin who created the key** on self-serve plans (Build, Max, Business):
+Some AI requests on Warp can't be linked to an individual user — for example, requests made with a team-scoped API key, a scheduled agent, or an agent identity. These requests have no individual *task billing principal*. On self-serve plans (Build, Max, Business), Warp bills these requests against the **team owner**, while attribution stays on the service account.
 
-1. First, the admin's plan-included monthly credits are consumed.
-2. Once those are exhausted, the admin's own Reload credits are consumed (and auto-reload triggers if the admin has it on).
-3. If both are empty and auto-reload can't keep up with the team-wide spend cap, the request is blocked.
+The waterfall on the owner's account is:
 
-This waterfall keeps team-level usage transparent and attributable on self-serve plans, even though credits are user-scoped.
+1. First, the owner's plan-included monthly credits are consumed.
+2. Once those are exhausted, the owner's Reload credits are consumed.
+
+When auto-reload is **off**, the service account is blocked once both buckets are depleted. When auto-reload is **on**, service-account usage can trigger auto-reload on the owner's pool subject to the team-wide spend cap; further service-account requests then deplete that reloaded balance until the cap is reached.
+
+This waterfall keeps usage attributable to the service account in audit logs and usage breakdowns, while making billing predictable for the team owner.
 
 :::note
-Enterprise plans support team-scoped credit pools, so service-account and scheduled-agent traffic can be billed against a dedicated team pool instead of an individual admin. See [enterprise billing](/enterprise/support-and-resources/billing/) for details.
+Enterprise plans support team-scoped credit pools, so service-account and scheduled-agent traffic draws from the team pool rather than an individual admin. When the team pool is depleted, traffic falls to pay-as-you-go (PAYG) if enabled in the contract; otherwise it's blocked. See [enterprise billing](/enterprise/support-and-resources/billing/).
 :::
 
 ### Can I use a Free plan if I'm a developer at a large company or organization?
@@ -236,6 +250,8 @@ See [Bring Your Own API Key](/support-and-community/plans-and-billing/bring-your
 
 BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features.
 
+On Business and Enterprise, local agent runs that use BYOK still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/).
+
 ### Does Warp support custom inference endpoints?
 
 Yes. In addition to BYOK, Warp can route requests to any OpenAI-compatible inference endpoint — including OpenRouter, LiteLLM, z.ai, and internal gateways your team already runs. Custom inference endpoint requests also don't consume Warp credits.
@@ -244,6 +260,8 @@ Custom inference endpoints are available on Free, Build, Max, Business, and Ente
 
 BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use these features.
 
+On Business and Enterprise, local agent runs that use a custom inference endpoint still consume platform credits for Warp's platform infrastructure. See [platform credits](/support-and-community/plans-and-billing/platform-credits/).
+
 ### What payment options are available for Warp's self-service plans?
 
 Warp uses Stripe as our payments processor and currently accepts the following payment methods: credit card, debit card, Link, Apple Pay, Google Pay.

From 653c6c32317b68040809d0cb5bbeb987c3a36bc9 Mon Sep 17 00:00:00 2001
From: Hong Yi Chen <hongyigma@gmail.com>
Date: Tue, 19 May 2026 16:32:49 -0700
Subject: [PATCH 5/5] Remove duplicate CIE stub; PR #71 owns the canonical
 custom-inference-endpoint.mdx

The CIE page is owned by PR #71 (hyc/plan-updates-byok-cie), which also
adds the matching src/sidebar.ts entry. Removing the stub here so the
PR #73 \u2192 umbrella merge auto-resolves cleanly after #71 lands.

Co-Authored-By: Oz <oz-agent@warp.dev>
---
 .../custom-inference-endpoint.mdx             | 117 ------------------
 1 file changed, 117 deletions(-)
 delete mode 100644 src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx

diff --git a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx b/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx
deleted file mode 100644
index 77465ba4..00000000
--- a/src/content/docs/support-and-community/plans-and-billing/custom-inference-endpoint.mdx
+++ /dev/null
@@ -1,117 +0,0 @@
----
-title: Custom inference endpoint
-description: >-
-  Connect Warp to any OpenAI-compatible custom inference endpoint, such as
-  OpenRouter, LiteLLM, z.ai, or an internal gateway. Available on the Free
-  plan and all eligible paid plans.
----
-
-A **Custom inference endpoint (CIE)** lets you connect Warp's agents to any OpenAI-compatible inference endpoint, so you can route AI requests through your preferred model router, hosted gateway, or internal infrastructure.
-
-CIE is the right fit when you want to choose your provider, consolidate billing through a third-party router, or run inference behind your own gateway — without giving up the agent experience inside Warp. When a CIE is configured and selected, Warp **never consumes your** [credits](/support-and-community/plans-and-billing/credits/) for the request.
-
-:::note
-CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans.
-:::
-
-:::note
-BYOK and custom inference endpoint support are available for individual users and organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees require a Warp Business or Enterprise plan to use custom inference endpoints.
-:::
-
-## Key features
-
-* **OpenAI-compatible** - Works with any endpoint that implements the OpenAI Chat Completions API.
-* **Provider flexibility** - Use a model router (OpenRouter, LiteLLM), a model provider with an OpenAI-compatible surface (z.ai), or your own internal gateway.
-* **No Warp credits consumed** - Inference is billed directly by your endpoint provider; Warp's metered features remain unaffected.
-* **Local configuration** - Endpoint URLs and credentials are stored locally on your device and never synced to the cloud.
-
-## How it works
-
-CIE expects your endpoint to implement the **OpenAI Chat Completions API** (`POST /v1/chat/completions`). Any service that exposes a compatible surface can be used as a CIE target:
-
-* **OpenRouter** - Aggregates many model providers behind a single OpenAI-compatible API and consolidated billing.
-* **LiteLLM** - A self-hosted proxy that exposes a unified, OpenAI-compatible API across providers.
-* **z.ai** - A model provider with an OpenAI-compatible API surface for its models.
-* **Internal gateways** - Any in-house service that fronts model providers behind an OpenAI-compatible endpoint (for example, a corporate AI gateway with logging, redaction, or access control).
-
-When you configure a CIE, Warp stores the endpoint URL, model identifiers, and credentials **locally on your device**. They are never synced to Warp's servers.
-
-:::caution
-CIE does not apply to [Oz Cloud Agents](/agent-platform/cloud-agents/overview/). Because CIE configuration is stored locally, it is not available to cloud-hosted agent runs. Cloud agent runs always consume [Warp credits](/support-and-community/plans-and-billing/credits/).
-:::
-
-When a CIE-routed model is selected:
-
-* Warp **does not consume** any of your [credits](/support-and-community/plans-and-billing/credits/).
-* Costs are billed directly by your endpoint provider.
-* Warp does not retain or store your endpoint credentials on any of its servers.
-
-## Enabling a custom inference endpoint
-
-To enable and configure a custom inference endpoint:
-
-1. In Warp, open **Settings** and search for `inference endpoint` to jump to the configuration.
-2. Add your endpoint URL (the base URL that exposes `/v1/chat/completions`) and any required credentials (typically an API key).
-3. Specify the model identifier(s) you want to route through this endpoint.
-4. Save the configuration. Once added, you'll see your custom models appear in the model picker.
-
-When you explicitly select a CIE-routed model from the model picker, Warp routes the request through your endpoint instead of consuming Warp's credits.
-
-The CIE configuration flow mirrors the [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) setup, so the steps will feel familiar if you've already configured BYOK.
-
-## Billing behavior
-
-### Warp credits
-
-When you select a CIE-routed model from the model picker:
-
-* No Warp credits are consumed for that request.
-* Inference is billed directly by your endpoint provider, according to their pricing.
-* The credit transparency footer will show "0 credits used" for CIE-routed requests.
-
-### Auto routing still uses Warp credits
-
-Warp's **Auto** models dynamically route across providers using Warp's infrastructure. Because Auto routing depends on Warp, **Auto always consumes Warp's credits**, even if you've configured a custom inference endpoint.
-
-To use your endpoint, select the specific CIE-backed model from the model picker rather than an Auto option.
-
-### Other AI features in Warp
-
-Some AI-powered features rely on Warp's infrastructure and are unaffected by CIE configuration. These continue to consume credits according to your plan; see [Credits](/support-and-community/plans-and-billing/credits/) for details.
-
-## Zero Data Retention (ZDR)
-
-Warp is **SOC 2 compliant** and has **Zero Data Retention (ZDR)** agreements with all of its contracted LLM providers.
-
-When you use a custom inference endpoint:
-
-* Data retention is determined by **your endpoint provider** and any upstream model providers they route to.
-* Warp **cannot enforce ZDR** for requests sent through a custom inference endpoint.
-* If your endpoint provider does not have ZDR with the underlying model provider, your requests may be retained according to their terms.
-
-Review your endpoint provider's data handling and retention policies before routing sensitive prompts through a CIE.
-
-## Plan availability
-
-CIE is available on the Free plan and on all eligible paid plans. See [warp.dev/pricing](https://www.warp.dev/pricing) for the current list of eligible plans and any plan-specific limits.
-
-CIE is available to individual users and to organizations with 10 or fewer employees, subject to Warp's Terms of Service. Companies or organizations with more than 10 employees need a Warp Business or Enterprise plan to use custom inference endpoints.
-
-Centrally configured, admin-managed CIE for teams is not yet available. Each user configures their own endpoint locally. Enterprise teams that need centrally managed model routing today should see [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/).
-
-## How CIE differs from BYOK and BYOLLM
-
-Warp offers three ways to bring your own AI infrastructure. Use this table to pick the right one, and follow the links for full details.
-
-| Name | Meaning | Plans |
-| --- | --- | --- |
-| **[Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/)** (BYOK) | Use your own API key for OpenAI, Anthropic, or Google models. Keys are stored locally on your device. | Free and all eligible paid plans |
-| **Custom inference endpoint** (CIE) | Connect Warp to an OpenAI-compatible endpoint such as OpenRouter, LiteLLM, z.ai, or an internal gateway. | Free and all eligible paid plans |
-| **[Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/)** (BYOLLM) | Enterprise-managed inference through your cloud provider (AWS Bedrock, Azure Foundry, Google Vertex) or approved internal infrastructure, with Warp handling routing, orchestration, governance, and observability. | Enterprise only |
-
-## Related resources
-
-* [Bring your own API key](/support-and-community/plans-and-billing/bring-your-own-api-key/) — Use your own OpenAI, Anthropic, or Google API keys.
-* [Bring your own LLM](/enterprise/enterprise-features/bring-your-own-llm/) — Enterprise-managed inference through your cloud provider or approved infrastructure.
-* [Model Choice](/agent-platform/capabilities/model-choice/) — Full list of supported models and `model_id` values.
-* [Credits](/support-and-community/plans-and-billing/credits/) — How Warp credits work and when they're consumed.