Skip to content

Commit d92c594

Browse files
committed
adding billing section docs
1 parent 2823438 commit d92c594

3 files changed

Lines changed: 273 additions & 3 deletions

File tree

docs/docs.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@
4444
{
4545
"group": "Understand the API",
4646
"icon": "head-side-gear",
47-
"pages": ["model-api/docs/understand-the-api"]
47+
"pages": ["model-api/docs/understand-the-api", "model-api/docs/billing"]
4848
},
4949
{
5050
"group": "Open AI Completions",

docs/model-api/docs/billing.mdx

Lines changed: 272 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,272 @@
1+
---
2+
title: 'Billing & Credits'
3+
description: 'How billing works for open and closed models'
4+
icon: 'credit-card'
5+
mode: 'wide'
6+
---
7+
8+
Bytez uses a credit-based system. Credits are consumed when you run models, and how they're consumed depends on whether you're using closed-source or open-source models.
9+
10+
## Plans
11+
12+
<CardGroup cols={2}>
13+
<Card title="Free" icon="gift">
14+
{`$0 / month - Get $1 in free credits`}
15+
16+
- Run open models up to 7B parameters
17+
- Access all closed model providers
18+
- 1 concurrent request (open models)
19+
- 10 requests/second (closed models)
20+
- Credits refresh every 4 weeks
21+
22+
</Card>
23+
<Card title="Pay-as-you-go" icon="rocket">
24+
{`$3 / month - Get $5 in credits`}
25+
26+
- Run open models up to 120B parameters
27+
- Access all closed model providers
28+
- Rate limits scale with credits purchased
29+
- Unlimited closed model requests
30+
- Add credits anytime
31+
32+
</Card>
33+
</CardGroup>
34+
35+
## How Credits Work
36+
37+
Credits are a unified currency across all models on Bytez. When you run a model, credits are deducted from your balance based on usage.
38+
39+
| Model Type | How Credits Are Consumed |
40+
| ------------- | ----------------------------------------------------------------- |
41+
| Closed models | Based on provider pricing (per token, per image, per video, etc.) |
42+
| Open models | Per second of inference |
43+
44+
Your credits purchased in the last 4 weeks determine two things:
45+
46+
1. **Which open models you can access** - Larger open models require more credits purchased to unlock
47+
2. **Your rate limits** - More credits purchased unlocks more concurrent requests
48+
49+
<Info>
50+
Adding credits immediately unlocks higher tiers. You don't need to wait for the next billing
51+
cycle.
52+
</Info>
53+
54+
### Credit Unlock Thresholds
55+
56+
| Credits Purchased (last 4 weeks) | Open Model Access | Concurrent Requests (7B) |
57+
| -------------------------------- | ----------------- | ------------------------ |
58+
| $0 (Free) | Up to 7B | 1 |
59+
| $3+ | Up to 7B | 4 |
60+
| $10+ | Up to 35B | 4 |
61+
| $25+ | Up to 70B | 10 |
62+
| $50+ | Up to 120B | 20 |
63+
| $100+ | Up to 120B | 40 |
64+
| $500+ | Up to 120B | 200 |
65+
| $1,000+ | Up to 120B | 400 |
66+
67+
<Warning>Credits expire 4 weeks after purchase. Use them or lose them!</Warning>
68+
69+
---
70+
71+
## Closed Model Billing
72+
73+
For closed-source models (OpenAI, Anthropic, Google, Mistral, Cohere), we pass through the provider's pricing plus a small platform fee.
74+
75+
```
76+
Your cost = Provider price + 2% platform fee
77+
```
78+
79+
Providers charge differently depending on the model and modality - per token for text, per image for image generation, per second for video, etc. We pass through whatever the provider charges.
80+
81+
**Example:** If OpenAI charges {`$0.000001`} per M tokens, you pay {`$0.00000102`} per M tokens.
82+
83+
<Accordion title="Why the 2% fee?">
84+
The platform fee covers:
85+
- Unified API translation and standardization
86+
- Request routing and load balancing
87+
- Usage tracking and analytics
88+
- Support and reliability infrastructure
89+
90+
You get a single API, single billing, and single format across all providers.
91+
92+
</Accordion>
93+
94+
### What's included
95+
96+
- **Pass-through pricing** - Pay only for what the provider charges
97+
- **No minimum** - No monthly minimums or commitments
98+
- **Real-time pricing** - We pass through provider rates as they change
99+
100+
---
101+
102+
## Open Model Billing
103+
104+
Open-source models run on our **serverless GPU infrastructure**. You're billed per second of inference time - no cold start fees, no idle charges.
105+
106+
```
107+
Your cost = Inference time (seconds) x Rate for model size
108+
```
109+
110+
### Pricing by Model Size
111+
112+
Bigger models use more VRAM, so they cost more per second:
113+
114+
| Model Size | Per Second | Per Hour |
115+
| ---------- | ---------- | -------- |
116+
| 7B | $0.000072 | ~$0.26 |
117+
| 15B | $0.000108 | ~$0.39 |
118+
| 35B | $0.000144 | ~$0.52 |
119+
| 70B | $0.000216 | ~$0.78 |
120+
| 120B | $0.00036 | ~$1.30 |
121+
122+
<Accordion title="How we calculate pricing">
123+
Our base rate is **$0.0000045/GB-second** of VRAM used.
124+
125+
For comparison:
126+
127+
- **Bytez:** $0.0000045/GB-sec (with Nvidia GPUs)
128+
- **AWS Lambda:** $0.0000167/GB-sec (CPUs only)
129+
130+
That's **3.7x cheaper** than AWS Lambda, and you get serverless Nvidia GPUs, not just serverless CPUs.
131+
132+
</Accordion>
133+
134+
### What's included
135+
136+
- **Per-second billing** - Billed in 1-second increments
137+
- **No cold start fees** - You don't pay while the model loads
138+
- **No idle charges** - You don't pay when not running inference
139+
- **No reserved instances** - No commitments, no minimums
140+
141+
---
142+
143+
## Auto-Reload
144+
145+
Auto-reload automatically tops up your credit balance when it runs low, so your API calls never fail unexpectedly.
146+
147+
### How it works
148+
149+
| Setting | Default | Description |
150+
| ------------- | ------- | --------------------------------------------- |
151+
| Threshold | $3 | Reload triggers when balance drops below this |
152+
| Reload amount | $10 | Amount added to your balance |
153+
| Monthly max | $100 | Maximum auto-reload spend per month |
154+
155+
<Steps>
156+
<Step title="Balance drops below threshold">
157+
When your credit balance falls below $3 (default), auto-reload activates
158+
</Step>
159+
<Step title="Card is charged">
160+
Your saved payment method is charged $10 (default reload amount)
161+
</Step>
162+
<Step title="Credits are added">$10 in credits is immediately added to your balance</Step>
163+
<Step title="Monthly cap enforced">
164+
Auto-reload stops if you've hit your monthly maximum ($100 default)
165+
</Step>
166+
</Steps>
167+
168+
### If Auto-Reload is Disabled
169+
170+
When auto-reload is off and your credits run out, you may get an API response like this:
171+
172+
```json
173+
{
174+
"status": 402,
175+
"error": "Payment Required",
176+
"message": "Insufficient credits. Please add credits to continue."
177+
}
178+
```
179+
180+
<Warning>
181+
If you're running production workloads, we recommend enabling auto-reload to prevent unexpected
182+
failures.
183+
</Warning>
184+
185+
### Configuring Auto-Reload
186+
187+
You can enable, disable, or adjust auto-reload settings in your [API Dashboard](https://bytez.com/api/billing).
188+
189+
190+
191+
---
192+
193+
## Auto-Scaling (Open Models)
194+
195+
By default, if you exceed your open model rate limits, requests are rejected with a rate-limit error.
196+
197+
If you want your rate limits to automatically scale with your traffic in production, add `autoScale: true` to your request:
198+
199+
```javascript
200+
const response = await fetch('https://api.bytez.com/v1/chat/completions', {
201+
method: 'POST',
202+
headers: {
203+
'Authorization': API_KEY,
204+
'Content-Type': 'application/json'
205+
},
206+
body: JSON.stringify({
207+
model: 'meta-llama/Llama-3-70b',
208+
messages: [...],
209+
autoScale: true
210+
})
211+
});
212+
```
213+
214+
When enabled, the system auto-purchases extra credits required to keep auto-scaling. You can control your **Max Monthly Spend** in your [API Dashboard](https://bytez.com/api/billing) to cap costs. This way you can auto-scale and control your budget.
215+
216+
<Info>
217+
For closed models, you get unlimited rate limits on a pay-as-you-go basis - no auto-scaling needed.
218+
</Info>
219+
220+
## Billing Cycle
221+
222+
<AccordionGroup>
223+
<Accordion title="Free Plan">
224+
- **Billing:** None
225+
- **Credits:** {`$1`} free credits, refreshed every 4 weeks
226+
- **Expiration:** Credits expire 4 weeks after grant
227+
</Accordion>
228+
<Accordion title="Pay-as-you-go Plan">
229+
- **Billing:** {`$3/month`} charged on signup date
230+
- **Credits:** $5 in credits granted each
231+
billing cycle
232+
- **Expiration:** All credits expire 4 weeks after grant
233+
</Accordion>
234+
</AccordionGroup>
235+
236+
### Adding Credits Mid-Cycle
237+
238+
You can add credits at any time. When you do:
239+
240+
1. **Immediate access** - Higher model tiers and rate limits unlock instantly
241+
2. **No proration** - You get the full credit amount immediately
242+
3. **Credits stack** - Purchased credits add to your existing balance
243+
244+
<Info>
245+
**Example:** You're on `Pay-as-you-go` with {`$2`} remaining. You add {`$25`}. Your new balance is {`$27`},
246+
which immediately unlocks 70B models and 10 concurrent requests.
247+
</Info>
248+
249+
---
250+
251+
## FAQ
252+
253+
<AccordionGroup>
254+
<Accordion title="What happens if I run out of credits mid-request?">
255+
In-flight requests will complete. Only new requests will fail with a 402 error.
256+
</Accordion>
257+
<Accordion title="Can I get a refund on unused credits?">
258+
Credits are non-refundable and expire 4 weeks after purchase.
259+
</Accordion>
260+
<Accordion title="How do I track my usage?">
261+
Visit your [API Dashboard](https://bytez.com/api/billing) to see real-time usage, credit
262+
balance, and request history.
263+
</Accordion>
264+
<Accordion title="Why do bigger models require more credits purchased?">
265+
Larger models require more GPU resources (VRAM). Requiring a minimum purchase threshold ensures
266+
you have enough credits to complete meaningful workloads without running out mid-task.
267+
</Accordion>
268+
<Accordion title="Is there volume pricing?">
269+
For high-volume usage (>{`$1,000`}/month), contact us at [team@bytez.com](mailto:team@bytez.com) for
270+
custom pricing.
271+
</Accordion>
272+
</AccordionGroup>

docs/model-api/docs/understand-the-api.mdx

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -44,8 +44,6 @@ Now, let's dive into the specific ways we handle requests under the hood for ope
4444

4545
**Key Takeaway:** For closed-source models, we act as a router and standardization layer. You interact with a **single, unified protocol**, making it easy to switch between models providers or use multiple providers without changing your code structure. The inference itself happens on the provider's infrastructure.
4646

47-
**Billing**: We don't charge anything for closed source models. Billing for closed-source models is based on the provider's pricing. They'll bill you based on the API key you provide.
48-
4947
</Accordion>
5048
<Accordion icon="lock-keyhole-open" title="Open‑Source Models –  Serverless GPU Inference">
5149
When you run an **open‑source** model, Bytez handles all the heavy lifting for you.

0 commit comments

Comments
 (0)