|
| 1 | +--- |
| 2 | +title: 'Billing & Credits' |
| 3 | +description: 'How billing works for open and closed models' |
| 4 | +icon: 'credit-card' |
| 5 | +mode: 'wide' |
| 6 | +--- |
| 7 | + |
| 8 | +Bytez uses a credit-based system. Credits are consumed when you run models, and how they're consumed depends on whether you're using closed-source or open-source models. |
| 9 | + |
| 10 | +## Plans |
| 11 | + |
| 12 | +<CardGroup cols={2}> |
| 13 | + <Card title="Free" icon="gift"> |
| 14 | + {`$0 / month - Get $1 in free credits`} |
| 15 | + |
| 16 | + - Run open models up to 7B parameters |
| 17 | + - Access all closed model providers |
| 18 | + - 1 concurrent request (open models) |
| 19 | + - 10 requests/second (closed models) |
| 20 | + - Credits refresh every 4 weeks |
| 21 | + |
| 22 | + </Card> |
| 23 | + <Card title="Pay-as-you-go" icon="rocket"> |
| 24 | + {`$3 / month - Get $5 in credits`} |
| 25 | + |
| 26 | + - Run open models up to 120B parameters |
| 27 | + - Access all closed model providers |
| 28 | + - Rate limits scale with credits purchased |
| 29 | + - Unlimited closed model requests |
| 30 | + - Add credits anytime |
| 31 | + |
| 32 | + </Card> |
| 33 | +</CardGroup> |
| 34 | + |
| 35 | +## How Credits Work |
| 36 | + |
| 37 | +Credits are a unified currency across all models on Bytez. When you run a model, credits are deducted from your balance based on usage. |
| 38 | + |
| 39 | +| Model Type | How Credits Are Consumed | |
| 40 | +| ------------- | ----------------------------------------------------------------- | |
| 41 | +| Closed models | Based on provider pricing (per token, per image, per video, etc.) | |
| 42 | +| Open models | Per second of inference | |
| 43 | + |
| 44 | +Your credits purchased in the last 4 weeks determine two things: |
| 45 | + |
| 46 | +1. **Which open models you can access** - Larger open models require more credits purchased to unlock |
| 47 | +2. **Your rate limits** - More credits purchased unlocks more concurrent requests |
| 48 | + |
| 49 | +<Info> |
| 50 | + Adding credits immediately unlocks higher tiers. You don't need to wait for the next billing |
| 51 | + cycle. |
| 52 | +</Info> |
| 53 | + |
| 54 | +### Credit Unlock Thresholds |
| 55 | + |
| 56 | +| Credits Purchased (last 4 weeks) | Open Model Access | Concurrent Requests (7B) | |
| 57 | +| -------------------------------- | ----------------- | ------------------------ | |
| 58 | +| $0 (Free) | Up to 7B | 1 | |
| 59 | +| $3+ | Up to 7B | 4 | |
| 60 | +| $10+ | Up to 35B | 4 | |
| 61 | +| $25+ | Up to 70B | 10 | |
| 62 | +| $50+ | Up to 120B | 20 | |
| 63 | +| $100+ | Up to 120B | 40 | |
| 64 | +| $500+ | Up to 120B | 200 | |
| 65 | +| $1,000+ | Up to 120B | 400 | |
| 66 | + |
| 67 | +<Warning>Credits expire 4 weeks after purchase. Use them or lose them!</Warning> |
| 68 | + |
| 69 | +--- |
| 70 | + |
| 71 | +## Closed Model Billing |
| 72 | + |
| 73 | +For closed-source models (OpenAI, Anthropic, Google, Mistral, Cohere), we pass through the provider's pricing plus a small platform fee. |
| 74 | + |
| 75 | +``` |
| 76 | +Your cost = Provider price + 2% platform fee |
| 77 | +``` |
| 78 | + |
| 79 | +Providers charge differently depending on the model and modality - per token for text, per image for image generation, per second for video, etc. We pass through whatever the provider charges. |
| 80 | + |
| 81 | +**Example:** If OpenAI charges {`$0.000001`} per M tokens, you pay {`$0.00000102`} per M tokens. |
| 82 | + |
| 83 | +<Accordion title="Why the 2% fee?"> |
| 84 | + The platform fee covers: |
| 85 | + - Unified API translation and standardization |
| 86 | + - Request routing and load balancing |
| 87 | + - Usage tracking and analytics |
| 88 | + - Support and reliability infrastructure |
| 89 | + |
| 90 | +You get a single API, single billing, and single format across all providers. |
| 91 | + |
| 92 | +</Accordion> |
| 93 | + |
| 94 | +### What's included |
| 95 | + |
| 96 | +- **Pass-through pricing** - Pay only for what the provider charges |
| 97 | +- **No minimum** - No monthly minimums or commitments |
| 98 | +- **Real-time pricing** - We pass through provider rates as they change |
| 99 | + |
| 100 | +--- |
| 101 | + |
| 102 | +## Open Model Billing |
| 103 | + |
| 104 | +Open-source models run on our **serverless GPU infrastructure**. You're billed per second of inference time - no cold start fees, no idle charges. |
| 105 | + |
| 106 | +``` |
| 107 | +Your cost = Inference time (seconds) x Rate for model size |
| 108 | +``` |
| 109 | + |
| 110 | +### Pricing by Model Size |
| 111 | + |
| 112 | +Bigger models use more VRAM, so they cost more per second: |
| 113 | + |
| 114 | +| Model Size | Per Second | Per Hour | |
| 115 | +| ---------- | ---------- | -------- | |
| 116 | +| 7B | $0.000072 | ~$0.26 | |
| 117 | +| 15B | $0.000108 | ~$0.39 | |
| 118 | +| 35B | $0.000144 | ~$0.52 | |
| 119 | +| 70B | $0.000216 | ~$0.78 | |
| 120 | +| 120B | $0.00036 | ~$1.30 | |
| 121 | + |
| 122 | +<Accordion title="How we calculate pricing"> |
| 123 | + Our base rate is **$0.0000045/GB-second** of VRAM used. |
| 124 | + |
| 125 | +For comparison: |
| 126 | + |
| 127 | +- **Bytez:** $0.0000045/GB-sec (with Nvidia GPUs) |
| 128 | +- **AWS Lambda:** $0.0000167/GB-sec (CPUs only) |
| 129 | + |
| 130 | +That's **3.7x cheaper** than AWS Lambda, and you get serverless Nvidia GPUs, not just serverless CPUs. |
| 131 | + |
| 132 | +</Accordion> |
| 133 | + |
| 134 | +### What's included |
| 135 | + |
| 136 | +- **Per-second billing** - Billed in 1-second increments |
| 137 | +- **No cold start fees** - You don't pay while the model loads |
| 138 | +- **No idle charges** - You don't pay when not running inference |
| 139 | +- **No reserved instances** - No commitments, no minimums |
| 140 | + |
| 141 | +--- |
| 142 | + |
| 143 | +## Auto-Reload |
| 144 | + |
| 145 | +Auto-reload automatically tops up your credit balance when it runs low, so your API calls never fail unexpectedly. |
| 146 | + |
| 147 | +### How it works |
| 148 | + |
| 149 | +| Setting | Default | Description | |
| 150 | +| ------------- | ------- | --------------------------------------------- | |
| 151 | +| Threshold | $3 | Reload triggers when balance drops below this | |
| 152 | +| Reload amount | $10 | Amount added to your balance | |
| 153 | +| Monthly max | $100 | Maximum auto-reload spend per month | |
| 154 | + |
| 155 | +<Steps> |
| 156 | + <Step title="Balance drops below threshold"> |
| 157 | + When your credit balance falls below $3 (default), auto-reload activates |
| 158 | + </Step> |
| 159 | + <Step title="Card is charged"> |
| 160 | + Your saved payment method is charged $10 (default reload amount) |
| 161 | + </Step> |
| 162 | + <Step title="Credits are added">$10 in credits is immediately added to your balance</Step> |
| 163 | + <Step title="Monthly cap enforced"> |
| 164 | + Auto-reload stops if you've hit your monthly maximum ($100 default) |
| 165 | + </Step> |
| 166 | +</Steps> |
| 167 | + |
| 168 | +### If Auto-Reload is Disabled |
| 169 | + |
| 170 | +When auto-reload is off and your credits run out, you may get an API response like this: |
| 171 | + |
| 172 | +```json |
| 173 | +{ |
| 174 | + "status": 402, |
| 175 | + "error": "Payment Required", |
| 176 | + "message": "Insufficient credits. Please add credits to continue." |
| 177 | +} |
| 178 | +``` |
| 179 | + |
| 180 | +<Warning> |
| 181 | + If you're running production workloads, we recommend enabling auto-reload to prevent unexpected |
| 182 | + failures. |
| 183 | +</Warning> |
| 184 | + |
| 185 | +### Configuring Auto-Reload |
| 186 | + |
| 187 | +You can enable, disable, or adjust auto-reload settings in your [API Dashboard](https://bytez.com/api/billing). |
| 188 | + |
| 189 | + |
| 190 | + |
| 191 | +--- |
| 192 | + |
| 193 | +## Auto-Scaling (Open Models) |
| 194 | + |
| 195 | +By default, if you exceed your open model rate limits, requests are rejected with a rate-limit error. |
| 196 | + |
| 197 | +If you want your rate limits to automatically scale with your traffic in production, add `autoScale: true` to your request: |
| 198 | + |
| 199 | +```javascript |
| 200 | +const response = await fetch('https://api.bytez.com/v1/chat/completions', { |
| 201 | + method: 'POST', |
| 202 | + headers: { |
| 203 | + 'Authorization': API_KEY, |
| 204 | + 'Content-Type': 'application/json' |
| 205 | + }, |
| 206 | + body: JSON.stringify({ |
| 207 | + model: 'meta-llama/Llama-3-70b', |
| 208 | + messages: [...], |
| 209 | + autoScale: true |
| 210 | + }) |
| 211 | +}); |
| 212 | +``` |
| 213 | + |
| 214 | +When enabled, the system auto-purchases extra credits required to keep auto-scaling. You can control your **Max Monthly Spend** in your [API Dashboard](https://bytez.com/api/billing) to cap costs. This way you can auto-scale and control your budget. |
| 215 | + |
| 216 | +<Info> |
| 217 | +For closed models, you get unlimited rate limits on a pay-as-you-go basis - no auto-scaling needed. |
| 218 | +</Info> |
| 219 | + |
| 220 | +## Billing Cycle |
| 221 | + |
| 222 | +<AccordionGroup> |
| 223 | + <Accordion title="Free Plan"> |
| 224 | + - **Billing:** None |
| 225 | + - **Credits:** {`$1`} free credits, refreshed every 4 weeks |
| 226 | + - **Expiration:** Credits expire 4 weeks after grant |
| 227 | + </Accordion> |
| 228 | + <Accordion title="Pay-as-you-go Plan"> |
| 229 | + - **Billing:** {`$3/month`} charged on signup date |
| 230 | + - **Credits:** $5 in credits granted each |
| 231 | + billing cycle |
| 232 | + - **Expiration:** All credits expire 4 weeks after grant |
| 233 | + </Accordion> |
| 234 | +</AccordionGroup> |
| 235 | + |
| 236 | +### Adding Credits Mid-Cycle |
| 237 | + |
| 238 | +You can add credits at any time. When you do: |
| 239 | + |
| 240 | +1. **Immediate access** - Higher model tiers and rate limits unlock instantly |
| 241 | +2. **No proration** - You get the full credit amount immediately |
| 242 | +3. **Credits stack** - Purchased credits add to your existing balance |
| 243 | + |
| 244 | +<Info> |
| 245 | + **Example:** You're on `Pay-as-you-go` with {`$2`} remaining. You add {`$25`}. Your new balance is {`$27`}, |
| 246 | + which immediately unlocks 70B models and 10 concurrent requests. |
| 247 | +</Info> |
| 248 | + |
| 249 | +--- |
| 250 | + |
| 251 | +## FAQ |
| 252 | + |
| 253 | +<AccordionGroup> |
| 254 | + <Accordion title="What happens if I run out of credits mid-request?"> |
| 255 | + In-flight requests will complete. Only new requests will fail with a 402 error. |
| 256 | + </Accordion> |
| 257 | + <Accordion title="Can I get a refund on unused credits?"> |
| 258 | + Credits are non-refundable and expire 4 weeks after purchase. |
| 259 | + </Accordion> |
| 260 | + <Accordion title="How do I track my usage?"> |
| 261 | + Visit your [API Dashboard](https://bytez.com/api/billing) to see real-time usage, credit |
| 262 | + balance, and request history. |
| 263 | + </Accordion> |
| 264 | + <Accordion title="Why do bigger models require more credits purchased?"> |
| 265 | + Larger models require more GPU resources (VRAM). Requiring a minimum purchase threshold ensures |
| 266 | + you have enough credits to complete meaningful workloads without running out mid-task. |
| 267 | + </Accordion> |
| 268 | + <Accordion title="Is there volume pricing?"> |
| 269 | + For high-volume usage (>{`$1,000`}/month), contact us at [team@bytez.com](mailto:team@bytez.com) for |
| 270 | + custom pricing. |
| 271 | + </Accordion> |
| 272 | +</AccordionGroup> |
0 commit comments