Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
14 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion release-notes.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ Flash now supports deploying endpoints to [multiple datacenters](/flash/configur
- **Self-service worker upgrade**: Rebuild and roll workers from the dashboard without support tickets.
- **Edit template from endpoint page**: Inline edit and redeploy the underlying template directly from the endpoint view.
- **Improved Serverless metrics page**: Refinements to charts and filters for quicker root-cause analysis.
- [Flex and active workers](/serverless/pricing): Discounted always-on "active" capacity for baseline load with on-demand "flex" workers for bursts.
- [Flex and active workers](/serverless/pricing): Always-on "active" workers for baseline load with on-demand "flex" workers for bursts.
- **Billing explorer**: Inspect costs by resource, region, and time to identify optimization opportunities.

</Update>
Expand Down
2 changes: 1 addition & 1 deletion serverless/development/optimization.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ For private models, [embed them in your Docker image](/serverless/workers/create

### Maintain active workers

Set [active workers](/serverless/endpoints/endpoint-configurations#active-workers) > 0 to eliminate cold starts entirely. Active workers cost up to 30% less than flex workers.
Set [active workers](/serverless/endpoints/endpoint-configurations#active-workers) > 0 to eliminate cold starts entirely.

**Formula**: `Active workers = (Requests/min × Request duration in seconds) / 60`

Expand Down
2 changes: 1 addition & 1 deletion serverless/endpoints/endpoint-configurations.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ For endpoints with fewer than five workers, all workers use the highest-priority

### Active workers

Minimum number of workers that remain warm and ready at all times. Setting this to 1+ eliminates cold starts. Active workers incur charges when idle but receive a 20-30% discount.
Minimum number of workers that remain warm and ready at all times. Setting this to 1+ eliminates cold starts. Active workers incur charges continuously, including when idle.

### Max workers

Expand Down
2 changes: 1 addition & 1 deletion serverless/pricing.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ Serverless offers pay-per-second pricing with no upfront costs. You're billed fr
| | Flex workers | Active workers |
|---|--------------|----------------|
| **Behavior** | Scale to zero when idle | Always running (24/7) |
| **Pricing** | Standard per-second rate | 20–30% discount |
| **Pricing** | Standard per-second rate | Discounts available through sales inquiry |
| **Best for** | Variable workloads, cost optimization | Consistent traffic, low-latency requirements |

## GPU pricing
Expand Down
2 changes: 1 addition & 1 deletion serverless/workers/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ To deploy workers with AI/ML models, follow this order of preference:

Workers can run in two modes depending on your latency and cost requirements:

- **Active workers** run continuously (24/7) and are always ready to process requests instantly. They eliminate cold starts entirely and receive a discounted rate, making them ideal for latency-sensitive or high-traffic applications.
- **Active workers** run continuously (24/7) and are always ready to process requests instantly. They eliminate cold starts entirely, making them ideal for latency-sensitive or high-traffic applications.

- **Flex workers** scale dynamically based on demand, spinning down to zero when idle. They incur cold starts when scaling up but cost nothing when not in use, making them ideal for variable or sporadic workloads.

Expand Down
Loading