|
1 | 1 | --- |
2 | | -title: Using volumes to optimize cold starts on RunPod |
| 2 | +title: Using volumes to optimize cold starts on Runpod |
3 | 3 | date: 2024-08-13 |
4 | | -description: "Learn how to use volumes with dstack to optimize model inference cold start times on RunPod." |
| 4 | +description: "Learn how to use volumes with dstack to optimize model inference cold start times on Runpod." |
5 | 5 | slug: volumes-on-runpod |
6 | 6 | categories: |
7 | 7 | - Changelog |
8 | 8 | --- |
9 | 9 |
|
10 | | -# Using volumes to optimize cold starts on RunPod |
| 10 | +# Using volumes to optimize cold starts on Runpod |
11 | 11 |
|
12 | 12 | Deploying custom models in the cloud often faces the challenge of cold start times, including the time to provision a |
13 | 13 | new instance and download the model. This is especially relevant for services with autoscaling when new model replicas |
14 | 14 | need to be provisioned quickly. |
15 | 15 |
|
16 | 16 | Let's explore how `dstack` optimizes this process using volumes, with an example of |
17 | | -deploying a model on RunPod. |
| 17 | +deploying a model on Runpod. |
18 | 18 |
|
19 | 19 | <!-- more --> |
20 | 20 |
|
21 | | -Suppose you want to deploy Llama 3.1 on RunPod as a [service](../../docs/concepts/services.md): |
| 21 | +Suppose you want to deploy Llama 3.1 on Runpod as a [service](../../docs/concepts/services.md): |
22 | 22 |
|
23 | 23 | <div editor-title="examples/llms/llama31/tgi/service.dstack.yml"> |
24 | 24 |
|
@@ -59,9 +59,9 @@ When starting each replica, `text-generation-launcher` downloads the model to th |
59 | 59 | usually takes under a minute, but larger models may take longer. Repeated downloads can significantly affect |
60 | 60 | auto-scaling efficiency. |
61 | 61 |
|
62 | | -Great news: RunPod supports network volumes, which we can use for caching models across multiple replicas. |
| 62 | +Great news: Runpod supports network volumes, which we can use for caching models across multiple replicas. |
63 | 63 |
|
64 | | -With `dstack`, you can create a RunPod volume using the following configuration: |
| 64 | +With `dstack`, you can create a Runpod volume using the following configuration: |
65 | 65 |
|
66 | 66 | <div editor-title="examples/mist/volumes/runpod.dstack.yml"> |
67 | 67 |
|
@@ -130,7 +130,7 @@ resources: |
130 | 130 | In this case, `dstack` attaches the specified volume to each new replica. This ensures the model is downloaded only |
131 | 131 | once, reducing cold start time in proportion to the model size. |
132 | 132 |
|
133 | | -A notable feature of RunPod is that volumes can be attached to multiple containers simultaneously. This capability is |
| 133 | +A notable feature of Runpod is that volumes can be attached to multiple containers simultaneously. This capability is |
134 | 134 | particularly useful for auto-scalable services or distributed tasks. |
135 | 135 |
|
136 | 136 | Using [volumes](../../docs/concepts/volumes.md) not only optimizes inference cold start times but also enhances the |
|
0 commit comments