Skip to content

Commit 9565045

Browse files
[Docs] Minor improvements (#2766)
- Updated PyTorch Distributed example - Missing links in Clusters guide - Added `Utilization policy` examples to `Dev environments`, `Tasks`, `Services` - Added `Utilization policy` and `Retry policy` to `Protips` - Updated examples from Python 3.10 to 3.12
1 parent 6157962 commit 9565045

File tree

14 files changed

+387
-245
lines changed

14 files changed

+387
-245
lines changed

docs/blog/posts/efa.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,7 +117,7 @@ name: efa-task
117117
# The size of the cluster
118118
nodes: 2
119119
120-
python: "3.12"
120+
python: 3.12
121121
122122
# Commands to run on each node
123123
commands:

docs/docs/concepts/dev-environments.md

Lines changed: 100 additions & 73 deletions
Original file line numberDiff line numberDiff line change
@@ -99,66 +99,6 @@ init:
9999

100100
</div>
101101

102-
### Inactivity duration
103-
104-
Set [`inactivity_duration`](../reference/dstack.yml/dev-environment.md#inactivity_duration)
105-
to automatically stop the dev environment after a configured period of inactivity.
106-
107-
<div editor-title=".dstack.yml">
108-
109-
```yaml
110-
type: dev-environment
111-
name: vscode
112-
ide: vscode
113-
114-
# Stop if inactive for 2 hours
115-
inactivity_duration: 2h
116-
```
117-
118-
</div>
119-
120-
The dev environment becomes inactive when you close the remote VS Code window,
121-
close any `ssh <run name>` shells, and stop the `dstack apply` or `dstack attach` command.
122-
If you go offline without stopping anything manually, the dev environment will also become inactive
123-
within about 3 minutes.
124-
125-
If `inactivity_duration` is configured for your dev environment, you can see how long
126-
it has been inactive in `dstack ps --verbose`.
127-
128-
<div class="termy">
129-
130-
```shell
131-
$ dstack ps --verbose
132-
NAME BACKEND RESOURCES PRICE STATUS SUBMITTED
133-
vscode cudo 2xCPU, 8GB, $0.0286 running 8 mins ago
134-
100.0GB (disk) (inactive for 2m 34s)
135-
```
136-
137-
</div>
138-
139-
If you reattach to the dev environment using [`dstack attach`](../reference/cli/dstack/attach.md),
140-
the inactivity timer will be reset within a few seconds.
141-
142-
??? info "In-place update"
143-
As long as the configuration defines the `name` property, the value of `inactivity_duration`
144-
can be changed for a running dev environment without a restart.
145-
Just change the value in the configuration and run `dstack apply` again.
146-
147-
<div class="termy">
148-
149-
```shell
150-
$ dstack apply -f .dstack.yml
151-
152-
Detected configuration changes that can be updated in-place: ['inactivity_duration']
153-
Update the run? [y/n]:
154-
```
155-
156-
</div>
157-
158-
> `inactivity_duration` is not to be confused with [`idle_duration`](#idle-duration).
159-
> The latter determines how soon the underlying cloud instance will be terminated
160-
> _after_ the dev environment is stopped.
161-
162102
### Resources
163103

164104
When you specify a resource value like `cpu` or `memory`,
@@ -307,19 +247,6 @@ If you don't assign a value to an environment variable (see `HF_TOKEN` above),
307247
| `DSTACK_REPO_ID` | The ID of the repo |
308248
| `DSTACK_GPUS_NUM` | The total number of GPUs in the run |
309249

310-
### Spot policy
311-
312-
By default, `dstack` uses on-demand instances. However, you can change that
313-
via the [`spot_policy`](../reference/dstack.yml/dev-environment.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
314-
315-
!!! info "Reference"
316-
Dev environments support many more configuration options,
317-
incl. [`backends`](../reference/dstack.yml/dev-environment.md#backends),
318-
[`regions`](../reference/dstack.yml/dev-environment.md#regions),
319-
[`max_price`](../reference/dstack.yml/dev-environment.md#max_price), and
320-
[`max_duration`](../reference/dstack.yml/dev-environment.md#max_duration),
321-
among [others](../reference/dstack.yml/dev-environment.md).
322-
323250
### Retry policy
324251

325252
By default, if `dstack` can't find capacity or the instance is interrupted, the run will fail.
@@ -345,8 +272,108 @@ retry:
345272

346273
</div>
347274

275+
### Inactivity duration
276+
277+
Set [`inactivity_duration`](../reference/dstack.yml/dev-environment.md#inactivity_duration)
278+
to automatically stop the dev environment after a configured period of inactivity.
279+
280+
<div editor-title=".dstack.yml">
281+
282+
```yaml
283+
type: dev-environment
284+
name: vscode
285+
ide: vscode
286+
287+
# Stop if inactive for 2 hours
288+
inactivity_duration: 2h
289+
```
290+
291+
</div>
292+
293+
The dev environment becomes inactive when you close the remote VS Code window,
294+
close any `ssh <run name>` shells, and stop the `dstack apply` or `dstack attach` command.
295+
If you go offline without stopping anything manually, the dev environment will also become inactive
296+
within about 3 minutes.
297+
298+
If `inactivity_duration` is configured for your dev environment, you can see how long
299+
it has been inactive in `dstack ps --verbose`.
300+
301+
<div class="termy">
302+
303+
```shell
304+
$ dstack ps --verbose
305+
NAME BACKEND RESOURCES PRICE STATUS SUBMITTED
306+
vscode cudo 2xCPU, 8GB, $0.0286 running 8 mins ago
307+
100.0GB (disk) (inactive for 2m 34s)
308+
```
309+
310+
</div>
311+
312+
If you reattach to the dev environment using [`dstack attach`](../reference/cli/dstack/attach.md),
313+
the inactivity timer will be reset within a few seconds.
314+
315+
??? info "In-place update"
316+
As long as the configuration defines the `name` property, the value of `inactivity_duration`
317+
can be changed for a running dev environment without a restart.
318+
Just change the value in the configuration and run `dstack apply` again.
319+
320+
<div class="termy">
321+
322+
```shell
323+
$ dstack apply -f .dstack.yml
324+
325+
Detected configuration changes that can be updated in-place: ['inactivity_duration']
326+
Update the run? [y/n]:
327+
```
328+
329+
</div>
330+
331+
> `inactivity_duration` is not to be confused with [`idle_duration`](#idle-duration).
332+
> The latter determines how soon the underlying cloud instance will be terminated
333+
> _after_ the dev environment is stopped.
334+
335+
### Utilization policy
336+
337+
Sometimes it’s useful to track whether a dev environment is fully utilizing all GPUs. While you can check this with
338+
[`dstack metrics`](../reference/cli/dstack/metrics.md), `dstack` also lets you set a policy to auto-terminate the run if any GPU is underutilized.
339+
340+
Below is an example of a dev environment that auto-terminate if any GPU stays below 10% utilization for 1 hour.
341+
342+
<div editor-title=".dstack.yml">
343+
344+
```yaml
345+
type: dev-environment
346+
name: my-dev
347+
348+
python: 3.12
349+
ide: cursor
350+
351+
resources:
352+
gpu: H100:8
353+
354+
utilization_policy:
355+
min_gpu_utilization: 10
356+
time_window: 1h
357+
```
358+
359+
</div>
360+
361+
### Spot policy
362+
363+
By default, `dstack` uses on-demand instances. However, you can change that
364+
via the [`spot_policy`](../reference/dstack.yml/dev-environment.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
365+
348366
--8<-- "docs/concepts/snippets/manage-fleets.ext"
349367

368+
!!! info "Reference"
369+
Dev environments support many more configuration options,
370+
incl. [`backends`](../reference/dstack.yml/dev-environment.md#backends),
371+
[`regions`](../reference/dstack.yml/dev-environment.md#regions),
372+
[`max_price`](../reference/dstack.yml/dev-environment.md#max_price), and
373+
[`max_duration`](../reference/dstack.yml/dev-environment.md#max_duration),
374+
among [others](../reference/dstack.yml/dev-environment.md).
375+
376+
350377
--8<-- "docs/concepts/snippets/manage-runs.ext"
351378

352379
!!! info "What's next?"

docs/docs/concepts/services.md

Lines changed: 56 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -14,13 +14,13 @@ type: service
1414
name: llama31
1515

1616
# If `image` is not specified, dstack uses its default image
17-
python: "3.11"
17+
python: 3.12
1818
env:
1919
- HF_TOKEN
2020
- MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
2121
- MAX_MODEL_LEN=4096
2222
commands:
23-
- pip install vllm
23+
- uv pip install vllm
2424
- vllm serve $MODEL_ID
2525
--max-model-len $MAX_MODEL_LEN
2626
--tensor-parallel-size $DSTACK_GPUS_NUM
@@ -128,13 +128,13 @@ type: service
128128
# The name is optional, if not specified, generated randomly
129129
name: llama31-service
130130
131-
python: "3.10"
131+
python: 3.12
132132
133133
# Required environment variables
134134
env:
135135
- HF_TOKEN
136136
commands:
137-
- pip install vllm
137+
- uv pip install vllm
138138
- vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
139139
# Expose the port of the service
140140
port: 8000
@@ -184,7 +184,7 @@ name: http-server-service
184184
# Disable authorization
185185
auth: false
186186
187-
python: "3.10"
187+
python: 3.12
188188
189189
# Commands of the service
190190
commands:
@@ -220,7 +220,7 @@ env:
220220
- DASH_ROUTES_PATHNAME_PREFIX=/proxy/services/main/dash/
221221

222222
commands:
223-
- pip install dash
223+
- uv pip install dash
224224
# Assuming the Dash app is in your repo at app.py
225225
- python app.py
226226

@@ -303,11 +303,11 @@ type: service
303303
# The name is optional, if not specified, generated randomly
304304
name: llama31-service
305305
306-
python: "3.10"
306+
python: 3.12
307307
308308
# Commands of the service
309309
commands:
310-
- pip install vllm
310+
- uv pip install vllm
311311
- python -m vllm.entrypoints.openai.api_server
312312
--model mistralai/Mixtral-8X7B-Instruct-v0.1
313313
--host 0.0.0.0
@@ -384,7 +384,7 @@ type: service
384384
name: http-server-service
385385
386386
# If `image` is not specified, dstack uses its base image
387-
python: "3.10"
387+
python: 3.12
388388

389389
# Commands of the service
390390
commands:
@@ -407,7 +407,7 @@ port: 8000
407407
name: http-server-service
408408
409409
# If `image` is not specified, dstack uses its base image
410-
python: "3.10"
410+
python: 3.12
411411
# Ensure nvcc is installed (req. for Flash Attention)
412412
nvcc: true
413413

@@ -480,15 +480,15 @@ type: service
480480
# The name is optional, if not specified, generated randomly
481481
name: llama-2-7b-service
482482
483-
python: "3.10"
483+
python: 3.12
484484
485485
# Environment variables
486486
env:
487487
- HF_TOKEN
488488
- MODEL=NousResearch/Llama-2-7b-chat-hf
489489
# Commands of the service
490490
commands:
491-
- pip install vllm
491+
- uv pip install vllm
492492
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
493493
# The port of the service
494494
port: 8000
@@ -512,18 +512,6 @@ resources:
512512
| `DSTACK_REPO_ID` | The ID of the repo |
513513
| `DSTACK_GPUS_NUM` | The total number of GPUs in the run |
514514

515-
### Spot policy
516-
517-
By default, `dstack` uses on-demand instances. However, you can change that
518-
via the [`spot_policy`](../reference/dstack.yml/service.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
519-
520-
!!! info "Reference"
521-
Services support many more configuration options,
522-
incl. [`backends`](../reference/dstack.yml/service.md#backends),
523-
[`regions`](../reference/dstack.yml/service.md#regions),
524-
[`max_price`](../reference/dstack.yml/service.md#max_price), and
525-
among [others](../reference/dstack.yml/service.md).
526-
527515
### Retry policy
528516

529517
By default, if `dstack` can't find capacity, or the service exits with an error, or the instance is interrupted, the run will fail.
@@ -550,8 +538,52 @@ retry:
550538
If one replica of a multi-replica service fails with retry enabled,
551539
`dstack` will resubmit only the failed replica while keeping active replicas running.
552540

541+
### Spot policy
542+
543+
By default, `dstack` uses on-demand instances. However, you can change that
544+
via the [`spot_policy`](../reference/dstack.yml/service.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
545+
546+
### Utilization policy
547+
548+
Sometimes it’s useful to track whether a service is fully utilizing all GPUs. While you can check this with
549+
[`dstack metrics`](../reference/cli/dstack/metrics.md), `dstack` also lets you set a policy to auto-terminate the run if any GPU is underutilized.
550+
551+
Below is an example of a service that auto-terminate if any GPU stays below 10% utilization for 1 hour.
552+
553+
<div editor-title=".dstack.yml">
554+
555+
```yaml
556+
type: service
557+
name: llama-2-7b-service
558+
559+
python: 3.12
560+
env:
561+
- HF_TOKEN
562+
- MODEL=NousResearch/Llama-2-7b-chat-hf
563+
commands:
564+
- uv pip install vllm
565+
- python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
566+
port: 8000
567+
568+
resources:
569+
gpu: 24GB
570+
571+
utilization_policy:
572+
min_gpu_utilization: 10
573+
time_window: 1h
574+
```
575+
576+
</div>
577+
553578
--8<-- "docs/concepts/snippets/manage-fleets.ext"
554579

580+
!!! info "Reference"
581+
Services support many more configuration options,
582+
incl. [`backends`](../reference/dstack.yml/service.md#backends),
583+
[`regions`](../reference/dstack.yml/service.md#regions),
584+
[`max_price`](../reference/dstack.yml/service.md#max_price), and
585+
among [others](../reference/dstack.yml/service.md).
586+
555587
--8<-- "docs/concepts/snippets/manage-runs.ext"
556588

557589
!!! info "What's next?"

0 commit comments

Comments
 (0)