dstackai
diff --git a/‎docs/blog/posts/efa.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/blog/posts/efa.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/docs/concepts/dev-environments.md‎
Lines changed: 100 additions & 73 deletions b/‎docs/docs/concepts/dev-environments.md‎
Lines changed: 100 additions & 73 deletions
diff --git a/‎docs/docs/concepts/services.md‎
Lines changed: 56 additions & 24 deletions b/‎docs/docs/concepts/services.md‎
Lines changed: 56 additions & 24 deletions
@@ -117,7 +117,7 @@ name: efa-task
 # The size of the cluster
 nodes: 2
 
-python: "3.12"
+python: 3.12
 
 # Commands to run on each node
 commands:
 
@@ -99,66 +99,6 @@ init:
 
 </div>
 
-### Inactivity duration
-
-Set [`inactivity_duration`](../reference/dstack.yml/dev-environment.md#inactivity_duration)
-to automatically stop the dev environment after a configured period of inactivity.
-
-<div editor-title=".dstack.yml">
-
-```yaml
-type: dev-environment
-name: vscode
-ide: vscode
-
-# Stop if inactive for 2 hours
-inactivity_duration: 2h
-```
-
-</div>
-
-The dev environment becomes inactive when you close the remote VS Code window,
-close any `ssh <run name>` shells, and stop the `dstack apply` or `dstack attach` command.
-If you go offline without stopping anything manually, the dev environment will also become inactive
-within about 3 minutes.
-
-If `inactivity_duration` is configured for your dev environment, you can see how long
-it has been inactive in `dstack ps --verbose`.
-
-<div class="termy">
-
-```shell
-$ dstack ps --verbose
- NAME    BACKEND  RESOURCES       PRICE    STATUS                 SUBMITTED
- vscode  cudo     2xCPU, 8GB,     $0.0286  running                8 mins ago
-                  100.0GB (disk)           (inactive for 2m 34s)
-```
-
-</div>
-
-If you reattach to the dev environment using [`dstack attach`](../reference/cli/dstack/attach.md),
-the inactivity timer will be reset within a few seconds.
-
-??? info "In-place update"
-    As long as the configuration defines the `name` property, the value of `inactivity_duration`
-    can be changed for a running dev environment without a restart.
-    Just change the value in the configuration and run `dstack apply` again.
-
-    <div class="termy">
-
-    ```shell
-    $ dstack apply -f .dstack.yml
-
-    Detected configuration changes that can be updated in-place: ['inactivity_duration']
-    Update the run? [y/n]:
-    ```
-
-    </div>
-
-> `inactivity_duration` is not to be confused with [`idle_duration`](#idle-duration).
-> The latter determines how soon the underlying cloud instance will be terminated
-> _after_ the dev environment is stopped.
-
 ### Resources
 
 When you specify a resource value like `cpu` or `memory`,
@@ -307,19 +247,6 @@ If you don't assign a value to an environment variable (see `HF_TOKEN` above),
     | `DSTACK_REPO_ID`        | The ID of the repo                      |
     | `DSTACK_GPUS_NUM`       | The total number of GPUs in the run     |
 
-### Spot policy
-
-By default, `dstack` uses on-demand instances. However, you can change that
-via the [`spot_policy`](../reference/dstack.yml/dev-environment.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
-
-!!! info "Reference"
-    Dev environments support many more configuration options,
-    incl. [`backends`](../reference/dstack.yml/dev-environment.md#backends), 
-    [`regions`](../reference/dstack.yml/dev-environment.md#regions), 
-    [`max_price`](../reference/dstack.yml/dev-environment.md#max_price), and
-    [`max_duration`](../reference/dstack.yml/dev-environment.md#max_duration), 
-    among [others](../reference/dstack.yml/dev-environment.md).
-
 ### Retry policy
 
 By default, if `dstack` can't find capacity or the instance is interrupted, the run will fail.
@@ -345,8 +272,108 @@ retry:
 
 </div>
 
+### Inactivity duration
+
+Set [`inactivity_duration`](../reference/dstack.yml/dev-environment.md#inactivity_duration)
+to automatically stop the dev environment after a configured period of inactivity.
+
+<div editor-title=".dstack.yml">
+
+```yaml
+type: dev-environment
+name: vscode
+ide: vscode
+
+# Stop if inactive for 2 hours
+inactivity_duration: 2h
+```
+
+</div>
+
+The dev environment becomes inactive when you close the remote VS Code window,
+close any `ssh <run name>` shells, and stop the `dstack apply` or `dstack attach` command.
+If you go offline without stopping anything manually, the dev environment will also become inactive
+within about 3 minutes.
+
+If `inactivity_duration` is configured for your dev environment, you can see how long
+it has been inactive in `dstack ps --verbose`.
+
+<div class="termy">
+
+```shell
+$ dstack ps --verbose
+ NAME    BACKEND  RESOURCES       PRICE    STATUS                 SUBMITTED
+ vscode  cudo     2xCPU, 8GB,     $0.0286  running                8 mins ago
+                  100.0GB (disk)           (inactive for 2m 34s)
+```
+
+</div>
+
+If you reattach to the dev environment using [`dstack attach`](../reference/cli/dstack/attach.md),
+the inactivity timer will be reset within a few seconds.
+
+??? info "In-place update"
+    As long as the configuration defines the `name` property, the value of `inactivity_duration`
+    can be changed for a running dev environment without a restart.
+    Just change the value in the configuration and run `dstack apply` again.
+
+    <div class="termy">
+
+    ```shell
+    $ dstack apply -f .dstack.yml
+
+    Detected configuration changes that can be updated in-place: ['inactivity_duration']
+    Update the run? [y/n]:
+    ```
+
+    </div>
+
+> `inactivity_duration` is not to be confused with [`idle_duration`](#idle-duration).
+> The latter determines how soon the underlying cloud instance will be terminated
+> _after_ the dev environment is stopped.
+
+### Utilization policy
+
+Sometimes it’s useful to track whether a dev environment is fully utilizing all GPUs. While you can check this with
+[`dstack metrics`](../reference/cli/dstack/metrics.md), `dstack` also lets you set a policy to auto-terminate the run if any GPU is underutilized.
+
+Below is an example of a dev environment that auto-terminate if any GPU stays below 10% utilization for 1 hour.
+
+<div editor-title=".dstack.yml">
+
+```yaml
+type: dev-environment
+name: my-dev
+
+python: 3.12
+ide: cursor
+
+resources:
+  gpu: H100:8
+
+utilization_policy:
+  min_gpu_utilization: 10
+  time_window: 1h
+```
+
+</div>
+
+### Spot policy
+
+By default, `dstack` uses on-demand instances. However, you can change that
+via the [`spot_policy`](../reference/dstack.yml/dev-environment.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
+
 --8<-- "docs/concepts/snippets/manage-fleets.ext"
 
+!!! info "Reference"
+    Dev environments support many more configuration options,
+    incl. [`backends`](../reference/dstack.yml/dev-environment.md#backends), 
+    [`regions`](../reference/dstack.yml/dev-environment.md#regions), 
+    [`max_price`](../reference/dstack.yml/dev-environment.md#max_price), and
+    [`max_duration`](../reference/dstack.yml/dev-environment.md#max_duration), 
+    among [others](../reference/dstack.yml/dev-environment.md).
+
+
 --8<-- "docs/concepts/snippets/manage-runs.ext"
 
 !!! info "What's next?"
 
@@ -14,13 +14,13 @@ type: service
 name: llama31
 
 # If `image` is not specified, dstack uses its default image
-python: "3.11"
+python: 3.12
 env:
   - HF_TOKEN
   - MODEL_ID=meta-llama/Meta-Llama-3.1-8B-Instruct
   - MAX_MODEL_LEN=4096
 commands:
-  - pip install vllm
+  - uv pip install vllm
   - vllm serve $MODEL_ID
     --max-model-len $MAX_MODEL_LEN
     --tensor-parallel-size $DSTACK_GPUS_NUM
@@ -128,13 +128,13 @@ type: service
 # The name is optional, if not specified, generated randomly
 name: llama31-service
 
-python: "3.10"
+python: 3.12
 
 # Required environment variables
 env:
   - HF_TOKEN
 commands:
-  - pip install vllm
+  - uv pip install vllm
   - vllm serve meta-llama/Meta-Llama-3.1-8B-Instruct --max-model-len 4096
 # Expose the port of the service
 port: 8000
@@ -184,7 +184,7 @@ name: http-server-service
 # Disable authorization
 auth: false
 
-python: "3.10"
+python: 3.12
 
 # Commands of the service
 commands:
@@ -220,7 +220,7 @@ env:
   - DASH_ROUTES_PATHNAME_PREFIX=/proxy/services/main/dash/
 
 commands:
-  - pip install dash
+  - uv pip install dash
   # Assuming the Dash app is in your repo at app.py
   - python app.py
 
@@ -303,11 +303,11 @@ type: service
 # The name is optional, if not specified, generated randomly
 name: llama31-service
 
-python: "3.10"
+python: 3.12
 
 # Commands of the service
 commands:
-  - pip install vllm
+  - uv pip install vllm
   - python -m vllm.entrypoints.openai.api_server
     --model mistralai/Mixtral-8X7B-Instruct-v0.1
     --host 0.0.0.0
@@ -384,7 +384,7 @@ type: service
 name: http-server-service    
 
 # If `image` is not specified, dstack uses its base image
-python: "3.10"
+python: 3.12
 
 # Commands of the service
 commands:
@@ -407,7 +407,7 @@ port: 8000
     name: http-server-service    
     
     # If `image` is not specified, dstack uses its base image
-    python: "3.10"
+    python: 3.12
     # Ensure nvcc is installed (req. for Flash Attention) 
     nvcc: true
 
@@ -480,15 +480,15 @@ type: service
 # The name is optional, if not specified, generated randomly
 name: llama-2-7b-service
 
-python: "3.10"
+python: 3.12
 
 # Environment variables
 env:
   - HF_TOKEN
   - MODEL=NousResearch/Llama-2-7b-chat-hf
 # Commands of the service
 commands:
-  - pip install vllm
+  - uv pip install vllm
   - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
 # The port of the service
 port: 8000
@@ -512,18 +512,6 @@ resources:
     | `DSTACK_REPO_ID`        | The ID of the repo                      |
     | `DSTACK_GPUS_NUM`       | The total number of GPUs in the run     |
 
-### Spot policy
-
-By default, `dstack` uses on-demand instances. However, you can change that
-via the [`spot_policy`](../reference/dstack.yml/service.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
-
-!!! info "Reference"
-    Services support many more configuration options,
-    incl. [`backends`](../reference/dstack.yml/service.md#backends), 
-    [`regions`](../reference/dstack.yml/service.md#regions), 
-    [`max_price`](../reference/dstack.yml/service.md#max_price), and
-    among [others](../reference/dstack.yml/service.md).
-
 ### Retry policy
 
 By default, if `dstack` can't find capacity, or the service exits with an error, or the instance is interrupted, the run will fail.
@@ -550,8 +538,52 @@ retry:
 If one replica of a multi-replica service fails with retry enabled,
 `dstack` will resubmit only the failed replica while keeping active replicas running.
 
+### Spot policy
+
+By default, `dstack` uses on-demand instances. However, you can change that
+via the [`spot_policy`](../reference/dstack.yml/service.md#spot_policy) property. It accepts `spot`, `on-demand`, and `auto`.
+
+### Utilization policy
+
+Sometimes it’s useful to track whether a service is fully utilizing all GPUs. While you can check this with
+[`dstack metrics`](../reference/cli/dstack/metrics.md), `dstack` also lets you set a policy to auto-terminate the run if any GPU is underutilized.
+
+Below is an example of a service that auto-terminate if any GPU stays below 10% utilization for 1 hour.
+
+<div editor-title=".dstack.yml">
+
+```yaml
+type: service
+name: llama-2-7b-service
+
+python: 3.12
+env:
+  - HF_TOKEN
+  - MODEL=NousResearch/Llama-2-7b-chat-hf
+commands:
+  - uv pip install vllm
+  - python -m vllm.entrypoints.openai.api_server --model $MODEL --port 8000
+port: 8000
+
+resources:
+  gpu: 24GB
+
+utilization_policy:
+  min_gpu_utilization: 10
+  time_window: 1h
+```
+
+</div>
+
 --8<-- "docs/concepts/snippets/manage-fleets.ext"
 
+!!! info "Reference"
+    Services support many more configuration options,
+    incl. [`backends`](../reference/dstack.yml/service.md#backends), 
+    [`regions`](../reference/dstack.yml/service.md#regions), 
+    [`max_price`](../reference/dstack.yml/service.md#max_price), and
+    among [others](../reference/dstack.yml/service.md).
+
 --8<-- "docs/concepts/snippets/manage-runs.ext"
 
 !!! info "What's next?"