Releases · dstackai/dstack

15 May 11:45

jvstme

0.20.20

90c00cf

0.20.20 Latest

Latest

Services

NVIDIA Dynamo

This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.

Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.

type: service
name: dynamo-pd

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    docker: true
    commands:
      - apt-get update
      - apt-get install -y python3-dev python3-venv
      - python3 -m venv ~/dyn-venv
      - source ~/dyn-venv/bin/activate
      - pip install -U pip
      - pip install "ai-dynamo[sglang]==1.1.1"
      - git clone https://github.com/ai-dynamo/dynamo.git
      # Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
      - docker compose -f dynamo/deploy/docker-compose.yml up -d
      - |
        python3 -m dynamo.frontend \
          --http-host 0.0.0.0 --http-port 8000 \
          --discovery-backend etcd --router-mode kv \
          --kv-cache-block-size 64
    resources:
      cpu: 4
    router:
      type: dynamo

  - count: 1..4
    scaling:
      metric: rps
      target: 3
    python: "3.12"
    nvcc: true
    commands:
      # dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
      # is provisioned. Compose the etcd/NATS endpoints from it.
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      # Set to enable /health endpoint required by dstack probes.
      - export DYN_SYSTEM_PORT="8000"
      # Wait until the router's etcd and NATS ports are actually accepting connections.
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode prefill --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    python: "3.12"
    nvcc: true
    commands:
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_PORT="8000"
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode decode --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

# Custom probe is required for PD disaggregation.
probes:
  - type: http
    url: /health
    interval: 15s

dstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.

Refer to the Dynamo example for full deployment instructions.

Replica groups

It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.

Exports

Gateways

Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.

$ dstack export --project main create my-export --gateway shared-gateway --importer team
 NAME       FLEETS  GATEWAYS        IMPORTERS 
 my-export  -       shared-gateway  team

Now, if you list gateways in the team project, you'll see the exported gateway:

$ dstack gateway --project team
 NAME                 BACKEND          HOSTNAME        DOMAIN                 DEFAULT  STATUS  
 main/shared-gateway  aws (eu-west-1)  108.131.126.35  gtw.mycompany.example           running

Additionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.

type: gateway
name: shared-gateway

backend: aws
region: eu-west-1

domain: ${{ run.project_name }}.mycompany.example

Global exports

Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.

$ dstack export create global-export --gateway shared-gateway --global
 NAME           FLEETS  GATEWAYS        IMPORTERS
 global-export  -       shared-gateway  *

AWS

EFA clusters

Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.

Kubernetes

Permissions

dstack now requires the watch permission for pods within the namespace. See Required permissions for up-to-date ClusterRole and Role manifests.

Backend configuration

The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.

Migration guide

If namespace is unset or set to default in both the backend config and the kubeconfig, no action is required — default continues to be used.
If namespace is set to the same value (e.g. ns-a) in both the backend config and the kubeconfig, no action is required.
If namespace is set to ns-a in the backend config but the kubeconfig has a different value (or none), set the namespace to ns-a in your kubeconfig context to prepare for future versions.
It is only safe to remove namespace from the backend config if its value is default.

What's changed

[Services] Allow to specify image, docker, python, nvcc, privileged at replica group level by @Bihan in #3832
[Internal]: Delete some unused classes by @jvstme in #3842
[Internal] Fix pyright failing in CI by @jvstme in #3846
[Internal] Update RunpodApiClient by @un-def in #3847
[Internal] Fix openai SDK failing in tests by @jvstme in #3849
[RunPod] Handle deleting non-existent volume by @r4victor in #3853
[Runpod] Fix broken registry_auth support by @un-def in #3844
[UX] Raise ImportError on Python 3.14 or later by @r4victor in #3855
[Exports] Gateway support by @jvstme in #3845
[Internal] Rename docs/ to mkdocs/, move examples under /docs/, inline source by @peterschmidt85 in #3859
[Kubernetes] Deprecate namespace in backend config by @un-def in #3858
[Gateways] Allow setting imported gateway as project default by @jvstme in #3860
[Internal] Forbid exporting the built-in dstack Sky gateway by @jvstme in #3864
[AWS] Support multi-EFA instances with public IPs by @r4victor in #3865
[Internal] Add server-side validation for fleet configuration subtypes by @un-def in #3848
[Verda] Optimize terminating Verda instances by @jvstme in #3811
[Internal] Introduce GatewayModel.forbid_new_services by @jvstme in #3863
[Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in #3869
[Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in #3866
Fix Pyright errors with requests==2.34.0 by @jvstme in #3873
Add project name interpolation in gateway domains by @jvstme in #3870
[Bugfix] Fix duplicate headers with in-server proxy by @jvstme in #3872
[Docs]: Gateway Exports by @jvstme in #3862
[Kubernetes] Fai...

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

30 Apr 11:01

r4victor

0.20.19

9e23658

0.20.19

Services

RPS window for autoscaling

Services now support a window property in the scaling spec that defines the time window used to calculate RPS. Allowed values are 30s, 1m, and 5m (default is 1m). Previously, the RPS was always calculated using a 1m window.

type: service
image: nginx
port: 80

replicas: 0..1
scaling:
  metric: rps
  # 1 request per second, calculated over a 5-minute window
  target: 1
  window: 5m

Kubernetes

`registry_auth`

The kubernetes backend now supports the registry_auth property for pulling Docker images from private registries:

type: service
image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
registry_auth:
  username: $oauthtoken
  password: ${{ secrets.ngc_api_key }}

dstack automatically creates and sets up imagePullSecrets for the pods. This requires new permissions for the Kubernetes role:

rules:
  resources: ["secrets"]
  verbs: ["create", "delete"]

Read-only volumes

Kubernetes volume configurations now support a new read_only property. When set to true, it enforces readOnly: true in the pod's volumeMounts.

type: volume
backend: kubernetes
name: my-volume
size: 100GB
read_only: true

Server

Faster processing

The server has been optimized to reduce processing latencies. As a result, many operations now take less time: run provisioning is up to 14s faster and run termination is up to 7s faster.

Examples

Documentation and examples have been refreshed, including a new Qwen3.6-27B and DeepSeek V4 examples. A new prefill-decode blog post shows how to run SGLang PD disaggregation via Shepherd Model Gateway.

Breaking changes

Python 3.9 support dropped

Running dstack on Python 3.9 is no longer supported, as Python 3.9 reached end-of-life on 2025-10-31. Please upgrade to Python 3.10 or later.

What's Changed

Refresh quickstart and service docs with Qwen3.6-27B by @peterschmidt85 in #3819
Disallow running dstack on Python 3.9 by @jvstme in #3817
Create placeholder instance models by @r4victor in #3821
Add DeepSeek V4 model docs by @peterschmidt85 in #3823
Reduce pipelines processing latencies by @r4victor in #3828
[Docs]: Update scale_up/down_delay descriptions by @jvstme in #3831
Clean up exports on project and fleet deletion by @jvstme in #3827
[shim,runner] Improve logging options by @un-def in #3822
Allow configuring RPS window for service scaling by @jvstme in #3830
Replace sglang_router with smg in PD examples by @Bihan in #3836
Interpolate JobSpec secrets for Compute.run_job() by @un-def in #3834
Kubernetes: configure imagePullSecrets by @un-def in #3835
Kubernetes: add read_only volume property by @un-def in #3838

Full Changelog: 0.20.18...0.20.19

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

23 Apr 14:54

un-def

0.20.18

ad4b638

0.20.18

CLI

For VM-based backends as well as SSH fleets, the CLI now shows Docker image pull progress in the format <extracted>/<downloaded>/<total>.

Offers

This update reduces the time required to fetch backend offers and initialize backends, making both dstack offer and dstack apply faster:

- runpod — 0.66s => 0.03s (22x)
- amddevcloud — 2.26s => 0.85s (2.7x)
- cudo — 2.48s => 1.02s (2.4x)
- verda — 3.27s => 1.74s (1.9x)
- lambda — 3.24s => 1.89s (1.7x)
- vastai — 3.27s => 1.77s (1.8x)
- gcp — 3.74s => 2.54s (1.5x)
- azure — 5.83s => 3.11s (1.9x)
- aws — 6.58s => 3.56s (1.8x)

Secrets

The Manager project role can now manage secrets if the allow_managers_manage_secrets property is enabled in the server’s default_permissions config:

default_permissions:
  allow_managers_manage_secrets: true

Previously, only the Admin role was allowed to manage secrets.

GPUs

This update adds support for GeForce RTX 2, 3, 4, and 5 series GPUs, which were previously not detected properly across both backend and SSH fleets.

GCP

The gcp backend now requires the compute.projects.get permission. Make sure this permission is granted to any custom IAM roles used by dstack.

What's changed

Optimize GCP offers by @r4victor in #3793
Optimize InstanceOffer construction by @r4victor in #3794
Speed up GCP validate_credentials by @r4victor in #3795
Support secrets management by Manager role by @r4victor in #3801
Fix update_default_project() crash on server without TTY by @un-def in #3797
Kubernetes: fix is_hard_taint check by @un-def in #3803
Fix deleting idle instance from fleet with runs by @jvstme in #3807
[Docs] Update examples by @peterschmidt85 in #3798
Display image pull progress in CLI by @jvstme in #3805
[Docs] Add an inline kubeconfig example to the kubernetes backend documentation by @peterschmidt85 in #3813
Avoid Verda instance termination warnings by @jvstme in #3810
[Internal] Improve warning message in ServerConfigManager.apply_config() by @un-def in #3804
Add missing join to volumes query in JobSubmittedWorker by @un-def in #3816
Add CLI deprecation warnings about gateway routers by @jvstme in #3814
Bump gpuhunt, add support for all GeForce RTX 2..5 series by @un-def in #3818
Add misssing compute.projects.get GCP permission by @un-def in #3820

Full changelog: 0.20.17...0.20.18

Contributors

un-def, r4victor, and 2 other contributors

Assets 2

16 Apr 12:45

peterschmidt85

0.20.17

6216394

0.20.17

PD disaggregation

This update simplifies running SGLang with Prefill-Decode disaggregation.

Previously, PD disaggregation required configuring router on the gateway, which meant
the gateway had to run in the same cluster as the service to communicate with service
replicas.

With this update, router is configured on a service replica group instead. This allows
using a standard gateway outside the service cluster.

Below is an example service configuration for running zai-org/GLM-4.5-Air-FP8 using replica groups:

type: service
name: prefill-decode
image: lmsysorg/sglang:latest

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    commands:
      - pip install sglang_router
      - |
        python -m sglang_router.launch_router \
          --host 0.0.0.0 \
          --port 8000 \
          --pd-disaggregation \
          --prefill-policy cache_aware
    router:
      type: sglang
    resources:
      cpu: 4

  - count: 1..4
    scaling:
      metric: rps
      target: 3
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --disaggregation-mode prefill \
          --disaggregation-transfer-backend nixl \
          --host 0.0.0.0 \
          --port 8000 \
          --disaggregation-bootstrap-port 8998
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    commands:
      - |
        python -m sglang.launch_server \
          --model-path $MODEL_ID \
          --disaggregation-mode decode \
          --disaggregation-transfer-backend nixl \
          --host 0.0.0.0 \
          --port 8000
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

# Custom probe is required for PD disaggregation.
probes:
  - type: http
    url: /health
    interval: 15s

Note: this setup requires the service fleet or cluster to provide a CPU node for the
router replica.

Kubernetes

The kubernetes backend adds support for both network and instance volumes.

Network volumes

You can either create a new network volume or register an existing one. To create a new
network volume, specify size and optionally storage_class_name and/or
access_modes:

type: volume
backend: kubernetes
name: my-volume

size: 100GB

This automatically creates a PersistentVolumeClaim and associates it with the volume.

If you don't specify storage_class_name, the decision is delegated to the
DefaultStorageClass admission controller, if enabled.

If you don't specify access_modes, it defaults to [ReadWriteOnce]. To attach
volumes to multiple runs at the same time, set it to [ReadWriteMany] or
[ReadWriteMany, ReadOnlyMany].

To reuse an existing PersistentVolumeClaim, specify its name in claim_name:

type: volume
backend: kubernetes
name: my-volume

claim_name: existing-pvc

Once a volume configuration is applied, you can attach it to your runs via volumes:

type: dev-environment
name: vscode-vol

ide: vscode

volumes:
  - name: my-volume
    path: /volume_data

Instance volumes

In addition to network volumes, the kubernetes backend now supports instance volumes:

type: dev-environment
name: vscode-vol

ide: vscode

volumes:
  - instance_path: /mnt/volume
    path: /volume_data

Unlike network volumes, which persist across instances, instance volumes persist data
only within a particular instance. They are useful for storing caches or when you
manually mount a shared filesystem into the instance path.

Note: using volumes with the kubernetes backend requires the corresponding
permissions.

Performance

Fetching backend offers for the first time has been optimized and is now much faster. As
a result, dstack apply, dstack offer, and the offers UI are all more responsive.
Here are the improvements for some of the major backends:

- aws — 41.43s => 6.61s (6.3x)
- azure — 12.49s => 5.50s (2.3x)
- gcp — 13.51s => 5.20s (2.6x)
- nebius — 10.74s => 3.80s (2.8x)
- runpod — 9.36s => 0.09s (104x)
- verda — 9.49s => 2.33s (4.1x)

Fleets

In-place update

Backend fleets now support initial in-place updates. You can update nodes,
reservation, tags, resources, backends, regions, availability_zones,
instance_types, spot_policy, and max_price without re-creating the entire fleet.
If existing idle instances do not match the updated configuration, dstack replaces
them.

Default resources

Fleets used to have default resources set to cpu=2.. mem=8GB.. disk=100GB.. when
left unspecified. This meant any offers with fewer resources were excluded from such
fleets. If you wanted to run on a mem=4GB VM, you had to specify resources in both
the run and fleet configurations.

Now fleets have no default resources, so all offers are available by default. If you
need to add extra constraints on which offers can be provisioned in a fleet, specify
resources explicitly.

Run configurations continue to have default minimum resources set to
cpu=2.. mem=8GB.. disk=100GB.. to avoid provisioning instances that are too small.

Offers

The dstack offer CLI command now supports the --fleet argument, which allows you to
see only offers from the specified fleets.

dstack offer --fleet my-fleet --fleet another-project/other-fleet

The same is now supported in the UI on both the Offers and Launch pages.

Exports

Importers can now delete an import via
dstack import delete <export-project>/<export-name>. This is useful when an export
was created by the exporter, but the importer no longer needs it and does not want to
wait until the exporter deletes it.

AWS

RTX Pro 6000

The aws backend adds support for g7e.* instances offering RTXPRO6000 GPUs.

Docker

Default Docker registry

If you'd like to cache Docker images through your own Docker registry, you can now
configure it when starting the dstack server:

export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY=<registry base hostname>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_USERNAME=<registry username>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_PASSWORD=<registry password>

These settings should only be used for registries that act as a pull-through cache for
Docker Hub. This is useful if you would like to avoid rate limits when you have too
many image pulls.

Migration note

Warning

Since v0.20.0, dstack has required fleets before runs can be submitted.

Until now, the deprecated DSTACK_FF_AUTOCREATED_FLEETS_ENABLED feature flag allowed submitting runs without fleets. In 0.20.17, this flag has been removed.

What's changed

Drop deprecated scheduled tasks by @r4victor in #3749
[Docs]: Rename REST API -> HTTP API by @jvstme in #3748
Rework runner job submission flow by @un-def in #3743
Default Docker registry and credentials by @jvstme in #3747
Detect Verda provisioning errors earlier by @jvstme in #3753
Optimize Python DB tests by @r4victor in #3755
Add case study on Graphsignal's use of dstack for inference benchmarking by @peterschmidt85 in #3751
Allow combining on/off idle_duration between runs and fleets by @r4victor in #3756
Fix no offers retry for scheduled runs by @r4victor in #3759
Support dynamic run waiting CLI status with extra renderables by @r4victor in #3760
Kubernetes: add instance volumes support by @un-def in #3758
Init gateways in background by @r4victor in #3762
Store source backend config by @r4victor in #3764
Show offers in dstack apply for elastic container fleets by @peterschmidt85 in #3754
Support cloud fleet in-place update by @r4victor in #3766
Set up HTTP ALB listener for ACM gateway by @r4victor in #3767
Evict jobs if instance is no longer imported by @jvstme in #3772
Implement cloud fleet in-place update for provisioning fields by @r4victor in #3775
Drop fleet default min resources by @r4victor in #3776
Support --fleet in dstack offer by @peterschmidt85 in #3774
Support imported fleets in dstack fleet get by @jvstme in #3773
Limit fleet consolidation attempts by @r4victor in #3777
[Docs]: Examples cleanup and installation updates by @peterschmidt85 in #3765
Support AWS G7e (RTXPRO6000) instances by @jvstme in #3752
Support imported fleets in dstack event by @jvstme in #3779
Drop autocreated fleets by @r4victor in #3782
Support fleet filters in the Offers and Launch UI by @peterschmidt85 in #3780
Support router as replica with pipelines...

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

06 Apr 12:03

peterschmidt85

0.20.16

fc9afa9

0.20.16

Server

Performance

This release introduces a major overhaul of dstack server background processing. A single server
replica can now handle ~10x more resources, supporting at least 1000 active instances and runs. In
benchmarks, we observed 2x-10x faster processing (see #3551).

Provisioning 200 instances: 12 minutes -> 4 minutes.
Running a 200-node task: >25 minutes -> 4 minutes.
Terminating 50 instances: 60 seconds -> 10 seconds.

The performance gains come from a new, more efficient background processing architecture. Server
hardware requirements and memory consumption remain the same.

If you need to temporarily revert this behavior, set
DSTACK_FF_PIPELINE_PROCESSING_DISABLED=1 before starting the server.

Upgrade notes

Warning

This release includes significant internal changes to the dstack server. Test in a staging
environment before upgrading production whenever possible.

Warning

Rolling upgrades from 0.20.13 or older directly to 0.20.16 are not supported. Do not run
replicas on 0.20.13 (or older) and 0.20.16 at the same time. Upgrade to 0.20.15 first, or
scale server replicas down to 1 before upgrading.

SSH proxy

Servers can enforce proxy-only SSH access by combining SSH proxy with the new
DSTACK_SERVER_SSHPROXY_ENFORCED flag. When enabled, runs omit user-provided keys from authorized
lists and expect clients to connect via the proxy endpoint that run details expose. For more details, see the server deployment guide.

Note

SSH proxy is experimental, and behavior may change in future releases.

UI

SSH keys

User settings now include an SSH keys tab where you can upload OpenSSH public keys, see their fingerprints, and remove keys that no longer belong to you. Uploaded keys let you open SSH sessions without relying on the client key that dstack attach manages automatically, and duplicate keys are rejected with a clear error.

CLI

`dstack attach`

When SSH proxy is enabled on the server, dstack attach now routes through the proxy automatically and receives the proxy host, port, and upstream ID from run connection info. Servers can opt into proxy-only access by setting DSTACK_SERVER_SSHPROXY_ENFORCED, which stops embedding direct SSH keys in runs.

export DSTACK_SERVER_SSHPROXY_ENFORCED=1

Backends

RunPod

RunPod backends can now provision on-demand CPU offerings in secure cloud regions, so jobs that request gpu: 0 schedule successfully without tricking the scheduler. Disk size checks respect the per-offer limits RunPod publishes.

resources:
  gpu: 0
  cpu: 8
  memory: 32GB

Verda

Verda startup scripts and SSH keys are now generated per instance and removed reliably on teardown, preventing stale credentials and improving cleanup when a rollout provisions multiple machines.

Major bug-fixes

Improved Git-related CLI repo errors with actionable messages for missing credentials, detached HEAD state, and non-repository directories (#3730).

What's changed

[Internal] Don't reload server on cli package changes by @un-def in #3706
Fix SELinux denials and "Text file busy" on SSH fleet provisioning by @peterschmidt85 in #3712
Add support for user-provided SSH public keys by @un-def in #3688
Move stop_runner() to JobTerminating pipeline by @r4victor in #3714
Add web UI for user public keys by @un-def in #3713
[Landing] Update headings and descriptions for clarity in README, installation, and quickstart guides to amplify agentic orchestration (WIP) by @peterschmidt85 in #3710
Add pipelines optimizations by @r4victor in #3719
Reject user interaction in runner_ssh_tunnel by @un-def in #3716
Use sshproxy for CLI attach if enabled by @un-def in #3711
Enable pipelines by default by @r4victor in #3728
Do not wait in VerdaCompute.create_instance by @jvstme in #3723
Pass delete_permanently when deleting Verda instances by @peterschmidt85 in #3734
Fix pipelines not running on Python <= 3.10 by @r4victor in #3736
Tests: bump pytest-asyncio>=0.25.2 by @un-def in #3733
Fix docs Swagger UI rendering for REST API pages by @peterschmidt85 in #3729
Guard cached get_offers with an execution lock by @r4victor in #3738
Fix JobRunningPipeline not reclaiming stale jobs for terminating runs by @r4victor in #3741
runpod: support on-demand CPU offers and provisioning by @peterschmidt85 in #3726
Add JobMetricsPoint.job_id index by @r4victor in #3742
Fix SENTRY_TRACES_BACKGROUND_SAMPLE_RATE not respected by @r4victor in #3744
Update Server Deployment guide for pipelines by @r4victor in #3745
[Docs] Add dstack-sshproxy deployment guide by @un-def in #3720
Revamp repo errors handling by @un-def in #3730
[chore]: Fix add_row_from_dict() typing issues by @jvstme in #3739
Handle concurrent repo blob/file archive uploads by @un-def in #3737
Verda: make startup script and SSH key lifecycle per-instance with reliable cleanup by @peterschmidt85 in #3718

Full changelog: 0.20.15...0.20.16

Contributors

un-def, r4victor, and 2 other contributors

Assets 2

27 Mar 12:56

peterschmidt85

0.20.15

6f1f1f4

0.20.15

Backends

CloudRift

The cloudrift backend now supports provisioning AMD MI350X GPUs.

Major bug-fixes

[AMD] Handle amd-smi 7.x output format (#3701) — Support both amd-smi output formats: flat array (ROCm 6.x) and wrapped {"gpu_data": [...]} (ROCm 7.x).
[CloudRift] CloudRift VMs boot with incorrect RTC clock (~1h ahead). Added NTP sync wait before launching the shim.
[UI] Support model and some other YAML properties with Launch wizard (e.g. to allow to deploy models via templates).
[SSH fleets] On SELinux-enforcing hosts (RHEL, Rocky, CentOS), the shim service failed to start with "Permission denied" because files moved from /tmp kept their temporary SELinux labels
[UI] Services using HTTPS with an AWS ACM certificate were incorrectly displayed with an http:// URL in UI.
[SSH fleets] Setting blocks at the top level of an SSH fleet config was silently ignored, causing "No matching instance offers available" errors. It only worked when set per-host

What's changed

Fix pipeline fetcher deadlock by @r4victor in #3704
[CloudRift] Fix NTP clock skew breaking Docker; handle amd-smi 7.x output by @peterschmidt85 in #3701
Fix SELinux denials on SSH fleet provisioning by @peterschmidt85 in #3702
[UI] Add missing fields to Launch UI supported fields list by @peterschmidt85 in #3671
Fix http protocol shown for https services with acm cert by @r4victor in #3709
Respect top-level blocks in SSH fleet configuration by @un-def in #3700

Full changelog: 0.20.14...0.20.15

Contributors

un-def, r4victor, and peterschmidt85

Assets 2

26 Mar 17:38

peterschmidt85

0.20.14

a61cc26

0.20.14

Dev environments

You can create dev environments without specifying ide to connect via SSH only:

type: dev-environment
name: dev

python: "3.12"

resources:
  gpu: H100NVL:1

Exports

The following improvements are part of the exports feature, which enables sharing fleets across projects.

You can now list imported resources using:

$ dstack import list

 NAME                 FLEETS
 project-a/my-export  my-fleet, another-fleet

Imported fleets can be used directly in your workflows via cross-project references:

type: dev-environment
name: dev

python: "3.12"

resources:
  gpu: H100NVL:1

fleets: [project-a/my-fleet]

Or via CLI:

dstack apply -f .dstack.yml --fleet project-a/my-fleet

Imported fleets are also included in dstack offer. Plus, dstack event shows imported fleets with a project prefix.

Backends

Azure

Added support for Azure GPU VM series based on the latest NVIDIA hardware:

NCads H100 v5 (H100 NVL GPUs, up to 2 GPUs per VM)
ND H100 v5 (8× H100 GPUs for large-scale training workloads)
ND H200 v5 (next-gen H200 GPUs with increased memory and bandwidth for large models)

type: fleet
name: my-fleet

nodes: 1

resources:
  gpu: H100NVL:1

Major bug-fixes

Fixed AWS private gateway provisioning failures caused by the AWS 32-character load balancer name limit and by setups with multiple private subnets per availability zone
Restored proper instance startup handling on AWS, resolving issues such as volumes not attaching reliably
Removed unnecessary Docker metadata requests for default images, reducing the risk of registry rate limits

What's changed

Add Exports concept page and CLI reference by @peterschmidt85 in #3659
Refactor process_submitted_jobs by @r4victor in #3666
Add API for SSH proxy by @un-def in #3646
Add per-job hourly log quota enforced on runner by @peterschmidt85 in #3668
Implement submitted jobs pipeline by @r4victor in #3670
Do not filter by fleet in filter_instances() by @jvstme in #3674
Show imported instances in dstack offer by @jvstme in #3673
Prepare process_runs for pipelines migration by @r4victor in #3675
Fix the "No fleets" warning in UI by @jvstme in #3669
Fix the "No Fleets" warning in CLI by @jvstme in #3667
Show targets with project prefix in dstack event by @jvstme in #3680
Make ide optional for dev-environment by @peterschmidt85 in #3656
Fix 500 server error when re-applying replica groups service by @Bihan in #3678
Cross-project fleet references in CLI and YAML by @jvstme in #3677
Add SSH and IDE connection info to runs API by @un-def in #3681
Implement run pipeline by @r4victor in #3686
Do not request image config for default images by @jvstme in #3684
/imports/list API and dstack import list CLI by @jvstme in #3682
Check lock_expires_at in deprecated background tasks by @r4victor in #3689
Support external pipelines registration by @r4victor in #3691
Handle TimeoutError on pipeline draining by @r4victor in #3692
Drop redundant instance select with lock by @r4victor in #3693
Restore instance.wait_until_running() in AWS create_instance by @r4victor in #3694
Complete fix replica groups issue 3676 by @Bihan in #3687
Fix AWS private gateway provisioning by @r4victor in #3698
[Azure] Add support for H100 NVL and H200 VM series; refactor instance creation methods to cleanup failed instances by @peterschmidt85 in #3699

Full changelog: 0.20.13...0.20.14

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

13 Mar 13:42

jvstme

0.20.13

f077822

0.20.13

Exports

SSH fleet sharing

You can now share SSH fleets across projects using the new exports system:

$ dstack export \
        --project team-a \
        create shared-gpus \
        --fleet gpu-fleet-1 \
        --fleet gpu-fleet-2 \
        --importer team-b \
        --importer team-c

 NAME         FLEETS                    IMPORTERS
 shared-gpus  gpu-fleet-1, gpu-fleet-2  team-b, team-c

From the importer project’s perspective, exported fleets appear in dstack fleet list and can be used for runs just like the project’s own fleets:

$ dstack fleet --project team-b list

 NAME                NODES  GPU          SPOT  BACKEND  PRICE  STATUS  CREATED
 my-local-fleet      1      -            -     ssh      -      active  3 days ago
 team-a/gpu-fleet-1  2      A100:80GB:8  -     ssh      -      active  1 week ago
 team-a/gpu-fleet-2  1      H100:80GB:4  -     ssh      -      active  2 days ago

Learn more about exports in the docs.

UI

Project templates

Project settings now allow you to configure a custom templates repository.

Task connect

Task pages now include a Connect section that guides you through accessing the ports exposed by the task.

Backends

Crusoe

The Crusoe backend now supports H200 and B200 GPUs. The cluster docs have been updated to demonstrate native Crusoe backend usage for configuring a high-performance InfiniBand cluster.

Vast.ai

The new community_cloud backend setting allows you to restrict usage to secure cloud offers only:

backends:
- type: vastai
  community_cloud: false  # Use only secure cloud offers

The default remains community_cloud: true.

What's changed

[UI]: Refresh button does not refresh run logs, metrics, or events by @olgenn in #3618
Update Crusoe cluster docs by @peterschmidt85 in #3620
Report runtime working_dir and username from runner via JobRuntimeData by @peterschmidt85 in #3617
Implement fleet pipeline by @r4victor in #3623
Replace RunPod -> Runpod in docs, blog, comments by @jvstme in #3625
Extend fleet and instance permission tests by @jvstme in #3627
Reorganize Go codebase by @un-def in #3628
Display imported fleets with project prefix in CLI by @jvstme in #3630
Fix Crusoe CPU instances and add H200/B200 support by @peterschmidt85 in #3619
Do not show SSH fleet resources in dstack fleet by @jvstme in #3632
[UI] Unify Connect UX across run configuration types (plus fix) by @peterschmidt85 in #3622
[UI]: Implement dynamic property filter options by @olgenn in #3621
Fleet sharing main mechanisms by @jvstme in #3629
Prevent Hot Aisle min reservation period error by @jvstme in #3633
Implement instance pipeline by @r4victor in #3636
Migrate attribute comments to docstrings by @r4victor in #3639
Allow concurrent run and TERMINATING jobs processing by @r4victor in #3641
[UI] Add per-project templates repo support by @peterschmidt85 in #3640
Implement terminating jobs pipeline by @r4victor in #3643
[UI] Make Launch to respect resources in templates by @peterschmidt85 in #3642
[Skills] AMD image selection, files, repos, image guidance by @peterschmidt85 in #3634
Refactor process_running_jobs background task by @r4victor in #3648
Exports API and CLI by @jvstme in #3647
Add community_cloud to VastAI backend (default true) by @peterschmidt85 in #3635
Do not show resources for SSH fleets in UI by @peterschmidt85 in #3649
[Blog] Infrastructure orchestration is an agent skill by @peterschmidt85 in #3645
Fix error when imported fleet has no capacity by @jvstme in #3652
Improve CLI settings section and fix status indicator colors by @peterschmidt85 in #3650
Upgrade litestream 0.5.0 → 0.5.9 and simplify entrypoint restore by @peterschmidt85 in #3653
Fix services on imported fleets by @jvstme in #3654
Implement running jobs pipeline by @r4victor in #3657
Fix error submitting run to empty imported fleet by @jvstme in #3661
Fix updating pre-0.18.2 gateways (including Sky) by @jvstme in #3658
Fix CLI compatibility with older servers by @jvstme in #3664

Full Changelog: 0.20.12...0.20.13

Contributors

un-def, olgenn, and 3 other contributors

Assets 2

26 Feb 15:52

peterschmidt85

0.20.12

733a17c

0.20.12

Backends

Crusoe

dstack now supports Crusoe as a backend, enabling VM-based provisioning with GPU instances. The backend supports both single-node and multi-node cluster provisioning with InfiniBand.

type: fleet
name: my-crusoe-fleet

backends: [crusoe]
resources:
  gpu: A100:8
nodes: 2
placement: cluster

Note

CPU instances, H200, B200, GB200, MI300X, MI355X and volumes support is coming soon.

UI

Launch wizard

The UI now includes a launch wizard that lets users create runs from pre-defined templates. Instead of writing YAML from
scratch, users can select a template, pick GPU resources, adjust settings, and review the final
configuration—all through a guided flow.

To enable the launch wizard, point the server to a templates repository:

$ DSTACK_SERVER_TEMPLATES_REPO=https://github.com/dstackai/dstack-templates dstack server

Templates are YAML files under .dstack/templates in the repo. Each template has type set to template, a unique name, a title, configurable parameters, and a configuration that defines the dstack run:

type: template
name: in-browser-ide

title: In-browser IDE
description: Access the instance using VS Code in the browser.

parameters:
  - type: name
  - type: resources
  - type: python_or_docker
  - type: repo
  - type: working_dir
    
  - type: env
    title: Password
    name: PASSWORD
    value: $random-password

configuration:
  type: service
  
  auth: false
  gateway: true
  https: auto

  env:
    - BIND_ADDR=0.0.0.0:8080
  commands:
    - |
      echo "Your password is $PASSWORD. Share it carefully as it grants full access to the IDE."
    - |
      curl -fsSL https://code-server.dev/install.sh | sh -s -- --method standalone --prefix /tmp/code-server
    - |
      /tmp/code-server/bin/code-server --bind-addr $BIND_ADDR --auth password --disable-telemetry --disable-update-check .
  port: 8080

  probes:
    - type: http
      url: /healthz

See dstack-templates for an example repository.

Note

The launch wizard is an experimental feature. Currently, templates are configured per server. Per-project templates configuration is coming soon.

Instances

The UI now has an Instance details page where you can view detailed information about an instance, including its events and inspect data. Instance names across the UI—including on Events pages—now link directly to this page.

What's changed

Document Adding indexes by @r4victor in #3594
[Blog] Model inference with Prefill-Decode disaggregation by @peterschmidt85 in #3595
[runner] Drop buildLDLibraryPathEnv() by @un-def in #3593
Fix mutually exclusive fields validation by @jvstme in #3598
[Docs] PD disaggregation by @Bihan in #3592
[Docs] Clarify how K8s resources and offers work by @un-def in #3565
Allow https: auto for services by @peterschmidt85 in #3600
Implement gateway pipeline by @r4victor in #3599
Allow detecting whether service https is unset by @jvstme in #3601
Implement volume pipeline by @r4victor in #3604
[Website] Minor edits by @peterschmidt85 in #3609
[Website] Add robots.txt and structured data by @peterschmidt85 in #3610
Add templates API and launch wizard UI by @peterschmidt85 in #3605
[UI] Include availability issues information to offer cards by @peterschmidt85 in #3607
Fix SSH fleet with proxy_jump in-place update check by @un-def in #3612
Bump gpuhunt==0.1.17 by @r4victor in #3615
[UI] Add Instance details page by @peterschmidt85 in #3614
Add Crusoe Cloud backend by @peterschmidt85 in #3602

Full changelog: 0.20.11...0.20.12

Contributors

un-def, Bihan, and 3 other contributors

Assets 2

26 Feb 09:17

r4victor

0.20.11

e88e25f

0.20.11

This release fixes a potential issue with the server replica failing to start due to a migration trying to create existing index on Postgres.

What's Changed

Fix concurrent indexes migration by @r4victor in #3591

Full Changelog: 0.20.10...0.20.11

Contributors

r4victor

Assets 2

Uh oh!

Releases: dstackai/dstack

0.20.20

Services

NVIDIA Dynamo

Replica groups

Exports

Gateways

Global exports

AWS

EFA clusters

Kubernetes

Permissions

Backend configuration

Migration guide

What's changed

Contributors

Uh oh!

0.20.19

Services

RPS window for autoscaling

Kubernetes

registry_auth

Read-only volumes

Server

Faster processing

Examples

Breaking changes

Python 3.9 support dropped

What's Changed

Contributors

Uh oh!

0.20.18

CLI

Offers

Secrets

GPUs

GCP

What's changed

Contributors

Uh oh!

0.20.17

PD disaggregation

Kubernetes

Network volumes

Instance volumes

Performance

Fleets

In-place update

Default resources

Offers

Exports

AWS

RTX Pro 6000

Docker

Default Docker registry

Migration note

What's changed

Contributors

Uh oh!

0.20.16

Server

Performance

Upgrade notes

SSH proxy

UI

SSH keys

CLI

dstack attach

Backends

RunPod

Verda

Major bug-fixes

What's changed

Contributors

Uh oh!

0.20.15

Backends

CloudRift

Major bug-fixes

`registry_auth`

`dstack attach`