Releases: dstackai/dstack
0.20.20
Services
NVIDIA Dynamo
This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.
Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.
type: service
name: dynamo-pd
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
docker: true
commands:
- apt-get update
- apt-get install -y python3-dev python3-venv
- python3 -m venv ~/dyn-venv
- source ~/dyn-venv/bin/activate
- pip install -U pip
- pip install "ai-dynamo[sglang]==1.1.1"
- git clone https://github.com/ai-dynamo/dynamo.git
# Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
- docker compose -f dynamo/deploy/docker-compose.yml up -d
- |
python3 -m dynamo.frontend \
--http-host 0.0.0.0 --http-port 8000 \
--discovery-backend etcd --router-mode kv \
--kv-cache-block-size 64
resources:
cpu: 4
router:
type: dynamo
- count: 1..4
scaling:
metric: rps
target: 3
python: "3.12"
nvcc: true
commands:
# dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
# is provisioned. Compose the etcd/NATS endpoints from it.
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
# Set to enable /health endpoint required by dstack probes.
- export DYN_SYSTEM_PORT="8000"
# Wait until the router's etcd and NATS ports are actually accepting connections.
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode prefill --disaggregation-transfer-backend nixl
resources:
gpu: H200
- count: 1..8
scaling:
metric: rps
target: 2
python: "3.12"
nvcc: true
commands:
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
- export DYN_SYSTEM_PORT="8000"
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode decode --disaggregation-transfer-backend nixl
resources:
gpu: H200
port: 8000
model: zai-org/GLM-4.5-Air-FP8
# Custom probe is required for PD disaggregation.
probes:
- type: http
url: /health
interval: 15sdstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.
Refer to the Dynamo example for full deployment instructions.
Replica groups
It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.
Exports
Gateways
Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.
$ dstack export --project main create my-export --gateway shared-gateway --importer team
NAME FLEETS GATEWAYS IMPORTERS
my-export - shared-gateway team Now, if you list gateways in the team project, you'll see the exported gateway:
$ dstack gateway --project team
NAME BACKEND HOSTNAME DOMAIN DEFAULT STATUS
main/shared-gateway aws (eu-west-1) 108.131.126.35 gtw.mycompany.example runningAdditionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.
type: gateway
name: shared-gateway
backend: aws
region: eu-west-1
domain: ${{ run.project_name }}.mycompany.exampleGlobal exports
Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.
$ dstack export create global-export --gateway shared-gateway --global
NAME FLEETS GATEWAYS IMPORTERS
global-export - shared-gateway *AWS
EFA clusters
Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.
Kubernetes
Permissions
dstack now requires the watch permission for pods within the namespace. See Required permissions for up-to-date ClusterRole and Role manifests.
Backend configuration
The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.
Migration guide
Migration guide
- If
namespaceis unset or set todefaultin both the backend config and the kubeconfig, no action is required —defaultcontinues to be used. - If
namespaceis set to the same value (e.g.ns-a) in both the backend config and the kubeconfig, no action is required. - If
namespaceis set tons-ain the backend config but the kubeconfig has a different value (or none), set the namespace tons-ain your kubeconfig context to prepare for future versions. - It is only safe to remove
namespacefrom the backend config if its value isdefault.
What's changed
- [Services] Allow to specify
image,docker,python,nvcc,privilegedat replica group level by @Bihan in #3832 - [Internal]: Delete some unused classes by @jvstme in #3842
- [Internal] Fix
pyrightfailing in CI by @jvstme in #3846 - [Internal] Update
RunpodApiClientby @un-def in #3847 - [Internal] Fix
openaiSDK failing in tests by @jvstme in #3849 - [RunPod] Handle deleting non-existent volume by @r4victor in #3853
- [Runpod] Fix broken
registry_authsupport by @un-def in #3844 - [UX] Raise
ImportErroron Python 3.14 or later by @r4victor in #3855 - [Exports] Gateway support by @jvstme in #3845
- [Internal] Rename
docs/tomkdocs/, move examples under/docs/, inline source by @peterschmidt85 in #3859 - [Kubernetes] Deprecate
namespacein backend config by @un-def in #3858 - [Gateways] Allow setting imported gateway as project default by @jvstme in #3860
- [Internal] Forbid exporting the built-in
dstackSky gateway by @jvstme in #3864 - [AWS] Support multi-EFA instances with public IPs by @r4victor in #3865
- [Internal] Add server-side validation for fleet configuration subtypes by @un-def in #3848
- [Verda] Optimize terminating Verda instances by @jvstme in #3811
- [Internal] Introduce
GatewayModel.forbid_new_servicesby @jvstme in #3863 - [Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in #3869
- [Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in #3866
- Fix Pyright errors with
requests==2.34.0by @jvstme in #3873 - Add project name interpolation in gateway domains by @jvstme in #3870
- [Bugfix] Fix duplicate headers with in-server proxy by @jvstme in #3872
- [Docs]: Gateway Exports by @jvstme in #3862
- [Kubernetes] Fai...
0.20.19
Services
RPS window for autoscaling
Services now support a window property in the scaling spec that defines the time window used to calculate RPS. Allowed values are 30s, 1m, and 5m (default is 1m). Previously, the RPS was always calculated using a 1m window.
type: service
image: nginx
port: 80
replicas: 0..1
scaling:
metric: rps
# 1 request per second, calculated over a 5-minute window
target: 1
window: 5mKubernetes
registry_auth
The kubernetes backend now supports the registry_auth property for pulling Docker images from private registries:
type: service
image: nvcr.io/nim/deepseek-ai/deepseek-r1-distill-llama-8b
registry_auth:
username: $oauthtoken
password: ${{ secrets.ngc_api_key }}dstack automatically creates and sets up imagePullSecrets for the pods. This requires new permissions for the Kubernetes role:
rules:
resources: ["secrets"]
verbs: ["create", "delete"]Read-only volumes
Kubernetes volume configurations now support a new read_only property. When set to true, it enforces readOnly: true in the pod's volumeMounts.
type: volume
backend: kubernetes
name: my-volume
size: 100GB
read_only: trueServer
Faster processing
The server has been optimized to reduce processing latencies. As a result, many operations now take less time: run provisioning is up to 14s faster and run termination is up to 7s faster.
Examples
Documentation and examples have been refreshed, including a new Qwen3.6-27B and DeepSeek V4 examples. A new prefill-decode blog post shows how to run SGLang PD disaggregation via Shepherd Model Gateway.
Breaking changes
Python 3.9 support dropped
Running dstack on Python 3.9 is no longer supported, as Python 3.9 reached end-of-life on 2025-10-31. Please upgrade to Python 3.10 or later.
What's Changed
- Refresh quickstart and service docs with Qwen3.6-27B by @peterschmidt85 in #3819
- Disallow running
dstackon Python 3.9 by @jvstme in #3817 - Create placeholder instance models by @r4victor in #3821
- Add DeepSeek V4 model docs by @peterschmidt85 in #3823
- Reduce pipelines processing latencies by @r4victor in #3828
- [Docs]: Update
scale_up/down_delaydescriptions by @jvstme in #3831 - Clean up exports on project and fleet deletion by @jvstme in #3827
- [shim,runner] Improve logging options by @un-def in #3822
- Allow configuring RPS window for service scaling by @jvstme in #3830
- Replace sglang_router with smg in PD examples by @Bihan in #3836
- Interpolate JobSpec secrets for Compute.run_job() by @un-def in #3834
- Kubernetes: configure
imagePullSecretsby @un-def in #3835 - Kubernetes: add
read_onlyvolume property by @un-def in #3838
Full Changelog: 0.20.18...0.20.19
0.20.18
CLI
For VM-based backends as well as SSH fleets, the CLI now shows Docker image pull progress in the format <extracted>/<downloaded>/<total>.
Offers
This update reduces the time required to fetch backend offers and initialize backends, making both dstack offer and dstack apply faster:
- runpod — 0.66s => 0.03s (22x)
- amddevcloud — 2.26s => 0.85s (2.7x)
- cudo — 2.48s => 1.02s (2.4x)
- verda — 3.27s => 1.74s (1.9x)
- lambda — 3.24s => 1.89s (1.7x)
- vastai — 3.27s => 1.77s (1.8x)
- gcp — 3.74s => 2.54s (1.5x)
- azure — 5.83s => 3.11s (1.9x)
- aws — 6.58s => 3.56s (1.8x)
Secrets
The Manager project role can now manage secrets if the allow_managers_manage_secrets property is enabled in the server’s default_permissions config:
default_permissions:
allow_managers_manage_secrets: truePreviously, only the Admin role was allowed to manage secrets.
GPUs
This update adds support for GeForce RTX 2, 3, 4, and 5 series GPUs, which were previously not detected properly across both backend and SSH fleets.
GCP
The gcp backend now requires the compute.projects.get permission. Make sure this permission is granted to any custom IAM roles used by dstack.
What's changed
- Optimize GCP offers by @r4victor in #3793
- Optimize InstanceOffer construction by @r4victor in #3794
- Speed up GCP validate_credentials by @r4victor in #3795
- Support secrets management by Manager role by @r4victor in #3801
- Fix
update_default_project()crash on server without TTY by @un-def in #3797 - Kubernetes: fix
is_hard_taintcheck by @un-def in #3803 - Fix deleting idle instance from fleet with runs by @jvstme in #3807
- [Docs] Update examples by @peterschmidt85 in #3798
- Display image pull progress in CLI by @jvstme in #3805
- [Docs] Add an inline
kubeconfigexample to thekubernetesbackend documentation by @peterschmidt85 in #3813 - Avoid Verda instance termination warnings by @jvstme in #3810
- [Internal] Improve warning message in
ServerConfigManager.apply_config()by @un-def in #3804 - Add missing join to volumes query in JobSubmittedWorker by @un-def in #3816
- Add CLI deprecation warnings about gateway routers by @jvstme in #3814
- Bump
gpuhunt, add support for all GeForce RTX 2..5 series by @un-def in #3818 - Add misssing
compute.projects.getGCP permission by @un-def in #3820
Full changelog: 0.20.17...0.20.18
0.20.17
PD disaggregation
This update simplifies running SGLang with Prefill-Decode disaggregation.
Previously, PD disaggregation required configuring router on the gateway, which meant
the gateway had to run in the same cluster as the service to communicate with service
replicas.
With this update, router is configured on a service replica group instead. This allows
using a standard gateway outside the service cluster.
Below is an example service configuration for running zai-org/GLM-4.5-Air-FP8 using replica groups:
type: service
name: prefill-decode
image: lmsysorg/sglang:latest
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
commands:
- pip install sglang_router
- |
python -m sglang_router.launch_router \
--host 0.0.0.0 \
--port 8000 \
--pd-disaggregation \
--prefill-policy cache_aware
router:
type: sglang
resources:
cpu: 4
- count: 1..4
scaling:
metric: rps
target: 3
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--disaggregation-mode prefill \
--disaggregation-transfer-backend nixl \
--host 0.0.0.0 \
--port 8000 \
--disaggregation-bootstrap-port 8998
resources:
gpu: H200
- count: 1..8
scaling:
metric: rps
target: 2
commands:
- |
python -m sglang.launch_server \
--model-path $MODEL_ID \
--disaggregation-mode decode \
--disaggregation-transfer-backend nixl \
--host 0.0.0.0 \
--port 8000
resources:
gpu: H200
port: 8000
model: zai-org/GLM-4.5-Air-FP8
# Custom probe is required for PD disaggregation.
probes:
- type: http
url: /health
interval: 15sNote: this setup requires the service fleet or cluster to provide a CPU node for the
router replica.
Kubernetes
The kubernetes backend adds support for both network and instance volumes.
Network volumes
You can either create a new network volume or register an existing one. To create a new
network volume, specify size and optionally storage_class_name and/or
access_modes:
type: volume
backend: kubernetes
name: my-volume
size: 100GBThis automatically creates a PersistentVolumeClaim and associates it with the volume.
If you don't specify
storage_class_name, the decision is delegated to the
DefaultStorageClassadmission controller, if enabled.If you don't specify
access_modes, it defaults to[ReadWriteOnce]. To attach
volumes to multiple runs at the same time, set it to[ReadWriteMany]or
[ReadWriteMany, ReadOnlyMany].
To reuse an existing PersistentVolumeClaim, specify its name in claim_name:
type: volume
backend: kubernetes
name: my-volume
claim_name: existing-pvcOnce a volume configuration is applied, you can attach it to your runs via volumes:
type: dev-environment
name: vscode-vol
ide: vscode
volumes:
- name: my-volume
path: /volume_dataInstance volumes
In addition to network volumes, the kubernetes backend now supports instance volumes:
type: dev-environment
name: vscode-vol
ide: vscode
volumes:
- instance_path: /mnt/volume
path: /volume_dataUnlike network volumes, which persist across instances, instance volumes persist data
only within a particular instance. They are useful for storing caches or when you
manually mount a shared filesystem into the instance path.
Note: using volumes with the
kubernetesbackend requires the corresponding
permissions.
Performance
Fetching backend offers for the first time has been optimized and is now much faster. As
a result, dstack apply, dstack offer, and the offers UI are all more responsive.
Here are the improvements for some of the major backends:
- aws — 41.43s => 6.61s (6.3x)
- azure — 12.49s => 5.50s (2.3x)
- gcp — 13.51s => 5.20s (2.6x)
- nebius — 10.74s => 3.80s (2.8x)
- runpod — 9.36s => 0.09s (104x)
- verda — 9.49s => 2.33s (4.1x)
Fleets
In-place update
Backend fleets now support initial in-place updates. You can update nodes,
reservation, tags, resources, backends, regions, availability_zones,
instance_types, spot_policy, and max_price without re-creating the entire fleet.
If existing idle instances do not match the updated configuration, dstack replaces
them.
Default resources
Fleets used to have default resources set to cpu=2.. mem=8GB.. disk=100GB.. when
left unspecified. This meant any offers with fewer resources were excluded from such
fleets. If you wanted to run on a mem=4GB VM, you had to specify resources in both
the run and fleet configurations.
Now fleets have no default resources, so all offers are available by default. If you
need to add extra constraints on which offers can be provisioned in a fleet, specify
resources explicitly.
Run configurations continue to have default minimum resources set to
cpu=2.. mem=8GB.. disk=100GB.. to avoid provisioning instances that are too small.
Offers
The dstack offer CLI command now supports the --fleet argument, which allows you to
see only offers from the specified fleets.
dstack offer --fleet my-fleet --fleet another-project/other-fleetThe same is now supported in the UI on both the Offers and Launch pages.
Exports
Importers can now delete an import via
dstack import delete <export-project>/<export-name>. This is useful when an export
was created by the exporter, but the importer no longer needs it and does not want to
wait until the exporter deletes it.
AWS
RTX Pro 6000
The aws backend adds support for g7e.* instances offering RTXPRO6000 GPUs.
Docker
Default Docker registry
If you'd like to cache Docker images through your own Docker registry, you can now
configure it when starting the dstack server:
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY=<registry base hostname>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_USERNAME=<registry username>
export DSTACK_SERVER_DEFAULT_DOCKER_REGISTRY_PASSWORD=<registry password>These settings should only be used for registries that act as a pull-through cache for
Docker Hub. This is useful if you would like to avoid rate limits when you have too
many image pulls.
Migration note
Warning
Since v0.20.0, dstack has required fleets before runs can be submitted.
Until now, the deprecated DSTACK_FF_AUTOCREATED_FLEETS_ENABLED feature flag allowed submitting runs without fleets. In 0.20.17, this flag has been removed.
What's changed
- Drop deprecated scheduled tasks by @r4victor in #3749
- [Docs]: Rename REST API -> HTTP API by @jvstme in #3748
- Rework runner job submission flow by @un-def in #3743
- Default Docker registry and credentials by @jvstme in #3747
- Detect Verda provisioning errors earlier by @jvstme in #3753
- Optimize Python DB tests by @r4victor in #3755
- Add case study on Graphsignal's use of dstack for inference benchmarking by @peterschmidt85 in #3751
- Allow combining on/off idle_duration between runs and fleets by @r4victor in #3756
- Fix no offers retry for scheduled runs by @r4victor in #3759
- Support dynamic run waiting CLI status with extra renderables by @r4victor in #3760
- Kubernetes: add instance volumes support by @un-def in #3758
- Init gateways in background by @r4victor in #3762
- Store source backend config by @r4victor in #3764
- Show offers in dstack apply for elastic container fleets by @peterschmidt85 in #3754
- Support cloud fleet in-place update by @r4victor in #3766
- Set up HTTP ALB listener for ACM gateway by @r4victor in #3767
- Evict jobs if instance is no longer imported by @jvstme in #3772
- Implement cloud fleet in-place update for provisioning fields by @r4victor in #3775
- Drop fleet default min resources by @r4victor in #3776
- Support --fleet in dstack offer by @peterschmidt85 in #3774
- Support imported fleets in
dstack fleet getby @jvstme in #3773 - Limit fleet consolidation attempts by @r4victor in #3777
- [Docs]: Examples cleanup and installation updates by @peterschmidt85 in #3765
- Support AWS G7e (
RTXPRO6000) instances by @jvstme in #3752 - Support imported fleets in
dstack eventby @jvstme in #3779 - Drop autocreated fleets by @r4victor in #3782
- Support fleet filters in the Offers and Launch UI by @peterschmidt85 in #3780
- Support router as replica with pipelines...
0.20.16
Server
Performance
This release introduces a major overhaul of dstack server background processing. A single server
replica can now handle ~10x more resources, supporting at least 1000 active instances and runs. In
benchmarks, we observed 2x-10x faster processing (see #3551).
- Provisioning 200 instances: 12 minutes -> 4 minutes.
- Running a 200-node task: >25 minutes -> 4 minutes.
- Terminating 50 instances: 60 seconds -> 10 seconds.
The performance gains come from a new, more efficient background processing architecture. Server
hardware requirements and memory consumption remain the same.
If you need to temporarily revert this behavior, set
DSTACK_FF_PIPELINE_PROCESSING_DISABLED=1 before starting the server.
Upgrade notes
Warning
This release includes significant internal changes to the dstack server. Test in a staging
environment before upgrading production whenever possible.
Warning
Rolling upgrades from 0.20.13 or older directly to 0.20.16 are not supported. Do not run
replicas on 0.20.13 (or older) and 0.20.16 at the same time. Upgrade to 0.20.15 first, or
scale server replicas down to 1 before upgrading.
SSH proxy
Servers can enforce proxy-only SSH access by combining SSH proxy with the new
DSTACK_SERVER_SSHPROXY_ENFORCED flag. When enabled, runs omit user-provided keys from authorized
lists and expect clients to connect via the proxy endpoint that run details expose. For more details, see the server deployment guide.
Note
SSH proxy is experimental, and behavior may change in future releases.
UI
SSH keys
User settings now include an SSH keys tab where you can upload OpenSSH public keys, see their fingerprints, and remove keys that no longer belong to you. Uploaded keys let you open SSH sessions without relying on the client key that dstack attach manages automatically, and duplicate keys are rejected with a clear error.
CLI
dstack attach
When SSH proxy is enabled on the server, dstack attach now routes through the proxy automatically and receives the proxy host, port, and upstream ID from run connection info. Servers can opt into proxy-only access by setting DSTACK_SERVER_SSHPROXY_ENFORCED, which stops embedding direct SSH keys in runs.
export DSTACK_SERVER_SSHPROXY_ENFORCED=1Backends
RunPod
RunPod backends can now provision on-demand CPU offerings in secure cloud regions, so jobs that request gpu: 0 schedule successfully without tricking the scheduler. Disk size checks respect the per-offer limits RunPod publishes.
resources:
gpu: 0
cpu: 8
memory: 32GBVerda
Verda startup scripts and SSH keys are now generated per instance and removed reliably on teardown, preventing stale credentials and improving cleanup when a rollout provisions multiple machines.
Major bug-fixes
- Improved Git-related CLI repo errors with actionable messages for missing credentials, detached HEAD state, and non-repository directories (#3730).
What's changed
- [Internal] Don't reload server on cli package changes by @un-def in #3706
- Fix SELinux denials and "Text file busy" on SSH fleet provisioning by @peterschmidt85 in #3712
- Add support for user-provided SSH public keys by @un-def in #3688
- Move stop_runner() to JobTerminating pipeline by @r4victor in #3714
- Add web UI for user public keys by @un-def in #3713
- [Landing] Update headings and descriptions for clarity in README, installation, and quickstart guides to amplify agentic orchestration (WIP) by @peterschmidt85 in #3710
- Add pipelines optimizations by @r4victor in #3719
- Reject user interaction in
runner_ssh_tunnelby @un-def in #3716 - Use sshproxy for CLI attach if enabled by @un-def in #3711
- Enable pipelines by default by @r4victor in #3728
- Do not wait in
VerdaCompute.create_instanceby @jvstme in #3723 - Pass delete_permanently when deleting Verda instances by @peterschmidt85 in #3734
- Fix pipelines not running on Python <= 3.10 by @r4victor in #3736
- Tests: bump pytest-asyncio>=0.25.2 by @un-def in #3733
- Fix docs Swagger UI rendering for REST API pages by @peterschmidt85 in #3729
- Guard cached get_offers with an execution lock by @r4victor in #3738
- Fix JobRunningPipeline not reclaiming stale jobs for terminating runs by @r4victor in #3741
- runpod: support on-demand CPU offers and provisioning by @peterschmidt85 in #3726
- Add JobMetricsPoint.job_id index by @r4victor in #3742
- Fix SENTRY_TRACES_BACKGROUND_SAMPLE_RATE not respected by @r4victor in #3744
- Update Server Deployment guide for pipelines by @r4victor in #3745
- [Docs] Add
dstack-sshproxydeployment guide by @un-def in #3720 - Revamp repo errors handling by @un-def in #3730
- [chore]: Fix
add_row_from_dict()typing issues by @jvstme in #3739 - Handle concurrent repo blob/file archive uploads by @un-def in #3737
- Verda: make startup script and SSH key lifecycle per-instance with reliable cleanup by @peterschmidt85 in #3718
Full changelog: 0.20.15...0.20.16
0.20.15
Backends
CloudRift
The cloudrift backend now supports provisioning AMD MI350X GPUs.
Major bug-fixes
- [AMD] Handle
amd-smi7.x output format (#3701) — Support bothamd-smioutput formats: flat array (ROCm 6.x) and wrapped{"gpu_data": [...]}(ROCm 7.x). - [CloudRift] CloudRift VMs boot with incorrect RTC clock (~1h ahead). Added NTP sync wait before launching the shim.
- [UI] Support
modeland some other YAML properties withLaunchwizard (e.g. to allow to deploy models via templates). - [SSH fleets] On SELinux-enforcing hosts (RHEL, Rocky, CentOS), the shim service failed to start with "Permission denied" because files moved from
/tmpkept their temporary SELinux labels - [UI] Services using HTTPS with an AWS ACM certificate were incorrectly displayed with an
http://URL in UI. - [SSH fleets] Setting
blocksat the top level of an SSH fleet config was silently ignored, causing "No matching instance offers available" errors. It only worked when set per-host
What's changed
- Fix pipeline fetcher deadlock by @r4victor in #3704
- [CloudRift] Fix NTP clock skew breaking Docker; handle amd-smi 7.x output by @peterschmidt85 in #3701
- Fix SELinux denials on SSH fleet provisioning by @peterschmidt85 in #3702
- [UI] Add missing fields to
LaunchUI supported fields list by @peterschmidt85 in #3671 - Fix
httpprotocol shown forhttpsservices withacmcert by @r4victor in #3709 - Respect top-level
blocksin SSH fleet configuration by @un-def in #3700
Full changelog: 0.20.14...0.20.15
0.20.14
Dev environments
You can create dev environments without specifying ide to connect via SSH only:
type: dev-environment
name: dev
python: "3.12"
resources:
gpu: H100NVL:1Exports
The following improvements are part of the exports feature, which enables sharing fleets across projects.
You can now list imported resources using:
$ dstack import list
NAME FLEETS
project-a/my-export my-fleet, another-fleetImported fleets can be used directly in your workflows via cross-project references:
type: dev-environment
name: dev
python: "3.12"
resources:
gpu: H100NVL:1
fleets: [project-a/my-fleet]Or via CLI:
dstack apply -f .dstack.yml --fleet project-a/my-fleetImported fleets are also included in dstack offer. Plus, dstack event shows imported fleets with a project prefix.
Backends
Azure
Added support for Azure GPU VM series based on the latest NVIDIA hardware:
- NCads H100 v5 (H100 NVL GPUs, up to 2 GPUs per VM)
- ND H100 v5 (8× H100 GPUs for large-scale training workloads)
- ND H200 v5 (next-gen H200 GPUs with increased memory and bandwidth for large models)
type: fleet
name: my-fleet
nodes: 1
resources:
gpu: H100NVL:1Major bug-fixes
- Fixed AWS private gateway provisioning failures caused by the AWS 32-character load balancer name limit and by setups with multiple private subnets per availability zone
- Restored proper instance startup handling on AWS, resolving issues such as volumes not attaching reliably
- Removed unnecessary Docker metadata requests for default images, reducing the risk of registry rate limits
What's changed
- Add Exports concept page and CLI reference by @peterschmidt85 in #3659
- Refactor
process_submitted_jobsby @r4victor in #3666 - Add API for SSH proxy by @un-def in #3646
- Add per-job hourly log quota enforced on runner by @peterschmidt85 in #3668
- Implement submitted jobs pipeline by @r4victor in #3670
- Do not filter by fleet in
filter_instances()by @jvstme in #3674 - Show imported instances in
dstack offerby @jvstme in #3673 - Prepare
process_runsfor pipelines migration by @r4victor in #3675 - Fix the "No fleets" warning in UI by @jvstme in #3669
- Fix the "No Fleets" warning in CLI by @jvstme in #3667
- Show targets with project prefix in
dstack eventby @jvstme in #3680 - Make
ideoptional for dev-environment by @peterschmidt85 in #3656 - Fix 500 server error when re-applying replica groups service by @Bihan in #3678
- Cross-project fleet references in CLI and YAML by @jvstme in #3677
- Add SSH and IDE connection info to runs API by @un-def in #3681
- Implement run pipeline by @r4victor in #3686
- Do not request image config for default images by @jvstme in #3684
/imports/listAPI anddstack import listCLI by @jvstme in #3682- Check lock_expires_at in deprecated background tasks by @r4victor in #3689
- Support external pipelines registration by @r4victor in #3691
- Handle TimeoutError on pipeline draining by @r4victor in #3692
- Drop redundant instance select with lock by @r4victor in #3693
- Restore instance.wait_until_running() in AWS create_instance by @r4victor in #3694
- Complete fix replica groups issue 3676 by @Bihan in #3687
- Fix AWS private gateway provisioning by @r4victor in #3698
- [Azure] Add support for H100 NVL and H200 VM series; refactor instance creation methods to cleanup failed instances by @peterschmidt85 in #3699
Full changelog: 0.20.13...0.20.14
0.20.13
Exports
SSH fleet sharing
You can now share SSH fleets across projects using the new exports system:
$ dstack export \
--project team-a \
create shared-gpus \
--fleet gpu-fleet-1 \
--fleet gpu-fleet-2 \
--importer team-b \
--importer team-c
NAME FLEETS IMPORTERS
shared-gpus gpu-fleet-1, gpu-fleet-2 team-b, team-cFrom the importer project’s perspective, exported fleets appear in dstack fleet list and can be used for runs just like the project’s own fleets:
$ dstack fleet --project team-b list
NAME NODES GPU SPOT BACKEND PRICE STATUS CREATED
my-local-fleet 1 - - ssh - active 3 days ago
team-a/gpu-fleet-1 2 A100:80GB:8 - ssh - active 1 week ago
team-a/gpu-fleet-2 1 H100:80GB:4 - ssh - active 2 days agoLearn more about exports in the docs.
UI
Project templates
Project settings now allow you to configure a custom templates repository.
Task connect
Task pages now include a Connect section that guides you through accessing the ports exposed by the task.
Backends
Crusoe
The Crusoe backend now supports H200 and B200 GPUs. The cluster docs have been updated to demonstrate native Crusoe backend usage for configuring a high-performance InfiniBand cluster.
Vast.ai
The new community_cloud backend setting allows you to restrict usage to secure cloud offers only:
backends:
- type: vastai
community_cloud: false # Use only secure cloud offersThe default remains community_cloud: true.
What's changed
- [UI]: Refresh button does not refresh run logs, metrics, or events by @olgenn in #3618
- Update Crusoe cluster docs by @peterschmidt85 in #3620
- Report runtime
working_dirandusernamefromrunnerviaJobRuntimeDataby @peterschmidt85 in #3617 - Implement fleet pipeline by @r4victor in #3623
- Replace RunPod -> Runpod in docs, blog, comments by @jvstme in #3625
- Extend fleet and instance permission tests by @jvstme in #3627
- Reorganize Go codebase by @un-def in #3628
- Display imported fleets with project prefix in CLI by @jvstme in #3630
- Fix Crusoe CPU instances and add H200/B200 support by @peterschmidt85 in #3619
- Do not show SSH fleet resources in
dstack fleetby @jvstme in #3632 - [UI] Unify Connect UX across run configuration types (plus fix) by @peterschmidt85 in #3622
- [UI]: Implement dynamic property filter options by @olgenn in #3621
- Fleet sharing main mechanisms by @jvstme in #3629
- Prevent Hot Aisle min reservation period error by @jvstme in #3633
- Implement instance pipeline by @r4victor in #3636
- Migrate attribute comments to docstrings by @r4victor in #3639
- Allow concurrent run and
TERMINATINGjobs processing by @r4victor in #3641 - [UI] Add per-project templates repo support by @peterschmidt85 in #3640
- Implement terminating jobs pipeline by @r4victor in #3643
- [UI] Make
Launchto respectresourcesin templates by @peterschmidt85 in #3642 - [Skills] AMD image selection,
files,repos,imageguidance by @peterschmidt85 in #3634 - Refactor
process_running_jobsbackground task by @r4victor in #3648 - Exports API and CLI by @jvstme in #3647
- Add community_cloud to VastAI backend (default true) by @peterschmidt85 in #3635
- Do not show resources for SSH fleets in UI by @peterschmidt85 in #3649
- [Blog] Infrastructure orchestration is an agent skill by @peterschmidt85 in #3645
- Fix error when imported fleet has no capacity by @jvstme in #3652
- Improve CLI settings section and fix status indicator colors by @peterschmidt85 in #3650
- Upgrade litestream 0.5.0 → 0.5.9 and simplify entrypoint restore by @peterschmidt85 in #3653
- Fix services on imported fleets by @jvstme in #3654
- Implement running jobs pipeline by @r4victor in #3657
- Fix error submitting run to empty imported fleet by @jvstme in #3661
- Fix updating pre-0.18.2 gateways (including Sky) by @jvstme in #3658
- Fix CLI compatibility with older servers by @jvstme in #3664
Full Changelog: 0.20.12...0.20.13
0.20.12
Backends
Crusoe
dstack now supports Crusoe as a backend, enabling VM-based provisioning with GPU instances. The backend supports both single-node and multi-node cluster provisioning with InfiniBand.
type: fleet
name: my-crusoe-fleet
backends: [crusoe]
resources:
gpu: A100:8
nodes: 2
placement: clusterNote
CPU instances, H200, B200, GB200, MI300X, MI355X and volumes support is coming soon.
UI
Launch wizard
The UI now includes a launch wizard that lets users create runs from pre-defined templates. Instead of writing YAML from
scratch, users can select a template, pick GPU resources, adjust settings, and review the final
configuration—all through a guided flow.
To enable the launch wizard, point the server to a templates repository:
$ DSTACK_SERVER_TEMPLATES_REPO=https://github.com/dstackai/dstack-templates dstack serverTemplates are YAML files under .dstack/templates in the repo. Each template has type set to template, a unique name, a title, configurable parameters, and a configuration that defines the dstack run:
type: template
name: in-browser-ide
title: In-browser IDE
description: Access the instance using VS Code in the browser.
parameters:
- type: name
- type: resources
- type: python_or_docker
- type: repo
- type: working_dir
- type: env
title: Password
name: PASSWORD
value: $random-password
configuration:
type: service
auth: false
gateway: true
https: auto
env:
- BIND_ADDR=0.0.0.0:8080
commands:
- |
echo "Your password is $PASSWORD. Share it carefully as it grants full access to the IDE."
- |
curl -fsSL https://code-server.dev/install.sh | sh -s -- --method standalone --prefix /tmp/code-server
- |
/tmp/code-server/bin/code-server --bind-addr $BIND_ADDR --auth password --disable-telemetry --disable-update-check .
port: 8080
probes:
- type: http
url: /healthzSee dstack-templates for an example repository.
Note
The launch wizard is an experimental feature. Currently, templates are configured per server. Per-project templates configuration is coming soon.
Instances
The UI now has an Instance details page where you can view detailed information about an instance, including its events and inspect data. Instance names across the UI—including on Events pages—now link directly to this page.
What's changed
- Document Adding indexes by @r4victor in #3594
- [Blog] Model inference with Prefill-Decode disaggregation by @peterschmidt85 in #3595
- [runner] Drop buildLDLibraryPathEnv() by @un-def in #3593
- Fix mutually exclusive fields validation by @jvstme in #3598
- [Docs] PD disaggregation by @Bihan in #3592
- [Docs] Clarify how K8s resources and offers work by @un-def in #3565
- Allow
https: autofor services by @peterschmidt85 in #3600 - Implement gateway pipeline by @r4victor in #3599
- Allow detecting whether service
httpsis unset by @jvstme in #3601 - Implement volume pipeline by @r4victor in #3604
- [Website] Minor edits by @peterschmidt85 in #3609
- [Website] Add
robots.txtand structured data by @peterschmidt85 in #3610 - Add templates API and launch wizard UI by @peterschmidt85 in #3605
- [UI] Include availability issues information to offer cards by @peterschmidt85 in #3607
- Fix SSH fleet with
proxy_jumpin-place update check by @un-def in #3612 - Bump gpuhunt==0.1.17 by @r4victor in #3615
- [UI] Add Instance details page by @peterschmidt85 in #3614
- Add Crusoe Cloud backend by @peterschmidt85 in #3602
Full changelog: 0.20.11...0.20.12
