Services
NVIDIA Dynamo
This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.
Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.
type: service
name: dynamo-pd
env:
- HF_TOKEN
- MODEL_ID=zai-org/GLM-4.5-Air-FP8
replicas:
- count: 1
docker: true
commands:
- apt-get update
- apt-get install -y python3-dev python3-venv
- python3 -m venv ~/dyn-venv
- source ~/dyn-venv/bin/activate
- pip install -U pip
- pip install "ai-dynamo[sglang]==1.1.1"
- git clone https://github.com/ai-dynamo/dynamo.git
# Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
- docker compose -f dynamo/deploy/docker-compose.yml up -d
- |
python3 -m dynamo.frontend \
--http-host 0.0.0.0 --http-port 8000 \
--discovery-backend etcd --router-mode kv \
--kv-cache-block-size 64
resources:
cpu: 4
router:
type: dynamo
- count: 1..4
scaling:
metric: rps
target: 3
python: "3.12"
nvcc: true
commands:
# dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
# is provisioned. Compose the etcd/NATS endpoints from it.
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
# Set to enable /health endpoint required by dstack probes.
- export DYN_SYSTEM_PORT="8000"
# Wait until the router's etcd and NATS ports are actually accepting connections.
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode prefill --disaggregation-transfer-backend nixl
resources:
gpu: H200
- count: 1..8
scaling:
metric: rps
target: 2
python: "3.12"
nvcc: true
commands:
- export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
- export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
- export DYN_SYSTEM_PORT="8000"
- |
until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
&& (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
done
- pip install "ai-dynamo[sglang]==1.1.1"
- |
python3 -m dynamo.sglang \
--model-path $MODEL_ID --served-model-name $MODEL_ID \
--discovery-backend etcd --host 0.0.0.0 \
--page-size 64 \
--disaggregation-mode decode --disaggregation-transfer-backend nixl
resources:
gpu: H200
port: 8000
model: zai-org/GLM-4.5-Air-FP8
# Custom probe is required for PD disaggregation.
probes:
- type: http
url: /health
interval: 15sdstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.
Refer to the Dynamo example for full deployment instructions.
Replica groups
It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.
Exports
Gateways
Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.
$ dstack export --project main create my-export --gateway shared-gateway --importer team
NAME FLEETS GATEWAYS IMPORTERS
my-export - shared-gateway team Now, if you list gateways in the team project, you'll see the exported gateway:
$ dstack gateway --project team
NAME BACKEND HOSTNAME DOMAIN DEFAULT STATUS
main/shared-gateway aws (eu-west-1) 108.131.126.35 gtw.mycompany.example runningAdditionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.
type: gateway
name: shared-gateway
backend: aws
region: eu-west-1
domain: ${{ run.project_name }}.mycompany.exampleGlobal exports
Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.
$ dstack export create global-export --gateway shared-gateway --global
NAME FLEETS GATEWAYS IMPORTERS
global-export - shared-gateway *AWS
EFA clusters
Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.
Kubernetes
Permissions
dstack now requires the watch permission for pods within the namespace. See Required permissions for up-to-date ClusterRole and Role manifests.
Backend configuration
The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.
Migration guide
Migration guide
- If
namespaceis unset or set todefaultin both the backend config and the kubeconfig, no action is required —defaultcontinues to be used. - If
namespaceis set to the same value (e.g.ns-a) in both the backend config and the kubeconfig, no action is required. - If
namespaceis set tons-ain the backend config but the kubeconfig has a different value (or none), set the namespace tons-ain your kubeconfig context to prepare for future versions. - It is only safe to remove
namespacefrom the backend config if its value isdefault.
What's changed
- [Services] Allow to specify
image,docker,python,nvcc,privilegedat replica group level by @Bihan in #3832 - [Internal]: Delete some unused classes by @jvstme in #3842
- [Internal] Fix
pyrightfailing in CI by @jvstme in #3846 - [Internal] Update
RunpodApiClientby @un-def in #3847 - [Internal] Fix
openaiSDK failing in tests by @jvstme in #3849 - [RunPod] Handle deleting non-existent volume by @r4victor in #3853
- [Runpod] Fix broken
registry_authsupport by @un-def in #3844 - [UX] Raise
ImportErroron Python 3.14 or later by @r4victor in #3855 - [Exports] Gateway support by @jvstme in #3845
- [Internal] Rename
docs/tomkdocs/, move examples under/docs/, inline source by @peterschmidt85 in #3859 - [Kubernetes] Deprecate
namespacein backend config by @un-def in #3858 - [Gateways] Allow setting imported gateway as project default by @jvstme in #3860
- [Internal] Forbid exporting the built-in
dstackSky gateway by @jvstme in #3864 - [AWS] Support multi-EFA instances with public IPs by @r4victor in #3865
- [Internal] Add server-side validation for fleet configuration subtypes by @un-def in #3848
- [Verda] Optimize terminating Verda instances by @jvstme in #3811
- [Internal] Introduce
GatewayModel.forbid_new_servicesby @jvstme in #3863 - [Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in #3869
- [Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in #3866
- Fix Pyright errors with
requests==2.34.0by @jvstme in #3873 - Add project name interpolation in gateway domains by @jvstme in #3870
- [Bugfix] Fix duplicate headers with in-server proxy by @jvstme in #3872
- [Docs]: Gateway Exports by @jvstme in #3862
- [Kubernetes] Fail fast if job pod was not scheduled by @un-def in #3874
- [Exports] Global exports support by @jvstme in #3879
- [Services] Support PD with NVIDIA Dynamo by @Bihan in #3868
- [Internal] Update text regarding billing based on the project type by @peterschmidt85 in #3876
- [Docs] Add NVIDIA Dynamo docs by @Bihan in #3877
- [Internal] Fix unreleased
global_exportslock on Postgres by @jvstme in #3882
Full changelog: 0.20.19...0.20.20