Release 0.20.20-v1 · dstackai/dstack-enterprise

Services

NVIDIA Dynamo

This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.

Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.

type: service
name: dynamo-pd

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    docker: true
    commands:
      - apt-get update
      - apt-get install -y python3-dev python3-venv
      - python3 -m venv ~/dyn-venv
      - source ~/dyn-venv/bin/activate
      - pip install -U pip
      - pip install "ai-dynamo[sglang]==1.1.1"
      - git clone https://github.com/ai-dynamo/dynamo.git
      # Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
      - docker compose -f dynamo/deploy/docker-compose.yml up -d
      - |
        python3 -m dynamo.frontend \
          --http-host 0.0.0.0 --http-port 8000 \
          --discovery-backend etcd --router-mode kv \
          --kv-cache-block-size 64
    resources:
      cpu: 4
    router:
      type: dynamo

  - count: 1..4
    scaling:
      metric: rps
      target: 3
    python: "3.12"
    nvcc: true
    commands:
      # dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
      # is provisioned. Compose the etcd/NATS endpoints from it.
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      # Set to enable /health endpoint required by dstack probes.
      - export DYN_SYSTEM_PORT="8000"
      # Wait until the router's etcd and NATS ports are actually accepting connections.
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode prefill --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    python: "3.12"
    nvcc: true
    commands:
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_PORT="8000"
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode decode --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

# Custom probe is required for PD disaggregation.
probes:
  - type: http
    url: /health
    interval: 15s

dstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.

Refer to the Dynamo example for full deployment instructions.

Replica groups

It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.

Exports

Gateways

Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.

$ dstack export --project main create my-export --gateway shared-gateway --importer team
 NAME       FLEETS  GATEWAYS        IMPORTERS 
 my-export  -       shared-gateway  team

Now, if you list gateways in the team project, you'll see the exported gateway:

$ dstack gateway --project team
 NAME                 BACKEND          HOSTNAME        DOMAIN                 DEFAULT  STATUS  
 main/shared-gateway  aws (eu-west-1)  108.131.126.35  gtw.mycompany.example           running

Additionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.

type: gateway
name: shared-gateway

backend: aws
region: eu-west-1

domain: ${{ run.project_name }}.mycompany.example

Global exports

Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.

$ dstack export create global-export --gateway shared-gateway --global
 NAME           FLEETS  GATEWAYS        IMPORTERS
 global-export  -       shared-gateway  *

AWS

EFA clusters

Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.

Kubernetes

Backend configuration

The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.

Migration guide

If namespace is unset or set to default in both the backend config and the kubeconfig, no action is required — default continues to be used.
If namespace is set to the same value (e.g. ns-a) in both the backend config and the kubeconfig, no action is required.
If namespace is set to ns-a in the backend config but the kubeconfig has a different value (or none), set the namespace to ns-a in your kubeconfig context to prepare for future versions.
It is only safe to remove namespace from the backend config if its value is default.

What's changed

[Services] Allow to specify image, docker, python, nvcc, privileged at replica group level by @Bihan in dstackai/dstack#3832
[Internal]: Delete some unused classes by @jvstme in dstackai/dstack#3842
[Internal] Fix pyright failing in CI by @jvstme in dstackai/dstack#3846
[Internal] Update RunpodApiClient by @un-def in dstackai/dstack#3847
[Internal] Fix openai SDK failing in tests by @jvstme in dstackai/dstack#3849
[RunPod] Handle deleting non-existent volume by @r4victor in dstackai/dstack#3853
[Runpod] Fix broken registry_auth support by @un-def in dstackai/dstack#3844
[UX] Raise ImportError on Python 3.14 or later by @r4victor in dstackai/dstack#3855
[Exports] Gateway support by @jvstme in dstackai/dstack#3845
[Internal] Rename docs/ to mkdocs/, move examples under /docs/, inline source by @peterschmidt85 in dstackai/dstack#3859
[Kubernetes] Deprecate namespace in backend config by @un-def in dstackai/dstack#3858
[Gateways] Allow setting imported gateway as project default by @jvstme in dstackai/dstack#3860
[Internal] Forbid exporting the built-in dstack Sky gateway by @jvstme in dstackai/dstack#3864
[AWS] Support multi-EFA instances with public IPs by @r4victor in dstackai/dstack#3865
[Internal] Add server-side validation for fleet configuration subtypes by @un-def in dstackai/dstack#3848
[Verda] Optimize terminating Verda instances by @jvstme in dstackai/dstack#3811
[Internal] Introduce GatewayModel.forbid_new_services by @jvstme in dstackai/dstack#3863
[Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in dstackai/dstack#3869
[Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in dstackai/dstack#3866
Fix Pyright errors with requests==2.34.0 by @jvstme in dstackai/dstack#3873
Add project name interpolation in gateway domains by @jvstme in dstackai/dstack#3870
[Bugfix] Fix duplicate headers with in-server proxy by @jvstme in dstackai/dstack#3872
[Docs]: Gateway Exports by @jvstme in dstackai/dstack#3862
[Kubernetes] Fail fast if job pod was not scheduled by @un-def in dstackai/dstack#3874
[Exports] Global exports support by @jvstme in dstackai/dstack#3879
[Services] Support PD with NVIDIA Dynamo by @Bihan in dstackai/dstack#3868
[Internal] Update text regarding billing based on the project type by @peterschmidt85 in dstackai/dstack#3876
[Docs] Add NVIDIA Dynamo docs by @Bihan in dstackai/dstack#3877
[Internal] Fix unreleased global_exports lock on Postgres by @jvstme in dstackai/dstack#3882

Full changelog: dstackai/dstack@0.20.19...0.20.20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.20.20-v1

Choose a tag to compare

Sorry, something went wrong.