Skip to content

0.20.20

Latest

Choose a tag to compare

@jvstme jvstme released this 15 May 11:45
· 9 commits to master since this release
90c00cf

Services

NVIDIA Dynamo

This update adds support for Prefill-Decode (PD) disaggregated inference with NVIDIA Dynamo.

Previously, dstack supported PD disaggregation only with Shepherd Model Gateway as the router and SGLang as the inference engine for workers. With this update, a replica group can declare router: { type: dynamo }, allowing workers to use inference engines such as SGLang, vLLM, or TensorRT-LLM.

type: service
name: dynamo-pd

env:
  - HF_TOKEN
  - MODEL_ID=zai-org/GLM-4.5-Air-FP8

replicas:
  - count: 1
    docker: true
    commands:
      - apt-get update
      - apt-get install -y python3-dev python3-venv
      - python3 -m venv ~/dyn-venv
      - source ~/dyn-venv/bin/activate
      - pip install -U pip
      - pip install "ai-dynamo[sglang]==1.1.1"
      - git clone https://github.com/ai-dynamo/dynamo.git
      # Brings up the NATS / etcd compose stack and runs the Dynamo HTTP frontend.
      - docker compose -f dynamo/deploy/docker-compose.yml up -d
      - |
        python3 -m dynamo.frontend \
          --http-host 0.0.0.0 --http-port 8000 \
          --discovery-backend etcd --router-mode kv \
          --kv-cache-block-size 64
    resources:
      cpu: 4
    router:
      type: dynamo

  - count: 1..4
    scaling:
      metric: rps
      target: 3
    python: "3.12"
    nvcc: true
    commands:
      # dstack injects DSTACK_ROUTER_INTERNAL_IP after the router replica
      # is provisioned. Compose the etcd/NATS endpoints from it.
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      # Set to enable /health endpoint required by dstack probes.
      - export DYN_SYSTEM_PORT="8000"
      # Wait until the router's etcd and NATS ports are actually accepting connections.
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode prefill --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

  - count: 1..8
    scaling:
      metric: rps
      target: 2
    python: "3.12"
    nvcc: true
    commands:
      - export ETCD_ENDPOINTS="http://$DSTACK_ROUTER_INTERNAL_IP:2379"
      - export NATS_SERVER="nats://$DSTACK_ROUTER_INTERNAL_IP:4222"
      - export DYN_SYSTEM_PORT="8000"
      - |
        until (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/2379) 2>/dev/null \
           && (echo > /dev/tcp/$DSTACK_ROUTER_INTERNAL_IP/4222) 2>/dev/null; do
          echo "waiting for etcd/NATS on $DSTACK_ROUTER_INTERNAL_IP..."; sleep 3
        done
      - pip install "ai-dynamo[sglang]==1.1.1"
      - |
        python3 -m dynamo.sglang \
          --model-path $MODEL_ID --served-model-name $MODEL_ID \
          --discovery-backend etcd --host 0.0.0.0 \
          --page-size 64 \
          --disaggregation-mode decode --disaggregation-transfer-backend nixl
    resources:
      gpu: H200

port: 8000
model: zai-org/GLM-4.5-Air-FP8

# Custom probe is required for PD disaggregation.
probes:
  - type: http
    url: /health
    interval: 15s

dstack provisions the router replica, injects DSTACK_ROUTER_INTERNAL_IP into non-router replicas, and lets Dynamo workers connect directly to the router’s etcd and NATS services.

Refer to the Dynamo example for full deployment instructions.

Replica groups

It's now possible to configure the image, docker, python, nvcc, and privileged properties at the replica group level. This enables complex multi-component services like NVIDIA Dynamo, where different replicas require different runtime environments.

Exports

Gateways

Gateways can now be exported and shared across projects, enabling centralized gateway management in multi-project setups.

$ dstack export --project main create my-export --gateway shared-gateway --importer team
 NAME       FLEETS  GATEWAYS        IMPORTERS 
 my-export  -       shared-gateway  team      

Now, if you list gateways in the team project, you'll see the exported gateway:

$ dstack gateway --project team
 NAME                 BACKEND          HOSTNAME        DOMAIN                 DEFAULT  STATUS  
 main/shared-gateway  aws (eu-west-1)  108.131.126.35  gtw.mycompany.example           running

Additionally, gateway domains now support optional project name interpolation using ${{ run.project_name }}, allowing different projects to use different domains on the same shared gateway.

type: gateway
name: shared-gateway

backend: aws
region: eu-west-1

domain: ${{ run.project_name }}.mycompany.example

Global exports

Users with global admin privileges can now export SSH fleets and gateways to all projects at once, enabling organization-wide resource sharing.

$ dstack export create global-export --gateway shared-gateway --global
 NAME           FLEETS  GATEWAYS        IMPORTERS
 global-export  -       shared-gateway  *

AWS

EFA clusters

Previously, fleets that used EFA (Elastic Fabric Adapter) with multiple network interfaces required public_ips: False. With this release, dstack allows creating such fleets with public IPs. This simplifies the use of interconnected clusters on AWS by removing the need to run the dstack server and CLI inside a private VPC.

Kubernetes

Permissions

dstack now requires the watch permission for pods within the namespace. See Required permissions for up-to-date ClusterRole and Role manifests.

Backend configuration

The namespace property of the kubernetes backend configuration is now formally deprecated. It still takes effect and remains the source of truth in this version, but future versions will read the namespace from the current kubeconfig context instead.

Migration guide

Migration guide

  • If namespace is unset or set to default in both the backend config and the kubeconfig, no action is required — default continues to be used.
  • If namespace is set to the same value (e.g. ns-a) in both the backend config and the kubeconfig, no action is required.
  • If namespace is set to ns-a in the backend config but the kubeconfig has a different value (or none), set the namespace to ns-a in your kubeconfig context to prepare for future versions.
  • It is only safe to remove namespace from the backend config if its value is default.

What's changed

  • [Services] Allow to specify image, docker, python, nvcc, privileged at replica group level by @Bihan in #3832
  • [Internal]: Delete some unused classes by @jvstme in #3842
  • [Internal] Fix pyright failing in CI by @jvstme in #3846
  • [Internal] Update RunpodApiClient by @un-def in #3847
  • [Internal] Fix openai SDK failing in tests by @jvstme in #3849
  • [RunPod] Handle deleting non-existent volume by @r4victor in #3853
  • [Runpod] Fix broken registry_auth support by @un-def in #3844
  • [UX] Raise ImportError on Python 3.14 or later by @r4victor in #3855
  • [Exports] Gateway support by @jvstme in #3845
  • [Internal] Rename docs/ to mkdocs/, move examples under /docs/, inline source by @peterschmidt85 in #3859
  • [Kubernetes] Deprecate namespace in backend config by @un-def in #3858
  • [Gateways] Allow setting imported gateway as project default by @jvstme in #3860
  • [Internal] Forbid exporting the built-in dstack Sky gateway by @jvstme in #3864
  • [AWS] Support multi-EFA instances with public IPs by @r4victor in #3865
  • [Internal] Add server-side validation for fleet configuration subtypes by @un-def in #3848
  • [Verda] Optimize terminating Verda instances by @jvstme in #3811
  • [Internal] Introduce GatewayModel.forbid_new_services by @jvstme in #3863
  • [Docs] Introduce CLI & API guide; rework the HTTP API reference page by @peterschmidt85 in #3869
  • [Internal] Add script to set up Kubernetes cluster for dstack backend by @un-def in #3866
  • Fix Pyright errors with requests==2.34.0 by @jvstme in #3873
  • Add project name interpolation in gateway domains by @jvstme in #3870
  • [Bugfix] Fix duplicate headers with in-server proxy by @jvstme in #3872
  • [Docs]: Gateway Exports by @jvstme in #3862
  • [Kubernetes] Fail fast if job pod was not scheduled by @un-def in #3874
  • [Exports] Global exports support by @jvstme in #3879
  • [Services] Support PD with NVIDIA Dynamo by @Bihan in #3868
  • [Internal] Update text regarding billing based on the project type by @peterschmidt85 in #3876
  • [Docs] Add NVIDIA Dynamo docs by @Bihan in #3877
  • [Internal] Fix unreleased global_exports lock on Postgres by @jvstme in #3882

Full changelog: 0.20.19...0.20.20