Skip to content

0.19.12-v1

Choose a tag to compare

@r4victor r4victor released this 04 Jun 11:18
· 2 commits to main since this release

Clusters

Simplified use of MPI

startup_order and stop_criteria

New run configuration properties are introduced:

  • startup_order: any/master-first/workers-first specifies the order in which master and workers jobs are started.
  • stop_criteria: all-done/master-done specifies the criteria when a multi-node run should be considered finished.

These properties simplify running certain multi-node workloads. For example, MPI requires that workers are up and running when the master runs mpirun, so you'd use startup_order: workers-first. MPI workload can be considered done when the master is done, so you'd use stop_criteria: master-done and dstack won't wait for workers to exit.

DSTACK_MPI_HOSTFILE

dstack now automatically creates an MPI hostfile and exposes the DSTACK_MPI_HOSTFILE environment variable with the hostfile path. It can be used directly as mpirun --hostfile $DSTACK_MPI_HOSTFILE.

CLI

We've also updated how the CLI displays run and job status. Previously, the CLI displayed the internal status code which was hard to interpret. Now, the the STATUS column in dstack ps and dstack apply displays a status code which is easy to understand why run or job was terminated.

dstack ps -n 10
 NAME               BACKEND             RESOURCES                            PRICE    STATUS        SUBMITTED
 oom-task                                                                             no offers     yesterday
 oom-task           nebius (eu-north1)  cpu=2 mem=8GB disk=100GB             $0.0496  exited (127)  yesterday
 oom-task           nebius (eu-north1)  cpu=2 mem=8GB disk=100GB             $0.0496  exited (127)  yesterday
 heavy-wolverine-1                                                                    done          yesterday
   replica=0 job=0  aws (us-east-1)     cpu=4 mem=16GB disk=100GB T4:16GB:1  $0.526   exited (0)    yesterday
   replica=0 job=1  aws (us-east-1)     cpu=4 mem=16GB disk=100GB T4:16GB:1  $0.526   exited (0)    yesterday
 cursor             nebius (eu-north1)  cpu=2 mem=8GB disk=100GB             $0.0496  stopped       yesterday
 cursor             nebius (eu-north1)  cpu=2 mem=8GB disk=100GB             $0.0496  error         yesterday
 cursor             nebius (eu-north1)  cpu=2 mem=8GB disk=100GB             $0.0496  interrupted   yesterday
 cursor             nebius (eu-north1)  cpu=2 mem=8GB disk=100GB             $0.0496  aborted       yesterday

Examples

Simplified NCCL tests

With this release improvements, it became much easier to run MPI workloads with dstack. This includes NCCL tests that can now be run using the following configuration:

type: task
name: nccl-tests

nodes: 2
startup_order: workers-first
stop_criteria: master-done

image: dstackai/efa
env:
  - NCCL_DEBUG=INFO
commands:
  - cd /root/nccl-tests/build
  - |
    if [ ${DSTACK_NODE_RANK} -eq 0 ]; then
      mpirun \
        --allow-run-as-root --hostfile $DSTACK_MPI_HOSTFILE \
        -n ${DSTACK_GPUS_NUM} \
        -N ${DSTACK_GPUS_PER_NODE} \
        --mca btl_tcp_if_exclude lo,docker0 \
        --bind-to none \
        ./all_reduce_perf -b 8 -e 8G -f 2 -g 1
    else
      sleep infinity
    fi

resources:
  gpu: nvidia:4:16GB
  shm_size: 16GB

See the updated NCCL tests example for more details.

Distributed training

TRL

The new TRL example walks you through how to run distributed fine-tune using TRL, Accelerate and Deepspeed.

Axolotl

The new Axolotl example walks you through how to run distributed fine-tune using Axolotl with dstack.

What's changed

Full changelog: dstackai/dstack@0.19.11...0.19.12