[Docs] Minor PR review feedback fixes

peterschmidt85 · peterschmidt85 · commit 8632ded53c9e · 2025-05-16T14:47:41.000+02:00
diff --git a/docs/blog/posts/mpi.md b/docs/blog/posts/mpi.md
@@ -86,10 +86,10 @@ resources:
 
 </div>
 
-The first worker node (`DSTACK_NODE_RANK=0`) generates a `hostfile` listing all node IPs and waits until all nodes are
+The master node (`DSTACK_NODE_RANK=0`) generates a `hostfile` listing all node IPs and waits until all nodes are
 reachable via MPI. Once confirmed, it launches the `/root/nccl-tests/build/all_reduce_perf` benchmark across all available GPUs in the cluster.
 
-The other worker nodes remain blocked until they receive a termination signal from the master node via a FIFO pipe.
+Non-master nodes remain blocked until they receive a termination signal from the master node via a FIFO pipe.
 
 With this, now you can use such a task to run both NCCL or RCCL tests on both cloud and SSH fleets, 
 as well as use MPI for other tasks.
@@ -102,4 +102,4 @@ as well as use MPI for other tasks.
 !!! info "What's next?"
     1. Learn more about [dev environments](../../docs/concepts/dev-environments.md), [tasks](../../docs/concepts/tasks.md), [services](../../docs/concepts/services.md), and [fleets](../../docs/concepts/fleets.md)
     2. Check the [NCCL tests](../../examples/clusters/nccl-tests/index.md) example
-    2. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}
+    3. Join [Discord :material-arrow-top-right-thin:{ .external }](https://discord.gg/u8SmfwPpMd){:target="_blank"}
diff --git a/docs/docs/concepts/fleets.md b/docs/docs/concepts/fleets.md
@@ -63,33 +63,34 @@ Once the status of instances changes to `idle`, they can be used by dev environm
 
 To ensure instances are interconnected (e.g., for
 [distributed tasks](tasks.md#distributed-tasks)), set `placement` to `cluster`. 
-This ensures all instances are provisioned in the same backend and region with optimal inter-node connectivity
+This ensures all instances are provisioned with optimal inter-node connectivity.
 
 ??? info "AWS"
-    When you create a cloud fleet with `aws`, [Elastic Fabric Adapter networking :material-arrow-top-right-thin:{ .external }](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html){:target="_blank"} is automatically configured if it’s supported for the corresponding instance type.
+    When you create a cloud fleet with AWS, [Elastic Fabric Adapter networking :material-arrow-top-right-thin:{ .external }](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html){:target="_blank"} is automatically configured if it’s supported for the corresponding instance type.
     Note, EFA requires the `public_ips` to be set to `false` in the `aws` backend configuration.
     Otherwise, instances are only connected by the default VPC subnet.
 
     Refer to the [EFA](../../blog/posts/efa.md) example for more details.
 
 ??? info "GCP"
-    When you create a cloud fleet with `gcp`, for the A3 Mega and A3 High instance types, [GPUDirect-TCPXO and GPUDirect-TCPX :material-arrow-top-right-thin:{ .external }](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot){:target="_blank"} networking is automatically configured.
+    When you create a cloud fleet with GCP, for the A3 Mega and A3 High instance types, [GPUDirect-TCPXO and GPUDirect-TCPX :material-arrow-top-right-thin:{ .external }](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot){:target="_blank"} networking is automatically configured.
 
     !!! info "Backend configuration"    
         Note, GPUDirect-TCPXO and GPUDirect-TCPX require `extra_vpcs` to be configured  in the `gcp` backend configuration.
         Refer to the [A3 Mega](../../examples/clusters/a3mega/index.md) and 
         [A3 High](../../examples/clusters/a3high/index.md) examples for more details.
 
 ??? info "Nebius"
-    When you create a Nebius cloud fleet with `placement: cluster`, [InfiniBand networking :material-arrow-top-right-thin:{ .external }](https://docs.nebius.com/compute/clusters/gpu){:target="_blank"} is automatically configured if it’s supported for the corresponding instance type.
+    When you create a cloud fleet with Nebius, [InfiniBand networking :material-arrow-top-right-thin:{ .external }](https://docs.nebius.com/compute/clusters/gpu){:target="_blank"} is automatically configured if it’s supported for the corresponding instance type.
     Otherwise, instances are only connected by the default VPC subnet.
 
-    An InfiniBand fabric for the cluster is selected automatically.
-    If you prefer to use some specific fabrics, configure them in the
+    An InfiniBand fabric for the cluster is selected automatically. If you prefer to use some specific fabrics, configure them in the
     [backend settings](../reference/server/config.yml.md#nebius).
 
-> The `cluster` placement is supported only for `aws`, `azure`, `gcp`, `nebius`, `oci`, and `vultr`
-> backends.
+The `cluster` placement is supported for `aws`, `azure`, `gcp`, `nebius`, `oci`, and `vultr`
+backends.
+
+> For more details on optimal inter-node connectivity, read the [Clusters](../guides/clusters.md) guide.
 
 #### Resources
 
@@ -312,13 +313,14 @@ Once the status of instances changes to `idle`, they can be used by dev environm
 If the hosts are interconnected (i.e. share the same network), set `placement` to `cluster`. 
 This is required if you'd like to use the fleet for [distributed tasks](tasks.md#distributed-tasks).
 
-##### Network
-    
-By default, `dstack` automatically detects the network shared by the hosts. 
-However, it's possible to configure it explicitly via 
-the [`network`](../reference/dstack.yml/fleet.md#network) property.
+??? info "Network"  
+    By default, `dstack` automatically detects the network shared by the hosts. 
+    However, it's possible to configure it explicitly via 
+    the [`network`](../reference/dstack.yml/fleet.md#network) property.
+
+    [//]: # (TODO: Provide an example and more detail)
 
-[//]: # (TODO: Provide an example and more detail)
+> For more details on optimal inter-node connectivity, read the [Clusters](../guides/clusters.md) guide.
 
 #### Blocks { #ssh-blocks }
 
@@ -471,5 +473,6 @@ Alternatively, you can delete a fleet by passing the fleet name  to `dstack flee
 To terminate and delete specific instances from a fleet, pass `-i INSTANCE_NUM`.
 
 !!! info "What's next?"
-    1. Read about [dev environments](dev-environments.md), [tasks](tasks.md), and
+    1. Check [dev environments](dev-environments.md), [tasks](tasks.md), and
     [services](services.md)
+    2. Read the [Clusters](../guides/clusters.md) guide
diff --git a/docs/docs/guides/clusters.md b/docs/docs/guides/clusters.md
@@ -18,22 +18,22 @@ Cloud fleets allow to provision interconnected clusters across supported backend
 For cloud fleets, fast interconnect is currently supported only on the `aws`, `gcp`, and `nebius` backends.
 
 === "AWS"
-    When you create a cloud fleet with `aws`, [Elastic Fabric Adapter :material-arrow-top-right-thin:{ .external }](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html){:target="_blank"} networking is automatically configured if it’s supported for the corresponding instance type.
+    When you create a cloud fleet with AWS, [Elastic Fabric Adapter :material-arrow-top-right-thin:{ .external }](https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa.html){:target="_blank"} networking is automatically configured if it’s supported for the corresponding instance type.
     
     !!! info "Backend configuration"    
-        Note, EFA requires the `public_ips` to set to `false` in the `aws` backend configuration.
+        Note, EFA requires the `public_ips` to be set to `false` in the `aws` backend configuration.
         Refer to the [EFA](../../blog/posts/efa.md) example for more details.
 
 === "GCP"
-    When you create a cloud fleet with `gcp`, for the A3 Mega and A3 High instance types, [GPUDirect-TCPXO and GPUDirect-TCPX :material-arrow-top-right-thin:{ .external }](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot){:target="_blank"} networking is automatically configured.
+    When you create a cloud fleet with GCP, for the A3 Mega and A3 High instance types, [GPUDirect-TCPXO and GPUDirect-TCPX :material-arrow-top-right-thin:{ .external }](https://cloud.google.com/kubernetes-engine/docs/how-to/gpu-bandwidth-gpudirect-tcpx-autopilot){:target="_blank"} networking is automatically configured.
 
     !!! info "Backend configuration"    
         Note, GPUDirect-TCPXO and GPUDirect-TCPX require `extra_vpcs` to be configured  in the `gcp` backend configuration.
         Refer to the [A3 Mega](../../examples/clusters/a3mega/index.md) and 
-        [A3 Mega](../../examples/clusters/a3high/index.md) examples for more details.
+        [A3 High](../../examples/clusters/a3high/index.md) examples for more details.
 
 === "Nebius"
-    When you create a cloud fleet with `nebius`, [InfiniBand :material-arrow-top-right-thin:{ .external }](https://docs.nebius.com/compute/clusters/gpu){:target="_blank"} networking is automatically configured if it’s supported for the corresponding instance type.
+    When you create a cloud fleet with Nebius, [InfiniBand :material-arrow-top-right-thin:{ .external }](https://docs.nebius.com/compute/clusters/gpu){:target="_blank"} networking is automatically configured if it’s supported for the corresponding instance type.
 
 > To request fast interconnect support for a other backends,
 file an [issue :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues){:target="_ blank"}. 
@@ -47,7 +47,7 @@ To test the interconnect of a created fleet, ensure you run [NCCL](../../example
 
 A distributed task is a task with `nodes` set to a value greater than `2`. In this case, `dstack` first ensures a 
 suitable fleet is available, then starts the master node and runs the task container on it. Once the master is up,
-`dstack` starts worker nodes and runs the task container on each worker node.
+`dstack` starts the rest of the nodes and runs the task container on each of them.
 
 Within the task's `commands`, it's possible to use `DSTACK_MASTER_NODE_IP`, `DSTACK_NODES_IPS`, `DSTACK_NODE_RANK`, and other
 [system environment variables](../concepts/tasks.md#system-environment-variables) for inter-node communication.
diff --git a/examples/clusters/nccl-tests/README.md b/examples/clusters/nccl-tests/README.md
@@ -63,10 +63,10 @@ resources:
 
 !!! info "MPI"
     NCCL tests rely on MPI to run on multiple processes. The master node (`DSTACK_NODE_RANK=0`) generates `hostfile` (using `DSTACK_NODES_IPS`) 
-    and waits until worker nodes are accessible via MPI. 
+    and waits until other nodes are accessible via MPI. 
     Then, it executes `/nccl-tests/build/all_reduce_perf` across all GPUs.
 
-    Worker nodes use a `FIFO` pipe to wait for until the MPI run is finished.
+    Non-master nodes use a `FIFO` pipe to wait for until the MPI run is finished.
 
     There is an open [issue :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues/2467){:target="_blank"} to simplify the use of MPI with distributed tasks.
 
diff --git a/examples/clusters/rccl-tests/.dstack.yml b/examples/clusters/rccl-tests/.dstack.yml
@@ -33,7 +33,7 @@ commands:
         if ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 true >/dev/null 2>&1; then
           break
         fi
-        echo 'Waiting for worker nodes...'
+        echo 'Waiting for other nodes...'
         sleep 5
       done
       # Run NCCL Tests
@@ -45,7 +45,7 @@ commands:
         -x NCCL_IB_GID_INDEX=3 \
         -x NCCL_IB_DISABLE=0 \
         ./build/all_reduce_perf -b 8M -e 8G -f 2 -g 1 -w 5 --iters 20 -c 0;
-      # Notify worker nodes the MPI run is finished
+      # Notify other nodes the MPI run is finished
       ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 sh -c "echo done > ${FIFO}"
     else
       mkfifo ${FIFO}
diff --git a/examples/clusters/rccl-tests/README.md b/examples/clusters/rccl-tests/README.md
@@ -44,7 +44,7 @@ commands:
         if ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 true >/dev/null 2>&1; then
           break
         fi
-        echo 'Waiting for worker nodes...'
+        echo 'Waiting for other nodes...'
         sleep 5
       done
       # Run NCCL Tests
@@ -56,7 +56,7 @@ commands:
         -x NCCL_IB_GID_INDEX=3 \
         -x NCCL_IB_DISABLE=0 \
         ./build/all_reduce_perf -b 8M -e 8G -f 2 -g 1 -w 5 --iters 20 -c 0;
-      # Notify worker nodes the MPI run is finished
+      # Notify other nodes the MPI run is finished
       ${MPIRUN} -n ${DSTACK_NODES_NUM} -N 1 sh -c "echo done > ${FIFO}"
     else
       mkfifo ${FIFO}
@@ -72,10 +72,10 @@ resources:
 
 !!! info "MPI"
     RCCL tests rely on MPI to run on multiple processes. The master node (`DSTACK_NODE_RANK=0`) generates `hostfile` (using `DSTACK_NODES_IPS`) 
-    and waits until worker nodes are accessible via MPI. 
+    and waits until other nodes are accessible via MPI. 
     Then, it executes `/rccl-tests/build/all_reduce_perf` across all GPUs.
 
-    Worker nodes use a `FIFO` pipe to wait for until the MPI run is finished.
+    Other nodes use a `FIFO` pipe to wait for until the MPI run is finished.
 
     There is an open [issue :material-arrow-top-right-thin:{ .external }](https://github.com/dstackai/dstack/issues/2467){:target="_blank"} to simplify the use of MPI with distributed tasks.