From 6cf064e4ad057104cb5ba8575deb12d6e83378c2 Mon Sep 17 00:00:00 2001 From: Jvst Me Date: Thu, 15 May 2025 22:28:14 +0200 Subject: [PATCH 1/2] [Docs]: Mention SSH fleet networking requirements --- docs/docs/concepts/fleets.md | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/docs/docs/concepts/fleets.md b/docs/docs/concepts/fleets.md index 64f424833c..03f54f742f 100644 --- a/docs/docs/concepts/fleets.md +++ b/docs/docs/concepts/fleets.md @@ -275,6 +275,10 @@ Define a fleet configuration as a YAML file in your project directory. The file 3. The user specified should have passwordless `sudo` access. + 4. The SSH server should be running and configured with `AllowTcpForwarding yes` in `/etc/ssh/sshd_config`. + + 5. The firewall should allow SSH and forbid any other connections from external networks. + To create or update the fleet, pass the fleet configuration to [`dstack apply`](../reference/cli/dstack/apply.md):
@@ -331,10 +335,10 @@ divided into, allowing multiple jobs to use these blocks concurrently. hosts: - hostname: 3.255.177.51 blocks: 4 - - hostaname: 3.255.177.52 + - hostname: 3.255.177.52 # As many as possible, according to numbers of GPUs and CPUs blocks: auto - - hostaname: 3.255.177.53 + - hostname: 3.255.177.53 # Do not sclice. This is the default value, may be omitted blocks: 1 ``` From 3ed48eb0865e954e052fb11a771626e2cc7fc2e2 Mon Sep 17 00:00:00 2001 From: Jvst Me Date: Fri, 16 May 2025 16:57:21 +0200 Subject: [PATCH 2/2] Change `should`->`must` and mention clusters --- docs/docs/concepts/fleets.md | 20 ++++++++++---------- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/docs/docs/concepts/fleets.md b/docs/docs/concepts/fleets.md index 03f54f742f..f4442869e9 100644 --- a/docs/docs/concepts/fleets.md +++ b/docs/docs/concepts/fleets.md @@ -254,30 +254,30 @@ Define a fleet configuration as a YAML file in your project directory. The file
??? info "Requirements" - 1. Hosts should be pre-installed with Docker. + 1. Hosts must be pre-installed with Docker. === "NVIDIA" - 2. Hosts with NVIDIA GPUs should also be pre-installed with CUDA 12.1 and + 2. Hosts with NVIDIA GPUs must also be pre-installed with CUDA 12.1 and [NVIDIA Container Toolkit :material-arrow-top-right-thin:{ .external }](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html). === "AMD" - 2. Hosts with AMD GPUs should also be pre-installed with AMDGPU-DKMS kernel driver (e.g. via + 2. Hosts with AMD GPUs must also be pre-installed with AMDGPU-DKMS kernel driver (e.g. via [native package manager :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/native-install/index.html) or [AMDGPU installer :material-arrow-top-right-thin:{ .external }](https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/amdgpu-install.html).) === "Intel Gaudi" - 2. Hosts with Intel Gaudi accelerators should be pre-installed with [Gaudi software and drivers](https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html#driver-installation). - This should include the drivers, `hl-smi`, and Habana Container Runtime. + 2. Hosts with Intel Gaudi accelerators must be pre-installed with [Gaudi software and drivers](https://docs.habana.ai/en/latest/Installation_Guide/Driver_Installation.html#driver-installation). + This must include the drivers, `hl-smi`, and Habana Container Runtime. === "Tenstorrent" - 2. Hosts with Tenstorrent accelerators should be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation). - This should include the drivers, `tt-smi`, and HugePages. + 2. Hosts with Tenstorrent accelerators must be pre-installed with [Tenstorrent software](https://docs.tenstorrent.com/getting-started/README.html#software-installation). + This must include the drivers, `tt-smi`, and HugePages. - 3. The user specified should have passwordless `sudo` access. + 3. The user specified must have passwordless `sudo` access. - 4. The SSH server should be running and configured with `AllowTcpForwarding yes` in `/etc/ssh/sshd_config`. + 4. The SSH server must be running and configured with `AllowTcpForwarding yes` in `/etc/ssh/sshd_config`. - 5. The firewall should allow SSH and forbid any other connections from external networks. + 5. The firewall must allow SSH and should forbid any other connections from external networks. For `placement: cluster` fleets, it should also allow any communication between fleet nodes. To create or update the fleet, pass the fleet configuration to [`dstack apply`](../reference/cli/dstack/apply.md):