You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
See the [Local Workstation](./local-workstation.md) guide.
23
23
24
-
### I have access to a Slurm cluster
24
+
### I Have Access to a Slurm Cluster
25
25
26
26
Add a `slurm:` section to your YAML config and submit with the same `automodel` command. The CLI generates the `torchrun` invocation and calls `sbatch` for you:
Use the same `skypilot:` launcher, but set `cloud: kubernetes`. This is a good fit when your team already has a GPU-backed Kubernetes cluster and you want SkyPilot to handle job submission and multi-node orchestration:
Copy file name to clipboardExpand all lines: docs/launcher/skypilot.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -106,7 +106,7 @@ model:
106
106
107
107
## Cloud Examples
108
108
109
-
### AWS - On-demand A10G
109
+
### AWS — On-Demand A10G
110
110
111
111
```yaml
112
112
skypilot:
@@ -118,7 +118,7 @@ skypilot:
118
118
hf_token: ${HF_TOKEN}
119
119
```
120
120
121
-
### GCP - Spot V100, 8 GPUs (single node)
121
+
### GCP — Spot V100, 8 GPUs (Single Node)
122
122
123
123
```yaml
124
124
skypilot:
@@ -130,7 +130,7 @@ skypilot:
130
130
hf_token: ${HF_TOKEN}
131
131
```
132
132
133
-
### Multi-node distributed training (2 x 8 x A100)
133
+
### Multi-Node Distributed Training (2 x 8 x A100)
134
134
135
135
```yaml
136
136
skypilot:
@@ -142,7 +142,7 @@ skypilot:
142
142
hf_token: ${HF_TOKEN}
143
143
```
144
144
145
-
For multi-node jobs the launcher automatically adds the SkyPilot rendezvous environment variables (`$SKYPILOT_NODE_RANK`, `$SKYPILOT_NUM_NODES`, `$SKYPILOT_NODE_IPS`) to the `torchrun` command.
145
+
For multi-node jobs, the launcher automatically adds the SkyPilot rendezvous environment variables (`$SKYPILOT_NODE_RANK`, `$SKYPILOT_NUM_NODES`, `$SKYPILOT_NODE_IPS`) to the `torchrun` command.
If you want to run on a Kubernetes cluster, use `cloud: kubernetes` and follow the dedicated [SkyPilot + Kubernetes tutorial](./skypilot-kubernetes.md). That guide includes:
0 commit comments