Skip to content

Commit 511fac2

Browse files
Merge pull request #25 from LCSB-BioCore/mk-slurm-mwe
Minimal working example for usage in slurm
2 parents 5d92b48 + e9eaa62 commit 511fac2

5 files changed

Lines changed: 214 additions & 1 deletion

File tree

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -132,3 +132,13 @@ julia> gather_array(dataset) # download the data from workers to a sing
132132
0.610183 1.12165 0.722438
133133
134134
```
135+
136+
## Using DistributedData.jl in HPC environments
137+
138+
You can use
139+
[`ClusterManagers`](https://github.com/JuliaParallel/ClusterManagers.jl)
140+
package to add distributed workers from many different workload managers and
141+
task scheduling environments, such as Slurm, PBS, LSF, and others.
142+
143+
See the documentation for an [example of using Slurm to run
144+
DistributedData](https://lcsb-biocore.github.io/DistributedData.jl/stable/slurm/).
Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
#!/bin/bash -l
2+
#SBATCH -n 128
3+
#SBATCH -c 1
4+
#SBATCH -t 60
5+
#SBATCH --mem-per-cpu 4G
6+
#SBATCH -J MyDistributedJob
7+
8+
module load lang/Julia/1.3.0
9+
10+
julia run-analysis.jl

docs/slurm-example/run-analysis.jl

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,19 @@
1+
using Distributed, ClusterManagers, DistributedData
2+
3+
# read the number of available workers from environment and start the worker processes
4+
n_workers = parse(Int , ENV["SLURM_NTASKS"])
5+
addprocs_slurm(n_workers , topology =:master_worker)
6+
7+
# load the required packages on all workers
8+
@everywhere using DistributedData
9+
10+
# generate a random dataset on all workers
11+
dataset = dtransform((), _ -> randn(10000,10000), workers(), :myData)
12+
13+
# for demonstration, sum the whole dataset
14+
totalResult = dmapreduce(dataset, sum, +)
15+
16+
# do not forget to save the results!
17+
f = open("result.txt", "w")
18+
println(f, totalResult)
19+
close(f)

docs/src/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ on worker-local storage (e.g. to prevent memory exhaustion) and others.
1919
To start quickly, you can read the tutorial:
2020

2121
```@contents
22-
Pages=["tutorial.md"]
22+
Pages=["tutorial.md", "slurm.md"]
2323
```
2424

2525
### Functions

docs/src/slurm.md

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
2+
# DistributedData.jl on Slurm clusters
3+
4+
DistributedData scales well to moderately large computer clusters. As a
5+
practical example, you can see the script that was used to process relatively
6+
large datasets for GigaSOM, documented in the Supplementary file of the
7+
[GigaSOM
8+
article](https://academic.oup.com/gigascience/article/9/11/giaa127/5987271)
9+
(click the `giaa127_Supplemental_File` link below on the page, and find
10+
*Listing S1* at the end of the PDF).
11+
12+
Use of DistributedData with [Slurm](https://slurm.schedmd.com/overview.html) is
13+
similar as with many other distributed computing systems:
14+
15+
1. You install Julia and the required packages.
16+
2. You submit a batch (or interactive) task, which runs your Julia script on a
17+
node and gives it some information about where to find other worker nodes.
18+
3. In the Julia script, you use
19+
[`ClusterManagers`](https://github.com/JuliaParallel/ClusterManagers.jl)
20+
function `addprocs_slurm` to add the processes, just as with normal
21+
`addprocs`. Similar functions exist for many other task schedulers,
22+
including the popular PBS and LSF.
23+
4. The rest of the workflow is unchanged; all functions from `DistributedData`
24+
such as `save_at` and `dmapreduce` will work in the cluster just as they
25+
worked locally. Performance will vary though -- you may want to optimize
26+
your algorithm to use as much parallelism as possible (to get lots of
27+
performance), load more data in the memory (usually, much more total memory
28+
is available in the clusters than on a single computer), but keep an eye on
29+
the communication overhead, transferring only the minimal required amount of
30+
data as seldom as possible.
31+
32+
### Preparing the packages
33+
34+
The easiest way to install the packages is using a single-machine interactive
35+
job. On the access node of your HPC, run this command to give you a 60-minute
36+
interactive session:
37+
```sh
38+
srun --pty -t60 /bin/bash -
39+
```
40+
41+
(Depending on your cluster setup, it may be benerifical to also specify a
42+
partition to which the job should belong to; many clusters provide an
43+
interactive partition where the interactive jobs get scheduled faster. To do
44+
that, add option `-p interactive` into the parameters of `srun`.)
45+
46+
When the shell opens (the prompt should change), you can load the Julia module,
47+
usually with a command such as this:
48+
49+
```sh
50+
module load lang/Julia/1.3.0
51+
```
52+
53+
(You may consult `module spider julia` for other possible Julia versions.)
54+
55+
After that, start `julia` and add press `]` to open the packaging prompt (you
56+
should see `(v1.3) pkg>` instead of `julia>`). There you can download and
57+
install the required packages:
58+
```
59+
add DistributedData, ClusterManagers
60+
```
61+
62+
You may also want to load the packages to precompile them, which saves precious
63+
time later in the workflows. Press backspace to return to the "normal" Julia
64+
shell, and type:
65+
```julia
66+
using DistributedData, ClusterManagers
67+
```
68+
69+
Depending on the package, this may take a while, but should be done in under a
70+
minute for most existing packages. Finally, press `Ctrl+D` twice to exit both
71+
Julia and the interactive Slurm job shell.
72+
73+
### Slurm batch script
74+
75+
An example Slurm batch script
76+
([download](https://github.com/LCSB-BioCore/DistributedData.jl/blob/master/docs/slurm-example/run-analysis.batch))
77+
is listed below -- save it as `run-analysis.batch` to your Slurm access node,
78+
in a directory that is shared with the workers (usually a "scratch" directory;
79+
try `cd $SCRATCH`).
80+
```sh
81+
#!/bin/bash -l
82+
#SBATCH -n 128
83+
#SBATCH -c 1
84+
#SBATCH -t 10
85+
#SBATCH --mem-per-cpu 4G
86+
#SBATCH -J MyDistributedJob
87+
88+
module load lang/Julia/1.3.0
89+
90+
julia run-analysis.jl
91+
```
92+
93+
The parameters in the script have this meaning, in order:
94+
- the batch spawns 128 "tasks" (ie. spawning 128 separate processes)
95+
- each task uses 1 CPU (you may want more CPUs if you work with actual
96+
threads and shared memory)
97+
- the whole batch takes maximum 10 minutes
98+
- each CPU (in our case each process) will be allocated 4 gigabytes of RAM
99+
- the job will be visible in the queue as `MyDistributedJob`
100+
- it will load Julia 1.3.0 module on the workers, so that `julia` executable is
101+
available (you may want to consult the versions availability with your HPC
102+
administrators)
103+
- finally, it will run the Julia script `run-analysis.jl`
104+
105+
### Julia script
106+
107+
The `run-analysis.jl`
108+
([download](https://github.com/LCSB-BioCore/DistributedData.jl/blob/master/docs/slurm-example/run-analysis.jl))
109+
may look as follows:
110+
```julia
111+
using Distributed, ClusterManagers, DistributedData
112+
113+
# read the number of available workers from environment and start the worker processes
114+
n_workers = parse(Int , ENV["SLURM_NTASKS"])
115+
addprocs_slurm(n_workers , topology =:master_worker)
116+
117+
# load the required packages on all workers
118+
@everywhere using DistributedData
119+
120+
# generate a random dataset on all workers
121+
dataset = dtransform((), _ -> randn(10000,10000), workers(), :myData)
122+
123+
# for demonstration, sum the whole dataset
124+
totalResult = dmapreduce(dataset, sum, +)
125+
126+
# do not forget to save the results!
127+
f = open("result.txt", "w")
128+
println(f, totalResult)
129+
close(f)
130+
```
131+
132+
Finally, you can start the whole thing with `sbatch` command executed on the
133+
access node:
134+
```sh
135+
sbatch run-analysis.batch
136+
```
137+
138+
### Collecting the results
139+
140+
After your tasks gets queued, executed and finished successfully, you may see
141+
the result in `result.txt`. In the meantime, you can entertain yourself by
142+
watching `squeue`, to see e.g. the expected execution time of your batch.
143+
144+
Note that you may want to run the analysis in a separate directory, because the
145+
logs from all workers are collected in the current path by default. The
146+
resulting files may look like this:
147+
```
148+
0 [user@access1 test]$ ls
149+
job0000.out job0019.out job0038.out job0057.out job0076.out job0095.out job0114.out
150+
job0001.out job0020.out job0039.out job0058.out job0077.out job0096.out job0115.out
151+
job0002.out job0021.out job0040.out job0059.out job0078.out job0097.out job0116.out
152+
job0003.out job0022.out job0041.out job0060.out job0079.out job0098.out job0117.out
153+
job0004.out job0023.out job0042.out job0061.out job0080.out job0099.out job0118.out
154+
job0005.out job0024.out job0043.out job0062.out job0081.out job0100.out job0119.out
155+
job0006.out job0025.out job0044.out job0063.out job0082.out job0101.out job0120.out
156+
job0007.out job0026.out job0045.out job0064.out job0083.out job0102.out job0121.out
157+
job0008.out job0027.out job0046.out job0065.out job0084.out job0103.out job0122.out
158+
job0009.out job0028.out job0047.out job0066.out job0085.out job0104.out job0123.out
159+
job0010.out job0029.out job0048.out job0067.out job0086.out job0105.out job0124.out
160+
job0011.out job0030.out job0049.out job0068.out job0087.out job0106.out job0125.out
161+
job0012.out job0031.out job0050.out job0069.out job0088.out job0107.out job0126.out
162+
job0013.out job0032.out job0051.out job0070.out job0089.out job0108.out job0127.out
163+
job0014.out job0033.out job0052.out job0071.out job0090.out job0109.out result.txt <-- here is the result!
164+
job0015.out job0034.out job0053.out job0072.out job0091.out job0110.out run-analysis.jl
165+
job0016.out job0035.out job0054.out job0073.out job0092.out job0111.out run-analysis.sbatch
166+
job0017.out job0036.out job0055.out job0074.out job0093.out job0112.out slurm-2237171.out
167+
job0018.out job0037.out job0056.out job0075.out job0094.out job0113.out
168+
```
169+
170+
The files `job*.out` contain the information collected from individual
171+
workers' standard outputs, such as the output of `println` or `@info`. For
172+
complicated programs, this is the easiest way to get out the debugging
173+
information, and a simple but often sufficient way to collect benchmarking
174+
output (using e.g. `@time`).

0 commit comments

Comments
 (0)