|
| 1 | + |
| 2 | +# DistributedData.jl on Slurm clusters |
| 3 | + |
| 4 | +DistributedData scales well to moderately large computer clusters. As a |
| 5 | +practical example, you can see the script that was used to process relatively |
| 6 | +large datasets for GigaSOM, documented in the Supplementary file of the |
| 7 | +[GigaSOM |
| 8 | +article](https://academic.oup.com/gigascience/article/9/11/giaa127/5987271) |
| 9 | +(click the `giaa127_Supplemental_File` link below on the page, and find |
| 10 | +*Listing S1* at the end of the PDF). |
| 11 | + |
| 12 | +Use of DistributedData with [Slurm](https://slurm.schedmd.com/overview.html) is |
| 13 | +similar as with many other distributed computing systems: |
| 14 | + |
| 15 | +1. You install Julia and the required packages. |
| 16 | +2. You submit a batch (or interactive) task, which runs your Julia script on a |
| 17 | + node and gives it some information about where to find other worker nodes. |
| 18 | +3. In the Julia script, you use |
| 19 | + [`ClusterManagers`](https://github.com/JuliaParallel/ClusterManagers.jl) |
| 20 | + function `addprocs_slurm` to add the processes, just as with normal |
| 21 | + `addprocs`. Similar functions exist for many other task schedulers, |
| 22 | + including the popular PBS and LSF. |
| 23 | +4. The rest of the workflow is unchanged; all functions from `DistributedData` |
| 24 | + such as `save_at` and `dmapreduce` will work in the cluster just as they |
| 25 | + worked locally. Performance will vary though -- you may want to optimize |
| 26 | + your algorithm to use as much parallelism as possible (to get lots of |
| 27 | + performance), load more data in the memory (usually, much more total memory |
| 28 | + is available in the clusters than on a single computer), but keep an eye on |
| 29 | + the communication overhead, transferring only the minimal required amount of |
| 30 | + data as seldom as possible. |
| 31 | + |
| 32 | +### Preparing the packages |
| 33 | + |
| 34 | +The easiest way to install the packages is using a single-machine interactive |
| 35 | +job. On the access node of your HPC, run this command to give you a 60-minute |
| 36 | +interactive session: |
| 37 | +```sh |
| 38 | +srun --pty -t60 /bin/bash - |
| 39 | +``` |
| 40 | + |
| 41 | +(Depending on your cluster setup, it may be benerifical to also specify a |
| 42 | +partition to which the job should belong to; many clusters provide an |
| 43 | +interactive partition where the interactive jobs get scheduled faster. To do |
| 44 | +that, add option `-p interactive` into the parameters of `srun`.) |
| 45 | + |
| 46 | +When the shell opens (the prompt should change), you can load the Julia module, |
| 47 | +usually with a command such as this: |
| 48 | + |
| 49 | +```sh |
| 50 | +module load lang/Julia/1.3.0 |
| 51 | +``` |
| 52 | + |
| 53 | +(You may consult `module spider julia` for other possible Julia versions.) |
| 54 | + |
| 55 | +After that, start `julia` and add press `]` to open the packaging prompt (you |
| 56 | +should see `(v1.3) pkg>` instead of `julia>`). There you can download and |
| 57 | +install the required packages: |
| 58 | +``` |
| 59 | +add DistributedData, ClusterManagers |
| 60 | +``` |
| 61 | + |
| 62 | +You may also want to load the packages to precompile them, which saves precious |
| 63 | +time later in the workflows. Press backspace to return to the "normal" Julia |
| 64 | +shell, and type: |
| 65 | +```julia |
| 66 | +using DistributedData, ClusterManagers |
| 67 | +``` |
| 68 | + |
| 69 | +Depending on the package, this may take a while, but should be done in under a |
| 70 | +minute for most existing packages. Finally, press `Ctrl+D` twice to exit both |
| 71 | +Julia and the interactive Slurm job shell. |
| 72 | + |
| 73 | +### Slurm batch script |
| 74 | + |
| 75 | +An example Slurm batch script |
| 76 | +([download](https://github.com/LCSB-BioCore/DistributedData.jl/blob/master/docs/slurm-example/run-analysis.batch)) |
| 77 | +is listed below -- save it as `run-analysis.batch` to your Slurm access node, |
| 78 | +in a directory that is shared with the workers (usually a "scratch" directory; |
| 79 | +try `cd $SCRATCH`). |
| 80 | +```sh |
| 81 | +#!/bin/bash -l |
| 82 | +#SBATCH -n 128 |
| 83 | +#SBATCH -c 1 |
| 84 | +#SBATCH -t 10 |
| 85 | +#SBATCH --mem-per-cpu 4G |
| 86 | +#SBATCH -J MyDistributedJob |
| 87 | + |
| 88 | +module load lang/Julia/1.3.0 |
| 89 | + |
| 90 | +julia run-analysis.jl |
| 91 | +``` |
| 92 | + |
| 93 | +The parameters in the script have this meaning, in order: |
| 94 | +- the batch spawns 128 "tasks" (ie. spawning 128 separate processes) |
| 95 | +- each task uses 1 CPU (you may want more CPUs if you work with actual |
| 96 | + threads and shared memory) |
| 97 | +- the whole batch takes maximum 10 minutes |
| 98 | +- each CPU (in our case each process) will be allocated 4 gigabytes of RAM |
| 99 | +- the job will be visible in the queue as `MyDistributedJob` |
| 100 | +- it will load Julia 1.3.0 module on the workers, so that `julia` executable is |
| 101 | + available (you may want to consult the versions availability with your HPC |
| 102 | + administrators) |
| 103 | +- finally, it will run the Julia script `run-analysis.jl` |
| 104 | + |
| 105 | +### Julia script |
| 106 | + |
| 107 | +The `run-analysis.jl` |
| 108 | +([download](https://github.com/LCSB-BioCore/DistributedData.jl/blob/master/docs/slurm-example/run-analysis.jl)) |
| 109 | +may look as follows: |
| 110 | +```julia |
| 111 | +using Distributed, ClusterManagers, DistributedData |
| 112 | + |
| 113 | +# read the number of available workers from environment and start the worker processes |
| 114 | +n_workers = parse(Int , ENV["SLURM_NTASKS"]) |
| 115 | +addprocs_slurm(n_workers , topology =:master_worker) |
| 116 | + |
| 117 | +# load the required packages on all workers |
| 118 | +@everywhere using DistributedData |
| 119 | + |
| 120 | +# generate a random dataset on all workers |
| 121 | +dataset = dtransform((), _ -> randn(10000,10000), workers(), :myData) |
| 122 | + |
| 123 | +# for demonstration, sum the whole dataset |
| 124 | +totalResult = dmapreduce(dataset, sum, +) |
| 125 | + |
| 126 | +# do not forget to save the results! |
| 127 | +f = open("result.txt", "w") |
| 128 | +println(f, totalResult) |
| 129 | +close(f) |
| 130 | +``` |
| 131 | + |
| 132 | +Finally, you can start the whole thing with `sbatch` command executed on the |
| 133 | +access node: |
| 134 | +```sh |
| 135 | +sbatch run-analysis.batch |
| 136 | +``` |
| 137 | + |
| 138 | +### Collecting the results |
| 139 | + |
| 140 | +After your tasks gets queued, executed and finished successfully, you may see |
| 141 | +the result in `result.txt`. In the meantime, you can entertain yourself by |
| 142 | +watching `squeue`, to see e.g. the expected execution time of your batch. |
| 143 | + |
| 144 | +Note that you may want to run the analysis in a separate directory, because the |
| 145 | +logs from all workers are collected in the current path by default. The |
| 146 | +resulting files may look like this: |
| 147 | +``` |
| 148 | +0 [user@access1 test]$ ls |
| 149 | +job0000.out job0019.out job0038.out job0057.out job0076.out job0095.out job0114.out |
| 150 | +job0001.out job0020.out job0039.out job0058.out job0077.out job0096.out job0115.out |
| 151 | +job0002.out job0021.out job0040.out job0059.out job0078.out job0097.out job0116.out |
| 152 | +job0003.out job0022.out job0041.out job0060.out job0079.out job0098.out job0117.out |
| 153 | +job0004.out job0023.out job0042.out job0061.out job0080.out job0099.out job0118.out |
| 154 | +job0005.out job0024.out job0043.out job0062.out job0081.out job0100.out job0119.out |
| 155 | +job0006.out job0025.out job0044.out job0063.out job0082.out job0101.out job0120.out |
| 156 | +job0007.out job0026.out job0045.out job0064.out job0083.out job0102.out job0121.out |
| 157 | +job0008.out job0027.out job0046.out job0065.out job0084.out job0103.out job0122.out |
| 158 | +job0009.out job0028.out job0047.out job0066.out job0085.out job0104.out job0123.out |
| 159 | +job0010.out job0029.out job0048.out job0067.out job0086.out job0105.out job0124.out |
| 160 | +job0011.out job0030.out job0049.out job0068.out job0087.out job0106.out job0125.out |
| 161 | +job0012.out job0031.out job0050.out job0069.out job0088.out job0107.out job0126.out |
| 162 | +job0013.out job0032.out job0051.out job0070.out job0089.out job0108.out job0127.out |
| 163 | +job0014.out job0033.out job0052.out job0071.out job0090.out job0109.out result.txt <-- here is the result! |
| 164 | +job0015.out job0034.out job0053.out job0072.out job0091.out job0110.out run-analysis.jl |
| 165 | +job0016.out job0035.out job0054.out job0073.out job0092.out job0111.out run-analysis.sbatch |
| 166 | +job0017.out job0036.out job0055.out job0074.out job0093.out job0112.out slurm-2237171.out |
| 167 | +job0018.out job0037.out job0056.out job0075.out job0094.out job0113.out |
| 168 | +``` |
| 169 | + |
| 170 | +The files `job*.out` contain the information collected from individual |
| 171 | +workers' standard outputs, such as the output of `println` or `@info`. For |
| 172 | +complicated programs, this is the easiest way to get out the debugging |
| 173 | +information, and a simple but often sufficient way to collect benchmarking |
| 174 | +output (using e.g. `@time`). |
0 commit comments