Skip to content

Commit bf372a3

Browse files
committed
docs(parallel): document Dask and SLURM arguments
1 parent b3ef78f commit bf372a3

1 file changed

Lines changed: 274 additions & 10 deletions

File tree

docs/getting_started.rst

Lines changed: 274 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -186,9 +186,10 @@ The ``top_traj_file`` argument is required; other arguments have default values.
186186
- Enable verbose output.
187187
- ``False``
188188
- ``bool``
189-
* - ``--outfile``
190-
- Name of the JSON output file to write results to (filename only). Defaults to ``outfile.json``.
191-
- ``outfile.json``
189+
* - ``--output_file``
190+
- Name of the JSON output file to write results to (filename only). Defaults to
191+
``output_file.json``.
192+
- ``output_file.json``
192193
- ``str``
193194
* - ``--force_partitioning``
194195
- Factor for partitioning forces when there are weak correlations.
@@ -202,22 +203,285 @@ The ``top_traj_file`` argument is required; other arguments have default values.
202203
- How to group molecules for averaging.
203204
- ``molecules``
204205
- ``str``
205-
* - ``--kcal_force_units``
206-
- Set input units as kcal/mol
207-
- ``False``
208-
- ``bool``
209206
* - ``--combined_forcetorque``
210-
- Use the combined force-torque covariance matrix for the highest level to match the 2019 paper
207+
- Use the combined force-torque covariance matrix for the highest level to match the
208+
2019 paper.
211209
- ``True``
212210
- ``bool``
213211
* - ``--customised_axes``
214-
- Use custom bonded axes to get COM, MOI and PA that match the 2019 paper
212+
- Use custom bonded axes to get COM, MOI and PA that match the 2019 paper.
215213
- ``True``
216214
- ``bool``
217215
* - ``--search_type``
218-
- Method for finding neighbouring molecules
216+
- Method for finding neighbouring molecules.
219217
- ``RAD``
220218
- ``str``
219+
* - ``--parallel_frames``
220+
- Execute frame-local covariance calculations in parallel. When enabled, frame-level
221+
work is submitted to Dask and reduced in the parent process.
222+
- ``False``
223+
- ``bool``
224+
* - ``--use_dask``
225+
- Enable local Dask frame parallelism. This is useful for running frame-level work
226+
across local worker processes.
227+
- ``False``
228+
- ``bool``
229+
* - ``--dask_workers``
230+
- Number of local Dask worker processes to use for parallel frame execution. If unset,
231+
Dask chooses a default.
232+
- ``None``
233+
- ``int``
234+
* - ``--dask_threads_per_worker``
235+
- Number of threads per local Dask worker. ``1`` is recommended for trajectory safety
236+
with MDAnalysis.
237+
- ``1``
238+
- ``int``
239+
* - ``--hpc``
240+
- Use a SLURM-backed Dask cluster for parallel frame execution.
241+
- ``False``
242+
- ``bool``
243+
* - ``--submit``
244+
- Submit a master SLURM job and exit instead of running immediately in the current
245+
process. This is intended for HPC batch submission.
246+
- ``False``
247+
- ``bool``
248+
* - ``--hpc_queue``
249+
- SLURM partition or queue to use for Dask worker jobs.
250+
- ``None``
251+
- ``str``
252+
* - ``--hpc_nodes``
253+
- Number of SLURM Dask worker jobs to launch.
254+
- ``1``
255+
- ``int``
256+
* - ``--hpc_cores``
257+
- Number of CPU cores requested per Dask worker job.
258+
- ``1``
259+
- ``int``
260+
* - ``--hpc_processes``
261+
- Number of Dask worker processes per SLURM job.
262+
- ``1``
263+
- ``int``
264+
* - ``--hpc_memory``
265+
- Memory requested per Dask worker job, for example ``4GB`` or ``16GB``.
266+
- ``4GB``
267+
- ``str``
268+
* - ``--hpc_walltime``
269+
- Walltime requested for each Dask worker job, formatted as ``HH:MM:SS``.
270+
- ``01:00:00``
271+
- ``str``
272+
* - ``--hpc_account``
273+
- Optional SLURM account or project code.
274+
- ``None``
275+
- ``str``
276+
* - ``--hpc_qos``
277+
- Optional SLURM QoS value.
278+
- ``None``
279+
- ``str``
280+
* - ``--hpc_constraint``
281+
- Optional SLURM node constraint.
282+
- ``None``
283+
- ``str``
284+
* - ``--conda_path``
285+
- Path to the conda executable used in the SLURM worker prologue.
286+
- ``conda``
287+
- ``str``
288+
* - ``--conda_exec``
289+
- Conda-compatible executable to use for environment activation, usually ``conda`` or
290+
``mamba``.
291+
- ``conda``
292+
- ``str``
293+
* - ``--conda_env``
294+
- Conda environment name to activate on SLURM workers.
295+
- ``None``
296+
- ``str``
297+
298+
Parallel Frame Execution
299+
------------------------
300+
301+
CodeEntropy can optionally process trajectory frames in parallel using Dask. This is
302+
most useful for larger trajectories where the frame-local covariance calculations are
303+
one of the slowest parts of the workflow.
304+
305+
The parallel implementation works as a map/reduce workflow:
306+
307+
* each Dask worker processes one frame at a time;
308+
* each worker returns a frame-local covariance result;
309+
* the parent process reduces those frame-local results into the final running
310+
covariance averages;
311+
* the entropy graph runs after frame reduction has completed.
312+
313+
This means workers do not directly modify the shared covariance accumulators. The
314+
parent process remains responsible for reduction, which keeps the parallel execution
315+
consistent with the sequential workflow.
316+
317+
Local Dask Execution
318+
^^^^^^^^^^^^^^^^^^^^
319+
320+
For local workstation or laptop use, enable ``parallel_frames`` and ``use_dask`` in
321+
``config.yaml``:
322+
323+
.. code-block:: yaml
324+
325+
---
326+
327+
run1:
328+
top_traj_file: ["md_A4_dna.tpr", "md_A4_dna_xf.trr"]
329+
selection_string: "all"
330+
start: 0
331+
end: 100
332+
step: 1
333+
334+
parallel_frames: true
335+
use_dask: true
336+
dask_workers: 4
337+
dask_threads_per_worker: 1
338+
339+
The recommended value for ``dask_threads_per_worker`` is ``1``. This keeps each worker
340+
process independent and avoids thread-safety issues when reading trajectory data.
341+
342+
The same run can also be started from the command line:
343+
344+
.. code-block:: bash
345+
346+
CodeEntropy \
347+
--parallel_frames true \
348+
--use_dask true \
349+
--dask_workers 4 \
350+
--dask_threads_per_worker 1
351+
352+
For very small systems or short trajectories, local Dask may not be faster than the
353+
sequential path because there is overhead in starting workers and transferring frame
354+
data. It is best suited to larger calculations with many frames.
355+
356+
SLURM / HPC Dask Execution
357+
^^^^^^^^^^^^^^^^^^^^^^^^^^
358+
359+
On a SLURM-based HPC system, CodeEntropy can create a Dask cluster using SLURM worker
360+
jobs. This is enabled with ``hpc: true``.
361+
362+
Example ``config.yaml``:
363+
364+
.. code-block:: yaml
365+
366+
---
367+
368+
run1:
369+
top_traj_file: ["1AKI_prod_new.tpr", "1AKI_prod_new.trr"]
370+
selection_string: "all"
371+
start: 0
372+
end: 500
373+
step: 1
374+
375+
parallel_frames: true
376+
hpc: true
377+
378+
hpc_queue: standard
379+
hpc_nodes: 4
380+
hpc_cores: 8
381+
hpc_processes: 1
382+
hpc_memory: 16GB
383+
hpc_walltime: "02:00:00"
384+
385+
hpc_account: null
386+
hpc_qos: null
387+
hpc_constraint: null
388+
389+
conda_path: conda
390+
conda_exec: conda
391+
conda_env: codeentropy
392+
393+
The important HPC options are:
394+
395+
* ``hpc_queue``: SLURM partition or queue.
396+
* ``hpc_nodes``: number of Dask worker jobs to launch.
397+
* ``hpc_cores``: number of CPU cores requested per Dask worker job.
398+
* ``hpc_processes``: number of Dask worker processes per SLURM job.
399+
* ``hpc_memory``: memory requested per Dask worker job.
400+
* ``hpc_walltime``: walltime requested for each worker job.
401+
* ``conda_env``: environment to activate on the worker jobs.
402+
403+
Submitting a Master SLURM Job
404+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
405+
406+
If you want CodeEntropy to submit a master SLURM job and then exit, set
407+
``submit: true`` as well as ``hpc: true``:
408+
409+
.. code-block:: yaml
410+
411+
---
412+
413+
run1:
414+
top_traj_file: ["1AKI_prod.tpr", "1AKI_prod.trr"]
415+
selection_string: "all"
416+
start: 0
417+
end: 500
418+
step: 1
419+
420+
submit: true
421+
parallel_frames: true
422+
hpc: true
423+
424+
hpc_queue: standard
425+
hpc_nodes: 4
426+
hpc_cores: 8
427+
hpc_processes: 1
428+
hpc_memory: 16GB
429+
hpc_walltime: "02:00:00"
430+
431+
hpc_account: null
432+
hpc_qos: null
433+
hpc_constraint: null
434+
435+
conda_path: conda
436+
conda_exec: conda
437+
conda_env: codeentropy
438+
439+
Run CodeEntropy from the working directory containing ``config.yaml``:
440+
441+
.. code-block:: bash
442+
443+
CodeEntropy
444+
445+
In submit mode, CodeEntropy writes and submits a master SLURM script, then exits from
446+
the current process. The submitted master job starts CodeEntropy again on the cluster,
447+
where the SLURM-backed Dask workers are then launched.
448+
449+
Choosing a Parallel Mode
450+
^^^^^^^^^^^^^^^^^^^^^^^^
451+
452+
Use sequential execution for small tests and debugging:
453+
454+
.. code-block:: yaml
455+
456+
parallel_frames: false
457+
use_dask: false
458+
hpc: false
459+
submit: false
460+
461+
Use local Dask when running on a workstation:
462+
463+
.. code-block:: yaml
464+
465+
parallel_frames: true
466+
use_dask: true
467+
dask_workers: 4
468+
dask_threads_per_worker: 1
469+
470+
Use HPC Dask when running inside an allocated HPC session or batch job:
471+
472+
.. code-block:: yaml
473+
474+
parallel_frames: true
475+
hpc: true
476+
477+
Use submit mode when you want CodeEntropy to create and submit the master SLURM job
478+
for you:
479+
480+
.. code-block:: yaml
481+
482+
submit: true
483+
parallel_frames: true
484+
hpc: true
221485
222486
Averaging
223487
---------

0 commit comments

Comments
 (0)