Author: Lester Hedges
Email: lester.hedges@bristol.ac.uk
The companion notebook for this section can be found here
The previous section showed you how to write a node to perform minimisation of a molecular system within an interactive Jupyter notebook. Here we introduce you to some of the other ways of running BioSimSpace nodes, showing how the same script can be used in several different ways.
The typical way of interacting with BioSimSpace is by running a workflow component, or node, from the command-line. A node is just a normal Python script that is run using the python interpreter. Let's use the molecular minimisation example from the previous notebook, which we've provided as a Python script called minimisation.py within nodes directory. (This is the just the previous notebook, downloaded as a regular Python script.)
From the command-line, we can query the node to see what it does and get information about the inputs:
python nodes/minimise.py --helpusage: minimise.py [-h] [-c CONFIG] [-v [VERBOSE]] [--export-cwl [EXPORT_CWL]]
[--strict-file-naming [STRICT_FILE_NAMING]] --files FILES
[FILES ...] [--steps STEPS]
[--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}]
A node to perform energy minimisation and save the minimised molecular
configuration to file.
Args that start with '--' (e.g. --arg) can also be set in a config file
(specified via -c). The config file uses YAML syntax and must represent a YAML
'mapping' (for details, see http://learn.getgrav.org/advanced/yaml). If an arg
is specified in more than one place, then commandline values override config
file values which override defaults.
Output:
minimised: FileSet The minimised molecular system.
Required arguments:
--files FILES [FILES ...]
A set of molecular input files.
Optional arguments:
-h, --help Show this help message and exit.
-c CONFIG, --config CONFIG
Path to configuration file.
-v [VERBOSE], --verbose [VERBOSE]
Print verbose error messages.
--export-cwl [EXPORT_CWL]
Export Common Workflow Language (CWL) wrapper and exit.
--strict-file-naming [STRICT_FILE_NAMING]
Enforce that the prefix of any file based output matches its name.
--steps STEPS The number of minimisation steps.
default=10000
min=0, max=1000000
--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}
The molecular dynamics engine.
default=auto
In the previous section, input was achieved via a graphical user interface where the user could configure options and upload files. On the command-line, inputs must be set as command-line arguments. From the information provided in the node itself, i.e. the description, the definition of inputs and outputs, BioSimSpace has autogenerated a nicely formatted argparse help message that describes how the node works. The information shows all of the inputs and outputs, let's us know which inputs are optional, and specifies any default values or constraints.
Note that it's possible to pass options to the node in various ways, e.g. directly on the command-line, using a YAML configuration file, or even using environment variables. This provides a lot of flexibility in the way in which BioSimSpace nodes can be run. For now we'll just pass arguments on the command-line.
Try running the node without any arguments and seeing what the output is:
python nodes/minimise.pyusage: minimise.py [-h] [-c CONFIG] [-v [VERBOSE]] [--export-cwl [EXPORT_CWL]]
[--strict-file-naming [STRICT_FILE_NAMING]] --files FILES
[FILES ...] [--steps STEPS]
[--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}]
minimise.py: error: the following arguments are required: --files
Thankfully we've provided some files for you. As before, these are found in the input directory.
ls inputs/ala*inputs/ala.crd inputs/ala.top
(The files define a solvated alanine dipeptide system in AMBER format.)
Let's now run the minimisation node using these files as input. In the interests of time, let's also reduce the number of steps to 1000. The files can be passed to the script in various ways. All of the following are allowed:
python nodes/minimisation.py --steps=1000 --files="inputs/ala.crd, inputs/ala.top"
python nodes/minimisation.py --steps=1000 --files inputs/ala.crd inputs/ala.top
python nodes/minimisation.py --steps=1000 --files inputs/ala.*python nodes/minimise.py --steps=1000 --files inputs/ala.*We should find that the minimised molecular system has been written to the working directory.
ls minimised.*minimised.prm7 minimised.rst7
Note that the files have been written in the same format as the original molecular system, i.e. AMBER.
We have also provided some GROMACS format input files.
ls inputs/kigaki.*inputs/kigaki.gro inputs/kigaki.top
Let's now run the node using these files as input. This is a larger system so the minimisation will take a little longer.
python nodes/minimise.py --steps=1000 --files inputs/kigaki.*There should now be two additional GROMACS format output files in the working directory.
ls minimised.*minimised.gro minimised.prm7 minimised.rst7 minimised.top
BioSimSpace also provides functionality for running nodes internally. This allows you to call a node from within a script, thereby using existing nodes as building blocks for more complicated workflows. To activate nodes you can point BioSimSpace to a directory in which they are contained. As such, you can maintain your own internal nodes and have them available users when needed.
For example.
import BioSimSpace as BSS
BSS.Node.setNodeDirectory("nodes")
BSS.Node.list()['equilibrate', 'minimise', 'parameterise', 'solvate']
To get information about a particular node we can pass its name to the help function:
BSS.Node.help("minimise")usage: minimise.py [-h] [-c CONFIG] [-v [VERBOSE]] [--export-cwl [EXPORT_CWL]]
[--strict-file-naming [STRICT_FILE_NAMING]] --files FILES
[FILES ...] [--steps STEPS]
[--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}]
A node to perform energy minimisation and save the minimised molecular
configuration to file.
Args that start with '--' (e.g. --arg) can also be set in a config file
(specified via -c). The config file uses YAML syntax and must represent a YAML
'mapping' (for details, see http://learn.getgrav.org/advanced/yaml). If an arg
is specified in more than one place, then commandline values override config
file values which override defaults.
Output:
minimised: FileSet The minimised molecular system.
Required arguments:
--files FILES [FILES ...]
A set of molecular input files.
Optional arguments:
-h, --help Show this help message and exit.
-c CONFIG, --config CONFIG
Path to configuration file.
-v [VERBOSE], --verbose [VERBOSE]
Print verbose error messages.
--export-cwl [EXPORT_CWL]
Export Common Workflow Language (CWL) wrapper and exit.
--strict-file-naming [STRICT_FILE_NAMING]
Enforce that the prefix of any file based output matches its name.
--steps STEPS The number of minimisation steps.
default=10000
min=0, max=1000000
--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}
The molecular dynamics engine.
default=auto
To execute a node we use the run function. This takes a dictionary of input values and returns another dictionary containing the outputs. Let's generate a valid input dictionary:
input = {"files" : ["inputs/ala.crd", "inputs/ala.top"],
"steps" : 1000
}We can now run the minimise node, passing the dictionary from above:
output = BSS.Node.run("minimise", input)Finally, let's print the output dictionary to see the result of running the node:
print(output){'minimised': ['/home/lester/Code/BioSimSpaceTutorials/01_introduction/minimised.prm7', '/home/lester/Code/BioSimSpaceTutorials/01_introduction/minimised.rst7']}
BioSimSpace nodes can also autogenerate their own Common Workflow Language (CWL) tool wrappers, allowing them to be plugged into any workflow engine that supports the standard. To generate a wrapper, simply pass the --export-cwl argument when running the node, e.g.:
python nodes/equilibrate.py --export-cwlLet's examine the wrapper:
cat nodes/equilibrate.cwlcwlVersion: v1.0
class: CommandLineTool
baseCommand: ["/home/lester/.conda/envs/biosimspace-dev/bin/python", "/home/lester/Code/BioSimSpaceTutorials/01_introduction/nodes/equilibrate.py", "--strict-file-naming"]
inputs:
files:
type:
- type: array
items: File
inputBinding:
prefix: --files
separate: true
runtime:
type: string?
default: 0.02 nanosecond
inputBinding:
prefix: --runtime
separate: true
temperature_start:
type: string?
default: 0.0 kelvin
inputBinding:
prefix: --temperature_start
separate: true
temperature_end:
type: string?
default: 300.0 kelvin
inputBinding:
prefix: --temperature_end
separate: true
restraint:
type: string?
default: none
inputBinding:
prefix: --restraint
separate: true
report_interval:
type: int?
default: 100
inputBinding:
prefix: --report_interval
separate: true
restart_interval:
type: int?
default: 500
inputBinding:
prefix: --restart_interval
separate: true
engine:
type: string?
default: auto
inputBinding:
prefix: --engine
separate: true
outputs:
equilibrated:
type:
type: array
items: File
outputBinding:
glob: "equilibrated.*"
trajectory:
type: File
outputBinding:
glob: "trajectory.*"
As a simple example of chaining BioSimSpace nodes in a command-line workflow, consider the following script:
#!/usr/bin/env bash
# scripts/workflow.sh
# Exit immediately on error.
set -e
echo "Parameterising..."
python nodes/parameterise.py --pdb inputs/methanol.pdb --forcefield gaff
echo "Solvating..."
python nodes/solvate.py --files parameterised.* --water_model tip3p
echo "Minimising..."
python nodes/minimise.py --files solvated.* --steps 1000
echo "Equilibrating..."
python nodes/equilibrate.py --files minimised.* --restraint heavy
echo "Done!"Starting from a PDB topology, this script calls each of the nodes in sequence, passing the output of one as the input to the next. The output of the final node is a set of files representing the equlibrated molecular system, as well as a trajectory and PDB file that can be visualised with, e.g. the Visual Molecular Dynamics (VMD) program.
Let's run the workflow:
bash scripts/workflow.shParameterising...
Solvating...
Minimising...
Equilibrating...
Done!