Skip to content

Latest commit

 

History

History
364 lines (247 loc) · 12.2 KB

File metadata and controls

364 lines (247 loc) · 12.2 KB

Author: Lester Hedges
Email:   lester.hedges@bristol.ac.uk

Running nodes

The companion notebook for this section can be found here

The previous section showed you how to write a node to perform minimisation of a molecular system within an interactive Jupyter notebook. Here we introduce you to some of the other ways of running BioSimSpace nodes, showing how the same script can be used in several different ways.

Running nodes on the command-line

The typical way of interacting with BioSimSpace is by running a workflow component, or node, from the command-line. A node is just a normal Python script that is run using the python interpreter. Let's use the molecular minimisation example from the previous notebook, which we've provided as a Python script called minimisation.py within nodes directory. (This is the just the previous notebook, downloaded as a regular Python script.)

From the command-line, we can query the node to see what it does and get information about the inputs:

python nodes/minimise.py --help
usage: minimise.py [-h] [-c CONFIG] [-v [VERBOSE]] [--export-cwl [EXPORT_CWL]]
                   [--strict-file-naming [STRICT_FILE_NAMING]] --files FILES
                   [FILES ...] [--steps STEPS]
                   [--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}]

A node to perform energy minimisation and save the minimised molecular
configuration to file.

Args that start with '--' (e.g. --arg) can also be set in a config file
(specified via -c). The config file uses YAML syntax and must represent a YAML
'mapping' (for details, see http://learn.getgrav.org/advanced/yaml). If an arg
is specified in more than one place, then commandline values override config
file values which override defaults.

Output:
  minimised: FileSet    The minimised molecular system.

Required arguments:
  --files FILES [FILES ...]
                        A set of molecular input files.

Optional arguments:
  -h, --help            Show this help message and exit.
  -c CONFIG, --config CONFIG
                        Path to configuration file.
  -v [VERBOSE], --verbose [VERBOSE]
                        Print verbose error messages.
  --export-cwl [EXPORT_CWL]
                        Export Common Workflow Language (CWL) wrapper and exit.
  --strict-file-naming [STRICT_FILE_NAMING]
                        Enforce that the prefix of any file based output matches its name.
  --steps STEPS         The number of minimisation steps.
                          default=10000
                          min=0, max=1000000
  --engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}
                        The molecular dynamics engine.
                          default=auto

In the previous section, input was achieved via a graphical user interface where the user could configure options and upload files. On the command-line, inputs must be set as command-line arguments. From the information provided in the node itself, i.e. the description, the definition of inputs and outputs, BioSimSpace has autogenerated a nicely formatted argparse help message that describes how the node works. The information shows all of the inputs and outputs, let's us know which inputs are optional, and specifies any default values or constraints.

Note that it's possible to pass options to the node in various ways, e.g. directly on the command-line, using a YAML configuration file, or even using environment variables. This provides a lot of flexibility in the way in which BioSimSpace nodes can be run. For now we'll just pass arguments on the command-line.

Try running the node without any arguments and seeing what the output is:

python nodes/minimise.py
usage: minimise.py [-h] [-c CONFIG] [-v [VERBOSE]] [--export-cwl [EXPORT_CWL]]
                   [--strict-file-naming [STRICT_FILE_NAMING]] --files FILES
                   [FILES ...] [--steps STEPS]
                   [--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}]
minimise.py: error: the following arguments are required: --files

Thankfully we've provided some files for you. As before, these are found in the input directory.

ls inputs/ala*
inputs/ala.crd	inputs/ala.top

(The files define a solvated alanine dipeptide system in AMBER format.)

Let's now run the minimisation node using these files as input. In the interests of time, let's also reduce the number of steps to 1000. The files can be passed to the script in various ways. All of the following are allowed:

python nodes/minimisation.py --steps=1000 --files="inputs/ala.crd, inputs/ala.top"
python nodes/minimisation.py --steps=1000 --files inputs/ala.crd inputs/ala.top
python nodes/minimisation.py --steps=1000 --files inputs/ala.*
python nodes/minimise.py --steps=1000 --files inputs/ala.*

We should find that the minimised molecular system has been written to the working directory.

ls minimised.*
minimised.prm7	minimised.rst7

Note that the files have been written in the same format as the original molecular system, i.e. AMBER.

We have also provided some GROMACS format input files.

ls inputs/kigaki.*
inputs/kigaki.gro  inputs/kigaki.top

Let's now run the node using these files as input. This is a larger system so the minimisation will take a little longer.

python nodes/minimise.py --steps=1000 --files inputs/kigaki.*

There should now be two additional GROMACS format output files in the working directory.

ls minimised.*
minimised.gro  minimised.prm7  minimised.rst7  minimised.top

Running nodes from within BioSimSpace

BioSimSpace also provides functionality for running nodes internally. This allows you to call a node from within a script, thereby using existing nodes as building blocks for more complicated workflows. To activate nodes you can point BioSimSpace to a directory in which they are contained. As such, you can maintain your own internal nodes and have them available users when needed.

For example.

import BioSimSpace as BSS

BSS.Node.setNodeDirectory("nodes")
BSS.Node.list()
['equilibrate', 'minimise', 'parameterise', 'solvate']

To get information about a particular node we can pass its name to the help function:

BSS.Node.help("minimise")
usage: minimise.py [-h] [-c CONFIG] [-v [VERBOSE]] [--export-cwl [EXPORT_CWL]]
                   [--strict-file-naming [STRICT_FILE_NAMING]] --files FILES
                   [FILES ...] [--steps STEPS]
                   [--engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}]

A node to perform energy minimisation and save the minimised molecular
configuration to file.

Args that start with '--' (e.g. --arg) can also be set in a config file
(specified via -c). The config file uses YAML syntax and must represent a YAML
'mapping' (for details, see http://learn.getgrav.org/advanced/yaml). If an arg
is specified in more than one place, then commandline values override config
file values which override defaults.

Output:
  minimised: FileSet    The minimised molecular system.

Required arguments:
  --files FILES [FILES ...]
                        A set of molecular input files.

Optional arguments:
  -h, --help            Show this help message and exit.
  -c CONFIG, --config CONFIG
                        Path to configuration file.
  -v [VERBOSE], --verbose [VERBOSE]
                        Print verbose error messages.
  --export-cwl [EXPORT_CWL]
                        Export Common Workflow Language (CWL) wrapper and exit.
  --strict-file-naming [STRICT_FILE_NAMING]
                        Enforce that the prefix of any file based output matches its name.
  --steps STEPS         The number of minimisation steps.
                          default=10000
                          min=0, max=1000000
  --engine {Amber,Gromacs,Namd,OpenMM,Somd,auto}
                        The molecular dynamics engine.
                          default=auto

To execute a node we use the run function. This takes a dictionary of input values and returns another dictionary containing the outputs. Let's generate a valid input dictionary:

input = {"files" : ["inputs/ala.crd", "inputs/ala.top"],
         "steps" : 1000
        }

We can now run the minimise node, passing the dictionary from above:

output = BSS.Node.run("minimise", input)

Finally, let's print the output dictionary to see the result of running the node:

print(output)
{'minimised': ['/home/lester/Code/BioSimSpaceTutorials/01_introduction/minimised.prm7', '/home/lester/Code/BioSimSpaceTutorials/01_introduction/minimised.rst7']}

BioSimSpace nodes can also autogenerate their own Common Workflow Language (CWL) tool wrappers, allowing them to be plugged into any workflow engine that supports the standard. To generate a wrapper, simply pass the --export-cwl argument when running the node, e.g.:

python nodes/equilibrate.py --export-cwl

Let's examine the wrapper:

cat nodes/equilibrate.cwl
cwlVersion: v1.0
class: CommandLineTool
baseCommand: ["/home/lester/.conda/envs/biosimspace-dev/bin/python", "/home/lester/Code/BioSimSpaceTutorials/01_introduction/nodes/equilibrate.py", "--strict-file-naming"]

inputs:
  files:
    type:
      - type: array
        items: File
    inputBinding:
      prefix: --files
      separate: true

  runtime:
    type: string?
    default: 0.02 nanosecond
    inputBinding:
      prefix: --runtime
      separate: true

  temperature_start:
    type: string?
    default: 0.0 kelvin
    inputBinding:
      prefix: --temperature_start
      separate: true

  temperature_end:
    type: string?
    default: 300.0 kelvin
    inputBinding:
      prefix: --temperature_end
      separate: true

  restraint:
    type: string?
    default: none
    inputBinding:
      prefix: --restraint
      separate: true

  report_interval:
    type: int?
    default: 100
    inputBinding:
      prefix: --report_interval
      separate: true

  restart_interval:
    type: int?
    default: 500
    inputBinding:
      prefix: --restart_interval
      separate: true

  engine:
    type: string?
    default: auto
    inputBinding:
      prefix: --engine
      separate: true

outputs:
  equilibrated:
    type:
      type: array
      items: File
    outputBinding:
      glob: "equilibrated.*"
  trajectory:
    type: File
    outputBinding:
      glob: "trajectory.*"

As a simple example of chaining BioSimSpace nodes in a command-line workflow, consider the following script:

#!/usr/bin/env bash
# scripts/workflow.sh

# Exit immediately on error.
set -e

echo "Parameterising..."
python nodes/parameterise.py --pdb inputs/methanol.pdb --forcefield gaff

echo "Solvating..."
python nodes/solvate.py --files parameterised.* --water_model tip3p

echo "Minimising..."
python nodes/minimise.py --files solvated.* --steps 1000

echo "Equilibrating..."
python nodes/equilibrate.py --files minimised.* --restraint heavy

echo "Done!"

Starting from a PDB topology, this script calls each of the nodes in sequence, passing the output of one as the input to the next. The output of the final node is a set of files representing the equlibrated molecular system, as well as a trajectory and PDB file that can be visualised with, e.g. the Visual Molecular Dynamics (VMD) program.

Let's run the workflow:

bash scripts/workflow.sh
Parameterising...
Solvating...
Minimising...
Equilibrating...
Done!