Skip to content

Commit 5836c87

Browse files
committed
Tidying up and expanding docs
1 parent 1f55a2e commit 5836c87

4 files changed

Lines changed: 243 additions & 397 deletions

File tree

README.md

Lines changed: 93 additions & 48 deletions
Original file line numberDiff line numberDiff line change
@@ -5,76 +5,118 @@
55
A utility to automatically prepare structures from the PDB for molecular dynamics simulation and perform minimisations and simple MD simulations.
66

77
## Features
8-
* [X] Automatically download structures, sequences and metadata from the PDB, PDB-REDO, EMDB and UNIPROT
9-
* [X] Automatically fill missing loops with modeller
10-
* [X] Automatically add missing atoms and fix non-standard residues with pdbfixer
11-
* [X] Automatically resolve steric clashes and minimise structures
12-
* [X] Automatically trim together structures to be the same length
13-
* [X] Run simple MD simulations for testing, validation and minimisation
14-
* [X] Create 'morph' trajectories with metadynamics
15-
* [X] Automatically extract and fix hetatms\ligands
16-
* [X] Output PQR files
17-
* [ ] Automatically propagate metadata through to finalised structure files
18-
* [ ] AIIDA integration
8+
* Automatically download structures, sequences and metadata from the PDB, PDB-REDO, EMDB and UNIPROT
9+
* Automatically fill missing loops with MODELLER
10+
* Automatically add missing atoms and fix non-standard residues with pdbfixer
11+
* Automatically resolve steric clashes and minimise structures
12+
* Automatically align and trim together structures to be the same length
13+
* Automatically extract and prepare hetatms\ligands for simulation
14+
* Easily run simple MD simulations for testing, validation and minimisation
15+
* Create 'morph' trajectories with metadynamics
16+
* Coming soon: integration with other MD\EM workflows!
1917

2018
## Installation
19+
20+
### Install via conda
21+
* Install [Conda](https://github.com/conda-forge/miniforge?tab=readme-ov-file#install) (if you don't already have it)
22+
* Recommended: create a new virtual environment: `conda env create --name prepmd && conda activate prepmd`
23+
* Install prepmd from the CCPBioSim conda channel: `conda install -c conda-forge -c prepmd`
24+
* Add your [modeller license key](https://salilab.org/modeller/registration.html) by running `prep-license <your license key`
25+
26+
### Manual install
2127
* Install [Conda](https://github.com/conda-forge/miniforge?tab=readme-ov-file#install) (if you don't already have it)
2228
* Clone this repo and enter the folder: `git clone https://github.com/CCPBioSim/prepmd.git && cd prepmd`
2329
* Run `conda env create --name prepmd --file environment.yaml && conda activate prepmd && pip install .`
24-
* For the MODELLER part of the workflow to work, you need to get a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in `envs/prep/lib/modeller-10.7/modlib/modeller/config.py` relative to the path where conda is installed.
30+
* For the MODELLER part of the workflow to work, you need a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in `envs/prep/lib/modeller-10.7/modlib/modeller/config.py` relative to the path where conda is installed.
2531
* After installing, run `pytest` to run tests.
2632

27-
## Preparing structures from the PDB for simulation
33+
## Quickstart
34+
35+
`prepmd 6xov 6xov_processed.cif` will download the structure for PDB entry `6xov`, process it and write it to `6xov_processed.cif`. If you have a local structure file, you can use the `--structure` parameter, though you'll still need to list a PDB code (it's not important what the code is, but some of the file foramts used by prepmd require a code to be present.
36+
Note: .pdb support is provided for legacy compatibility, but using the mmCif format is strongly recommended, as the pdb format is deprecated.
37+
`runmd 6xov_processed.cif --traj_out traj.xtc --md_steps 5000` will minimise and run a simulation of structure.cif using OpenMM, writing a trajectory to `traj_out.xtc`, for 5000 steps. By default, `runmd` uses a minimal set of simulation parameters, which aren't likely to be accurate - check the `runmd` section of this documentation for more options
38+
39+
## Preparing structure files for simulation with prepmd
40+
41+
### prepmd workflow
42+
Steps in the `prepmd` workflow:
43+
* The structure file(s) are downloaded (if not supplied) into a working directory. PDB and mmCif are supported, though mmCif is recommended. `prepmd` automatically infers the file format from the file extension of the input/output files.
44+
* `prepmd` extracts the sequence from the residues in the PDB directly and compares them to a reference sequence. By default this is the sequence described in the SEQRES entries of the structure file. The two sequences are alligned and [MODELLER]() is used to fill in the missing residues.
45+
* Optionally, multiple models can be created, and scored based on MODELLER's internal metrics or their similarity to a reference EM density map.
46+
* HETATMS are extracted from the structure file and saved to .sdf files. [rdkit]() is used to add hydrogens and correct the geometry of the ligands.
47+
* PDBFixer is used to add misisng hydrogens and remove nonstandard residues.
48+
* Optionally, at this point, a PQR file can be output using PDB2PQR.
49+
* Finally, OpenMM is used to perform a test minimisation and simulation. This step ensures that the resulting file is ready for simulation and that there are no steric clashes. If the minimisation or test simulation fails, it will be retried with OpenMM's variable langevin integrator. In testing, this has successfully minimised structure files with high clash scores.
50+
* The final, mimimised structure file will be written out. Note: if ligands are present, the non-minimised structure will be written instead - this is to allow the user to choose which ligand files to include in their final structure, which can be minimised using `runmd`.
51+
52+
### prepmd command-line reference
53+
* Use `prepmd --help` for a full list of parameters.
2854

29-
### Basic example:
30-
`prepmd 6xov 6xov_processed.pdb` will download the structure for PDB entry `6xov`, process it and write it to `6xov_processed.pdb`.
31-
### Using a local structure file:
32-
`prepmd --structure 6xov_input.pdb 6xov 6xov_processed.pdb`. You still need to supply a PDB code, as the various file formats used by prepmd require one to be present.
33-
### Generate multiple structure files:
34-
`prepmd 6xov 6xov_processed.pdb -n 5` will generate 5 candidate structures and select the best one as determined by MODELLER's internal metrics. Alternatively, `prepmd 6xov 6xov_processed.pdb -n 5 -em 22281 --contour 0.01` will download EMD-22281, the EMDB entry associated with 6XOV, and score the generated models based on their agreement with the EM density map.
35-
### Use refined structures from PDB-REDO:
55+
### Worked examples
56+
57+
#### Using a local structure file
58+
`prepmd --structure 6xov_input.pdb 6xov 6xov_processed.pdb`. You still need to supply a PDB code, as some of the file formats used by prepmd require one to be present. The code doesn't have to be a 'real' PDB code, e.g. 'AAAA' will work fine. When using this setting, the input and output files must be in the same format - prepmd doesn't perform implicit conversions!
59+
#### Generate multiple structure files
60+
`prepmd 6xov 6xov_processed.pdb -n 5` will generate 5 candidate structures and select the best one as determined by MODELLER's internal metrics. Alternatively, `prepmd 6xov 6xov_processed.pdb -n 5 -em 22281 --contour 0.01` will download EMD-22281, the EMDB entry associated with 6XOV, and score the generated models based on their agreement with the EM density map (using the iterative closest point algorithm). The -em setting can also point to a map file.
61+
#### Use refined structures from PDB-REDO
3662
`prepmd 1cbs 1cbs_processed.pdb --redo` will download a refined structure from PDB-REDO, if it is available. Note: not all PDB entries have corresponding PDB-REDO entries.
37-
### Use your own alignments and sequences to fill missing loops:
38-
By default, `prepmd` will read missing residues from the pdb/mmcif metadata, attempt to align the missing residues with the currently present residues, and then build missing loops. You can manually provide a FASTA file containing the alignment data with `--fasta`. You can also ask prepmd to get the sequence data from UNIPROT instead, with `--download`, though this is not recommended, as the raw sequence data can be different from the PDB and cause the alignment to fail.
39-
### Other usage notes
40-
* `prepmd` will attempt to guess the correct file format from the filenames it's given. It won't perform implicit conversions, so make sure to start and end with the same file type.
41-
* By default, `prepmd` removes ligands and other molecules from the input and saves each residue to a separate SDF file. You can disable this behaviour with the `--ignore_hettams` flag.
63+
#### Use your own alignments and sequences to fill missing loops
64+
By default, `prepmd` will read missing residues from the pdb/mmcif SEQRES records, attempt to align the missing residues with the currently present residues, and then build missing loops with MODELLER. You can manually provide an aligned FASTA file containing the the complete and incomplete sequences with `--fasta`. You can also ask prepmd to get the sequence data from UNIPROT instead, with `--download`, though this is not recommended, as the raw sequence data can be substantially different from the PDB and cause the alignment to fail.
65+
#### Handling ligands
66+
* By default, `prepmd` removes ligands and other molecules from the input and saves each HETATM residue to its own SDF file. You can disable this behaviour with the `--ignore_hetatams` flag. The co-ordinates inside the SDF files correspond to the co-ordinates of the ligands in the structure file, so the ligands can be added back into the original structure easily. `prepmd` uses [rdkit]() to add hydrogens and correct the geometry of small molecules.
67+
#### Working directory
4268
* By default, `prepmd` will leave intermediate files in a randomly-named temporary directory. You can set the name of this directory: `prepmd --wdir 6xov_temp 6xov 6xov.cif`.
43-
* While both pdb and mmCif are supported, using the mmCif format is strongly recommended, as the pdb format has been deprecated since 2024.
44-
* Use `prepmd --help` for a full list of parameters.
4569

46-
## Running MD simulations
47-
`runmd` can run MD simulations using OpenMM.
48-
### A Basic Example
49-
`runmd structure.cif --min_out structure_minimised.cif --traj_out traj.xtc --md_steps 5000 --step 100` will minimise and run a simulation of structure.cif using OpenMM, writing a trajectory to `traj_out.xtc`, for 5000 steps, saving one trajectory frame every 100 steps.
50-
If you already have a minimised structure, you can skip minimisation: `runmd structure.cif --traj_out traj.xtc --md_steps 5000 --step 100 -nomin -notest`
51-
### Explicit solvent:
52-
`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 10 -solv tip4pew` will run a simulation with the tip4pew solvent. tip3p, tip4pew and spce are supported. You can also add pressure coupling with `--pressure 1.0` (for 1 bar). By default, simulations run with an implicit solvent equivalent to AMBER's `igb=8` option.
53-
### Force Fields:
54-
`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 50 -ff amber14` runs with amber14. charmm36, amoeba, amber14 and amber19 are available, with charmm36 being the default.
55-
### Equilibrate side chains:
70+
## Running MD simulations with runmd
71+
Steps in the `runmd` workflow:
72+
* Validate user input - runmd will attempt to infer the best parameters and halt if incompatible/impossible settings are used.
73+
* Create an OpenMM system object. If small molecules are present, `runmd` will also load the OpenFF Sage small molecule force field.
74+
* If there is explicit solvent, set up the simulation box and solvate the system.
75+
* If the run is a metadynamics run, setup bias variables and forces using [openmmtools]().
76+
* Attempt to minimise and run the simulation with OpenMM. If the run/minimisation crashes, the numerical integrator will automatically be switched to the variable langevin integrator and the simulation will be restarted.
77+
* If the run is a metadynamics run, and the metadynamics collective variables aren't minimised, the simulation will restart.
78+
79+
## runmd command-line reference
80+
* Use `runmd --help` for a full list of parameters.
81+
82+
### Output files
83+
`runmd 6xov_processed.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 5000` writes out a trajectory file and a structure file (mmcif or pdb) containing the minimised system. If the system has been solvated, this structure file also contains the solvent molecules. The trajectory can be written in DCD or XTC format, which is detected from the filename. The xtc format results in smaller files but with less precision.
84+
### Explicit solvent
85+
`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 10 -solv tip4pew` will run a simulation with the tip4pew solvent. tip3p, tip4pew and spce are supported. By default, simulations run with an implicit solvent equivalent to AMBER's `igb=8` option.
86+
### Temperature and pressure
87+
The default settings result in a rather loose coupling to the heat bath. You can change this with the `-f` or `--friction` argument, which specified the friction coefficient coupling the system to the heat bath. Running a simulation with explicit solvent will result in tighter coupling. You can also add pressure coupling with `--pressure 1.0` (for 1 bar).
88+
### Change force field
89+
`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 50 -ff amber14` runs with amber14. charmm36, amoeba, amber14 and amber19 are available, with charmm36 being the default. The force field is one of the most important MD parameters, and the best force field to use is normally system-dependent.
90+
### Equilibrate only side chains:
5691
`runmd structure.cif -o structure_minimised.cif --fix_backbone -solv tip4pew --notest` will fix the backbone in place and only equilibrate side chains.
57-
### Add ligands:
58-
`runmd structure.cif -l LIG.sdf -ff amber14` runs a simulation with a ligand. You can add multiple ligands by using the `-l` argument multiple times. Ligands are simulated using OpenFF. OpenFF has limited compatibility with force fields and solvent models, so ligand simulations only run with the amber14 force field and explicit solvent. By default, ligand simulations also run with a smaller timestep.
59-
### Create a morph trajectory:
60-
`runmd pre.cif -m post.cif -o minimised_out.pdb` will create a trajectory that smoothly transitions between pre.cif and post.cif. This trajectory is created using OpenMM's metadynamics features. Note: this should only be used for visualisation/illustration as trajectories created this way are arbitrary representations of structural transitions that aren't guaranteed to represent the underlying physics and biology.
92+
### Add ligands
93+
`runmd structure.cif -l LIG.sdf -ff amber14` runs a simulation with a ligand. You can add multiple ligands by using the `-l` argument multiple times. `runmd` supports small molecules using openff's Sage force field, which has limited compatibility with other force fields and solvent models, so ligand simulations only run with the amber14 force field and explicit solvent. By default, ligand simulations also run with a smaller timestep.
94+
### Create a morph trajectory
95+
`runmd pre.cif -m post.cif -o minimised_out.pdb` will create a trajectory that smoothly transitions between the structures in pre.cif and post.cif. This trajectory is created using openmmtools' metadynamics features. The metadynamics run applies arbitrary biasing forces to perform the transition, so this should only be used for visualisation/illustration, and may not represent the underlying physics and biology.
6196
If you have two files for the same structure which aren't aligned (e.g. they have slightly different starting/ending residues), you can trim the ends to align them: `aligntogether pre.cif post.cif pre_cropped.cif post_cropped.cif`
62-
### Other usage notes:
63-
* Set the numerical integrator with the `-i` flag. This can be either `VariableLangevinIntegrator` or `LangevinMiddleIntegrator`. By default, `runmd` will attempt to use the latter, and fall back to the former if the simulation becomes numerically unstable.
64-
* The default settings result in a rather loose coupling to the heat bath. You can change this with the `-f` or `--friction` argument, which specified the friction coefficient coupling the system to the heat bath. Running a simulation with explicit solvent will also result in tighter coupling.
97+
### Numerical integrators
98+
* Set the numerical integrator with the `-i` flag. This can be either `VariableLangevinIntegrator` or `LangevinMiddleIntegrator`. By default, `runmd` will attempt to use the latter, and fall back to the former if the simulation becomes numerically unstable. The parameter `--minimise-err` sets the error tolerance or the variable langevin integrator. Its value is arbitrary - 0.001 is a good starting point, increasing it will make the simulation run faster at the expense of accuracy.
99+
### Other settings
65100
* By default, `runmd` will try to select the most optimal nonbonded interaction method, but this can be overridden with `-nb` or `--nonbonded`, which can be one of `PME`, `CutoffPeriodic`, or `CutoffNonPeriodic`
66101
* By default, `runmd` will constrain the length of all bonds involving a hydrogen atom, which can allow for longer timesteps at the cost of some accuracy. This can be disabled by setting `-c None` or `--constraints None`. This setting is also disabled if the backbone is fixed.
67-
* Use `runmd --help` for a full list of parameters.
68102

69-
### What next?
70-
* Though you can run simple MD simulations with prepmd, for more in-depth MD we recommend using real MD software such as GROMACS, AMBER, NAMD or OpenMM.
103+
## What next?
104+
* Though you can run simple MD simulations, minimisations and validation with prepmd, for more in-depth MD we recommend using software such as GROMACS, AMBER, NAMD and OpenMM.
71105
* If you're looking to generate an atomistic structure file that matches your EM map as closely as possible, you can use a flexible fitting tool such as [TEMPy-ReFF](https://gitlab.com/topf-lab/tempy-reff).
72106

107+
## Python API
108+
prepmd's is also accessible via a python API:
109+
```
110+
from prepmd.prep import prep, run
111+
prep.prep("6xov", "6xov.cif", "working_dir")
112+
run.run("6xov.cif", traj_out="traj.xtc")
113+
```
114+
73115
## Licence
74116
AGPLv3
75117

76118
## Contributors
77-
prepmd is developed by Rob Welch. Thanks to Harry Swift for helping set up the CI. This project is funded by [DRIIMB](https://driimb.org/).
119+
`prepmd` is developed by Rob Welch. Thanks to Harry Swift for helping set up the CI. This project is funded by [DRI-IMB](https://driimb.org/). The repo is managed by CCPBioSim.
78120

79121
## Dependencies
80122
* OpenMM
@@ -85,3 +127,6 @@ prepmd is developed by Rob Welch. Thanks to Harry Swift for helping set up the C
85127
* mrcfile
86128
* icp
87129
* mdanalysis
130+
* openmmtools
131+
* openff-toolkit
132+
* rdkit

0 commit comments

Comments
 (0)