Skip to content

Commit 8f4d391

Browse files
committed
Generate multiple structures and score them, bugfixes, docs updates
1 parent d74c128 commit 8f4d391

13 files changed

Lines changed: 428 additions & 56 deletions

README.md

Lines changed: 54 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,46 +1,81 @@
1+
12
# prepmd
23
[![prepmd CI](https://github.com/CCPBioSim/mdprep/actions/workflows/python-app.yml/badge.svg)](https://github.com/CCPBioSim/mdprep/actions/workflows/python-app.yml)
34

4-
A utility to automatically prepare structures from the PDB for molecular dynamics simulation.
5+
A utility to automatically prepare structures from the PDB for molecular dynamics simulation and perform minimisations and simple MD simulations.
56

67
## Features
78
* [X] Automatically download structures, sequences and metadata from the PDB and UNIPROT
89
* [X] Automatically fill missing loops with modeller
910
* [X] Automatically add missing atoms and fix non-standard residues with pdbfixer
10-
* [ ] Automatically propagate metadata through to finalised structure files
1111
* [X] Automatically resolve steric clashes and minimise structures
1212
* [X] Automatically trim together structures to be the same length
1313
* [X] Run simple MD simulations for testing, validation and minimisation
1414
* [X] Create 'morph' trajectories with metadynamics
15+
* [ ] Automatically propagate metadata through to finalised structure files
1516
* [ ] AIIDA integration
1617

1718
## Installation
18-
* Install [Conda](https://conda-forge.org/download/) (if you don't already have it)
19-
* Clone this repo and enter the folder: `git clone https://github.com/CCPBioSim/mdprep.git && cd prepmd`
19+
* Install [Conda](https://github.com/conda-forge/miniforge?tab=readme-ov-file#install) (if you don't already have it)
20+
* Clone this repo and enter the folder: `git clone https://github.com/CCPBioSim/prepmd.git && cd prepmd`
2021
* Run `conda env create --name prepmd --file environment.yaml && conda activate prepmd && pip install .`
21-
* For the modeller part of the workflow to work, you need to get a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in `envs/prep/lib/modeller-10.7/modlib/modeller/config.py` relative to the path where conda is installed.
22+
* For the MODELLER part of the workflow to work, you need to get a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in `envs/prep/lib/modeller-10.7/modlib/modeller/config.py` relative to the path where conda is installed.
2223
* After installing, run `pytest` to run tests.
2324

2425
## Preparing structures from the PDB for simulation
25-
* A basic example: `prepmd 6xov 6xov_processed.pdb` will download the structure for PDB entry `6xov`, process it and write it to `6xov_processed.pdb`.
26-
* If you already have a pdb file, you can instead run: `prepmd --structure 6xov_input.pdb 6xov 6xov_processed.pdb`. You still need to supply a PDB code, as the various file formats used by prepmd require one to be present.
27-
* `prepmd` will attempt to guess the correct file formats from the filenames it's given. It won't perform implicit conversions, so make sure to start and end with the same file type.
26+
27+
### Basic example:
28+
`prepmd 6xov 6xov_processed.pdb` will download the structure for PDB entry `6xov`, process it and write it to `6xov_processed.pdb`.
29+
### Using a local structure file:
30+
`prepmd --structure 6xov_input.pdb 6xov 6xov_processed.pdb`. You still need to supply a PDB code, as the various file formats used by prepmd require one to be present.
31+
### Generate multiple structure files:
32+
`prepmd 6xov 6xov_processed.pdb -n 5` will generate 5 candidate structures and select the best one as determined by MODELLER's internal metrics. Alternatively, `prepmd 6xov 6xov_processed.pdb -n 5 -em 22281 --contour 0.01` will download EMD-22281, the EMDB entry associated with 6XOV, and score the generated models based on their agreement with the EM density map.
33+
### Use refined structures from PDB-REDO:
34+
`prepmd 1cbs 1cbs_processed.pdb --redo` will download a refined structure from PDB-REDO, if it is available. Note: not all PDB entries have corresponding PDB-REDO entries.
35+
### Use your own alignments and sequences to fill missing loops:
36+
By default, `prepmd` will read missing residues from the pdb/mmcif metadata, attempt to align the missing residues with the currently present residues, and then build missing loops. You can manually provide a FASTA file containing the alignment data with `--fasta`. You can also ask prepmd to get the sequence data from UNIPROT instead, with `--download`, though this is not recommended, as the raw sequence data can be different from the PDB and cause the alignment to fail.
37+
### Other usage notes
38+
* `prepmd` will attempt to guess the correct file format from the filenames it's given. It won't perform implicit conversions, so make sure to start and end with the same file type.
2839
* By default, `prepmd` will leave intermediate files in a randomly-named temporary directory. You can set the name of this directory: `prepmd --wdir 6xov_temp 6xov 6xov.cif`.
29-
* By default, `prepmd` will read missing residues from the pdb/mmcif metadata, attempt to align the missing residues with the currently present residues, and then build missing loops. You can manually provide a FASTA file containing the alignment data with `--fasta`. You can also ask mdprep to get the sequence data from UNIPROT instead, with `--download`, though this is not recommended, as the raw sequence data can be very different from the PDB and cause the alignment to fail.
30-
* Note: while both pdb and mmCif are supported, using the mmCif format is strongly recommended, as the pdb format has been deprecated since 2024.
40+
* While both pdb and mmCif are supported, using the mmCif format is strongly recommended, as the pdb format has been deprecated since 2024.
3141
* Use `prepmd --help` for a full list of parameters.
3242

3343
## Running MD simulations
34-
* `runmd` can run MD simulations using OpenMM.
35-
* A basic example: `runmd structure.cif --min_out structure_minimised.cif --traj_out traj.xtc --md_steps 5000 --step 100` will minimise and run a simulation of structure.cif, writing a trajectory to `traj_out.xtc`, for 5000 steps, saving one trajectory frame every 100 steps.
36-
* If you already have a minimised structure, you can skip minimisation: `runmd structure.cif --traj_out traj.xtc --md_steps 5000 --step 100 -nomin -notest`
37-
* Solvate the simulation box: `runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 10 -solv tip4pew`. tip3p, tip4pew and spce are supported. You can also add pressure coupling with `--pressure 1.0` (for 1 bar)
38-
* Run with different force fields: `runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 50 -ff amber14` runs with amber14. AMOEBA is also available, and amber19 is available if you have a recent version of OpenMM.
39-
* Fix the backbone in place and just equilibrate the side chains: `runmd structure.cif -o structure_minimised.cif --fix_backbone -solv tip4pew --notest`
40-
* Use metadynamics to create a (non-physical!) guided md morph trajectory between two structures: `runmd pre.cif -m post.cif -o minimised_out.pdb`
41-
* Note: if you have two files for the same structure which aren't aligned (e.g. they have slightly different starting/ending residues), you can trim the ends to align them: `aligntogether pre.cif post.cif pre_cropped.cif post_cropped.cif`
44+
`runmd` can run MD simulations using OpenMM.
45+
### A Basic Example
46+
`runmd structure.cif --min_out structure_minimised.cif --traj_out traj.xtc --md_steps 5000 --step 100` will minimise and run a simulation of structure.cif using OpenMM, writing a trajectory to `traj_out.xtc`, for 5000 steps, saving one trajectory frame every 100 steps.
47+
If you already have a minimised structure, you can skip minimisation: `runmd structure.cif --traj_out traj.xtc --md_steps 5000 --step 100 -nomin -notest`
48+
### Explicit solvent:
49+
`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 10 -solv tip4pew` will run a simulation with the tip4pew solvent. tip3p, tip4pew and spce are supported. You can also add pressure coupling with `--pressure 1.0` (for 1 bar). By default, simulations run with an implicit solvent equivalent to AMBER's `igb=8` option.
50+
### Force Fields:
51+
`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 50 -ff amber14` runs with amber14. charmm36, amoeba, amber14 and amber19 are available, with charmm36 being the default.
52+
### Equilibrate side chains:
53+
`runmd structure.cif -o structure_minimised.cif --fix_backbone -solv tip4pew --notest` will fix the backbone in place and only equilibrate side chains.
54+
### Create a morph trajectory:
55+
`runmd pre.cif -m post.cif -o minimised_out.pdb` will create a trajectory that smoothly transitions between pre.cif and post.cif. This trajectory is created using OpenMM's metadynamics features. Note: this should only be used for visualisation/illustration as trajectories created this way are arbitrary representations of structural transitions that aren't guaranteed to represent the underlying physics and biology.
56+
If you have two files for the same structure which aren't aligned (e.g. they have slightly different starting/ending residues), you can trim the ends to align them: `aligntogether pre.cif post.cif pre_cropped.cif post_cropped.cif`
57+
### Other usage notes:
58+
* Set the numerical integrator with the `-i` flag. This can be either `VariableLangevinIntegrator` or `LangevinMiddleIntegrator`. By default, `runmd` will attempt to use the latter, and fall back to the former if the simulation becomes numerically unstable.
59+
* The default settings result in a rather loose coupling to the heat bath. You can change this with the `-f` or `--friction` argument, which specified the friction coefficient coupling the system to the heat bath. Running a simulation with explicit solvent will also result in tighter coupling.
60+
* By default, `runmd` will try to select the most optimal nonbonded interaction method, but this can be overridden with `-nb` or `--nonbonded`, which can be one of `PME`, `CutoffPeriodic`, or `CutoffNonPeriodic`
61+
* By default, `runmd` will constrain the length of all bonds involving a hydrogen atom, which can allow for longer timesteps at the cost of some accuracy. This can be disabled by setting `-c None` or `--constraints None`. This setting is also disabled if the backbone is fixed.
4262
* Use `runmd --help` for a full list of parameters.
4363

44-
## License
64+
### What next?
65+
* Though you can run simple MD simulations with prepmd, for more in-depth MD we recommend using real MD software such as GROMACS, AMBER, NAMD or OpenMM.
66+
* If you're looking to generate an atomistic structure file that matches your EM map as closely as possible, you can use a flexible fitting tool such as [TEMPy-ReFF](https://gitlab.com/topf-lab/tempy-reff).
4567

68+
## Licence
4669
AGPLv3
70+
71+
## Contributors
72+
prepmd is developed by Rob Welch. Thanks to Harry Swift for helping set up the CI. This project is funded by [DRIIMB](https://driimb.org/). prepmd makes use of
73+
74+
## Dependencies
75+
* OpenMM
76+
* PDBFixer
77+
* BioPython
78+
* MODELLER
79+
* pdb2pqr
80+
* mrcfile
81+
* icp

environment.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,4 +12,4 @@ dependencies:
1212
- modeller
1313
- biopython
1414
- pytest
15-
- mdanalysis
15+
- mrcfile

prepmd/__init__.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,4 +8,6 @@
88
from . import metadynamics
99
from . import align_together
1010
from . import add_modeller_license
11+
from . import point_cloud
12+
from . import lib
1113
__version__ = "1.0"

prepmd/download.py

Lines changed: 30 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,28 @@
99
import requests
1010

1111

12-
def get_structure(pdb_id, directory, file_format="mmCif"):
12+
def get_em_map(emdb_id, directory):
13+
"""
14+
Download a structure from the EMDB.
15+
Args:
16+
emdb_id: id of the em map to download, a string
17+
directory: directory to download the file into, a string
18+
returns:
19+
path to the downloaded file.
20+
"""
21+
emdb_id = str(emdb_id).replace("EMD-", "").replace("emd-", "")
22+
url = "https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-"+str(emdb_id)+"/map/emd_"+str(emdb_id)+".map.gz"
23+
destination = directory+sep+str(emdb_id)+".map.gz"
24+
try:
25+
urllib.request.urlretrieve(url, destination)
26+
except urllib.error.HTTPError as e:
27+
if e.code == 404:
28+
msg = "EMDB entry with ID "+emdb_id+" not found."
29+
raise IOError(msg)
30+
return directory+sep+str(emdb_id)+".map.gz"
31+
32+
33+
def get_structure(pdb_id, directory, file_format="mmCif", redo=False):
1334
"""
1435
Download a structure from the PDB.
1536
Args:
@@ -25,10 +46,17 @@ def get_structure(pdb_id, directory, file_format="mmCif"):
2546
if file_format == "pdb":
2647
format_str = "pdb"
2748
try:
28-
url = "https://files.rcsb.org/download/"+pdb_id+"."+format_str
49+
if redo:
50+
url = "https://pdb-redo.eu/db/"+pdb_id+"/"+pdb_id+"_final"+"."+format_str
51+
print(url)
52+
else:
53+
url = "https://files.rcsb.org/download/"+pdb_id+"."+format_str
2954
destination = directory+sep+pdb_id+"."+format_str
3055
urllib.request.urlretrieve(url, destination)
3156
except urllib.error.HTTPError as e:
57+
if e.code == 404 and redo:
58+
msg = "PDB with ID "+pdb_id+" not found in PDB-REDO."
59+
raise IOError(msg)
3260
r = requests.get(url.replace(".pdb", ".cif"))
3361
if r.status_code == 200:
3462
msg = "No PDB for "+pdb_id + \

prepmd/model.py

Lines changed: 70 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@
1515
import pathlib
1616
import shutil
1717
from prepmd import get_residues
18+
from prepmd import point_cloud
1819
import sys
1920

2021
placeholder = "sequence:::::::::"
@@ -96,7 +97,20 @@ def fasta(fasta_name, include_metadata=False):
9697
return out
9798

9899
def get_alignment_info(alignmentout):
100+
"""
101+
For a FASTA-formatted alignment file, get the number of residues filled,
102+
number of gaps, and largest gap.
103+
104+
Args:
105+
alignmentout: path to an alignment file, a string
99106
107+
Returns:
108+
total_resididues_filled - total number of residues to be added
109+
total_gaps_filled - number of gaps to be filled
110+
filled_residues - missing residues
111+
filled_gaps - gaps
112+
max_gap - largest gap
113+
"""
100114
def get_info(aln):
101115
total_missing = 0
102116
total_gaps = 0
@@ -131,23 +145,19 @@ def get_info(aln):
131145
return total_residues_filled, total_gaps_filled, filled_residues, filled_gaps, max_gap
132146

133147

134-
135-
136-
def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"]):
148+
def get_objective_functions(pdbs):
137149
"""
138-
For a directory, find all of the pdbs generated by modeller and select
139-
the one with the highest objective function (arbitrary metric used by
140-
modeller).
150+
For a list of PDB or mmCif files generated by modeller, get the objective
151+
function (which measures the quality of the model) and similarity (of the
152+
model sequence and the sequence used to fill the missing loops) for each
153+
pdb.
154+
141155
Args:
142-
directory: a string, the directory to scan
143-
ext: file extensions to check for (a list of strings)
156+
pdbs: a list of strings, paths to pdb files
144157
Returns:
145-
path to the file with the highest objective function, a string
158+
scores, similarities, two dictionaries keyed by file path containing
159+
the scores and similarities.
146160
"""
147-
148-
pdbs = []
149-
for ext in exts:
150-
pdbs += list(pathlib.Path(directory).glob('*.'+ext))
151161
scores = {}
152162
similarities = {}
153163
for pdb in pdbs:
@@ -167,6 +177,34 @@ def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"]):
167177
# mmcif
168178
if "_modeller.best_template_pct_seq_id" in line:
169179
similarities[pdb] = float(line.split()[-1])
180+
return scores, similarities
181+
182+
183+
def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"],
184+
em_map=None, em_contour_level=None):
185+
"""
186+
For a directory, find all of the pdbs generated by modeller and select
187+
the one with the highest objective function (arbitrary metric used by
188+
modeller).
189+
Args:
190+
directory: a string, the directory to scan
191+
ext: file extensions to check for (a list of strings)
192+
em_map: path to an EM density map file (a string). If this is set,
193+
the best PDB will be picked based on similarity to the map.
194+
em_contour_level: contour level for the EM map, a float.
195+
Returns:
196+
path to the file with the highest objective function, a string
197+
"""
198+
199+
pdbs = []
200+
for ext in exts:
201+
pdbs += list(pathlib.Path(directory).glob('*.'+ext))
202+
scores, similarities = get_objective_functions(pdbs)
203+
if em_map:
204+
em_scores = {}
205+
for pdb in pdbs:
206+
em_scores[pdb] = point_cloud.score_pdb_map(pdb, em_map,
207+
em_contour_level)
170208
max_sim = max(similarities.values())
171209
if max_sim >= 97:
172210
print("Similiarity: "+str(max_sim)+"%.")
@@ -181,12 +219,19 @@ def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"]):
181219
"loops. Please double-check your sequence data (by "
182220
"default these are the SEQRES records from the input "
183221
"structure.")
184-
return str(max(scores, key=scores.get))
222+
if em_map:
223+
em_err = min(em_scores.values())
224+
print("EM map alignemnt error: "+str(round(em_err, 2))+"A")
225+
if em_err > 15:
226+
raise ValueError("EM map and PDB are not the same structure.")
227+
return str(min(em_scores, key=em_scores.get))
228+
else:
229+
return str(max(scores, key=scores.get))
185230

186231

187232
# note: the pdb file isn't a parameter, it must be called code.pdb
188233
def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
189-
wdir):
234+
wdir, num_models=1, em_map=None, em_contour=None):
190235
"""
191236
For a given structure, fill in missing loops using modeller.
192237
Args:
@@ -197,6 +242,7 @@ def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
197242
this must be named according to the pdb id!
198243
outmodel: output structure file, a string
199244
wdir: working directory, a string, will be created if it doesn't exist
245+
num_models: how many models to generate, an int.
200246
Returns:
201247
nothing, but writes out outmodel and wdir.
202248
"""
@@ -265,6 +311,8 @@ def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
265311

266312
env.io.atom_files_directory = pdb_dirs
267313
print("Modelling missing loops...")
314+
if num_models > 4:
315+
print("(Creating "+str(num_models)+" models - might be slow!)")
268316
old_stdout = sys.stdout
269317
f = open(os.devnull, 'w')
270318
sys.stdout = f
@@ -275,11 +323,17 @@ def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
275323
if ".mmCif" in inmodel or ".cif" in inmodel or "Cif" in inmodel:
276324
a.set_output_model_format("MMCIF")
277325

326+
if num_models > 1:
327+
a.starting_model= 1
328+
a.ending_model = num_models
329+
278330
a.make()
279331

280332
sys.stdout = old_stdout
281333

282-
best_pdb = get_best_pdb(wdir)
334+
if em_map:
335+
print("Ranking PDBs by similarity to EM map...")
336+
best_pdb = get_best_pdb(wdir, em_map=em_map, em_contour_level=em_contour)
283337

284338
print("Finished modelling missing loops.")
285339
residues, gaps, remain_res, remain_gaps, max_gap = get_alignment_info(alignmentout)

prepmd/pdb2pqr.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/usr/bin/env python3
2+
# -*- coding: utf-8 -*-
3+
"""
4+
Created on Mon Feb 9 14:04:25 2026
5+
6+
@author: rob
7+
"""
8+
9+
#import pdb2pqr
10+
11+
# steps:
12+
# run as normal but not including test simulation and fixing
13+
# apply pdb2pqr before fixing
14+
# then fix, and especially add missing atoms AND remove hetatms
15+
# then run as normal
16+
17+
#pdb2pqr.run_pdb2pqr("A")
18+
# example usage: UBQ.pdb 1UBQ.pqr --titration-state-method=propka --with-ph=7 --ff=CHARMM --ffout=CHARMM

0 commit comments

Comments
 (0)