Skip to content

Commit 1d3d352

Browse files
committed
Bug fixes, docs, simulation tools
1 parent f5618aa commit 1d3d352

7 files changed

Lines changed: 646 additions & 111 deletions

File tree

README.md

Lines changed: 24 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -4,14 +4,32 @@ A utility to automatically prepare structures from the PDB for molecular dynamic
44
## Features
55
* [X] Automatically download structures, sequences and metadata from the PDB and UNIPROT
66
* [X] Automatically fill missing loops with modeller
7-
* [X] Automatically add missing atoms and convert non-standard residues (using pdbfixer)
7+
* [X] Automatically add missing atoms and convert non-standard residues with pdbfixer
88
* [X] Automatically propagate metadata through to finalised structure files
9-
* [X] Automatically resolve steric clashes
10-
* [ ] Automatically dock ligands
9+
* [X] Automatically resolve steric clashes and minimise structures
10+
* [X] A simple command-line interface for running molecular dynamics with OpenMM
1111
* [ ] AIIDA integration
1212

1313
## Installation
14-
* Download this repo and enter the root folder
14+
* Install [Conda](https://conda-forge.org/download/) (if you don't already have it)
15+
* Clone this repo and enter the folder: `git clone git@github.com:HECBioSim/prepmd.git && cd prepmd`
1516
* Run `conda env create --name prepmd --file environment.yaml && conda activate prepmd && pip install .`
16-
* For the modeller part of the workflow to work, you need to get a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in envs/prepmd/lib/modeller-10.7/modlib/modeller/config.py relative to your conda install directory.
17-
* Run `prempmd` from the prepmd environment. A basic example: `prepmd 6xov 6xov_processed.pdb` will download the structure for PDB entry 6xov, process it and write it to `6xov_processed.pdb`.
17+
* For the modeller part of the workflow to work, you need to get a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in `envs/prepmd/lib/modeller-10.7/modlib/modeller/config.py` relative to the path where conda is installed.
18+
19+
## Preparing structures from the PDB for simulation
20+
* A basic example: `prepmd 6xov 6xov_processed.pdb` will download the structure for PDB entry `6xov`, process it and write it to `6xov_processed.pdb`.
21+
* If you already have a pdb file, you can instead run: `prepmd --structure 6xov_input.pdb 6xov 6xov_processed.pdb`. You still need to supply a PDB code, as the various file formats used by prepmd require one to be present.
22+
* `prepmd` will attempt to guess the correct file formats from the filenames it's given. It won't perform implicit conversions, so make sure to start and end with the same file type.
23+
* By default, `prepmd` will leave intermediate files in a randomly-named temporary directory. You can set the name of this directory: `prepmd --wdir 6xov_temp 6xov 6xov.cif`.
24+
* By default, `prepmd` will read missing residues from the pdb/mmcif metadata, attempt to align the missing residues with the currently present residues, and then build missing loops. You can manually provide a FASTA file containing the alignment data with `--fasta`. You can also ask mdprep to get the sequence data from UNIPROT instead, with `--download`, though this is not recommended, as the raw sequence data can be very different from the PDB and cause the alignment to fail.
25+
* Note: while both pdb and mmCif are supported, using the mmCif format is strongly recommended, as the pdb format has been deprecated since 2024.
26+
* Use `prepmd --help` for a full list of parameters.
27+
28+
## Running MD simulations
29+
* `runmd` can run MD simulations using OpenMM.
30+
* A basic example: `runmd structure.cif --min_out structure_minimised.cif --traj_out traj.xtc --md_steps 5000 --step 100` will minimise and run a simulation of structure.cif, writing a trajectory to `traj_out.xtc`, for 5000 steps, saving one trajectory frame every 100 steps.
31+
* If you already have a minimised structure, you can skip minimisation: `runmd structure.cif --traj_out traj.xtc --md_steps 5000 --step 100 -nomin -notest`
32+
* Solvate the simulation box: `runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 10 -solv tip4pew`. tip3p, tip4pew and spce are supported. You can also add pressure coupling with `--pressure 1.0` (for 1 bar)
33+
* Run with different force fields: `runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 50 -ff amber14` runs with amber14. AMOEBA is also available, and amber19 is available if you have a recent version of OpenMM.
34+
* Finally, you may wish to fix the backbone in place and just equilibrate the side chains: `runmd structure.cif -o structure_minimised.cif --fix_backbone -solv tip4pew --notest`
35+
* Use `runmd --help` for a full list of parameters.

prepmd/download.py

Lines changed: 14 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -20,13 +20,23 @@ def get_structure(pdb_id, directory, file_format="mmCif"):
2020
returns:
2121
path to the downloaded file.
2222
"""
23-
if file_format == "mmCif":
23+
24+
if file_format == "mmCif" or file_format == "cif":
2425
format_str = "cif"
2526
if file_format == "pdb":
2627
format_str = "pdb"
27-
urllib.request.urlretrieve("https://files.rcsb.org/download/" +
28-
pdb_id+"."+format_str, directory+sep+pdb_id +
29-
"."+format_str)
28+
try:
29+
url = "https://files.rcsb.org/download/"+pdb_id+"."+format_str
30+
destination = directory+sep+pdb_id+"."+format_str
31+
urllib.request.urlretrieve(url, destination)
32+
except urllib.error.HTTPError as e:
33+
r = requests.get(url.replace(".pdb", ".cif"))
34+
if r.status_code == 200:
35+
msg = "No PDB for "+pdb_id+" exists (but an mmcif structure does). "
36+
"Run with --fmt cif to use it."
37+
raise IOError(msg)
38+
else:
39+
raise e
3040
return directory+sep+pdb_id+"."+format_str
3141

3242

prepmd/model.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
187187
print("Trying to use the sequence data in the input file...")
188188

189189
if not fastafile:
190-
print("No fasta file. Getting missing residues from input file...")
190+
print("Getting missing residues from input file...")
191191
pdb_res = get_residues.get_residues_pdb(inmodel, code)
192192
pdb_fullseq = get_residues.get_fullseq_pdb(inmodel, code)
193193
with open(code+".seq", "w") as file:

prepmd/prep.py

Lines changed: 69 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@
1111
import random
1212
import string
1313
import sys
14+
import shutil
1415

1516
from prepmd import download
1617
from prepmd import run
@@ -34,10 +35,10 @@
3435
parser.add_argument("-a", "--alignmentout",
3536
help="Alignment output file from sequences aligned for loop filling",
3637
default="alignment_out.fasta")
37-
parser.add_argument("-fmt", "--format",
38+
parser.add_argument("-fmt", "--dlformat",
3839
help="Structure format to download (only used when no structure file is "
3940
"provided)",
40-
default="mmCif")
41+
default=None)
4142
parser.add_argument(
4243
"-q", "--quiet", help="Do not print debug info", action="store_true")
4344
parser.add_argument("-e", "--fixstart",
@@ -62,17 +63,8 @@ def id_generator(size=6, chars=string.ascii_uppercase + string.digits):
6263
return ''.join(random.choice(chars) for _ in range(size))
6364

6465

65-
#def run_test(code, wdir):
66-
# params = {"code": code,
67-
# 'outmodel': "/home/rob/temp/missingresiduestestrun/"+code+"_ready.pdb",
68-
# "workingdir": "/home/rob/temp/missingresiduestestrun/"+code+"_"+id_generator(6)
69-
# }
70-
# prep(params["code"], params["outmodel"],
71-
# params["workingdir"], structure_format="pdb")
72-
73-
7466
def prep(code, outmodel, workingdir, folder=None, fastafile=None, inmodel=None,
75-
alignmentout="alignment_out.fasta", structure_format="mmCif",
67+
alignmentout="alignment_out.fasta", download_format="mmCif",
7668
quiet=False, fix_after=True, download_sequence=False,
7769
fix_missing_atoms=True):
7870
"""
@@ -89,7 +81,7 @@ def prep(code, outmodel, workingdir, folder=None, fastafile=None, inmodel=None,
8981
inmodel: path to input structure file, a string.
9082
alignmentout: output file from the alignment of the sequence from the
9183
structure file and the original sequence.
92-
structure_format: format of the structure file, a string. Can be
84+
download_format: format of the structure file, a string. Can be
9385
'mmCif' or 'pdb'.
9486
quiet: if true, don't print anything. boolean
9587
fix_after: if true, fix the pdb file after loop building. If false,
@@ -102,10 +94,36 @@ def prep(code, outmodel, workingdir, folder=None, fastafile=None, inmodel=None,
10294
Returns:
10395
nothing, but writes out a file to outmodel.
10496
"""
97+
98+
# infer download format from output format
99+
if not download_format:
100+
if (".pdb") in outmodel:
101+
download_format = "pdb"
102+
elif (".cif") in outmodel or ".mmcif" in outmodel or (
103+
".pdbx") in outmodel:
104+
download_format = "mmCif"
105+
106+
# check that input/output are specified in the same format
107+
# i'm not against converting the files but it shouldn't happen implicitly
108+
in_string = lambda substr, text : text == None or substr in text.lower()
109+
if in_string(".pdb", inmodel) and in_string(
110+
".pdb", outmodel) and in_string("pdb", download_format):
111+
pass
112+
elif in_string(".cif", inmodel) and in_string(
113+
".cif", outmodel) and in_string("cif", download_format):
114+
pass
115+
else:
116+
raise IOError("Inputs and outputs are in different formats! Please "
117+
"use only one format (ideally cif)" )
105118

106119
if not os.path.isdir(workingdir):
107120
pathlib.Path(workingdir).mkdir(parents=True, exist_ok=True)
108121

122+
if inmodel:
123+
# todo: copy input file to working directory
124+
shutil.copyfile(inmodel, workingdir+os.path.sep+inmodel)
125+
126+
run_dir = os.getcwd()
109127
os.chdir(workingdir)
110128

111129
# check folder for strucutre/sequence files
@@ -129,7 +147,7 @@ def prep(code, outmodel, workingdir, folder=None, fastafile=None, inmodel=None,
129147
if code and inmodel == None and folder == None:
130148
print("Downloading structure file")
131149
inmodel = download.get_structure(
132-
code, workingdir, file_format=structure_format)
150+
code, workingdir, file_format=download_format)
133151

134152
# fix
135153
if not fix_after:
@@ -174,9 +192,12 @@ def prep(code, outmodel, workingdir, folder=None, fastafile=None, inmodel=None,
174192
if ".cif" in inmodel or ".mmcif" in inmodel:
175193
print("Metadata restoration not implemented for mmCif (yet)")
176194

177-
print("Testing "+code)
195+
print("Simulating "+code)
178196
run.test_sim(outmodel)
179-
print("Wrote "+outmodel)
197+
print("Done.")
198+
199+
if not os.path.isabs(outmodel):
200+
shutil.copyfile(outmodel, run_dir+os.path.sep+outmodel)
180201

181202
def entry_point():
182203
"CLI entry point function. Uses sys.argv and argparse args object."
@@ -189,7 +210,7 @@ def entry_point():
189210
args.wdir = args.code+"_"+id_generator(6)
190211
prep(args.code, args.out, os.getcwd()+os.path.sep+args.wdir,
191212
folder=args.directory, fastafile=args.fasta, inmodel=args.structure,
192-
alignmentout=args.alignmentout, structure_format=args.format,
213+
alignmentout=args.alignmentout, download_format=args.dlformat,
193214
quiet=args.quiet, fix_after=fix_after,
194215
download_sequence=args.download, fix_missing_atoms=args.leavemissing)
195216

@@ -199,7 +220,7 @@ def test_mmcif_support():
199220
prep("6xov",
200221
os.getcwd()+os.path.sep+"6xov"+"_"+genid+".cif",
201222
os.getcwd()+os.path.sep+"testout"+os.path.sep+"6xov"+"_"+genid,
202-
structure_format="mmCif")
223+
download_format="mmCif")
203224

204225
def test_minimise():
205226
genid = id_generator(6)
@@ -221,29 +242,48 @@ class style():
221242
RESET = '\033[0m'
222243

223244
#testinputs = ["6xov", "9CS5", "8RM8", ]#"8VV2", "9B8B", "8CAE", "8QZA", "8RTL", "8RTO", "9A9W"] # regular pdbs
224-
testinputs = ["6TY4"] # clash
245+
#testinputs = ["6TY4", "6XOV", "9CS5", "8CAE", "8QZA", "8RTO"]
246+
247+
tests = [
248+
{"id": "6TY4", "format":"pdb"},
249+
{"id": "6XOV", "format":"pdb"},
250+
{"id": "9CS5", "format":"pdb"},
251+
{"id": "8CAE", "format":"pdb"},
252+
{"id": "8QZA", "format":"pdb"},
253+
{"id": "8RTO", "format":"mmCif"},
254+
{"id": "7IB8", "format":"mmCif"},
255+
{"id": "9A9G", "format":"mmCif"},
256+
{"id": "9I3U", "format":"pdb"},
257+
#test_minimise,
258+
#test_mmcif_support
259+
]
260+
#testinputs = ["8rto"]
225261

226262
results = {}
227263
state = 0
228264
cwd = os.getcwd()
229265

230-
for test in range(len(testinputs)):
266+
for test in range(len(tests)):
231267
try:
232268
os.chdir(cwd)
233-
code = testinputs[test]
234-
print(f"Testing {code} ({test}/{len(testinputs)})")
269+
code = tests[test]["id"]
270+
curr_format = tests[test]["format"]
271+
print(f"Testing {code} ({test}/{len(tests)})")
235272
genid = id_generator(6)
236273
pathlib.Path(os.getcwd()+os.path.sep+"testout").mkdir(
237274
parents=True, exist_ok=True)
238-
prep(code,
239-
os.getcwd()+os.path.sep+testinputs[test]+"_"+genid+".pdb",
240-
os.getcwd()+os.path.sep+"testout"+os.path.sep+code+"_"+genid,
241-
structure_format="pdb")
275+
if type(tests[test]) == dict:
276+
prep(code,
277+
os.getcwd()+os.path.sep+code+"_"+genid+"."+"cif",
278+
os.getcwd()+os.path.sep+"testout"+os.path.sep+code+"_"+genid,
279+
download_format=curr_format)
280+
elif callable(tests[test]):
281+
test()
242282
print(f"{style.GREEN}PASSED: {test} {style.RESET}")
243-
results[testinputs[test]] = "PASS"
283+
results[code] = "PASS"
244284
except Exception as e:
245-
print(f"{style.RED}FAILED: {testinputs[test]}{style.RESET}")
246-
results[testinputs[test]] = e
285+
print(f"{style.RED}FAILED: {test}{style.RESET}")
286+
results[code] = e
247287
state = 1
248288
print("")
249289
print("RESULTS:")

0 commit comments

Comments
 (0)