CCPBioSim
diff --git a/‎README.md‎
Lines changed: 54 additions & 19 deletions b/‎README.md‎
Lines changed: 54 additions & 19 deletions
diff --git a/‎environment.yaml‎
Lines changed: 1 addition & 1 deletion b/‎environment.yaml‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎prepmd/__init__.py‎
Lines changed: 2 additions & 0 deletions b/‎prepmd/__init__.py‎
Lines changed: 2 additions & 0 deletions
diff --git a/‎prepmd/download.py‎
Lines changed: 30 additions & 2 deletions b/‎prepmd/download.py‎
Lines changed: 30 additions & 2 deletions
diff --git a/‎prepmd/model.py‎
Lines changed: 70 additions & 16 deletions b/‎prepmd/model.py‎
Lines changed: 70 additions & 16 deletions
diff --git a/‎prepmd/pdb2pqr.py‎
Lines changed: 18 additions & 0 deletions b/‎prepmd/pdb2pqr.py‎
Lines changed: 18 additions & 0 deletions
@@ -1,46 +1,81 @@
+
 # prepmd
 [![prepmd CI](https://github.com/CCPBioSim/mdprep/actions/workflows/python-app.yml/badge.svg)](https://github.com/CCPBioSim/mdprep/actions/workflows/python-app.yml)
 
-A utility to automatically prepare structures from the PDB for molecular dynamics simulation.
+A utility to automatically prepare structures from the PDB for molecular dynamics simulation and perform minimisations and simple MD simulations.
 
 ## Features
 * [X] Automatically download structures, sequences and metadata from the PDB and UNIPROT
 * [X] Automatically fill missing loops with modeller
 * [X] Automatically add missing atoms and fix non-standard residues with pdbfixer
-* [ ] Automatically propagate metadata through to finalised structure files
 * [X] Automatically resolve steric clashes and minimise structures
 * [X] Automatically trim together structures to be the same length
 * [X] Run simple MD simulations for testing, validation and minimisation
 * [X] Create 'morph' trajectories with metadynamics
+* [ ] Automatically propagate metadata through to finalised structure files
 * [ ] AIIDA integration
 
 ## Installation
-* Install [Conda](https://conda-forge.org/download/) (if you don't already have it)
-* Clone this repo and enter the folder: `git clone https://github.com/CCPBioSim/mdprep.git && cd prepmd` 
+* Install [Conda](https://github.com/conda-forge/miniforge?tab=readme-ov-file#install) (if you don't already have it)
+* Clone this repo and enter the folder: `git clone https://github.com/CCPBioSim/prepmd.git && cd prepmd` 
 * Run `conda env create --name prepmd --file environment.yaml && conda activate prepmd && pip install .`
-* For the modeller part of the workflow to work, you need to get a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in `envs/prep/lib/modeller-10.7/modlib/modeller/config.py` relative to the path where conda is installed.
+* For the MODELLER part of the workflow to work, you need to get a [modeller license key](https://salilab.org/modeller/registration.html) and add it to modeller's config.py file. If you use conda, the key will be in `envs/prep/lib/modeller-10.7/modlib/modeller/config.py` relative to the path where conda is installed.
 * After installing, run `pytest` to run tests.
 
 ## Preparing structures from the PDB for simulation
-* A basic example: `prepmd 6xov 6xov_processed.pdb` will download the structure for PDB entry `6xov`, process it and write it to `6xov_processed.pdb`.
-* If you already have a pdb file, you can instead run: `prepmd --structure 6xov_input.pdb 6xov 6xov_processed.pdb`. You still need to supply a PDB code, as the various file formats used by prepmd require one to be present.
-* `prepmd` will attempt to guess the correct file formats from the filenames it's given. It won't perform implicit conversions, so make sure to start and end with the same file type.
+
+### Basic example: 
+`prepmd 6xov 6xov_processed.pdb` will download the structure for PDB entry `6xov`, process it and write it to `6xov_processed.pdb`.
+### Using a local structure file:
+ `prepmd --structure 6xov_input.pdb 6xov 6xov_processed.pdb`. You still need to supply a PDB code, as the various file formats used by prepmd require one to be present.
+### Generate multiple structure files:
+`prepmd 6xov 6xov_processed.pdb -n 5` will generate 5 candidate structures and select the best one as determined by MODELLER's internal metrics. Alternatively, `prepmd 6xov 6xov_processed.pdb -n 5 -em 22281 --contour 0.01` will download EMD-22281, the EMDB entry associated with 6XOV, and score the generated models based on their agreement with the EM density map.
+### Use refined structures from PDB-REDO:
+`prepmd 1cbs 1cbs_processed.pdb --redo` will download a refined structure from PDB-REDO, if it is available. Note: not all PDB entries have corresponding PDB-REDO entries.
+### Use your own alignments and sequences to fill missing loops:
+By default, `prepmd` will read missing residues from the pdb/mmcif metadata, attempt to align the missing residues with the currently present residues, and then build missing loops. You can manually provide a FASTA file containing the alignment data with `--fasta`. You can also ask prepmd to get the sequence data from UNIPROT instead, with `--download`, though this is not recommended, as the raw sequence data can be different from the PDB and cause the alignment to fail.
+### Other usage notes
+* `prepmd` will attempt to guess the correct file format from the filenames it's given. It won't perform implicit conversions, so make sure to start and end with the same file type.
 * By default, `prepmd` will leave intermediate files in a randomly-named temporary directory. You can set the name of this directory: `prepmd --wdir 6xov_temp 6xov 6xov.cif`.
-* By default, `prepmd` will read missing residues from the pdb/mmcif metadata, attempt to align the missing residues with the currently present residues, and then build missing loops. You can manually provide a FASTA file containing the alignment data with `--fasta`. You can also ask mdprep to get the sequence data from UNIPROT instead, with `--download`, though this is not recommended, as the raw sequence data can be very different from the PDB and cause the alignment to fail.
-* Note: while both pdb and mmCif are supported, using the mmCif format is strongly recommended, as the pdb format has been deprecated since 2024.
+* While both pdb and mmCif are supported, using the mmCif format is strongly recommended, as the pdb format has been deprecated since 2024.
 * Use `prepmd --help` for a full list of parameters. 
 
 ## Running MD simulations
-* `runmd` can run MD simulations using OpenMM.
-* A basic example: `runmd structure.cif --min_out structure_minimised.cif --traj_out traj.xtc --md_steps 5000 --step 100` will minimise and run a simulation of structure.cif, writing a trajectory to `traj_out.xtc`, for 5000 steps, saving one trajectory frame every 100 steps.
-* If you already have a minimised structure, you can skip minimisation: `runmd structure.cif --traj_out traj.xtc --md_steps 5000 --step 100 -nomin -notest`
-* Solvate the simulation box: `runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 10 -solv tip4pew`. tip3p, tip4pew and spce are supported. You can also add pressure coupling with `--pressure 1.0` (for 1 bar)
-* Run with different force fields: `runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 50 -ff amber14` runs with amber14. AMOEBA is also available, and amber19 is available if you have a recent version of OpenMM.
-* Fix the backbone in place and just equilibrate the side chains: `runmd structure.cif -o structure_minimised.cif --fix_backbone -solv tip4pew --notest`
-* Use metadynamics to create a (non-physical!) guided md morph trajectory between two structures: `runmd pre.cif -m post.cif -o minimised_out.pdb` 
-* Note: if you have two files for the same structure which aren't aligned (e.g. they have slightly different starting/ending residues), you can trim the ends to align them: `aligntogether pre.cif post.cif pre_cropped.cif post_cropped.cif`
+`runmd` can run MD simulations using OpenMM.
+### A Basic Example
+ `runmd structure.cif --min_out structure_minimised.cif --traj_out traj.xtc --md_steps 5000 --step 100` will minimise and run a simulation of structure.cif using OpenMM, writing a trajectory to `traj_out.xtc`, for 5000 steps, saving one trajectory frame every 100 steps.
+ If you already have a minimised structure, you can skip minimisation: `runmd structure.cif --traj_out traj.xtc --md_steps 5000 --step 100 -nomin -notest`
+### Explicit solvent:
+`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 10 -solv tip4pew` will run a simulation with the tip4pew solvent. tip3p, tip4pew and spce are supported. You can also add pressure coupling with `--pressure 1.0` (for 1 bar). By default, simulations run with an implicit solvent equivalent to AMBER's `igb=8` option.
+### Force Fields:
+`runmd structure.cif -o structure_minimised.cif --traj_out traj.xtc --md_steps 500 --step 50 -ff amber14` runs with amber14. charmm36, amoeba, amber14 and amber19 are available, with charmm36 being the default.
+### Equilibrate side chains:
+`runmd structure.cif -o structure_minimised.cif --fix_backbone -solv tip4pew --notest` will fix the backbone in place and only equilibrate side chains.
+### Create a morph trajectory:
+`runmd pre.cif -m post.cif -o minimised_out.pdb`  will create a trajectory that smoothly transitions between pre.cif and post.cif. This trajectory is created using OpenMM's metadynamics features. Note: this should only be used for visualisation/illustration as trajectories created this way are arbitrary representations of structural transitions that aren't guaranteed to represent the underlying physics and biology.
+If you have two files for the same structure which aren't aligned (e.g. they have slightly different starting/ending residues), you can trim the ends to align them: `aligntogether pre.cif post.cif pre_cropped.cif post_cropped.cif`
+### Other usage notes:
+* Set the numerical integrator with the `-i` flag. This can be either `VariableLangevinIntegrator` or `LangevinMiddleIntegrator`. By default, `runmd` will attempt to use the latter, and fall back to the former if the simulation becomes numerically unstable.
+* The default settings result in a rather loose coupling to the heat bath. You can change this with the `-f` or `--friction` argument, which specified the friction coefficient coupling the system to the heat bath. Running a simulation with explicit solvent will also result in tighter coupling.
+* By default, `runmd` will try to select the most optimal nonbonded interaction method, but this can be overridden with `-nb` or `--nonbonded`, which can be one of `PME`, `CutoffPeriodic`, or `CutoffNonPeriodic`
+* By default, `runmd` will constrain the length of all bonds involving a hydrogen atom, which can allow for longer timesteps at the cost of some accuracy. This can be disabled by setting `-c None` or `--constraints None`. This setting is also disabled if the backbone is fixed.
 * Use `runmd --help` for a full list of parameters. 
 
-## License
+### What next?
+* Though you can run simple MD simulations with prepmd, for more in-depth MD we recommend using real MD software such as GROMACS, AMBER, NAMD or OpenMM.
+* If you're looking to generate an atomistic structure file that matches your EM map as closely as possible, you can use a flexible fitting tool such as [TEMPy-ReFF](https://gitlab.com/topf-lab/tempy-reff).
 
+## Licence
 AGPLv3
+
+## Contributors
+prepmd is developed by Rob Welch. Thanks to Harry Swift for helping set up the CI. This project is funded by [DRIIMB](https://driimb.org/). prepmd makes use of 
+
+## Dependencies
+* OpenMM
+* PDBFixer
+* BioPython
+* MODELLER
+* pdb2pqr
+* mrcfile
+* icp
@@ -12,4 +12,4 @@ dependencies:
   - modeller
   - biopython
   - pytest
-  - mdanalysis
+  - mrcfile
@@ -8,4 +8,6 @@
 from . import metadynamics
 from . import align_together
 from . import add_modeller_license
+from . import point_cloud
+from . import lib
 __version__ = "1.0"
@@ -9,7 +9,28 @@
 import requests
 
 
-def get_structure(pdb_id, directory, file_format="mmCif"):
+def get_em_map(emdb_id, directory):
+    """
+    Download a structure from the EMDB.
+    Args:
+        emdb_id: id of the em map to download, a string
+        directory: directory to download the file into, a string
+    returns:
+        path to the downloaded file.
+    """
+    emdb_id = str(emdb_id).replace("EMD-", "").replace("emd-", "")
+    url = "https://ftp.ebi.ac.uk/pub/databases/emdb/structures/EMD-"+str(emdb_id)+"/map/emd_"+str(emdb_id)+".map.gz"
+    destination = directory+sep+str(emdb_id)+".map.gz"
+    try:
+        urllib.request.urlretrieve(url, destination)
+    except urllib.error.HTTPError as e:
+        if e.code == 404:
+            msg = "EMDB entry with ID "+emdb_id+" not found."
+            raise IOError(msg)
+    return directory+sep+str(emdb_id)+".map.gz"
+
+
+def get_structure(pdb_id, directory, file_format="mmCif", redo=False):
     """
     Download a structure from the PDB.
     Args:
@@ -25,10 +46,17 @@ def get_structure(pdb_id, directory, file_format="mmCif"):
     if file_format == "pdb":
         format_str = "pdb"
     try:
-        url = "https://files.rcsb.org/download/"+pdb_id+"."+format_str
+        if redo:
+            url = "https://pdb-redo.eu/db/"+pdb_id+"/"+pdb_id+"_final"+"."+format_str
+            print(url)
+        else:
+            url = "https://files.rcsb.org/download/"+pdb_id+"."+format_str
         destination = directory+sep+pdb_id+"."+format_str
         urllib.request.urlretrieve(url, destination)
     except urllib.error.HTTPError as e:
+        if e.code == 404 and redo:
+            msg = "PDB with ID "+pdb_id+" not found in PDB-REDO."
+            raise IOError(msg)
         r = requests.get(url.replace(".pdb", ".cif"))
         if r.status_code == 200:
             msg = "No PDB for "+pdb_id + \
 
@@ -15,6 +15,7 @@
 import pathlib
 import shutil
 from prepmd import get_residues
+from prepmd import point_cloud
 import sys
 
 placeholder = "sequence:::::::::"
@@ -96,7 +97,20 @@ def fasta(fasta_name, include_metadata=False):
     return out
 
 def get_alignment_info(alignmentout):
+    """
+    For a FASTA-formatted alignment file, get the number of residues filled,
+    number of gaps, and largest gap.
+    
+    Args:
+        alignmentout: path to an alignment file, a string
     
+    Returns:
+        total_resididues_filled - total number of residues to be added
+        total_gaps_filled - number of gaps to be filled
+        filled_residues - missing residues
+        filled_gaps - gaps
+        max_gap - largest gap
+    """
     def get_info(aln):
         total_missing = 0
         total_gaps = 0
@@ -131,23 +145,19 @@ def get_info(aln):
     return total_residues_filled, total_gaps_filled, filled_residues, filled_gaps, max_gap
 
 
-
-
-def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"]):
+def get_objective_functions(pdbs):
     """
-    For a directory, find all of the pdbs generated by modeller and select
-    the one with the highest objective function (arbitrary metric used by
-    modeller).
+    For a list of PDB or mmCif files generated by modeller, get the objective
+    function (which measures the quality of the model) and similarity (of the
+    model sequence and the sequence used to fill the missing loops) for each
+    pdb.
+    
     Args:
-        directory: a string, the directory to scan
-        ext: file extensions to check for (a list of strings)
+        pdbs: a list of strings, paths to pdb files
     Returns:
-        path to the file with the highest objective function, a string
+        scores, similarities, two dictionaries keyed by file path containing
+        the scores and similarities.
     """
-    
-    pdbs = []
-    for ext in exts:
-        pdbs += list(pathlib.Path(directory).glob('*.'+ext))
     scores = {}
     similarities = {}
     for pdb in pdbs:
@@ -167,6 +177,34 @@ def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"]):
                 # mmcif
                 if "_modeller.best_template_pct_seq_id" in line:
                     similarities[pdb] = float(line.split()[-1])
+    return scores, similarities
+
+
+def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"],
+                 em_map=None, em_contour_level=None):
+    """
+    For a directory, find all of the pdbs generated by modeller and select
+    the one with the highest objective function (arbitrary metric used by
+    modeller).
+    Args:
+        directory: a string, the directory to scan
+        ext: file extensions to check for (a list of strings)
+        em_map: path to an EM density map file (a string). If this is set,
+        the best PDB will be picked based on similarity to the map.
+        em_contour_level: contour level for the EM map, a float.
+    Returns:
+        path to the file with the highest objective function, a string
+    """
+    
+    pdbs = []
+    for ext in exts:
+        pdbs += list(pathlib.Path(directory).glob('*.'+ext))
+    scores, similarities = get_objective_functions(pdbs)
+    if em_map:
+        em_scores = {}
+        for pdb in pdbs:
+            em_scores[pdb] = point_cloud.score_pdb_map(pdb, em_map, 
+                                                       em_contour_level)
     max_sim = max(similarities.values())
     if max_sim >= 97:
         print("Similiarity: "+str(max_sim)+"%.")
@@ -181,12 +219,19 @@ def get_best_pdb(directory, exts=["pdb", "cif", "mmcif", "mmCif"]):
                          "loops. Please double-check your sequence data (by "
                          "default these are the SEQRES records from the input "
                          "structure.")
-    return str(max(scores, key=scores.get))
+    if em_map:
+        em_err = min(em_scores.values())
+        print("EM map alignemnt error: "+str(round(em_err, 2))+"A")
+        if em_err > 15:
+            raise ValueError("EM map and PDB are not the same structure.")
+        return str(min(em_scores, key=em_scores.get))
+    else:
+        return str(max(scores, key=scores.get))
 
 
 # note: the pdb file isn't a parameter, it must be called code.pdb
 def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
-                         wdir):
+                         wdir, num_models=1, em_map=None, em_contour=None):
     """
     For a given structure, fill in missing loops using modeller.
     Args:
@@ -197,6 +242,7 @@ def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
         this must be named according to the pdb id!
         outmodel: output structure file, a string
         wdir: working directory, a string, will be created if it doesn't exist
+        num_models: how many models to generate, an int.
     Returns:
         nothing, but writes out outmodel and wdir.
     """
@@ -265,6 +311,8 @@ def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
 
     env.io.atom_files_directory = pdb_dirs
     print("Modelling missing loops...")
+    if num_models > 4:
+        print("(Creating "+str(num_models)+" models - might be slow!)")
     old_stdout = sys.stdout
     f = open(os.devnull, 'w')
     sys.stdout = f
@@ -275,11 +323,17 @@ def fix_missing_residues(code, fastafile, alignmentout, inmodel, outmodel,
     if ".mmCif" in inmodel or ".cif" in inmodel or "Cif" in inmodel:
         a.set_output_model_format("MMCIF")
 
+    if num_models > 1:
+        a.starting_model= 1
+        a.ending_model  = num_models
+        
     a.make()
 
     sys.stdout = old_stdout
 
-    best_pdb = get_best_pdb(wdir)
+    if em_map:
+        print("Ranking PDBs by similarity to EM map...")
+    best_pdb = get_best_pdb(wdir, em_map=em_map, em_contour_level=em_contour)
 
     print("Finished modelling missing loops.")
     residues, gaps, remain_res, remain_gaps, max_gap = get_alignment_info(alignmentout)
 
@@ -0,0 +1,18 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+"""
+Created on Mon Feb  9 14:04:25 2026
+
+@author: rob
+"""
+
+#import pdb2pqr
+
+# steps:
+    # run as normal but not including test simulation and fixing
+    # apply pdb2pqr before fixing
+    # then fix, and especially add missing atoms AND remove hetatms
+    # then run as normal
+
+#pdb2pqr.run_pdb2pqr("A")
+# example usage: UBQ.pdb 1UBQ.pqr --titration-state-method=propka --with-ph=7 --ff=CHARMM --ffout=CHARMM