CCPBioSim
diff --git a/‎docking_mcl1/MTH_Docking_Part1_Prepare_2.0.ipynb‎
Lines changed: 278 additions & 0 deletions b/‎docking_mcl1/MTH_Docking_Part1_Prepare_2.0.ipynb‎
Lines changed: 278 additions & 0 deletions
@@ -0,0 +1,278 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# **The Molecular Treasure Hunt: Part 1: In the beginning - making your life easier and preparing your input files carefully**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "*An &#8491;ngstrom sized adventure by Sarah Harris (Leeds Physics) and Geoff Wells (UCL Pharmacy)*"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "\"*It's a dangerous business, going out your door. You step onto the road, and if you don't keep your feet, there's no knowing where you might be swept off to.*\" J.R.R. Tolkien, The Fellowship of the Ring."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# References and weblinks\n",
+    "These are links to software used for preparing docking files, running the docking and visualising the output.\n",
+    "\n",
+    "## Autodock Vina:\n",
+    "\n",
+    "For version 1.1.0: O. Trott, A. J. Olson. AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization and multithreading, Journal of Computational Chemistry 2010, 31, 455-461. doi:10.1002/jcc.21334\n",
+    "Download: http://vina.scripps.edu/download.html\n",
+    "\n",
+    "For version 1.2.0: J. Eberhardt, D. Santos-Martins, A.F. Tillack, S. Forli. AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings. Journal of Chemical Information and Modeling 2021, 61, 8, 3891–3898. doi:10.1021/acs.jcim.1c00203\n",
+    "\n",
+    "\n",
+    "## Open Drug Discovery Toolkit:\n",
+    "Incorporating openbabel\n",
+    "\n",
+    "N. M. O'Boyle, M. Banck, C. A. James, C. Morley, T. Vandermeersch, G. R. Hutchison. Open Babel: An open chemical toolbox. Journal of Cheminformatics, 2011, 3, 33. doi:10.1186/1758-2946-3-33\n",
+    "\n",
+    "M. Wójcikowski, P. Zielenkiewicz, P. Siedlecki. Open Drug Discovery Toolkit (ODDT): a new open-source player in the drug discovery field. Journal of Cheminformatics 2015, 7, 26. doi:10.1186/s13321-015-0078-2\n",
+    "\n",
+    "\n",
+    "## UCSF Chimera:\n",
+    "\n",
+    "E. F. Pettersen, T. D. Goddard, C. C. Huang, G. S. Couch, D. M. Greenblatt, E. C. Meng, T. E. Ferrin. UCSF Chimera - a visualization system for exploratory research and analysis.  J Comput Chem 2004, 25, 1605-12. doi:10.1002/jcc.20084\n",
+    "\n",
+    "**This is not preinstalled, so you need to download it!**\n",
+    "\n",
+    "Download: https://www.cgl.ucsf.edu/chimera/download.html\n",
+    "\n",
+    "\n",
+    "### MGL Tools (integrated within Chimera):\n",
+    "S. Forli, R. Huey, M. E. Pique, M. Sanner, D. S. Goodsell, and A. J. Olson Computational protein-ligand docking and virtual drug screening with the AutoDock suite. Nat Protoc. 2016, 11, 905–919. doi:10.1038/nprot.2016.051\n",
+    "\n",
+    "\n",
+    "## DataWarrior (fun for chemists - but optional for everyone else!!):\n",
+    "\n",
+    "Thomas Sander, Joel Freyss, Modest von Korff, Christian Rufener. DataWarrior: An Open-Source Program For Chemistry Aware Data Visualization And Analysis. J Chem Inf Model 2015, 55, 460-473, doi:10.1021/ci500588j\n",
+    "\n",
+    "**This is not preinstalled, so you need to download it!**\n",
+    "\n",
+    "Download: http://www.openmolecules.org/datawarrior/download.html\n",
+    "\n",
+    "## Editorial Notes: \n",
+    "You may want to save a pdf version of this notebook for your records. You can do this from the File > Download as > PDF via Latex menu."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# What we hope to learn\n",
+    "The focus of these notebooks (and associated code) is the evaluation of ligand binding to multiple protein structures. The workflow is designed to make this as user friendly as possible, but in order to achieve this your files should ideally be organised and set up very carefully. This is a research project, so it is not a simple exercise that you can work through blindly. You need to think about:\n",
+    "\n",
+    "- Your initial biological question or hypothesis.\n",
+    "- How your protein/receptor may interact with ligands at the atomic level (always look carefully at your structures and complexes using Chimera - we really mean at every possible stage!).\n",
+    "- The thermodynamics, the computational and chemical steps you perform during your analysis, how these may be interrelated and the approximations that are made.\n",
+    "- The significance of what your results might mean e.g. are you trying to design a drug, or understand a biological interaction, or think about the physical interactions and dynamics that influence molecular recognition.\n",
+    "\n",
+    "All of these points are important to getting the most out of your project!\n",
+    "\n",
+    "You may feel that you only understand one or two of these areas, but by going through this process you should hopefully appreciate the spectrum of different disciplines that are necessary to address the problem as a whole. You should also know that the most difficult problems are solved through collaboration with people from different disciplines and by combining theoretical studies such as this with experimental observations.\n",
+    "\n",
+    "**You do not need to know or understand everything!!! This project is about finding out - not about what you might already know!! It has been the same for us while we wrote it.**\n",
+    "\n",
+    "## Protein targets you could dock against\n",
+    "As long as your protein conformers are properly aligned (we tell you how to do this below), you have the freedom to dock against structures from, for example:\n",
+    "\n",
+    "- an MD trajectory\n",
+    "- NMR conformers\n",
+    "- Variants or mutants of a protein\n",
+    "- Subtypes of a particular receptor\n",
+    "- X-ray crystal structures from different sources, species, of different resolutions or obtained after re-refinement (e.g. PDB-redo)\n",
+    "- Cryo-EM structures at 'atomistic' resolution\n",
+    "\n",
+    "**Caveat**: when you come to analyse your results in Part 5 ('Saving Your Greatest Hits') we have the option of 'animating' our docking results for particular ligands to see how protein conformation affects binding. In order to do this your protein structures should contain **exactly** the same atoms and atom types (so the files should preferably come from the same source e.g. an MD trajectory or an ensemble from an NMR experiment or careful choice of your protein conformers!). This is a limitation, but we are thinking about ways to get around it.\n",
+    "\n",
+    "\n",
+    "# Preparing our input files\n",
+    "First of all we need to check that the protein files that we are working with contain all of the necessary information for the docking and scoring calculations. We are also going to define a single grid box that describes where on or within the protein our ligands will be 'allowed' to dock (often a cavity or groove or an experimentally observed binding pocket). Therefore our proteins must be very carefully aligned to make sure that this box is always in the right place.\n",
+    "\n",
+    "We use Chimera to prepare and visualise our input and output structures because they have been designed to be compatible with Autodock Vina. In fact, if you explore the Tools menu in Chimera you will see that you can run simple single docking calculations using the Surface/Binding Analysis > Autodock Vina menu. You may wish to use one of your protein structures and ligands to explore this. However, this quickly becomes impractical with multiple protein conformations and many ligands, so the automated approach that we use here is important for feasibility.\n",
+    "\n",
+    "We will use some of this Chimera/Autodock Vina functionality to build our docking 'gridbox' configuration file a little later in this notebook. \n",
+    "\n",
+    "\n",
+    "## Prerequisites for your protein .pdb files\n",
+    "Your protein files **must** be protonated i.e. have H atoms added. This might be necessary if your protein conformer(s) came from a crystal structure which do not resolve hydrogen atoms. We discuss how to add hydrogen atoms below if this is necessary. If you have a series of structures from an NMR experiment or a series of protein conformers from a molecular dynamics (MD) simulation, the protein structures should already be protonated.\n",
+    "\n",
+    "## How to align your protein structures\n",
+    "You can align proteins using Chimera as follows:\n",
+    "\n",
+    "- Open Chimera.\n",
+    "- Open your pdb files (all of them!) using the File > Open menu.\n",
+    "- If necessary you can protonate your pdb files at this stage by using the Tools > Structure Editing > AddH menu and following the instructions.\n",
+    "\n",
+    "Chimera will ask you a lot of annoying questions. Just agree to everything, unless you think you know better. If you have done something ambitious, like including post-translational modifications (e.g. glycosylation or phosphorylation sites), cofactors, metal coordination sites, etc. then it is likely that you will have to think carefully at this stage. \n",
+    "\n",
+    "- Align your files using the Tools > Structure Comparison > Match Maker.\n",
+    "- From the Match Maker menu pick the protein that you would like to use as the template then select all of the other structures to align to it (using cntrl-click to select them).\n",
+    "- Click OK to align the structures.\n",
+    "- You will then need to save each structure into a separate pdb file, you can do this by opening the File > Save PDB menu.\n",
+    "- Enter a file name for the first file that you would like to save and select the matching file from the Save Models list. Make sure that the 'Save relative to model:' box is ticked. Then click on Save at the bottom of the window.\n",
+    "- You will need to do this for each file in turn giving each file a new name and picking the next model in the sequence to save.\n",
+    "- It would be a **very** good idea to give your pdb files a consistent name e.g. myprotein_00.pdb for the first file then myprotein_01.pdb for the second etc. if they come from a group of related (MD or NMR) stuctures. This will make life much easier later!!\n",
+    "\n",
+    "If you are fortunate enough to have many more protein conformers, e.g. around 100, then you will need an automated method to protonate and/or align pdb files. This is possible within Chimera, but we haven't done it for you. However, for you to be lucky enough to have this amount of protein receptor dynamics information, then you have probably obtained this from an MD simulation. If this is the case, then the MD simulation software will have it's own facilties for performing the alignment, that are far more computationally efficient (which becomes increasingly important for large data sets). And of course your structures must already be chemically correct (e.g. protontated) for the simulation to have been peformed."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import sys\n",
+    "import os\n",
+    "from glob import glob"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#This defines our receptor structures (all of these files happen to have the same prefix, the starting structure (xal structure) is xxx_00.pdb)\n",
+    "receptor_filenames = glob('../receptor/*.pdb')\n",
+    "receptor_filenames.sort()\n",
+    "print(receptor_filenames)\n",
+    "print(\"Number of proteins is\", len(receptor_filenames))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## How to define and visualise your protein receptor binding site (the 'gridbox')\n",
+    "We will use Chimera to help us define the position and size of the space on or within the protein structures into which we will dock our ligands:\n",
+    "\n",
+    "- First open one or more of your protein structures in Chimera using the File > Open menu\n",
+    "- Then open the Tools > Surface/Binding Analysis > Autodock Vina menu\n",
+    "- Tick the box 'Resize search volume using button 2' (this is the middle button of your mouse - change this using the dropdown menu if you have a minimalist mouse or a laptop trackpad!)\n",
+    "- When you press the middle button of your mouse in the molecule display window you will see a green edged box appear.\n",
+    "- You can rotate the protein using the left button of your mouse and resize the box using the middle button. To adjust the box size and position, place your cursor over the side of the box you wish to move and press the middle button of your mouse while you drag the box to make it bigger or smaller. **This might take some practice and a bit of trial and error!**\n",
+    "- It may be helpful to show the protein in a surface representation as you do this so that you can check that the gridbox covers all of your binding site (you need to know where your binding site is!!)\n",
+    "- Your gridbox should be large enough to cover your binding site (and a margin either side), but large box sizes will slow your calculations down (a lot!) and may result in ligands binding at multiple sites (which you might not want).\n",
+    "- As you adjust the box size and position you will see that the numbers for the center and size of the gridbox will change in the Chimera Autodock Vina window.\n",
+    "- It is a **good idea** to load in a few of your protein structures to check that your grid box covers the binding site in all of your protein conformers!\n",
+    "- Once you are happy with your gridbox you can read off the numbers for the grid center (these are x, y and z coordinates) and the grid size (these are x, y and z dimensions in &#8491;). These numbers should be inserted below e.g. the center_x number should be the first grid center value and the size_x value should be the first grid size number.\n",
+    "- When you have done this you can execute the piece of code below!\n",
+    "\n",
+    "### What to do if you don't know where your binding site is\n",
+    "You have a few options:\n",
+    "\n",
+    "- You could run a docking calculation using the whole protein by making your gridbox cover the entire volume of the protein (this could be very slow and result in poor sampling of the search space, but there is a version of Autodock Vina that is optimised for this type of calculation called qvina-w). This approach is called 'blind docking'.\n",
+    "- You could use a webserver to identify cavities within your protein that may be potential ligand binding sites. For example FTMap (https://ftmap.bu.edu/login.php), is a web server that provides you with information on how a series of probe ligands are predicted to interact with your protein. The clusters of sites that the probes occupy can help you to identify potential binding sites on your protein. This will provide you with a pdb file that you can download and visualise using Chimera. You can then try to match your grid box position with the location of the probe ligand clusters.\n",
+    "- You could use software to identify cavities within your protein that may be potential ligand binding sites. Examples include fpocket (https://github.com/Discngine/fpocket) a command line program, or CAVER (https://caver.cz/) a Pymol plugin. You will need to read the manuals, webpages and articles if you wish to use these approaches.\n",
+    "- You could use the sites at which co-factors or substrates bind to your protein (e.g. ATP binding sites in molecular motors or kinase enzymes). You may be able to infer the location of substrate or ligand binding sites from bioinformatics, mutation studies or by analogy with related proteins."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Where to put your files\n",
+    "Your aligned protein files should be in a folder called 'receptor' in the parent folder for all of your docking calculations for a particular protein (the parent folder is the one below this one!).\n",
+    "The conf.txt file that we make in the next part will also be stored here."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "#We write out our docking configuration file here - this is read by Autodock Vina and defines the gridbox and some other parameters\n",
+    "#It is called conf.txt\n",
+    "\n",
+    "center_x = 7.3\n",
+    "center_y = 17.7\n",
+    "center_z = 1.5\n",
+    "size_x = 15\n",
+    "size_y = 17\n",
+    "size_z = 17\n",
+    "#Probably best not to play around with these unless you know what you are doing!\n",
+    "energy_range = 4\n",
+    "exhaustiveness = 20\n",
+    "#Fixing the number of docked poses to 1 (num_modes = 1) will give us only the lowest energy docked conformation of each ligand in our output file\n",
+    "num_modes = 1\n",
+    "\n",
+    "with open(\"../receptor/conf.txt\",'w') as contents:\n",
+    "    contents.write(\"center_x = \" + str(center_x) + '\\n' + \"center_y = \" + str(center_y) + '\\n' + \"center_z = \" + str(center_z) + '\\n' + '\\n' + \"size_x = \" + str(size_x) + '\\n' + \"size_y = \" + str(size_y) + '\\n' + \"size_z = \" + str(size_z) + '\\n' + '\\n' + \"energy_range = \" + str(energy_range) + '\\n' + \"exhaustiveness = \" + str(exhaustiveness) + '\\n' \"num_modes = \" + str(num_modes) + '\\n')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Your conf.txt file that contains the grid box information and a few other parameters will look like this:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [],
+   "source": [
+    "with open(\"../receptor/conf.txt\", 'r') as f:\n",
+    "        for line in f:\n",
+    "            print(line)\n",
+    "f.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The first part is finished. You are now ready to move on to Part 2, which is the biggest step in your journey!\n",
+    "\n",
+    "\"*And here I have found what I sought not indeed, but finding I would possess for ever. For it is above all gold and silver, and beyond all jewels. Neither rock, nor steel, nor the fires of Morgoth, nor all the powers of the Elf-kingdoms, shall keep from me the treasure that I desire.*\" J.R.R. Tolkien, The Silmarillion.\n",
+    "\n",
+    "It is very important to prepare carefully before any big adventure!\n",
+    "\n",
+    "Sarah and Geoff\n",
+    "\n",
+    "(an \"out-of-office studios\" $O^{3}S$ production)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.16"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}