diff --git a/education/molmod_online/docking.md b/education/molmod_online/docking.md index 4ea83855..21d233fd 100644 --- a/education/molmod_online/docking.md +++ b/education/molmod_online/docking.md @@ -153,7 +153,7 @@ Click on 'Tools' next to the canonical sequence and select 'BLAST'. Next, a new [window](https://www.uniprot.org/blast/){:target="_blank"} will open with the BLAST search. One can enter either a protein or a nucleotide sequence or a UniProt identifier. -Change the number of hits to 50 in advanced parameters (for an easy alignement). Then proceed to run BLAST. +Change the number of hits to 50 in advanced parameters (for an easy alignment). Then proceed to run BLAST. This step might take a few moments since our sequence is being compared to the UniProtKB reference proteomes plus SwissProt databases. Once the run is finished, we can see a list of orthologous sequences from different organisms ordered by sequence identity. @@ -169,7 +169,7 @@ Select all 50 sequences and click Tools >> Align selected results, proceed to ru -The easiest way to visualize the alignment to identifiy which positions are more conserved is by generating a sequence *logo*. For each +The easiest way to visualize the alignment to identify which positions are more conserved is by generating a sequence *logo*. For each position in the sequence, the logo identifies the most frequently occurring residues and scales its one-letter code according to a conservation score. We will be using the [WebLogo server](http://weblogo.threeplusone.com/create.cgi){:target="_blank"}, in order the generate the sequence @@ -260,7 +260,7 @@ ARCTIC-3D will provide a structure of the human MDM2, and add the residue contac spectrum b, cyan_red -Because the current residue indices match the human canonical sequence, you will have to run a structural alignment of you mouse MDM2 model onto this structure and define the residue mapping by hand. +Because the current residue indices match the human canonical sequence, you will have to run a structural alignment of your mouse MDM2 model onto this structure and define the residue mapping by hand. For this, you need to load your mouse MDM2 model on the same PyMOL session and then perform a structural alignment on the human one. The `align` [function](https://pymolwiki.org/index.php/Align){:target="_blank"}, already implemented in PyMOL, is easy to use and well suited for this task. @@ -328,7 +328,7 @@ To start the job submission, click on `Submit a new job`. ### Submission and validation of structures -For this we will make us of the [HADDOCK 2.4 submission interface](https://wenmr.science.uu.nl/haddock2.4/submit/1){:target="_blank"} of the HADDOCK web server. +For this we will make use of the [HADDOCK 2.4 submission interface](https://wenmr.science.uu.nl/haddock2.4/submit/1){:target="_blank"} of the HADDOCK web server. @@ -524,7 +524,7 @@ Double the number of steps for all four stages of the semi-flexible refinement: ### Job submission -This interface allows us to modify many parameters that control the behavior of HADDOCK but in our case the default values are all appropriate. It also allows us to download the input structures of the docking run (in the form of a `.tgz` archive) and a parameter file which contains all the settings and input structures for our run (in `.json` format). We strongly recommend to download this file as it will allow you to repeat the run afterwards by uploading into the [file upload inteface](https://wenmr.science.uu.nl/haddock2.4/submit_file){:target="_blank"} of the HADDOCK web server. It can serve as input reference for the run and added to the suplementary material of your publications. This file can also be manually edited. +This interface allows us to modify many parameters that control the behavior of HADDOCK but in our case the default values are all appropriate. It also allows us to download the input structures of the docking run (in the form of a `.tgz` archive) and a parameter file which contains all the settings and input structures for our run (in `.json` format). We strongly recommend to download this file as it will allow you to repeat the run afterwards by uploading into the [file upload interface](https://wenmr.science.uu.nl/haddock2.4/submit_file){:target="_blank"} of the HADDOCK web server. It can serve as input reference for the run and added to the supplementary material of your publications. This file can also be manually edited. * **Step 14:** Click on the `Submit` button at the bottom left of the interface. diff --git a/education/molmod_online/modelling.md b/education/molmod_online/modelling.md index 8febab72..c35f5ba6 100644 --- a/education/molmod_online/modelling.md +++ b/education/molmod_online/modelling.md @@ -22,7 +22,7 @@ procedure. The last decades of scientific advances in the fields of protein biology revealed the extent of both the protein sequence and structure universes. Protein sequences databases currently hold hundreds of -millions entries ([source](https://www.ebi.ac.uk/uniprot/TrEMBLstats){:target="_blank"}) and are foreseen to continue +millions of entries ([source](https://www.ebi.ac.uk/uniprot/TrEMBLstats){:target="_blank"}) and are foreseen to continue growing exponentially, driven by high-throughput sequencing efforts. On the other hand, the number of experimental protein structures is three orders of magnitude smaller ([source](https://www.rcsb.org/stats/growth/growth-released-structures){:target="_blank"}), and that of unique folds has remained virtually unchanged since 2008. @@ -43,10 +43,10 @@ This apparent stagnation of the protein structure universe is a boon for structu There are many computational methods for predicting the three-dimensional structure of proteins from their sequence, most of which fall in one of four broad categories. Of this quadrumvirate, homology modelling is one of the -most reliable class of methods, with an estimated accuracy close to a low-resolution experimental +most reliable classes of methods, with an estimated accuracy close to a low-resolution experimental structure ([source](https://salilab.org/modeller/downloads/marc-bozi.pdf){:target="_blank"}). Two others, molecular threading and _ab initio_ modelling, are usually of interest only if homology modelling is not an option. -Finally, since 2021, machine-leaning methods have show to be able to handle protein structure prediction with high accuracy. +Finally, since 2021, machine-learning methods have been shown to be able to handle protein structure prediction with high accuracy. Homology modelling is then a structure prediction method \- worth noting, not exclusively for proteins @@ -89,12 +89,10 @@ Take the time to browse through the UniProt page of mouse MDM2. The header of th protein, gene, and organism names for this particular entry, as well as its unique UniProt accession code. On the left, below the header, there is a sidebar listing the several sections of the page. You can use these to navigate directly to the **Structure** section to verify if there are -already published experimental structures for mouse MDM2 (not a predicted model by AlphaFold2 !). -Fortunately, there aren't any _yet_; otherwise this tutorial would end here. +already published experimental structures for mouse MDM2. - -Similarly as man, no protein is an island, entire of itself, every protein is a piece of the cell, a part of the main. Thus if we imagine the cytoplasm as a thick molecular soup, proteins are constantly in contact, interacting and exchanging information. Currently, predicting the entire cell interactome is close to impossible, however UniProt offers us a possibility to see experimentally confirmed interaction partners of proteins. -Under **Interaction** you can see the available information about the interaction partners of MDM2. The 'Binary Interaction' subsection shows which is taken and regularly updated from the [InAct database](https://www.ebi.ac.uk/intact/){:target="_blank"}. These interactions represent only those binary interactions, which were proven by more than one experiment. The complete IntAct set can be accessed using the link in the *Cross-references* section. +Similarly to humans, no protein is an island, entire of itself, every protein is a piece of the cell, a part of the main. Thus if we imagine the cytoplasm as a thick molecular soup, proteins are constantly in contact, interacting and exchanging information. Currently, predicting the entire cell interactome is close to impossible, however UniProt offers us a possibility to see experimentally confirmed interaction partners of proteins. +Under **Interaction** you can see the available information about the interaction partners of MDM2. The 'Binary Interaction' subsection shows which is taken and regularly updated from the [IntAct database](https://www.ebi.ac.uk/intact/){:target="_blank"}. These interactions represent only those binary interactions, which were proven by more than one experiment. The complete IntAct set can be accessed using the link in the *Cross-references* section. Which proteins does MDM2 interact with and which interaction was most frequently confirmed? Where does the interaction with p53 take place? @@ -114,7 +112,7 @@ Besides reporting on experimental structures, UniProt links to portals such as t sequence and structure databases in order to build homology models. These automated protocols are configured to create models only under certain conditions, such as sufficient sequence identity and coverage. Still, the template identification, target/template alignment, and modelling options are unsupervised, which may lead to severe errors in some cases. -In general, these models offer a quick peek of what fold(s) a particular sequence can adapt and may as well serve as a starting point for further refinement and analyses. +In general, these models offer a quick peek of what fold(s) a particular sequence can adopt and may as well serve as a starting point for further refinement and analyses. Nevertheless, if the model will be a central part of a larger study, it might be worth to invest time and effort in modelling a particular protein of interest with a set of dedicated protocols. The following tab, **Family & Domains**, lists structural and domain information derived either from experiments or by similarity to other entries. @@ -270,7 +268,7 @@ Each aligned residue pair is marked with symbols: * ` ` - quite different Below, there is an example of an alignment of the full mouse MDM2 sequence aligned to the human MDM2 in *Clustal* format. -This kind of alignment can be generated by UniProt, upon selecting organisms or isoforms you are interested it. +This kind of alignment can be generated by UniProt, upon selecting organisms or isoforms you are interested in.
@@ -325,8 +323,8 @@ This is not the scenario we will use in this course, however if you want to use
 ### 2. Template search
 
 After you inserted the amino-acid sequence, which serves as *query* for *template* search, on the next page there will be all found templates listed.
-SWISS-MODEL uses its own database [STML](https://www.ncbi.nlm.nih.gov/pubmed/24782522){:target="_blank"} to search against when looking for related protein structure for this query.
-STML [https://swissmodel.expasy.org/templates/](https://swissmodel.expasy.org/templates/){:target="_blank"} is a curated template library updated regularly with the new PDB release, containing templates for more than 120000 unique protein sequences.
+SWISS-MODEL uses its own database [SMTL](https://www.ncbi.nlm.nih.gov/pubmed/24782522){:target="_blank"} to search against when looking for related protein structure for this query.
+SMTL [https://swissmodel.expasy.org/templates/](https://swissmodel.expasy.org/templates/){:target="_blank"} is a curated template library updated regularly with the new PDB release, containing templates for more than 120000 unique protein sequences.
 
 SWISS-MODEL uses two databases to search through: fast and accurate [BLAST](https://www.ncbi.nlm.nih.gov/pubmed/9254694){:target="_blank"}, mostly used for closely related templates and more sensitive and time consuming [HHblits](https://www.ncbi.nlm.nih.gov/pubmed/22198341){:target="_blank"}, in cases of remote homology.
 
@@ -367,7 +365,7 @@ After clicking on the arrow `﹀` on the left a short preview of the template wi
 
 The **oligomeric state** is predicted for each template and user can modify it manually under "target prediction". A warning sign appears if the oligomeric state of the model doesn't exactly match the one of the template (for example not all chains of the biounit included in the model).
 
-As a rule of thumb, in homology modelling it is recommended to use X-ray crystal structures with a resolution lower than $$2.2Å$$ as templates. One has to often compromise between high sequence identity/similarity and **template resolution**. In general structures determined by X-ray crystallography are preferred over averaged NMR structures and structures determined with electron microscopy, as the latter determines the overall shape of the molecule not individual atoms locations.
+As a rule of thumb, in homology modelling it is recommended to use X-ray crystal structures with a resolution lower than $$2.2Å$$ as templates. One has to often compromise between high sequence identity/similarity and **template resolution**. In general structures determined by X-ray crystallography are preferred over averaged NMR structures. Nowadays, cryo-EM can also reaches near-atomic resolution and can support atomic model building.
 
 
  **Sequence similarity** between the sequence and the template is calculated from a normalized [BLOSUM62](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC50453/){:target="_blank"} substitution matrix and similarly as the QSQE score, it ranged between 0 and 1 with 1 as 100% sequence similarity and vice versa. Note that gaps are not taken into account while calculating the sequence similarity.
@@ -384,8 +382,8 @@ Have a look at found templates and their properties.
 
 
 The NGL viewer offers an option to toggle between different protein representations as well as to create and save template figures.
-Notice how you can see residues names after you hover over them with your cursor.
-One of the coloring options is by `Bfactor Range`.
+Notice how you can see residue names after you hover over them with your cursor.
+One of the coloring options is by `B-factor Range`.
 The B-factor, or the temperature factor, refers to the displacement of atoms from their mean position in a crystal structure and reach the value between 0 and 1.
 It describes the local mobility of the macromolecule, with 0 being the least mobile parts, and in this case marked blue.
 
@@ -445,7 +443,7 @@ The Global Quality Estimate consists of four individual terms: Cβ atoms only, a
 Here again, the lower values indicate that the models scores lower than the experimental structure (red) and higher values indicate, that the model scores higher than the experimental structure (blue).
 
 SWISS-MODEL uses another method [QMEAN](https://pubmed.ncbi.nlm.nih.gov/21134891/){:target="_blank"} to estimate the quality of freshly built models.
-QMEAN quantifies model accuracy as well as modelling errors per residues and globally - for the entire model.
+QMEAN quantifies model accuracy as well as modelling errors per residue and globally - for the entire model.
 This is done using statistical potentials of mean force.
 
 
@@ -453,7 +451,7 @@ The QMEAN Z-score or the normalized QMEAN score shows the "*degree of nativeness
 
 
 QMEAN score per residue is shown in the *Local Quality Estimate* plot. The [QMEANDisCo](https://doi.org/10.1093/bioinformatics/btz828){:target="_blank"} method is used in this step. QMEANDisCo compares interatomic distances in the model with ensemble information extracted from experimentally determined protein structures of target sequence homologues. The score shows similarity of the residues to the experimental structure and if it drops below 0.6, modelled residues are in general of low quality.
-Different chains are showed in different colours and the residue modelling-quality can be viewed in 3D by selecting *Confidence (gradient)* as the coloring method in the NGL viewer.
+Different chains are shown in different colours and the residue modelling-quality can be viewed in 3D by selecting *Confidence (gradient)* as the coloring method in the NGL viewer.
 
 The comparison plot shows the QMEAN score of our model (red star) within all QMEAN scores of experimentally determined structures compared to their size (number of residues). Here the Z-score is equivalent to the standard deviation of the mean.
 
@@ -471,7 +469,7 @@ For more detailed structure information, one can click on the `Structure Assessm
 Investigate a selected model and its structural properties. What is the percentage of Ramachandran favoured residues?
 
 
-A Ramachandran plot is a way to visualize backbone dihedral angles of amino acid residues in the model against energetically favored regions of dihedrals of amino acids in general. These favored regions were obtained from more than 12000 experimental structures from [PISCES](https://pubmed.ncbi.nlm.nih.gov/12912846/){:target="_blank"}. Moreover the model is validated by [Molprobity](https://molprobity.biochem.duke.edu){:target="_blank"} both locally and globally. The quality of the structure is then expressed in Molprobity score, which should be as low as possible, and the percentage of Ramachandran Favoured residues, ideally above 98%. Clash score, outliers and bad angles and bonds should be as well as low as possible. More about structure assessment can be found in its [documentation](https://swissmodel.expasy.org/assess/help){:target="_blank"}. Examples of Ramachadran plots for all residues below:
+A Ramachandran plot is a way to visualize backbone dihedral angles of amino acid residues in the model against energetically favored regions of dihedrals of amino acids in general. These favored regions were obtained from more than 12000 experimental structures from [PISCES](https://pubmed.ncbi.nlm.nih.gov/12912846/){:target="_blank"}. Moreover the model is validated by [Molprobity](https://molprobity.biochem.duke.edu){:target="_blank"} both locally and globally. The quality of the structure is then expressed in Molprobity score, which should be as low as possible, and the percentage of Ramachandran Favoured residues, ideally above 98%. Clash score, outliers and bad angles and bonds should be as well as low as possible. More about structure assessment can be found in its [documentation](https://swissmodel.expasy.org/assess/help){:target="_blank"}. Examples of Ramachandran plots for all residues below:
 
 
 
diff --git a/education/molmod_online/simulation.md b/education/molmod_online/simulation.md
index 2ea3b3a8..47537a6c 100644
--- a/education/molmod_online/simulation.md
+++ b/education/molmod_online/simulation.md
@@ -753,7 +753,7 @@ used to generate initial velocities. Pick an unlikely number for the random seed
 
 
 The inclusion of velocity in this system caused the particles and the system to gain kinetic energy.
-This information is stored in an binary file format with extension `.edr`, which can be read using the GROMACS utility `energy`.
+This information is stored in a binary file format with extension `.edr`, which can be read using the GROMACS utility `energy`.
 This utility extracts the information from the energy file into tabular files that can then be turned into plots.
 Select the terms of interest by typing their numbers sequentially followed by `Enter`.
 To quit, type `0` and `Enter`. Use the `xvg_plot.py` utility to plot the resulting `.xvg` file, passing the `-i` flag to have an interactive session open.
@@ -1218,7 +1218,7 @@ remained stable while others didn't?
 
 Feel free to play around with Pymol. Zoom in on specific regions, such as where the peptide is most
 rigid or most flexible, and check the side chain conformations (`show sticks`). Feel free to waste
-some (CPU) time on making an nice image, using `ray` and `png`. Do mind that scenes that are too
+some (CPU) time on making a nice image, using `ray` and `png`. Do mind that scenes that are too
 complex may cause the built-in ray-tracer of Pymol to crash, so in that case you can only get the
 image as you have it on screen using `png` directly. Check out the
 [Pymol Gallery](https://pymolwiki.org/index.php/Gallery){:target="_blank"} for inspiration, or ask your instructors for tips. If you
@@ -1700,7 +1700,7 @@ with the centroids, or representatives, of each cluster.
 
 
   Cluster the RMSD matrix using the GROMOS method to quantitatively extract representative
-structures of the simulation. Choose peptide backbone for fitting and all-atoms of peptide as output. This is important, since we have will use the output structures for docking.
+structures of the simulation. Choose peptide backbone for fitting and all-atoms of peptide as output. This is important, since we will use the output structures for docking.
 
 
 
@@ -1752,7 +1752,7 @@ these clusters are meaningful, i.e. contain only similar structures?
 
## Picking representatives of the simulation -The aim of this simulation exercise was the sample the conformational landscape of the p53 +The aim of this simulation exercise was to sample the conformational landscape of the p53 N-terminal transactivation peptide, in order to extract representatives that could be used to generate models of its interaction with the MDM2 protein. The last step of clustering provides an unbiased method to select structures that were sampled throughout most of the trajectory (large