small grammar changes

danielskatz · web-flow · commit 18ae6e9631f0 · 2020-05-27T14:51:26.000-05:00
diff --git a/paper/paper.md b/paper/paper.md
@@ -20,20 +20,20 @@ bibliography: paper.bibtex
 
 # Summary
 
-The analysis of biomolecular structures is a crucial task for a wide range of applications ranging from drug design to protein engineering. The Protein Data Bank (PDB) file format [@pdb] is the most popular format to describe biomolecular structures such as proteins and nucleic acids. In this text-based format, each line represents a given atom and entails its main properties such as atom name and identifier, residue name and identifier, chain identifier, coordinates, etc. Several solutions have been developed to parse PDB files in dedicated objects that facilitate the analysis and manipulation of biomolecular structures. This is, for example, the case of the ``BioPython``  parser [@biopython,@biopdb] that loads PDB files in a nested dictionary whose structure mimics the hierarchical nature of the biomolecular structure. Selecting a given sub-part of the biomolecule can then be done by going through the dictionary and selecting the required atoms. Other packages, such as ``ProDy`` [@prody], ``BioJava`` [@biojava], ``MMTK`` [@mmtk] and ``MDAnalysis`` [@mdanalysis] to cite a few, also offer solutions to parse PDB files. However, these parsers are embedded in large codebases that are sometimes difficult to integrate with new applications and are often geared toward the analysis of molecular dynamics simulations. Light-weight applications such as ``pdb-tools`` [@pdbtools] lack the capabilities to manipulate coordinates.
+The analysis of biomolecular structures is a crucial task for a wide range of applications ranging from drug design to protein engineering. The Protein Data Bank (PDB) file format [@pdb] is the most popular format to describe biomolecular structures such as proteins and nucleic acids. In this text-based format, each line represents a given atom and entails its main properties such as atom name and identifier, residue name and identifier, chain identifier, coordinates, etc. Several solutions have been developed to parse PDB files into dedicated objects that facilitate the analysis and manipulation of biomolecular structures. This is, for example, the case for the ``BioPython``  parser [@biopython,@biopdb] that loads PDB files into a nested dictionary, the structure of which mimics the hierarchical nature of the biomolecular structure. Selecting a given sub-part of the biomolecule can then be done by going through the dictionary and selecting the required atoms. Other packages, such as ``ProDy`` [@prody], ``BioJava`` [@biojava], ``MMTK`` [@mmtk] and ``MDAnalysis`` [@mdanalysis] to cite a few, also offer solutions to parse PDB files. However, these parsers are embedded in large codebases that are sometimes difficult to integrate with new applications and are often geared toward the analysis of molecular dynamics simulations. Lightweight applications such as ``pdb-tools`` [@pdbtools] lack the capabilities to manipulate coordinates.
 
 
 
-We present here the Python package ``pdb2sql``, which loads individual PDB files in a relational database. Among different solutions the Structured Query Language (SQL) is a very popular solution to query a given database. However SQL queries are complex and domain scientists such as bioinformaticians are usually not familiar with them. This represents an important barrier for the adoption of SQL technology in bioinformatics. ``pdb2sql`` exposes complex SQL queries through simple Python methods that are intuitive for end users.  As such, our package leverages the power of SQL queries and remove the barrier that SQL complexity represents. In addition, several advanced modules have also been built, for example to rotate or translate biomolecular structures, to characterize interface contacts, and to measure structure similarity between two protein complexes. Additional modules can easily be developed following the same scheme. As a consequence, ``pdb2sql`` is a light-weight and versatile PDB tool that is easy to extend and to integrate with new applications.
+We present here the Python package ``pdb2sql``, which loads individual PDB files into a relational database. Among different solutions, the Structured Query Language (SQL) is a very popular solution to query a given database. However SQL queries are complex and domain scientists such as bioinformaticians are usually not familiar with them. This represents an important barrier to the adoption of SQL technology in bioinformatics. ``pdb2sql`` exposes complex SQL queries through simple Python methods that are intuitive for end users.  As such, our package leverages the power of SQL queries and removes the barrier that SQL complexity represents. In addition, several advanced modules have also been built, for example, to rotate or translate biomolecular structures, to characterize interface contacts, and to measure structure similarity between two protein complexes. Additional modules can easily be developed following the same scheme. As a consequence, ``pdb2sql`` is a lightweight and versatile PDB tool that is easy to extend and to integrate with new applications.
 
 
 # Capabilities of ``pdb2sql``
 
-``pdb2sql`` allows to query, manipulate and process PDB files through a series of dedicated classes. We give an overview of these features and illustrate them with snippets of code. More examples can be found in the documentation (https://pdb2sql.readthedocs.io).
+``pdb2sql`` allows a user to query, manipulate, and process PDB files through a series of dedicated classes. We give an overview of these features and illustrate them with snippets of code. More examples can be found in the documentation (https://pdb2sql.readthedocs.io).
 
 ## Extracting data from PDB files
 
-``pdb2sql`` allows to simply query the database using the ``get(attr, **kwargs)`` method. The attribute ``attr`` is here a list of or a single column name of the ``SQL`` database, see Table 1 for available attributes. The keyword argument ``kwargs`` can then be used to specify a sub-selection of atoms.
+``pdb2sql`` allows a user to simply query the database using the ``get(attr, **kwargs)`` method. The attribute ``attr`` here is a list of or a single column name of the ``SQL`` database; see Table 1 for available attributes. The keyword argument ``kwargs`` can then be used to specify a sub-selection of atoms.
 
 Table 1. Atom attributes and associated definitions in ``pdb2sql``
 
@@ -55,7 +55,7 @@ Table 1. Atom attributes and associated definitions in ``pdb2sql``
 | model     | Model serial number                       |
 
 
-Every attribute name can be used to select specific atoms and multiple conditions can be easily combined. For example, let's consider the following example :
+Every attribute name can be used to select specific atoms and multiple conditions can be easily combined. For example, let's consider the following example:
 
 ```python
 from pdb2sql import pdb2sql
@@ -66,7 +66,7 @@ atoms = pdb.get('x,y,z',
                 chainID='A')
 ```
 
-This snippet extracts the coordinates of the carbon and hydrogen atoms that belong to all the valine and leucine residues of the chain labelled `A` in the PDB file. Atoms can also be excluded from the selection by appending the prefix ``no_`` to the attribute name. This is the case in the following example :
+This snippet extracts the coordinates of the carbon and hydrogen atoms that belong to all the valine and leucine residues of the chain labelled `A` in the PDB file. Atoms can also be excluded from the selection by appending the prefix ``no_`` to the attribute name. This is the case in the following example:
 
 ```python
 from pdb2sql import pdb2sql
@@ -78,7 +78,7 @@ This snippet extracts the atom and residue names of all atoms except those belon
 
 ## Manipulating PDB files
 
-The data contained in the SQL database can also be modified using the ``update(attr, vals, **kwargs)`` method. The attributes and keyword arguments are identical to those in the ``get`` method. The ``vals`` argument should contain a `numpy` array whose dimension should match the selection criteria.  For example :
+The data contained in the SQL database can also be modified using the ``update(attr, vals, **kwargs)`` method. The attributes and keyword arguments are identical to those in the ``get`` method. The ``vals`` argument should contain a `numpy` array whose dimension should match the selection criteria.  For example:
 
 ```python
 import numpy as np
@@ -103,7 +103,7 @@ trans_vec = np.array([0,5,0])
 transform.translation(pdb, trans_vec, resSeq=1, chainID='A')
 ```
 
-One can also rotate a given selection around a given axis with the `rotate_axis` method :
+One can also rotate a given selection around a given axis with the `rotate_axis` method:
 
 ```python
 angle = np.pi
@@ -113,7 +113,7 @@ transform.rot_axis(pdb, axis, angle, resSeq=1, chainID='A')
 
 ## Identifying interface
 
-The ``interface`` class is derived from the ``pdb2sql`` class and offers functionalities to identify contact atoms or residues between two different chains with a given contact distance. It is useful for extracting and analysing the interface of e.g. protein-protein complexes. The following example snippet returns all the atoms and all the residues of the interface of '1AK4.pdb' defined by a contact distance of 6 Å.
+The ``interface`` class is derived from the ``pdb2sql`` class and offers functionality to identify contact atoms or residues between two different chains with a given contact distance. It is useful for extracting and analysing the interface of, e.g., protein-protein complexes. The following example snippet returns all the atoms and all the residues of the interface of '1AK4.pdb' defined by a contact distance of 6 Å.
 
 ```python
 from pdb2sql import interface
@@ -138,7 +138,7 @@ res = pdbitf.get_contact_residues(cutoff=6.0)
 
 ## Computing Structure Similarity
 
-The ``StructureSimilarity`` class allows to compute similarity measures between two protein-protein complexes. Several popular measures used to classify qualities of protein complex structures in the CAPRI (Critical Assessment of PRedicted Interactions) challenges [@capri] have been implemented: interface rmsd, ligand rmsd,  fraction of native contacts and DockQ[@dockq]. The approach implemented to compute the interface rmsd and ligand rmsd is identical to the well-known package ``ProFit`` [@profit]. All the methods required to superimpose structures have been implemented in the ``transform`` class and therefore relies on no external dependencies. The following snippet shows how to compute these measures:
+The ``StructureSimilarity`` class allows a user to compute similarity measures between two protein-protein complexes. Several popular measures used to classify qualities of protein complex structures in the CAPRI (Critical Assessment of PRedicted Interactions) challenges [@capri] have been implemented: interface rmsd, ligand rmsd,  fraction of native contacts and DockQ [@dockq]. The approach implemented to compute the interface rmsd and ligand rmsd is identical to the well-known package ``ProFit`` [@profit]. All the methods required to superimpose structures have been implemented in the ``transform`` class and therefore this relies on no external dependencies. The following snippet shows how to compute these measures:
 
 ```python
 from pdb2sql import StructureSimilarity
@@ -154,7 +154,7 @@ dockQ = sim.compute_DockQScore(fnat, lrmsd, irmsd)
 
 
 # Application
-``psb2sql`` has been used at the Netherlands eScience center for bioinformatics projects. This is, for example, the case of ``iScore`` [@iscore] that uses graph kernels and support vector machines to rank protein-protein interface. We illustrate here the use of the package by computing the interface rmsd and ligand rmsd of a series of structural models using the experimental structure as a reference. This is a common task for protein-protein docking where a large number of docked conformations are generated and have then to be compared to ground truth to identify the best-generated poses. This calculation is usually done using the ProFit software and we, therefore, compare our results with those obtained with ProFit. The code does compute the similarity measure for different decoys is simple:
+``psb2sql`` has been used at the Netherlands eScience center for bioinformatics projects. This is, for example, the case of ``iScore`` [@iscore], which uses graph kernels and support vector machines to rank protein-protein interfaces. We illustrate the use of the package here by computing the interface rmsd and ligand rmsd of a series of structural models using the experimental structure as a reference. This is a common task for protein-protein docking, where a large number of docked conformations are generated and have then to be compared to ground truth to identify the best-generated poses. This calculation is usually done using the ProFit software and we, therefore, compare our results with those obtained with ProFit. The code to compute the similarity measure for different decoys is simple:
 
 ```python
 from pdb2sql import StructureSimilarity
@@ -168,13 +168,13 @@ for d in decoys:g
     irmsd[d] = sim.compute_irmsd_fast(method='svd', izone='1AK4.izone')
 ```
 
-Note that the method will compute the i-zone, i.e. the zone of the proteins that form the interface in a similar way than ProFit. This is done for the first calculations and the i-zone is then reused for the subsequent calculations. The comparison of our interface rmsd values to those given by ProFit is shown in Fig 1.
+Note that the method will compute the i-zone, i.e., the zone of the proteins that form the interface in a similar way to ProFit. This is done for the first calculations and the i-zone is then reused for the subsequent calculations. The comparison of our interface rmsd values to those given by ProFit is shown in Fig 1.
 
 ![Example figure.](sim.png)
 Figure 1. Left - Superimposed model (green) and reference (cyan) structures. Right - comparison of interface rmsd values given by `pdb2sql` and by `ProFit`.
 
 # Acknowledgements
-We acknowledge contributions from Li Xue, Sonja Georgievska and Lars Ridder.
+We acknowledge contributions from Li Xue, Sonja Georgievska, and Lars Ridder.
 
 
 # References