Skip to content

Commit 18ae6e9

Browse files
authored
small grammar changes
1 parent 7a6d7cd commit 18ae6e9

1 file changed

Lines changed: 13 additions & 13 deletions

File tree

paper/paper.md

Lines changed: 13 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -20,20 +20,20 @@ bibliography: paper.bibtex
2020

2121
# Summary
2222

23-
The analysis of biomolecular structures is a crucial task for a wide range of applications ranging from drug design to protein engineering. The Protein Data Bank (PDB) file format [@pdb] is the most popular format to describe biomolecular structures such as proteins and nucleic acids. In this text-based format, each line represents a given atom and entails its main properties such as atom name and identifier, residue name and identifier, chain identifier, coordinates, etc. Several solutions have been developed to parse PDB files in dedicated objects that facilitate the analysis and manipulation of biomolecular structures. This is, for example, the case of the ``BioPython`` parser [@biopython,@biopdb] that loads PDB files in a nested dictionary whose structure mimics the hierarchical nature of the biomolecular structure. Selecting a given sub-part of the biomolecule can then be done by going through the dictionary and selecting the required atoms. Other packages, such as ``ProDy`` [@prody], ``BioJava`` [@biojava], ``MMTK`` [@mmtk] and ``MDAnalysis`` [@mdanalysis] to cite a few, also offer solutions to parse PDB files. However, these parsers are embedded in large codebases that are sometimes difficult to integrate with new applications and are often geared toward the analysis of molecular dynamics simulations. Light-weight applications such as ``pdb-tools`` [@pdbtools] lack the capabilities to manipulate coordinates.
23+
The analysis of biomolecular structures is a crucial task for a wide range of applications ranging from drug design to protein engineering. The Protein Data Bank (PDB) file format [@pdb] is the most popular format to describe biomolecular structures such as proteins and nucleic acids. In this text-based format, each line represents a given atom and entails its main properties such as atom name and identifier, residue name and identifier, chain identifier, coordinates, etc. Several solutions have been developed to parse PDB files into dedicated objects that facilitate the analysis and manipulation of biomolecular structures. This is, for example, the case for the ``BioPython`` parser [@biopython,@biopdb] that loads PDB files into a nested dictionary, the structure of which mimics the hierarchical nature of the biomolecular structure. Selecting a given sub-part of the biomolecule can then be done by going through the dictionary and selecting the required atoms. Other packages, such as ``ProDy`` [@prody], ``BioJava`` [@biojava], ``MMTK`` [@mmtk] and ``MDAnalysis`` [@mdanalysis] to cite a few, also offer solutions to parse PDB files. However, these parsers are embedded in large codebases that are sometimes difficult to integrate with new applications and are often geared toward the analysis of molecular dynamics simulations. Lightweight applications such as ``pdb-tools`` [@pdbtools] lack the capabilities to manipulate coordinates.
2424

2525

2626

27-
We present here the Python package ``pdb2sql``, which loads individual PDB files in a relational database. Among different solutions the Structured Query Language (SQL) is a very popular solution to query a given database. However SQL queries are complex and domain scientists such as bioinformaticians are usually not familiar with them. This represents an important barrier for the adoption of SQL technology in bioinformatics. ``pdb2sql`` exposes complex SQL queries through simple Python methods that are intuitive for end users. As such, our package leverages the power of SQL queries and remove the barrier that SQL complexity represents. In addition, several advanced modules have also been built, for example to rotate or translate biomolecular structures, to characterize interface contacts, and to measure structure similarity between two protein complexes. Additional modules can easily be developed following the same scheme. As a consequence, ``pdb2sql`` is a light-weight and versatile PDB tool that is easy to extend and to integrate with new applications.
27+
We present here the Python package ``pdb2sql``, which loads individual PDB files into a relational database. Among different solutions, the Structured Query Language (SQL) is a very popular solution to query a given database. However SQL queries are complex and domain scientists such as bioinformaticians are usually not familiar with them. This represents an important barrier to the adoption of SQL technology in bioinformatics. ``pdb2sql`` exposes complex SQL queries through simple Python methods that are intuitive for end users. As such, our package leverages the power of SQL queries and removes the barrier that SQL complexity represents. In addition, several advanced modules have also been built, for example, to rotate or translate biomolecular structures, to characterize interface contacts, and to measure structure similarity between two protein complexes. Additional modules can easily be developed following the same scheme. As a consequence, ``pdb2sql`` is a lightweight and versatile PDB tool that is easy to extend and to integrate with new applications.
2828

2929

3030
# Capabilities of ``pdb2sql``
3131

32-
``pdb2sql`` allows to query, manipulate and process PDB files through a series of dedicated classes. We give an overview of these features and illustrate them with snippets of code. More examples can be found in the documentation (https://pdb2sql.readthedocs.io).
32+
``pdb2sql`` allows a user to query, manipulate, and process PDB files through a series of dedicated classes. We give an overview of these features and illustrate them with snippets of code. More examples can be found in the documentation (https://pdb2sql.readthedocs.io).
3333

3434
## Extracting data from PDB files
3535

36-
``pdb2sql`` allows to simply query the database using the ``get(attr, **kwargs)`` method. The attribute ``attr`` is here a list of or a single column name of the ``SQL`` database, see Table 1 for available attributes. The keyword argument ``kwargs`` can then be used to specify a sub-selection of atoms.
36+
``pdb2sql`` allows a user to simply query the database using the ``get(attr, **kwargs)`` method. The attribute ``attr`` here is a list of or a single column name of the ``SQL`` database; see Table 1 for available attributes. The keyword argument ``kwargs`` can then be used to specify a sub-selection of atoms.
3737

3838
Table 1. Atom attributes and associated definitions in ``pdb2sql``
3939

@@ -55,7 +55,7 @@ Table 1. Atom attributes and associated definitions in ``pdb2sql``
5555
| model | Model serial number |
5656

5757

58-
Every attribute name can be used to select specific atoms and multiple conditions can be easily combined. For example, let's consider the following example :
58+
Every attribute name can be used to select specific atoms and multiple conditions can be easily combined. For example, let's consider the following example:
5959

6060
```python
6161
from pdb2sql import pdb2sql
@@ -66,7 +66,7 @@ atoms = pdb.get('x,y,z',
6666
chainID='A')
6767
```
6868

69-
This snippet extracts the coordinates of the carbon and hydrogen atoms that belong to all the valine and leucine residues of the chain labelled `A` in the PDB file. Atoms can also be excluded from the selection by appending the prefix ``no_`` to the attribute name. This is the case in the following example :
69+
This snippet extracts the coordinates of the carbon and hydrogen atoms that belong to all the valine and leucine residues of the chain labelled `A` in the PDB file. Atoms can also be excluded from the selection by appending the prefix ``no_`` to the attribute name. This is the case in the following example:
7070

7171
```python
7272
from pdb2sql import pdb2sql
@@ -78,7 +78,7 @@ This snippet extracts the atom and residue names of all atoms except those belon
7878

7979
## Manipulating PDB files
8080

81-
The data contained in the SQL database can also be modified using the ``update(attr, vals, **kwargs)`` method. The attributes and keyword arguments are identical to those in the ``get`` method. The ``vals`` argument should contain a `numpy` array whose dimension should match the selection criteria. For example :
81+
The data contained in the SQL database can also be modified using the ``update(attr, vals, **kwargs)`` method. The attributes and keyword arguments are identical to those in the ``get`` method. The ``vals`` argument should contain a `numpy` array whose dimension should match the selection criteria. For example:
8282

8383
```python
8484
import numpy as np
@@ -103,7 +103,7 @@ trans_vec = np.array([0,5,0])
103103
transform.translation(pdb, trans_vec, resSeq=1, chainID='A')
104104
```
105105

106-
One can also rotate a given selection around a given axis with the `rotate_axis` method :
106+
One can also rotate a given selection around a given axis with the `rotate_axis` method:
107107

108108
```python
109109
angle = np.pi
@@ -113,7 +113,7 @@ transform.rot_axis(pdb, axis, angle, resSeq=1, chainID='A')
113113

114114
## Identifying interface
115115

116-
The ``interface`` class is derived from the ``pdb2sql`` class and offers functionalities to identify contact atoms or residues between two different chains with a given contact distance. It is useful for extracting and analysing the interface of e.g. protein-protein complexes. The following example snippet returns all the atoms and all the residues of the interface of '1AK4.pdb' defined by a contact distance of 6 Å.
116+
The ``interface`` class is derived from the ``pdb2sql`` class and offers functionality to identify contact atoms or residues between two different chains with a given contact distance. It is useful for extracting and analysing the interface of, e.g., protein-protein complexes. The following example snippet returns all the atoms and all the residues of the interface of '1AK4.pdb' defined by a contact distance of 6 Å.
117117

118118
```python
119119
from pdb2sql import interface
@@ -138,7 +138,7 @@ res = pdbitf.get_contact_residues(cutoff=6.0)
138138

139139
## Computing Structure Similarity
140140

141-
The ``StructureSimilarity`` class allows to compute similarity measures between two protein-protein complexes. Several popular measures used to classify qualities of protein complex structures in the CAPRI (Critical Assessment of PRedicted Interactions) challenges [@capri] have been implemented: interface rmsd, ligand rmsd, fraction of native contacts and DockQ[@dockq]. The approach implemented to compute the interface rmsd and ligand rmsd is identical to the well-known package ``ProFit`` [@profit]. All the methods required to superimpose structures have been implemented in the ``transform`` class and therefore relies on no external dependencies. The following snippet shows how to compute these measures:
141+
The ``StructureSimilarity`` class allows a user to compute similarity measures between two protein-protein complexes. Several popular measures used to classify qualities of protein complex structures in the CAPRI (Critical Assessment of PRedicted Interactions) challenges [@capri] have been implemented: interface rmsd, ligand rmsd, fraction of native contacts and DockQ [@dockq]. The approach implemented to compute the interface rmsd and ligand rmsd is identical to the well-known package ``ProFit`` [@profit]. All the methods required to superimpose structures have been implemented in the ``transform`` class and therefore this relies on no external dependencies. The following snippet shows how to compute these measures:
142142

143143
```python
144144
from pdb2sql import StructureSimilarity
@@ -154,7 +154,7 @@ dockQ = sim.compute_DockQScore(fnat, lrmsd, irmsd)
154154

155155

156156
# Application
157-
``psb2sql`` has been used at the Netherlands eScience center for bioinformatics projects. This is, for example, the case of ``iScore`` [@iscore] that uses graph kernels and support vector machines to rank protein-protein interface. We illustrate here the use of the package by computing the interface rmsd and ligand rmsd of a series of structural models using the experimental structure as a reference. This is a common task for protein-protein docking where a large number of docked conformations are generated and have then to be compared to ground truth to identify the best-generated poses. This calculation is usually done using the ProFit software and we, therefore, compare our results with those obtained with ProFit. The code does compute the similarity measure for different decoys is simple:
157+
``psb2sql`` has been used at the Netherlands eScience center for bioinformatics projects. This is, for example, the case of ``iScore`` [@iscore], which uses graph kernels and support vector machines to rank protein-protein interfaces. We illustrate the use of the package here by computing the interface rmsd and ligand rmsd of a series of structural models using the experimental structure as a reference. This is a common task for protein-protein docking, where a large number of docked conformations are generated and have then to be compared to ground truth to identify the best-generated poses. This calculation is usually done using the ProFit software and we, therefore, compare our results with those obtained with ProFit. The code to compute the similarity measure for different decoys is simple:
158158

159159
```python
160160
from pdb2sql import StructureSimilarity
@@ -168,13 +168,13 @@ for d in decoys:g
168168
irmsd[d] = sim.compute_irmsd_fast(method='svd', izone='1AK4.izone')
169169
```
170170

171-
Note that the method will compute the i-zone, i.e. the zone of the proteins that form the interface in a similar way than ProFit. This is done for the first calculations and the i-zone is then reused for the subsequent calculations. The comparison of our interface rmsd values to those given by ProFit is shown in Fig 1.
171+
Note that the method will compute the i-zone, i.e., the zone of the proteins that form the interface in a similar way to ProFit. This is done for the first calculations and the i-zone is then reused for the subsequent calculations. The comparison of our interface rmsd values to those given by ProFit is shown in Fig 1.
172172

173173
![Example figure.](sim.png)
174174
Figure 1. Left - Superimposed model (green) and reference (cyan) structures. Right - comparison of interface rmsd values given by `pdb2sql` and by `ProFit`.
175175

176176
# Acknowledgements
177-
We acknowledge contributions from Li Xue, Sonja Georgievska and Lars Ridder.
177+
We acknowledge contributions from Li Xue, Sonja Georgievska, and Lars Ridder.
178178

179179

180180
# References

0 commit comments

Comments
 (0)