|
1 | 1 | # PDB2SQL |
2 | 2 |
|
3 | | -pdb2sql allows to use SQL queries to handle PDB files. |
4 | | -The project grew out of the developement of DeepRank and is still very much in development. |
5 | | - |
6 | | -At the moment two strategies are developped one using SQLite3 and the other SQLalchemy. |
7 | | -SQLalchemy allows to have a object oriented approach but seems a bit slower. |
8 | | - |
9 | 3 | [](https://doi.org/10.5281/zenodo.3232888) |
10 | | - |
11 | | - |
12 | 4 | [](https://travis-ci.org/DeepRank/pdb2sql) |
13 | 5 | [](https://coveralls.io/github/DeepRank/pdb2sql) |
14 | 6 | [](https://www.codacy.com/manual/CunliangGeng/pdb2sql?utm_source=github.com&utm_medium=referral&utm_content=DeepRank/pdb2sql&utm_campaign=Badge_Grade) |
15 | | -[](https://pdb2sql.readthedocs.io/) |
16 | | - |
17 | | - |
18 | | -## Installation |
19 | | -<!-- |
20 | | - 1. Clone the repository : `git clone https://github.com/DeepRank/pdb2sql` |
21 | | -
|
22 | | - 2. Go in the repo and type : `pip install -e ./` |
23 | | -
|
24 | | - 3. Test by going in the test folder and type : `pytest` --> |
25 | | - |
26 | | -`pip install pdb2sql` |
27 | | - |
28 | | -## pdb2sql |
29 | | - |
30 | | -The following script loads the pdb file '1AK4.pdb' (must be in the same folder than the script) in a SQLite3 data base in about 0.02 seconds. You can query the data base using the ```pdb2sql.get(attribute,**kwargs)``` method. |
31 | | - |
32 | | -```python |
33 | | -from pdb2sql.pdb2sqlcore import pdb2sql |
34 | | - |
35 | | -#create the database |
36 | | -db = pdb2sql('1AK4.pdb') |
37 | | -print('SQL %f' %(time()-t0)) |
38 | | - |
39 | | -# get the xyz of all the atoms |
40 | | -xyz = db.get('x,y,z',model=0) |
41 | | - |
42 | | -# get the xyz of all the CA, C, O, N atoms of all VAL and LEU residues of chain A |
43 | | -xyz = db.get('x,y,z',chainID='A',resName=['VAL','LEU'],name=['CA','C','O','N']) |
44 | | - |
45 | | -# move the resiude 1 of chain A |
46 | | -xyz = db.get('x,y,z',chainID='A',resSeq=1) |
47 | | -xyz = np.array(xyz) |
48 | | -xyz -= np.mean(xyz) |
49 | | -db.update('x,y,z',xyz,chainID='A',resSeq=1) |
50 | | - |
51 | | -``` |
52 | | - |
53 | | - |
54 | | - |
55 | | -#### SQL Queries |
56 | | - |
57 | | -SQL queries are quite versatile and can be used to return any attribute of the atoms with rather complex selections. As an example: |
58 | | - |
59 | | -```python |
60 | | -xyz = db.get('x,y,z',chainID='A',resName=['VAL','LEU'],name=['CA','C','O','N']) |
61 | | -``` |
62 | | - |
63 | | -returns the positon of the CA, C, N and O atoms of all the residues 'VAL' and 'LEU' of chain A. Any other attribute can be returned (chainID, resName, name .... ) buy using it in the first argument. For example |
64 | | - |
65 | | -```python |
66 | | -data = db.get('name,resSeq,resName',chainID='A') |
67 | | -``` |
68 | | -returns the name, residue number and residue name of all the atoms in chain A. |
69 | | - |
70 | | -#### Negative conditions |
71 | | - |
72 | | -Negative conditions can also be used to exclude some specific atoms from the selection. For example: |
73 | | - |
74 | | -```python |
75 | | -data = db.get('name,resSeq,resName',chainID='A',no_name=['H','N']) |
76 | | -``` |
77 | | - |
78 | | -returns the name, residue number and residue name of all the atoms in chain A **except the Hydrogen and Nitrogens**. All the condition starting with ```no_``` are considered as negation. Therefore: |
79 | | - |
80 | | -```python |
81 | | -data = db.get('name,resSeq,resName',chainID='A',no_resName=['VAL','LEU']) |
82 | | -``` |
83 | | - |
84 | | -will exclude the LEU and VAL residues from the selection. |
85 | | - |
86 | | -#### Modify the database |
87 | | - |
88 | | -The values of the data base can also be update with the pdb2sql.update(attribute,values,kwargs) method. For example |
89 | | - |
90 | | -```python |
91 | | -xyz = db.get('x,y,z',chainID='A',resSeq=1) |
92 | | -xyz = np.array(xyz) |
93 | | -xyz -= np.mean(xyz) |
94 | | -db.update('x,y,z',xyz,chainID='A',resSeq=1) |
95 | | -``` |
96 | | - |
97 | | -Translate the residue of resSeq 1 of chain A to the center of the coordinate. Note that a dedicated module called transform.py can handle translation,rottion, etc of xyz coordinates |
98 | | - |
99 | | -## pdb2sqlAlchemy |
100 | | - |
101 | | -SQLalchemy combine sql queries and object oriented programming. Therfore pdb2sqlAlchemy works in the same way that pdb2sqlcore but returns arrays of objects instead of nested lists. It is however a bit slower. |
102 | | - |
103 | | -```python |
104 | | -from pdb2sql.pdb2sqlAlchemy import pdb2sql_alchemy |
105 | | - |
106 | | -#create the database |
107 | | -db = pdb2sql_alchemy('1AK4.pdb') |
108 | | - |
109 | | -# extract the xyz position of all VAL and LEU resiues of chain A but not the H atoms |
110 | | -xyz = db.get('x,y,z',model=0) #chainID='A',resName=['VAL','LEU'],no_name=['H']) |
111 | | - |
112 | | -# put the data back |
113 | | -db.update('x,y,z',xyz) |
114 | | - |
115 | | -# extract atoms |
116 | | -atoms = db.get(chainID='A',resName=['VAL','LEU'],no_name=['H']) |
117 | | - |
118 | | -for at in atoms: |
119 | | - print(at.name,at.x,at.y,at.z) |
120 | | -``` |
121 | | - |
122 | | -Here as well you can get values from the database and update values to the data base with the methods .get() and .update(). The syntax is identical to the the one of pdbsqlcore: |
123 | | - |
124 | | - |
125 | | -```python |
126 | | -# extract the xyz position of all VAL and LEU resiues of chain A but not the H atoms |
127 | | -xyz = db.get('x,y,z',chainID='A',resName=['VAL','LEU'],no_name=['H']) |
128 | | - |
129 | | -# put the data back |
130 | | -db.update('x,y,z',xyz,chainID='A',resName=['VAL','LEU'],no_name=['H']) |
131 | | -``` |
132 | | - |
133 | | -#### Return ATOM objects |
134 | | - |
135 | | -The main difference is the possibility to to return ATOM objects. This is achieved when no attributes are specified in the .get() call |
136 | | - |
137 | | -```python |
138 | | -atoms = db.get(chainID='A',resName=['VAL','LEU'],no_name=['H']) |
139 | | -``` |
140 | | - |
141 | | -This returns a list of ATOM object. The ATOM class is also defined in pdb2sqlAlchemy.py. We can there extract information about these atoms by calling their attributes |
142 | | - |
143 | | -```python |
144 | | -for at in atoms: |
145 | | - print(at.name,at.x,at.y,at.z) |
146 | | -``` |
147 | | - |
148 | | -## Interface |
149 | | - |
150 | | -The module interface.py contains a class that subclass pdb2sqlcore (Test for pdb2sqlAlchemy not doneyet). It allows to analyze the properties of the interface between two chains contained in the pdb file. The class allows to easily extract the contact atoms and contact residues of the conformation. |
151 | | - |
152 | | -```python |
153 | | -from pdb2sql.interface import interface |
154 | | - |
155 | | -db = interface('1AK4.pdb') |
156 | | -contact_atoms = db.get_contact_atoms() |
157 | | -contact_residues = db.get_contact_residues() |
158 | | -``` |
159 | | - |
160 | | -The methods get_contact_atoms() returns here the rowID of the contact atoms. A few options are available to define the interface. |
161 | | - |
162 | | -## Structure Similarity |
163 | | - |
164 | | -The StructureSimilarity module allows to computeL `irmsd, lrmsd, Fnat` and `dockQ` score of given conformation with respect to its native. The native can be any other conformations as long as the sequences are aligned. |
165 | | - |
166 | | -```python |
167 | | -from pdb2sql.StructureSimilarity import StructureSimilarity |
168 | | - |
169 | | -# create the class instance |
170 | | -sim = StructureSimilarity('1AK4_300w.pdb','1AK4.pdb') |
171 | | - |
172 | | -# compute the irmsd with the two different methods |
173 | | -irmsd_fast = sim.compute_irmsd_fast(method='svd',izone='1AK4.izone') |
174 | | -irmsd = sim.compute_irmsd_pdb2sql(method='svd',izone='1AK4.izone') |
| 7 | +[](https://pdb2sql.readthedocs.io/en/latest/?badge=latest) |
175 | 8 |
|
176 | | -# compute the lrmsd with the two different methods |
177 | | -lrmsd_fast = sim.compute_lrmsd_fast(method='svd',lzone='1AK4.lzone',check=True) |
178 | | -lrmsd = sim.compute_lrmsd_pdb2sql(exportpath=None,method='svd') |
179 | 9 |
|
180 | | -# compute the Fnat with the two different methods |
181 | | -Fnat_fast = sim.compute_Fnat_fast(ref_pairs='1AK4.ref_pairs') |
182 | | -Fnat = sim.compute_Fnat_pdb2sql() |
| 10 | +PDB2SQL is a Python package that allows to use SQL queries to handle PDB files. |
| 11 | +This project grew out of the developement of DeepRank. |
183 | 12 |
|
184 | | -# compute the DOCKQ |
185 | | -dockQ = sim.compute_DockQScore(Fnat_fast,lrmsd_fast,irmsd_fast) |
186 | | -``` |
| 13 | +- Source code: https://github.com/DeepRank/pdb2sql |
| 14 | +- Documentation: https://pdb2sql.readthedocs.io |
187 | 15 |
|
188 | | -As you can see two methods are possible for the calculation of each quantity. We recommend using the **fast** that is faster and better tested. |
| 16 | +It provides: |
| 17 | +- a powerful `pdb2sql` object to manipulate PDB data in SQL database |
| 18 | +- strcuture transformation functions (rotations, translations...) |
| 19 | +- useful capablities to |
| 20 | + - calculate structure interface (contact atoms and residues) |
| 21 | + - calculate structure similarity (iRMSD, lRMSD, FNAT, DockQ...) |
0 commit comments