Skip to content
This repository was archived by the owner on Mar 4, 2025. It is now read-only.

Commit 61f2f8e

Browse files
committed
2 parents f5e54b8 + c0f39f9 commit 61f2f8e

File tree

6 files changed

+163
-56
lines changed

6 files changed

+163
-56
lines changed

README.md

Lines changed: 163 additions & 56 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ This is the implementation of the GLL-based context-free path querying (CFPQ) al
66

77
## Performance
88

9-
The proposed solution has been evaluated on several real-world graphs for both the all pairs and the multiple sources scenarios. The evaluation shows that the proposed solution is more than 25 times faster than the previous solution for Neo4j and is comparable, in some cases, with the linear algebra based solution for RedisGraph.
9+
The proposed solution has been evaluated on several real-world graphs for both the all pairs and the multiple sources scenarios. The evaluation shows that the proposed solution is more than 45 times faster than the previous solution for Neo4j and is comparable, in some scenarios and cases, with the linear algebra based solution.
1010

1111
**Machine configuration**: PC with Ubuntu 20.04, Intel Core i7-6700 3.40GHz CPU, DDR4 64Gb RAM.
1212

@@ -30,6 +30,8 @@ A detailed description of the graphs is listed bellow.
3030

3131
| Graph name | \|*V*\| | \|*E*\| | #subClassOf | #type | #broaderTransitive |
3232
|:-------------|:----------:|:----------:|:-----------:|:---------:|:------------------:|
33+
| Core | 1 323 | 2 752 | 178 | 0 | 0 |
34+
| Pathways | 6 238 | 12 363 | 3 117 | 3 118 | 0 |
3335
| Go hierarchy | 45 007 | 490 109 | 490 109 | 0 | 0 |
3436
| Enzyme | 48 815 | 86 543 | 8 163 | 14 989 | 8 156 |
3537
| Eclass_514en | 239 111 | 360 248 | 90 962 | 72 517 | 0 |
@@ -41,9 +43,21 @@ A detailed description of the graphs is listed bellow.
4143

4244
| Graph name | \|*V*\| | \|*E*\| | #a | #d |
4345
|:-------------|:----------:|:----------:|:---------:|:---------:|
44-
| Init | 2 446 224 | 2 112 809 | 481 994 | 1 630 815 |
45-
| Drivers | 4 273 803 | 3 707 769 | 858 568 | 2 849 201 |
46-
| Kernel | 11 254 434 | 9 484 213 | 1 981 258 | 7 502 955 |
46+
| Apache | 1 721 418 | 1 510 411 | 362 799 | 1 147 612 |
47+
| Block | 3 423 234 | 2 951 393 | 669 238 | 2 282 155 |
48+
| Fs | 4 177 416 | 3 609 373 | 824 430 | 2 784 943 |
49+
| Ipc | 3 401 022 | 2 931 498 | 664 151 | 2 267 347 |
50+
| Lib | 3 401 355 | 2 931 880 | 664 311 | 2 267 569 |
51+
| Mm | 2 538 243 | 2 191 079 | 498 918 | 1 692 161 |
52+
| Net | 4 039 470 | 3 500 141 | 807 162 | 2 692 979 |
53+
| Postgre | 5 203 419 | 4 678 543 | 1 209 597 | 3 468 946 |
54+
| Security | 3 479 982 | 3 003 326 | 683 339 | 2 319 987 |
55+
| Sound | 3 528 861 | 3 049 732 | 697 159 | 2 352 573 |
56+
| Init | 2 446 224 | 2 112 809 | 481 994 | 1 630 815 |
57+
| Arch | 3 448 422 | 2 970 242 | 6 712 95 | 2 298 947 |
58+
| Crypto | 3 464 970 | 2 988 387 | 678 408 | 2 309 979 |
59+
| Drivers | 4 273 803 | 3 707 769 | 858 568 | 2 849 201 |
60+
| Kernel | 11 254 434 | 9 484 213 | 1 981 258 | 7 502 955 |
4761

4862
### Grammars
4963

@@ -81,8 +95,10 @@ Grammar used for **static code analysis** graphs:
8195
```
8296

8397
### Results
84-
85-
The results of the **all pairs reachability** queries evaluation are presented in the table below.
98+
99+
The results of the **all pairs reachability** queries evaluation on graphs related to **RDF analysis** are listed below.
100+
101+
The sign ’–’ in cells means that the respective query is not applicable to the graph, so time is not measured.
86102

87103
<table>
88104
<thead>
@@ -91,7 +107,6 @@ The results of the **all pairs reachability** queries evaluation are presented i
91107
<th colspan="2" align="center">G<sub>1</sub></th>
92108
<th colspan="2" align="center">G<sub>2</sub></th>
93109
<th colspan="2" align="center">Geo</th>
94-
<th colspan="2" align="center">PointsTo</th>
95110
</tr>
96111
<tr>
97112
<td align="center">time (sec)</td>
@@ -100,119 +115,210 @@ The results of the **all pairs reachability** queries evaluation are presented i
100115
<td align="center">#answer</td>
101116
<td align="center">time (sec)</td>
102117
<td align="center">#answer</td>
103-
<td align="center">time (sec)</td>
104-
<td align="center">#answer</td>
105118
</tr>
106119
</thead>
107120
<tbody>
121+
<tr>
122+
<td align="left">Core</td>
123+
<td align="center">0,02</td>
124+
<td>204</td>
125+
<td align="center">0,01</td>
126+
<td align="center">214</td>
127+
<td align="center">–</td>
128+
<td align="center">–</td>
129+
</tr><tr>
130+
<td align="left">Pathways</td>
131+
<td align="center">0,07</td>
132+
<td>884</td>
133+
<td align="center">0,04</td>
134+
<td align="center">3117</td>
135+
<td align="center">–</td>
136+
<td align="center">–</td>
137+
</tr>
108138
<tr>
109139
<td align="left">Go hierarchy</td>
110-
<td align="center">564,72</td>
140+
<td align="center">3,68</td>
111141
<td>588 976</td>
112-
<td align="center">2813,50</td>
142+
<td align="center">5,42</td>
113143
<td align="center">738 937</td>
114144
<td align="center">–</td>
115145
<td align="center">–</td>
116-
<td align="center">–</td>
117-
<td align="center">–</td>
118146
</tr>
119147
<tr>
120148
<td align="left">Enzyme</td>
121-
<td align="center">0,19</td>
149+
<td align="center">0,22</td>
122150
<td>396</td>
123151
<td align="center">0,17</td>
124152
<td align="center">8163</td>
125-
<td align="center">8,54</td>
153+
<td align="center">5,7</td>
126154
<td align="center">14 267 542</td>
127-
<td align="center">–</td>
128-
<td align="center">–</td>
129155
</tr>
130156
<tr>
131-
<td align="left">Eclass_514en</td>
132-
<td align="center">295,06</td>
157+
<td align="left">Eclass</td>
158+
<td align="center">1,5</td>
133159
<td>90 994</td>
134-
<td align="center">279,80</td>
160+
<td align="center">0,97</td>
135161
<td align="center">96 163</td>
136162
<td align="center">–</td>
137163
<td align="center">–</td>
138-
<td align="center">–</td>
139-
<td align="center">–</td>
140164
</tr>
141165
<tr>
142166
<td align="left">Geospecies</td>
143-
<td align="center">2,64</td>
167+
<td align="center">2,89</td>
144168
<td>85</td>
145-
<td align="center">2,00</td>
169+
<td align="center">2,65</td>
146170
<td align="center">0</td>
147-
<td align="center">256,86</td>
171+
<td align="center">145,8</td>
148172
<td align="center">226 669 749</td>
149-
<td align="center">–</td>
150-
<td align="center">–</td>
151173
</tr>
152174
<tr>
153175
<td align="left">Go</td>
154-
<td align="center">11,18</td>
176+
<td align="center">5,56</td>
155177
<td>640 316</td>
156-
<td align="center">10,00</td>
178+
<td align="center">4,24</td>
157179
<td align="center">659 501</td>
158180
<td align="center">–</td>
159181
<td align="center">–</td>
160-
<td align="center">–</td>
161-
<td align="center">–</td>
162182
</tr>
163183
<tr>
164184
<td align="left">Taxonomy</td>
165-
<td align="center">43,72</td>
185+
<td align="center">45,47</td>
166186
<td>151 706</td>
167-
<td align="center">29,58</td>
187+
<td align="center">36,06</td>
168188
<td align="center">2 112 637</td>
169189
<td align="center">–</td>
170190
<td align="center">–</td>
191+
</tr>
192+
</tbody>
193+
</table>
194+
195+
The evaluation results for **single source** CFPQ for graphs related to **RDF analysis** and **G<sub>1**, **G<sub>2**, **Geo** grammars respectively in **reachability** and **all paths** scenarious:
196+
197+
![time](https://github.com/JetBrains-Research/GLL4Graph/blob/master/docs/pictures/ss-g1.png)
198+
![time](https://github.com/JetBrains-Research/GLL4Graph/blob/master/docs/pictures/ss-g2.png)
199+
![time](https://github.com/JetBrains-Research/GLL4Graph/blob/master/docs/pictures/ss-geo.png)
200+
201+
202+
The results for graphs related to static code analysis are compared to results of Azimov’s CFPQ algorithm based on matrix operations. [The implementation](https://github.com/JetBrains-Research/CFPQ_PyAlgo/blob/master/src/problems/Base/algo/matrix_base/matrix_base.py)
203+
from [CFPQ_PyAlgo](https://github.com/JetBrains-Research/CFPQ_PyAlgo) was taken as the implementation of the matrix CFPQ algorithm. This library contains the implementation for both scenarios, all pairs reachability and single source reachability. To perform matrix operations pygraphblas is used. [Pygraphblas](https://github.com/Graphegon/pygraphblas) is a python wrapper over the SuiteSparse library, which based on the [GraphBLAS](http://graphblas.org/index.php?title=Graph_BLAS_Forum) framework.
204+
205+
The results of the **all pairs reachability** queries evaluation on graphs related to **static code analysis** are listed below.
206+
207+
The sign ’–’ in cells means that the respective query and graph require a considerable amount of memory during algorithm execution that leads to unpredictable time to get the result.
208+
209+
<table>
210+
<thead>
211+
<tr>
212+
<th rowspan="2" align="left">Graph name</th>
213+
<th colspan="3" align="center">PointsTo</th>
214+
</tr>
215+
<tr>
216+
<td align="center">Neo4j time (sec)</td>
217+
<td align="center"> GraphBLAS time (sec)</td>
218+
<td>#answer</td>
219+
</tr>
220+
</thead>
221+
<tbody>
222+
<tr>
223+
<td align="left">Apache</td>
171224
<td align="center">–</td>
225+
<td align="center">536,7</td>
226+
<td align="center">92 806 768</td>
227+
</tr>
228+
<tr>
229+
<td align="left">Block</td>
230+
<td align="center">113,01</td>
231+
<td align="center">123,88</td>
232+
<td align="center">5 351 409</td>
233+
</tr>
234+
<tr>
235+
<td align="left">Fs</td>
236+
<td align="center">167,73</td>
237+
<td align="center">105,72</td>
238+
<td align="center">9 646 475</td>
239+
</tr>
240+
<tr>
241+
<td align="left">Ipc</td>
242+
<td align="center">109,43</td>
243+
<td align="center">79,52</td>
244+
<td align="center">5 249 389</td>
245+
</tr>
246+
<tr>
247+
<td align="left">Lib</td>
248+
<td align="center">111,09</td>
249+
<td align="center">121,79</td>
250+
<td align="center">5 276 303</td>
251+
</tr>
252+
<tr>
253+
<td align="left">Mm</td>
254+
<td align="center">77,92</td>
255+
<td align="center">84,15</td>
256+
<td align="center">3 990 305</td>
257+
</tr>
258+
<tr>
259+
<td align="left">Net</td>
260+
<td align="center">160,64</td>
261+
<td align="center">206,29</td>
262+
<td align="center">8 833 403</td>
263+
</tr>
264+
<tr>
265+
<td align="left">Postgre</td>
172266
<td align="center">–</td>
267+
<td align="center">969,88</td>
268+
<td align="center"> 90 661 446</td>
269+
</tr>
270+
<tr>
271+
<td align="left">Security</td>
272+
<td align="center">115,75</td>
273+
<td align="center">181,7</td>
274+
<td align="center">5 593 387</td>
275+
</tr>
276+
<tr>
277+
<td align="left">Sound</td>
278+
<td align="center">120,14</td>
279+
<td align="center">133,64</td>
280+
<td align="center">6 085 269</td>
173281
</tr>
174282
<tr>
175283
<td align="left">Init</td>
176-
<td align="center">–</td>
177-
<td>–</td>
178-
<td align="center">–</td>
179-
<td align="center">–</td>
180-
<td align="center">–</td>
181-
<td align="center">–</td>
182-
<td align="center">113,35</td>
284+
<td align="center">87,25</td>
285+
<td align="center">45,84</td>
183286
<td align="center">3 783 769</td>
184287
</tr>
288+
<tr>
289+
<td align="left">Arch</td>
290+
<td align="center">130,77</td>
291+
<td align="center">119,92</td>
292+
<td align="center">5 339 563</td>
293+
</tr>
294+
<tr>
295+
<td align="left">Crypto</td>
296+
<td align="center">128,8</td>
297+
<td align="center">122,09</td>
298+
<td align="center">5 428 237</td>
299+
</tr>
185300
<tr>
186301
<td align="left">Drivers</td>
187-
<td align="center">–</td>
188-
<td>–</td>
189-
<td align="center">–</td>
190-
<td align="center">–</td>
191-
<td align="center">–</td>
192-
<td align="center">–</td>
193-
<td align="center">736,81</td>
302+
<td align="center">371,18</td>
303+
<td align="center">279,39</td>
194304
<td align="center">18 825 025</td>
195305
</tr>
196306
<tr>
197307
<td align="left">Kernel</td>
198-
<td align="center">–</td>
199-
<td>–</td>
200-
<td align="center">–</td>
201-
<td align="center">–</td>
202-
<td align="center">–</td>
203-
<td align="center">–</td>
204-
<td align="center">850,46</td>
308+
<td align="center">614,047</td>
309+
<td align="center">378,05</td>
205310
<td align="center">16 747 731</td>
206311
</tr>
207312
</tbody>
208313
</table>
209314

210315
<br/>
211316

212-
The evaluation results for **multiple source** CFPQ for **Geospecies** graph and **Geo** grammar in reachability and all path scenarious are listed bellow.
213317

214-
![time](https://github.com/YaccConstructor/iguana/blob/GLL-for-graph/docs/pictures/geospecies_chunks.svg?raw=true&sanitize=true)
318+
The evaluation results for **single source** CFPQ for graphs related to **static code analysis** and **pointsTo** grammar in **reachability** and **all paths** scenarious:
215319

320+
![time](https://github.com/JetBrains-Research/GLL4Graph/blob/master/docs/pictures/stat-m.png)
321+
216322
## Download and build
217323

218324
The project is build with Maven.
@@ -271,3 +377,4 @@ graph_loader.py --graph core --relationships subClassOf,type
271377

272378
This project is licensed under OpenBSD License. License text can be found in the
273379
[license file](https://github.com/JetBrains-Research/GLL4Graph/blob/GLL-for-graph/LICENSE.md).
380+

docs/pictures/ss-g1.png

78.2 KB
Loading

docs/pictures/ss-g2.png

76 KB
Loading

docs/pictures/ss-geo.png

54.5 KB
Loading

docs/pictures/ss-stat.png

99.6 KB
Loading

docs/pictures/stat-m.png

129 KB
Loading

0 commit comments

Comments
 (0)