graph-database-benchmark/benchmark/tigergraph/README at master · RedisGraph/graph-database-benchmark · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
############################################################
# Copyright (c)  2015-now, TigerGraph Inc.
# All rights reserved
# It is provided as it is for benchmark reproducible purpose.
# anyone can use it for benchmark purpose with the
# acknowledgement to TigerGraph.
# Author: Mingxi Wu mingxi.wu@tigergraph.com
############################################################

This article documents the details on how to reproduce the graph database benchmark result on TigerGraph Dev Edtion.

Data Sets
===========

- graph500 edge file: http://service.tigergraph.com/download/benchmark/dataset/graph500-22/graph500-22

- twitter edge file: http://service.tigergraph.com/download/benchmark/dataset/twitter/twitter_rv.tar.gz


Hardware & Major enviroment
================================
- Amazon EC2 machine r4.8xlarge.
- OS Ubuntu 14.04.5 LTS
- Java build 1.8.0_144-b01 (folow http://www.webupd8.org/2012/09/install-oracle-java-8-in-ubuntu-via-ppa.html)
- you need to install the following python modules
$ sudo apt-get install python-pip python-dev build-essential
$ sudo pip install --upgrade pip
$ sudo pip install --upgrade virtualenv
$ sudo pip install tornado
$ sudo pip install neo4j-driver
$ sudo pip install requests
$ sudo apt-get install libcurl4-openssl-dev
$ sudo apt-get install libssl-dev
$ sudo pip install pycurl


- 32 vCPUs
- 244 GiB memory
- attached a 250G  EBS-optimized Provisioned IOPS SSD (IO1), IOPS we set is 50 IOPS/GiB
  Raw data and neo4j binary are put on this SSD.

TigerGraph Version
===================
- TigerGraph Devloper Editon downloaded from https://www.tigergraph.com/download

Install TigerGraph
===================

- installation video reference here https://www.youtube.com/watch?v=q-vAioBUwkI&t=3s

- put the ExprFunctions.hpp under tigergraph/dev/gdk/gsql/src/QueryUdf/ folder.
  It contains a udf for WCC use.

Open READ and WRITE permission for TigerGraph User
==================================================
If you installed TigerGraph with the default user, it's tigergraph.

Grant Read and Write permission to the tigergraph user on
- raw data folder.
- the script folder.

Configure timeout
===================
Setup timeout to 180 s
Use the following command to change Restpp.timeout_seconds to  3600
 gadmin --configure timeout_seconds

Use the following two commands to apply the config change.
 gadmin config-apply
 gadmin restart

- use the following command to check TigerGraph service
gadmin status


Graph500
==================
Setup schema
----------------
On a bash shell, switch to the user you specified during TigerGraph installation.
By default, the user is tigergraph and password is tigergraph.

When all TigerGraph service is up, install the schema, loading job and queries on
the bash command line.

gsql graph500_setup.gsql

Note: this setup takes up to several mintues.

Load data
---------------
On a bash shell, make sure you have permission to READ the raw edge file. Then, run the loading job by

nohup  gsql -g graph500 'run loading job load_graph500_edge using file1="/ebs/raw/graph500-22/graph500-22"'

Note: In the above command, replace file1="your_raw_data_path"


Run the benchmark
--------------------
In bash shell

./benchmark-graph500.sh "path_to_graph500_raw_data_folder"
e.g.

./benchmark-graph500.sh "/ebs/raw/graph500/"

A random seed folder for k-hop-path-neighbor-count query will be generated.
A result folder will be generated to keep the benchmark result.
This shell script will run the following workload in sequel.

1. k=1 for k-hop-path-neighbor-count query over 1000 random selected vertices.
2. k=2 for k-hop-path-neighbor-count query over 1000 random selected vertices.
3. 3 times of the wcc algorithm over the graph500 data
4. 3 times of the page rank algorithm 10 iterations over the graph500 data

Twitter
==================
Setup schema
----------------
On a bash shell, switch to the user you specified during TigerGraph installation.
By default, the user is tigergraph and password is tigergraph.

When all TigerGraph service is up, install the schema, loading job and queries on
the bash command line.

gsql twitter_setup.gsql

Note: this setup takes up to several mintues.

Load data
---------------
On a bash shell, make sure you have permission to READ the raw edge file. Then, run the loading job by

nohup  gsql -g twitter 'run loading job load_twitter_edge using file1="/ebs/raw/twitter_rv/twitter_rv.net"'

Note: In the above command, replace file1="your_raw_data_path"


Run the benchmark
--------------------
In bash shell

./benchmark-twitter.sh "path_to_twitter_raw_data_folder"
e.g.

./benchmark-twitter.sh "/ebs/raw/twitter_rv/"

A random seed folder for k-hop-path-neighbor-count query will be generated.
A result folder will be generated to keep the benchmark result.
This shell script will run the following workload in sequel.

1. k=1 for k-hop-path-neighbor-count query over 1000 random selected vertices.
2. k=2 for k-hop-path-neighbor-count query over 1000 random selected vertices.
3. 3 times of the wcc algorithm over the twitter data
4. 3 times of the page rank algorithm 10 iterations over the twitter-rv data


Mutliple machines
=====================================
r4.2xlarge 8 vCPUs 61 GiB

Setup TigerGraph Enterprise Edition, you need a license key from TigerGraph
- Edit the cluster_config.json to put the node ips, and ssh private key file.
- Make sure you open all ports.
- Then install
./install.sh -cn

Setup the schema, loading job and queries.
gsql twitter_setup.gsql

Then, run the job on one of the node.

gsql -g twitter 'run loading job load_twitter_edge using file1="all:/ebs/raw/split_twitter_rv.net/"'

This will launch a loader on each machine. And the loading time should be much shorter comparing to load from one machine.

Setup Configure
--------------------
- setup segment size to 2^17. In GSQL, enter "set segsize_in_bits=17"
- Setup timeout to 3600 s

Use the following command to change "Restpp.timeout_seconds" to  3600
 gadmin --configure timeout_seconds

Use the following two commands to apply the config change.
 gadmin config-apply
 gadmin restart

 - setup libtcmalloc. Find your installation folder, and the bin folder. E.g. /ebs/install/tigergraph/bin
 gadmin --config runtime
 GPE: "LD_PRELOAD=/ebs/install/tigergraph/bin/libtcmalloc.so "
 gadmin restart all

 - make sure all ports are open. As the installer will deploy binary to each cluster node.

 - the following are convinient parallel ssh commands you can refer to do some preparation.

  parallel-ssh -h server-list  -l ubuntu -P -x "-oStrictHostKeyChecking=no  -i /home/ubuntu/qa_release"  'sudo mount /dev/xvdb /ebs'

  parallel-ssh -h server-list  -l ubuntu -P -x "-oStrictHostKeyChecking=no  -i /home/ubuntu/qa_release"  'sudo apt-get install policycoreutils -y'

  parallel-ssh -h server-list  -l ubuntu -P -x "-oStrictHostKeyChecking=no  -i /home/ubuntu/qa_release"  'sudo mkdir  -p /ebs/raw/twitter'

  parallel-ssh -h server-list  -l ubuntu -P -x "-oStrictHostKeyChecking=no  -i /home/ubuntu/qa_release"  'sudo chown -R  tigergraph /ebs/raw’

Loading preparation
-------------------------
put all node info in a server-list file.
parallel-ssh -h server-list  -l ubuntu -P -x "-oStrictHostKeyChecking=no  -i /home/ubuntu/ssh-private-key"  'sudo mkdir  -p /ebs/split'
 ./split_distribute.sh  ./raw   /ebs/split  net

Run the following to test page rank:
nohup gsql twitter_setup.gsql;nohup gsql -g twitter 'run loading job load_twitter_edge using file1="all:/ebs/split"';nohup python pg.py twitter-rv tigergraph 10 3