|
1 | | -Bytehouse is a derivative of ClickHouse. |
2 | | -It is based on very old ClickHouse version (20.4.54418) and many features are unsupported. |
| 1 | +# ByteHouse ClickBench Reproduction |
3 | 2 |
|
4 | | -## Status |
| 3 | +## Reproduce The Result |
5 | 4 |
|
6 | | -ByteHouse's international cloud (bytehouse.cloud) is no longer reachable |
7 | | -from outside the China region. The service still operates within China |
8 | | -via Volcengine. All existing results in this directory were collected |
9 | | -against the international cloud and have been re-tagged with |
10 | | -`"historical"`. Future submissions running against a self-managed |
11 | | -ByteHouse instance (or via Volcengine) should not be tagged historical. |
| 5 | +If you want to reproduce the benchmark result, please send an email to [gaoyuanning@bytedance.com](mailto:gaoyuanning@bytedance.com) to get the EC2 login information. |
12 | 6 |
|
13 | | -https://bytehouse.cloud/signup |
| 7 | +After logging in to the EC2 instance: |
14 | 8 |
|
15 | | -Sign Up. Only Asia-Pacific South-East 1 AWS region is available. Verify email. |
16 | | - |
17 | | -Create virtual warehouse. Size L. |
18 | | - |
19 | | -Go to "Databases" and create database "test". |
20 | | - |
21 | | -Go to "SQL Worksheet" and copy-paste create.sql query there. |
22 | | - |
23 | | -Note: S3 import does not support public buckets. And it requires pasting secret access key, which we are not going to do. So, switch to using CLI. |
24 | | - |
25 | | -Create a machine in ap-southeast-1 region and install Bytehouse CLI: |
26 | | - |
27 | | -``` |
28 | | -wget --continue --progress=dot:giga https://github.com/bytehouse-cloud/cli/releases/download/v1.5.34/bytehouse-cli_1.5.34_Linux_x86_64.tar.gz |
29 | | -tar xvf bytehouse-cli_1.5.34_Linux_x86_64.tar.gz |
30 | | -``` |
31 | | - |
32 | | -``` |
33 | | -export user='...' |
34 | | -export password='...' |
35 | | -export account='AWS...' |
36 | | -export warehouse='test' |
37 | | -``` |
38 | | - |
39 | | -``` |
40 | | -wget --continue --progress=dot:giga 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz' |
41 | | -gzip -d -f hits.csv.gz |
| 9 | +```bash |
| 10 | +git clone <github_repo_url> |
| 11 | +cd ClickBench/bytehouse |
| 12 | +./benchmark.sh |
42 | 13 | ``` |
43 | 14 |
|
44 | | -Load the data: |
45 | | - |
46 | | -``` |
47 | | -echo -n "Load time: " |
48 | | -command time -f '%e' ./bytehouse-cli --user "$user" --account "$account" --password "$password" --region ap-southeast-1 --secure --warehouse "$warehouse" --query "INSERT INTO test.hits FORMAT CSV" < hits.csv |
49 | | -``` |
50 | | - |
51 | | -``` |
52 | | -99,997,497 total rows sent, 0 rows/s (81.14 GB, 0.00 B/s) |
53 | | -total rows sent: 99,997,497, average speed = 134,320 rows/s |
54 | | -Elapsed: 12m24.754608947s. 81.14 GB (108.94 MB/s). |
55 | | -─── End of Execution ─── |
56 | | -
|
57 | | -real 12m25.310s |
58 | | -``` |
59 | | - |
60 | | -Run the benchmark: |
61 | | - |
62 | | -``` |
63 | | -./run.sh 2>&1 | tee log.txt |
64 | | -
|
65 | | -cat log.txt | grep --text -F 'Elapsed' | |
66 | | - grep --text -oP 'Elapsed: [\d\.]+(ms|s)\. Processed: \d+ row' | |
67 | | - sed -r -e 's/Elapsed: ([0-9\.]+)(ms|s)\. Processed: ([0-9]+) row/\1 \2 \3/' | |
68 | | - awk '{ if ($3 == 0) { print "null" } else if ($2 == "ms") { print $1 / 1000 } else { print $1 } }' | |
69 | | - awk '{ if (i % 3 == 0) { printf "[" }; printf $1; if (i % 3 != 2) { printf "," } else { print "]," }; ++i; }' |
70 | | -``` |
| 15 | +## Notes |
71 | 16 |
|
72 | | -Note: cluster size L is the maximum that can be created. |
73 | | -An attempt to create XL gives "Failed AWAITING RESOURCES". |
| 17 | +- Please use the EC2 environment provided through email for reproduction. |
| 18 | +- Run the benchmark inside the `bytehouse` directory. |
0 commit comments