Skip to content

Commit 4b3b53d

Browse files
committed
update Bytehouse
1 parent f280945 commit 4b3b53d

20 files changed

Lines changed: 846 additions & 2347 deletions

File tree

bytehouse/NOTES.md

Lines changed: 0 additions & 284 deletions
This file was deleted.

bytehouse/README.md

Lines changed: 11 additions & 66 deletions
Original file line numberDiff line numberDiff line change
@@ -1,73 +1,18 @@
1-
Bytehouse is a derivative of ClickHouse.
2-
It is based on very old ClickHouse version (20.4.54418) and many features are unsupported.
1+
# ByteHouse ClickBench Reproduction
32

4-
## Status
3+
## Reproduce The Result
54

6-
ByteHouse's international cloud (bytehouse.cloud) is no longer reachable
7-
from outside the China region. The service still operates within China
8-
via Volcengine. All existing results in this directory were collected
9-
against the international cloud and have been re-tagged with
10-
`"historical"`. Future submissions running against a self-managed
11-
ByteHouse instance (or via Volcengine) should not be tagged historical.
5+
If you want to reproduce the benchmark result, please send an email to [gaoyuanning@bytedance.com](mailto:gaoyuanning@bytedance.com) to get the EC2 login information.
126

13-
https://bytehouse.cloud/signup
7+
After logging in to the EC2 instance:
148

15-
Sign Up. Only Asia-Pacific South-East 1 AWS region is available. Verify email.
16-
17-
Create virtual warehouse. Size L.
18-
19-
Go to "Databases" and create database "test".
20-
21-
Go to "SQL Worksheet" and copy-paste create.sql query there.
22-
23-
Note: S3 import does not support public buckets. And it requires pasting secret access key, which we are not going to do. So, switch to using CLI.
24-
25-
Create a machine in ap-southeast-1 region and install Bytehouse CLI:
26-
27-
```
28-
wget --continue --progress=dot:giga https://github.com/bytehouse-cloud/cli/releases/download/v1.5.34/bytehouse-cli_1.5.34_Linux_x86_64.tar.gz
29-
tar xvf bytehouse-cli_1.5.34_Linux_x86_64.tar.gz
30-
```
31-
32-
```
33-
export user='...'
34-
export password='...'
35-
export account='AWS...'
36-
export warehouse='test'
37-
```
38-
39-
```
40-
wget --continue --progress=dot:giga 'https://datasets.clickhouse.com/hits_compatible/hits.csv.gz'
41-
gzip -d -f hits.csv.gz
9+
```bash
10+
git clone <github_repo_url>
11+
cd ClickBench/bytehouse
12+
./benchmark.sh
4213
```
4314

44-
Load the data:
45-
46-
```
47-
echo -n "Load time: "
48-
command time -f '%e' ./bytehouse-cli --user "$user" --account "$account" --password "$password" --region ap-southeast-1 --secure --warehouse "$warehouse" --query "INSERT INTO test.hits FORMAT CSV" < hits.csv
49-
```
50-
51-
```
52-
99,997,497 total rows sent, 0 rows/s (81.14 GB, 0.00 B/s)
53-
total rows sent: 99,997,497, average speed = 134,320 rows/s
54-
Elapsed: 12m24.754608947s. 81.14 GB (108.94 MB/s).
55-
─── End of Execution ───
56-
57-
real 12m25.310s
58-
```
59-
60-
Run the benchmark:
61-
62-
```
63-
./run.sh 2>&1 | tee log.txt
64-
65-
cat log.txt | grep --text -F 'Elapsed' |
66-
grep --text -oP 'Elapsed: [\d\.]+(ms|s)\. Processed: \d+ row' |
67-
sed -r -e 's/Elapsed: ([0-9\.]+)(ms|s)\. Processed: ([0-9]+) row/\1 \2 \3/' |
68-
awk '{ if ($3 == 0) { print "null" } else if ($2 == "ms") { print $1 / 1000 } else { print $1 } }' |
69-
awk '{ if (i % 3 == 0) { printf "[" }; printf $1; if (i % 3 != 2) { printf "," } else { print "]," }; ++i; }'
70-
```
15+
## Notes
7116

72-
Note: cluster size L is the maximum that can be created.
73-
An attempt to create XL gives "Failed AWAITING RESOURCES".
17+
- Please use the EC2 environment provided through email for reproduction.
18+
- Run the benchmark inside the `bytehouse` directory.

0 commit comments

Comments
 (0)