Skip to content

Commit ebc59a6

Browse files
committed
More tools
1 parent ccbf4c5 commit ebc59a6

4 files changed

Lines changed: 94 additions & 0 deletions

File tree

datafusion-vortex-partitioned/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,12 @@ bash benchmark.sh
1515

1616
The benchmark script builds `vortex-datafusion-cli`, downloads the partitioned Parquet files, converts each `partitioned/hits_N.parquet` file into exactly one `vortex/hits_N.vortex` file, and runs the query set.
1717

18+
You can update/preview the results by running:
19+
20+
```bash
21+
./make-json.sh <machine-name> # Example. ./make-json.sh c6a.xlarge
22+
```
23+
1824
## Parquet to Vortex conversion
1925

2026
Each input file is converted independently through `vortex-datafusion-cli`:
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/bin/bash
2+
3+
# This script converts the raw `result.csv` data from `benchmark.sh` into the
4+
# final json format used by the benchmark dashboard.
5+
#
6+
# usage : ./make-json.sh <machine>
7+
#
8+
# example (save results/c6a.4xlarge.json)
9+
# ./make-json.sh c6a.4xlarge
10+
11+
MACHINE=$1
12+
OUTPUT_FILE="results/${MACHINE}.json"
13+
SYSTEM_NAME="DataFusion (Vortex, partitioned)"
14+
DATE=$(date +%Y-%m-%d)
15+
LOAD_TIME=${LOAD_TIME:-$(cat load-time.txt 2>/dev/null || echo null)}
16+
DATA_SIZE=${DATA_SIZE:-$(du -bcs vortex/*.vortex 2>/dev/null | awk '/total/ { print $1 }')}
17+
DATA_SIZE=${DATA_SIZE:-null}
18+
19+
mkdir -p results
20+
21+
# Read the CSV and build the result array using sed
22+
RESULT_ARRAY=$(awk -F, '{arr[$1]=arr[$1]","$3} END {for (i=1;i<=length(arr);i++) {gsub(/^,/, "", arr[i]); printf " ["arr[i]"]"; if (i<length(arr)) printf ",\n"}}' result.csv)
23+
24+
# form the final JSON structure from the template
25+
cat <<EOF > $OUTPUT_FILE
26+
{
27+
"system": "$SYSTEM_NAME",
28+
"date": "$DATE",
29+
"machine": "$MACHINE",
30+
"cluster_size": 1,
31+
"proprietary": "no",
32+
"tuned": "no",
33+
"hardware": "cpu",
34+
"tags": ["Rust","column-oriented","embedded","stateless"],
35+
"load_time": $LOAD_TIME,
36+
"data_size": $DATA_SIZE,
37+
"result": [
38+
$RESULT_ARRAY
39+
]
40+
}
41+
EOF

datafusion-vortex/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,12 @@ bash benchmark.sh
1515

1616
The benchmark script builds `vortex-datafusion-cli`, downloads `hits.parquet`, converts it to `vortex/hits.vortex`, and runs the query set.
1717

18+
You can update/preview the results by running:
19+
20+
```bash
21+
./make-json.sh <machine-name> # Example. ./make-json.sh c6a.xlarge
22+
```
23+
1824
## Parquet to Vortex conversion
1925

2026
The conversion intentionally goes through the DataFusion CLI path:

datafusion-vortex/make-json.sh

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/bin/bash
2+
3+
# This script converts the raw `result.csv` data from `benchmark.sh` into the
4+
# final json format used by the benchmark dashboard.
5+
#
6+
# usage : ./make-json.sh <machine>
7+
#
8+
# example ./make-json.sh c6a.4xlarge # saves results/c6a.4xlarge.json
9+
#
10+
11+
MACHINE=$1
12+
OUTPUT_FILE="results/${MACHINE}.json"
13+
SYSTEM_NAME="DataFusion (Vortex, single)"
14+
DATE=$(date +%Y-%m-%d)
15+
LOAD_TIME=${LOAD_TIME:-$(cat load-time.txt 2>/dev/null || echo null)}
16+
DATA_SIZE=${DATA_SIZE:-$(du -bcs vortex/*.vortex 2>/dev/null | awk '/total/ { print $1 }')}
17+
DATA_SIZE=${DATA_SIZE:-null}
18+
19+
mkdir -p results
20+
21+
# Read the CSV and build the result array using sed
22+
RESULT_ARRAY=$(awk -F, '{arr[$1]=arr[$1]","$3} END {for (i=1;i<=length(arr);i++) {gsub(/^,/, "", arr[i]); printf " ["arr[i]"]"; if (i<length(arr)) printf ",\n"}}' result.csv)
23+
24+
# form the final JSON structure from the template
25+
cat <<EOF > $OUTPUT_FILE
26+
{
27+
"system": "$SYSTEM_NAME",
28+
"date": "$DATE",
29+
"machine": "$MACHINE",
30+
"cluster_size": 1,
31+
"proprietary": "no",
32+
"tuned": "no",
33+
"hardware": "cpu",
34+
"tags": ["Rust","column-oriented","embedded","stateless"],
35+
"load_time": $LOAD_TIME,
36+
"data_size": $DATA_SIZE,
37+
"result": [
38+
$RESULT_ARRAY
39+
]
40+
}
41+
EOF

0 commit comments

Comments
 (0)