You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Follow the same EC2 setup used by [datafusion-partitioned](../datafusion-partitioned/README.md), then run:
10
+
11
+
```bash
12
+
cd ClickBench/datafusion-vortex-partitioned
13
+
bash benchmark.sh
14
+
```
15
+
16
+
The shared benchmark harness builds `vortex-datafusion-cli`, downloads the partitioned Parquet files, converts each `partitioned/hits_N.parquet` file into exactly one `vortex/hits_N.vortex` file, and runs the query set.
17
+
18
+
The `install` script checks out `vortex-datafusion-cli` tag `0.70.0-53.1.0`. CLI tags use `<vortex-version>-<df-version>`, where the first component is the `vortex-datafusion` crate version and the second is the DataFusion/DataFusion CLI version.
`binary_as_string=true` handles the incorrect Parquet logical annotation before Vortex is written. The produced Vortex files store those fields as strings, so benchmark reads use only the Vortex table registration.
-c "COPY (SELECT * EXCEPT (\"EventDate\"), CAST(CAST(\"EventDate\" AS INTEGER) AS DATE) AS \"EventDate\" FROM hits_parquet) TO 'vortex/hits_{}.vortex' STORED AS VORTEX;"
Copy file name to clipboardExpand all lines: datafusion-vortex-partitioned/queries.sql
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -16,7 +16,7 @@ SELECT "SearchEngineID", "SearchPhrase", COUNT(*) AS c FROM hits WHERE "SearchPh
16
16
SELECT"UserID", COUNT(*) FROM hits GROUP BY"UserID"ORDER BYCOUNT(*) DESCLIMIT10;
17
17
SELECT"UserID", "SearchPhrase", COUNT(*) FROM hits GROUP BY"UserID", "SearchPhrase"ORDER BYCOUNT(*) DESCLIMIT10;
18
18
SELECT"UserID", "SearchPhrase", COUNT(*) FROM hits GROUP BY"UserID", "SearchPhrase"LIMIT10;
19
-
SELECT"UserID", extract(minute FROM"EventTime") AS m, "SearchPhrase", COUNT(*) FROM hits GROUP BY"UserID", m, "SearchPhrase"ORDER BYCOUNT(*) DESCLIMIT10;
19
+
SELECT"UserID", extract(minute FROMto_timestamp_seconds("EventTime")) AS m, "SearchPhrase", COUNT(*) FROM hits GROUP BY"UserID", m, "SearchPhrase"ORDER BYCOUNT(*) DESCLIMIT10;
SELECTCOUNT(*) FROM hits WHERE"URL"LIKE'%google%';
22
22
SELECT"SearchPhrase", MIN("URL"), COUNT(*) AS c FROM hits WHERE"URL"LIKE'%google%'AND"SearchPhrase"<>''GROUP BY"SearchPhrase"ORDER BY c DESCLIMIT10;
@@ -40,4 +40,4 @@ SELECT "URL", COUNT(*) AS PageViews FROM hits WHERE "CounterID" = 62 AND "EventD
40
40
SELECT"TraficSourceID", "SearchEngineID", "AdvEngineID", CASE WHEN ("SearchEngineID"=0AND"AdvEngineID"=0) THEN "Referer" ELSE '' END AS Src, "URL"AS Dst, COUNT(*) AS PageViews FROM hits WHERE"CounterID"=62AND"EventDate">='2013-07-01'AND"EventDate"<='2013-07-31'AND"IsRefresh"=0GROUP BY"TraficSourceID", "SearchEngineID", "AdvEngineID", Src, Dst ORDER BY PageViews DESCLIMIT10 OFFSET 1000;
41
41
SELECT"URLHash", "EventDate", COUNT(*) AS PageViews FROM hits WHERE"CounterID"=62AND"EventDate">='2013-07-01'AND"EventDate"<='2013-07-31'AND"IsRefresh"=0AND"TraficSourceID"IN (-1, 6) AND"RefererHash"=3594120000172545465GROUP BY"URLHash", "EventDate"ORDER BY PageViews DESCLIMIT10 OFFSET 100;
42
42
SELECT"WindowClientWidth", "WindowClientHeight", COUNT(*) AS PageViews FROM hits WHERE"CounterID"=62AND"EventDate">='2013-07-01'AND"EventDate"<='2013-07-31'AND"IsRefresh"=0AND"DontCountHits"=0AND"URLHash"=2868770270353813622GROUP BY"WindowClientWidth", "WindowClientHeight"ORDER BY PageViews DESCLIMIT10 OFFSET 10000;
43
-
SELECT DATE_TRUNC('minute', "EventTime") AS M, COUNT(*) AS PageViews FROM hits WHERE"CounterID"=62AND"EventDate">='2013-07-14'AND"EventDate"<='2013-07-15'AND"IsRefresh"=0AND"DontCountHits"=0GROUP BY DATE_TRUNC('minute', "EventTime") ORDER BY DATE_TRUNC('minute', M) LIMIT10 OFFSET 1000;
43
+
SELECT DATE_TRUNC('minute', to_timestamp_seconds("EventTime"))AS M, COUNT(*) AS PageViews FROM hits WHERE"CounterID"=62AND"EventDate">='2013-07-14'AND"EventDate"<='2013-07-15'AND"IsRefresh"=0AND"DontCountHits"=0GROUP BY DATE_TRUNC('minute', to_timestamp_seconds("EventTime")) ORDER BY DATE_TRUNC('minute', M) LIMIT10 OFFSET 1000;
0 commit comments