Skip to content

Commit cff42de

Browse files
committed
Add ClickBench URL prefix filter benchmark
1 parent 8f033e4 commit cff42de

2 files changed

Lines changed: 20 additions & 0 deletions

File tree

benchmarks/queries/clickbench/README.md

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -241,8 +241,24 @@ These queries test the performance of the `FIRST_VALUE` aggregation function wit
241241
| Q12 | `WatchID` | `Int64` | `OS` | `Int16` | 91 |
242242

243243

244+
### Q13: Filter-only URL prefix match
244245

246+
**Question**: "Which counters have the most page views with URLs that look like HTTP URLs?"
245247

248+
**Important Query Properties**: Filter-only string prefix match. The `URL`
249+
column is used only by the pushed-down filter and is not projected or
250+
aggregated. This makes the query useful for measuring optimizations that can
251+
skip RowFilter evaluation when Parquet row group statistics prove that all rows
252+
in a row group satisfy the prefix predicate.
253+
254+
```sql
255+
SELECT "CounterID", COUNT(*) AS page_views
256+
FROM hits
257+
WHERE "URL" LIKE 'http%'
258+
GROUP BY "CounterID"
259+
ORDER BY page_views DESC
260+
LIMIT 10;
261+
```
246262

247263
## Data Notes
248264

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
-- Must set for ClickBench hits_partitioned dataset. See https://github.com/apache/datafusion/issues/16591
2+
-- set datafusion.execution.parquet.binary_as_string = true
3+
4+
SELECT "CounterID", COUNT(*) AS page_views FROM hits WHERE "URL" LIKE 'http%' GROUP BY "CounterID" ORDER BY page_views DESC LIMIT 10;

0 commit comments

Comments
 (0)