Conversation
|
benchmark results : It seems like we are 25x faster for u16 bitmap based accumulators (or I am sleepy :) ) |
|
I think we can do the same for 16 bit types, it is just 65_536 bytes 8192 if we use a bitmap. |
|
Oh wait, you're already doing that :) |
|
Query 0 in clickbench_extended dataset (which uses (Other queries are faster but I believe that is more around variance ) |
|
cc : @neilconway , @alamb , @martin-g . Please take a look whenever you get a chance |
|
run benchmarks |
alamb
left a comment
There was a problem hiding this comment.
This looks like a great idea. Thank you @coderfender
| harness = false | ||
|
|
||
| [[bench]] | ||
| name = "count_distinct" |
There was a problem hiding this comment.
can you please add this benchmark as a separate PR (so we can use our standard benchmark runner to confirm the results)?
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (93acd98) to 4b1901f (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (93acd98) to 4b1901f (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing optimize_count_distinct (93acd98) to 4b1901f (merge-base) diff using: tpcds File an issue against this benchmark runner |
| /// Uses 256 bytes to track all possible u8 values. | ||
| #[derive(Debug)] | ||
| pub struct BoolArray256DistinctCountAccumulator { | ||
| seen: Box<[bool; 256]>, |
There was a problem hiding this comment.
I think you can probably use a BooleanBuffer from Arrow to make this signifcantly faster (I think [bool uses a byte for each booelan) 🤔
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Remove hashset based accumulators for smaller int data types and use bitmaps. Follow up of : #21453
Rationale for this change
What changes are included in this PR?
Are these changes tested?
Are there any user-facing changes?