feat: Reduce allocations for aggregating Statistics#20768
feat: Reduce allocations for aggregating Statistics#20768alamb merged 12 commits intoapache:mainfrom
Statistics#20768Conversation
|
I verified this has a 5x speed up for numeric primitive values using small benchmark. felt unnecssary to add the benchmark since it is jsut a regular vectorization optimization |
StatisticsStatistics
06cd380 to
e51023e
Compare
…hanc-n/datafusion into speed-up-stats-aggregation
…hanc-n/datafusion into speed-up-stats-aggregation
|
@Dandandan cleaned up the implementation. The performance hit came from using |
Dandandan
left a comment
There was a problem hiding this comment.
I think it looks good, it could benefit though from a benchmark somewhere.
Main benchmarkSpeed up benchmarkadded bench. Looking at around a 10-12x improvement |
asolimando
left a comment
There was a problem hiding this comment.
LGTM, left a couple of minor comments and questions
| ($lhs:expr, $rhs:expr, $VARIANT:ident) => { | ||
| match ($lhs, $rhs) { | ||
| (ScalarValue::$VARIANT(Some(a)), ScalarValue::$VARIANT(Some(b))) => { | ||
| Ok(ScalarValue::$VARIANT(Some(a.wrapping_add(*b)))) |
There was a problem hiding this comment.
Correct me if I am wrong, but it seems that wrapping_add would wrap the sum upon overflow, producing a negative number, I am not sure this is what we want for statistics, I propose we either mark it Absent or at the latest mark it as Inexact (which doesn't seem to be happening) as done in Precision::add.
Not in scope for this PR, but on the subject I also think pub sum_value: Precision<ScalarValue>, unlike min and max should not have matched the data type of the column, but a wide type like Int64 to minimize the chances of overflow.
There was a problem hiding this comment.
Not in scope for this PR, but on the subject I also think
pub sum_value: Precision<ScalarValue>, unlikeminandmaxshould not have matched the data type of the column, but a wide type likeInt64to minimize the chances of overflow.
Good point, filed in #20826
There was a problem hiding this comment.
@asolimando Added the fix! 256 needs to be dealt with differently, so set up a small function for that
There was a problem hiding this comment.
Checked the new commit, thanks for addressing the comment and filing the issue, marking as resolved!
EDIT: it looks I lack privileges, I will let you do that if you want
asolimando
left a comment
There was a problem hiding this comment.
LGTM (modulo fixing the cargo doc failure)
| ScalarValue::Float64(_) => add_float!(lhs, rhs, Float64), | ||
| ScalarValue::Decimal32(_, _, _) => add_decimal!(lhs, rhs, Decimal32), | ||
| ScalarValue::Decimal64(_, _, _) => add_decimal!(lhs, rhs, Decimal64), | ||
| ScalarValue::Decimal128(_, _, _) => add_decimal!(lhs, rhs, Decimal128), |
There was a problem hiding this comment.
For Decimal128(Some(a), p, s), saturating to i128::MAX doesn't respect the precision constraint. A Decimal128 with precision 5 can't hold i128::MAX.
For example:
let a = Decimal128(Some(99999), 5, 2); // 999.99
let b = Decimal128(Some(99999), 5, 2); // 999.99
| @@ -0,0 +1,149 @@ | |||
| // Licensed to the Apache Software Foundation (ASF) under one | |||
There was a problem hiding this comment.
Not sure if we can put it into a higher level, is it possible that other nodes may use this code in the future, not only agg.
|
run benchmark sql_planner |
| /// converts the result back — 3 heap allocations per call. | ||
| /// | ||
| /// For non-primitive types, falls back to `ScalarValue::add`. | ||
| pub(crate) fn scalar_add(lhs: &ScalarValue, rhs: &ScalarValue) -> Result<ScalarValue> { |
There was a problem hiding this comment.
I bet you can make this even faster by making it mutate lhs rather than make a new one
pub(crate) fn scalar_add(lhs: &mut ScalarValue, rhs: &ScalarValue) -> Result<()> {There was a problem hiding this comment.
Filed a tracking ticket
| pub(crate) fn precision_add( | ||
| lhs: &Precision<ScalarValue>, | ||
| rhs: &Precision<ScalarValue>, | ||
| ) -> Precision<ScalarValue> { |
There was a problem hiding this comment.
Ditto here -- reusing lhs is probably even faster
| //! | ||
| //! Provides a cheap pairwise [`ScalarValue`] addition that directly | ||
| //! extracts inner primitive values, avoiding the expensive | ||
| //! `ScalarValue::add` path (which round-trips through Arrow arrays). |
There was a problem hiding this comment.
Why not just add this special case to ScalarValue::add ?
|
🤖 |
|
🤖: Benchmark completed Details
|
|
Thanks again @jonathanc-n @Dandandan and @asolimando |
## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes apache#123` indicates that this PR will close issue apache#123. --> - Closes apache#15809. ## Rationale for this change <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? Vectorize aggregations for combining statistics by gathering all values then calling kernels once <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Unit tests + existing tests <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? Removed `merge_iter`
#20865) ## Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. For example `Closes #123` indicates that this PR will close issue #123. --> - Closes #20826. ## Rationale for this change As discussed in the review thread on #20768 and tracked by #20826, `sum_value` should not keep narrow integer column types during stats aggregation, because merge/multiply paths can overflow before values are widened. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> ## What changes are included in this PR? This PR updates statistics `sum_value` arithmetic to match SUM-style widening for small integer types, and applies that behavior consistently across merge and multiplication paths. <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> ## Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> ## Are there any user-facing changes? <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. --> <!-- If there are any breaking changes to public APIs, please add the `api change` label. -->
## Which issue does this PR close? Follow up of #20768. ## Rationale for this change `Precision::min/max` allocates a lot of new `ScalarValues`, and it can be done in place. While running the `sql_planner` benchmark, it seems like for clickbench `Statistics::try_merge_iter` is a significant part of the runtime, and this PR improves that part by about 20-25% locally. ## What changes are included in this PR? Introduces a couple of of new internal functions to calculate the min/max of a `Precision` in-place. ## Are these changes tested? Existing general tests, and a few new unit tests. ## Are there any user-facing changes? None --------- Signed-off-by: Adam Gutglick <adamgsal@gmail.com>
Which issue does this PR close?
Rationale for this change
What changes are included in this PR?
Vectorize aggregations for combining statistics by gathering all values then calling kernels once
Are these changes tested?
Unit tests + existing tests
Are there any user-facing changes?
Removed
merge_iter