Reorder agg kernel dispatch, and have Combined use inner accumulators#7889
Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Merging this PR will not alter performance
|
| if self.vtable.try_accumulate(&mut self.partial, batch, ctx)? { | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
we later want to move this also to the session?
There was a problem hiding this comment.
As in, we want even canonical dispatch to be via the registry?
Signed-off-by: Nicholas Gates <nick@nickgates.com>
There was a problem hiding this comment.
where did this go do we instead execute with columnar?
There was a problem hiding this comment.
We were always supposed to. This only existed I think because of the stats lookup
There was a problem hiding this comment.
|
|
||
| // 1. Kernel registry first: a registered `(encoding, aggregate_fn)` kernel is strictly | ||
| // more specific than the vtable's `try_accumulate` short-circuit. Checking the | ||
| // registry first gives kernels for `Combined<V>` aggregates a chance to fire — | ||
| // `Combined::try_accumulate` always returns true, so a later kernel check would be | ||
| // unreachable. | ||
| { | ||
| let kernels_r = kernels.read(); | ||
| let batch_id = batch.encoding_id(); | ||
| let kernel = kernels_r | ||
| .get(&(batch_id, Some(self.aggregate_fn.id()))) | ||
| .or_else(|| kernels_r.get(&(batch_id, None))) | ||
| .copied(); | ||
| drop(kernels_r); | ||
| if let Some(kernel) = kernel | ||
| && let Some(result) = kernel.aggregate(&self.aggregate_fn, batch, ctx)? | ||
| { | ||
| vortex_ensure!( | ||
| result.dtype() == &self.partial_dtype, | ||
| "Aggregate kernel returned {}, expected {}", | ||
| result.dtype(), | ||
| self.partial_dtype, | ||
| ); | ||
| self.vtable.combine_partials(&mut self.partial, result)?; | ||
| return Ok(()); | ||
| } | ||
| } | ||
|
|
||
| // 2. Allow the vtable to short-circuit on the raw array before decompression. | ||
| if self.vtable.try_accumulate(&mut self.partial, batch, ctx)? { | ||
| return Ok(()); | ||
| } |
There was a problem hiding this comment.
why is this whole block different from the block below?
Aggregate kernel dispatch previously tried the vtable before the plugin registry, meaning plugins could not intercept the aggregate function as they can with array execution.
This change also fixed Combined to use inner accumulators, instead of just inner partials. This fixes the kernel dispatch for the delegated aggregate functions. e.g. a custom Sum<->Dict kernel previously would not have been used for Combiner<->Dict, even though it delegates to Sum + Count.