Conversation
49c7b37 to
40d0c26
Compare
| throw new ISE("Aggregator state exceeds 2 GB; cardinality too high for HeapVectorGrouper"); | ||
| } | ||
| int newCapacity = aggStateBuffer.capacity(); | ||
| while (newCapacity < neededCapacity) { |
40d0c26 to
16b8c6e
Compare
16b8c6e to
8b4c736
Compare
|
|
||
| private void growBuffer(final long neededCapacity) | ||
| { | ||
| if (neededCapacity > Integer.MAX_VALUE) { |
There was a problem hiding this comment.
probably want to make this limit configurable
8b4c736 to
05c35ff
Compare
05c35ff to
5b1b1c5
Compare
| } | ||
| } | ||
|
|
||
| return true; |
There was a problem hiding this comment.
[P1] canVectorize admits unsupported object/COMPLEX dimensions.
VectorTopNEngine.canVectorize only filters out decorated specs, arrays, and multi-value columns, then returns true for any remaining type whose output type matches the column capabilities. But TopNVectorColumnProcessorFactory.makeObjectProcessor only handles STRING object selectors and throws for every other object/COMPLEX type. That means a query using a DefaultDimensionSpec over a nested/COMPLEX column can be marked vectorizable here and then fail at runtime in makeObjectProcessor instead of falling back to the row path. canVectorize should reject non-STRING object/COMPLEX dimensions up front so capability checks match actual factory support.
| return Sequences.filter( | ||
| VectorTopNEngine.process(query, timeBoundaryInspector, cursorHolder, bufHolder.get()), | ||
| Predicates.notNull() | ||
| ).withBaggage(resourceCloser); |
There was a problem hiding this comment.
[P2] Vectorized TopN bypasses existing query metrics reporting.
The new early return into VectorTopNEngine.process skips the row-path bookkeeping that reports TopN metrics today. In the non-vector path this method records queryMetrics.cursor(...), then getMapFn records dimensionCardinality(...) and algorithm selection, and TopNMapFn records selector and pass-size metrics. None of that runs when shouldVectorize is true, so enabling vectorization changes emitted TopN metrics and removes operational visibility into algorithm choice and cardinality. If that loss is intended it should be wired back explicitly; otherwise this is a regression.
Description
Vectorizes the TopN native query engine. More work should be done to make TopN spilling like GroupBy, so it can use a more efficient, fixed-size, off-heap buffer to back the memory for the aggregation. For now, I'm using a heap-backed grouper similar to HashVectorGrouper in the GroupBy engine. I'm also not sure about the state of things in Dart/MSQe. Speed ups of roughly ~20% across the board.
Benchmarks
Run on my Apple M3 Pro, 12 physical cores, 18GB mem
Release note
Vectorize TopN native query engine.
This PR has: