Skip to content

perf: vectorize topN native engine#19353

Draft
jtuglu1 wants to merge 1 commit intoapache:masterfrom
jtuglu1:vectorize-top-n-engine
Draft

perf: vectorize topN native engine#19353
jtuglu1 wants to merge 1 commit intoapache:masterfrom
jtuglu1:vectorize-top-n-engine

Conversation

@jtuglu1
Copy link
Copy Markdown
Contributor

@jtuglu1 jtuglu1 commented Apr 18, 2026

Description

Vectorizes the TopN native query engine. More work should be done to make TopN spilling like GroupBy, so it can use a more efficient, fixed-size, off-heap buffer to back the memory for the aggregation. For now, I'm using a heap-backed grouper similar to HashVectorGrouper in the GroupBy engine. I'm also not sure about the state of things in Dart/MSQe. Speed ups of roughly ~20% across the board.

Benchmarks

Run on my Apple M3 Pro, 12 physical cores, 18GB mem

Benchmark                                  (indexType)  (numSegments)  (queryGranularity)  (rowsPerSegment)        (schemaAndQuery)  (threshold)  (vectorize)  Mode  Cnt        Score        Error  Units
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000                 basic.A           10        false  avgt    5    69122.980 ±   4462.770  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000                 basic.A           10        force  avgt    5    54046.641 ±   4367.487  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000       basic.numericSort           10        false  avgt    5   103399.888 ±  44324.233  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000       basic.numericSort           10        force  avgt    5    87322.720 ±  23678.746  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000  basic.alphanumericSort           10        false  avgt    5    75924.740 ±  14120.944  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000  basic.alphanumericSort           10        force  avgt    5    66410.182 ±  11979.505  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000                 basic.A           10        false  avgt    5   133947.144 ±  50106.898  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000                 basic.A           10        force  avgt    5    96217.492 ±   8871.746  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000       basic.numericSort           10        false  avgt    5   557642.770 ± 314507.572  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000       basic.numericSort           10        force  avgt    5   389809.531 ±  33972.861  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000  basic.alphanumericSort           10        false  avgt    5   435965.653 ± 137802.112  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000  basic.alphanumericSort           10        force  avgt    5   386555.540 ±   6790.380  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                 all            750000                 basic.A           10        false  avgt    5  2082680.574 ± 656078.762  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                 all            750000       basic.numericSort           10        false  avgt    5   165785.912 ±  22669.944  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                 all            750000  basic.alphanumericSort           10        false  avgt    5   155416.663 ±  77528.414  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                hour            750000                 basic.A           10        false  avgt    5  2102892.546 ± 634413.985  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                hour            750000       basic.numericSort           10        false  avgt    5   302456.169 ±  88204.908  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                hour            750000  basic.alphanumericSort           10        false  avgt    5   347785.978 ± 247907.083  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000                 basic.A           10        false  avgt    5    78131.986 ±  12065.964  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000                 basic.A           10        force  avgt    5    56115.802 ±   4901.144  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000       basic.numericSort           10        false  avgt    5    44441.229 ±  17818.964  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000       basic.numericSort           10        force  avgt    5    29320.550 ±   2044.562  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000  basic.alphanumericSort           10        false  avgt    5    40142.959 ±  20420.132  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000  basic.alphanumericSort           10        force  avgt    5    32129.217 ±   2168.716  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000                 basic.A           10        false  avgt    5    86430.593 ±   8643.281  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000                 basic.A           10        force  avgt    5    78887.059 ±  31017.995  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000       basic.numericSort           10        false  avgt    5   132638.263 ±  11458.490  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000       basic.numericSort           10        force  avgt    5   132583.761 ±  24221.660  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000  basic.alphanumericSort           10        false  avgt    5   180629.884 ±  59802.573  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000  basic.alphanumericSort           10        force  avgt    5   181480.345 ±   6770.964  us/op

Release note

Vectorize TopN native query engine.


This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

@jtuglu1 jtuglu1 force-pushed the vectorize-top-n-engine branch from 49c7b37 to 40d0c26 Compare April 18, 2026 03:29
throw new ISE("Aggregator state exceeds 2 GB; cardinality too high for HeapVectorGrouper");
}
int newCapacity = aggStateBuffer.capacity();
while (newCapacity < neededCapacity) {
@jtuglu1 jtuglu1 requested review from clintropolis and gianm April 20, 2026 17:59

private void growBuffer(final long neededCapacity)
{
if (neededCapacity > Integer.MAX_VALUE) {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably want to make this limit configurable

@jtuglu1 jtuglu1 force-pushed the vectorize-top-n-engine branch from 8b4c736 to 05c35ff Compare April 21, 2026 21:11
@jtuglu1 jtuglu1 force-pushed the vectorize-top-n-engine branch from 05c35ff to 5b1b1c5 Compare April 21, 2026 23:12
}
}

return true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P1] canVectorize admits unsupported object/COMPLEX dimensions.

VectorTopNEngine.canVectorize only filters out decorated specs, arrays, and multi-value columns, then returns true for any remaining type whose output type matches the column capabilities. But TopNVectorColumnProcessorFactory.makeObjectProcessor only handles STRING object selectors and throws for every other object/COMPLEX type. That means a query using a DefaultDimensionSpec over a nested/COMPLEX column can be marked vectorizable here and then fail at runtime in makeObjectProcessor instead of falling back to the row path. canVectorize should reject non-STRING object/COMPLEX dimensions up front so capability checks match actual factory support.

return Sequences.filter(
VectorTopNEngine.process(query, timeBoundaryInspector, cursorHolder, bufHolder.get()),
Predicates.notNull()
).withBaggage(resourceCloser);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[P2] Vectorized TopN bypasses existing query metrics reporting.

The new early return into VectorTopNEngine.process skips the row-path bookkeeping that reports TopN metrics today. In the non-vector path this method records queryMetrics.cursor(...), then getMapFn records dimensionCardinality(...) and algorithm selection, and TopNMapFn records selector and pass-size metrics. None of that runs when shouldVectorize is true, so enabling vectorization changes emitted TopN metrics and removes operational visibility into algorithm choice and cardinality. If that loss is intended it should be wired back explicitly; otherwise this is a regression.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants