perf: vectorize topN native engine by jtuglu1 · Pull Request #19353 · apache/druid

jtuglu1 · 2026-04-18T03:25:15Z

Description

Vectorizes the TopN native query engine. More work should be done to make TopN spilling like GroupBy, so it can use a more efficient, fixed-size, off-heap buffer to back the memory for the aggregation. For now, I'm using a heap-backed grouper similar to HashVectorGrouper in the GroupBy engine. I'm also not sure about the state of things in Dart/MSQe. Speed ups of roughly ~20% across the board.

Benchmarks

Run on my Apple M3 Pro, 12 physical cores, 18GB mem

Benchmark                                  (indexType)  (numSegments)  (queryGranularity)  (rowsPerSegment)        (schemaAndQuery)  (threshold)  (vectorize)  Mode  Cnt        Score        Error  Units
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000                 basic.A           10        false  avgt    5    69122.980 ±   4462.770  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000                 basic.A           10        force  avgt    5    54046.641 ±   4367.487  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000       basic.numericSort           10        false  avgt    5   103399.888 ±  44324.233  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000       basic.numericSort           10        force  avgt    5    87322.720 ±  23678.746  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000  basic.alphanumericSort           10        false  avgt    5    75924.740 ±  14120.944  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                 all            750000  basic.alphanumericSort           10        force  avgt    5    66410.182 ±  11979.505  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000                 basic.A           10        false  avgt    5   133947.144 ±  50106.898  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000                 basic.A           10        force  avgt    5    96217.492 ±   8871.746  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000       basic.numericSort           10        false  avgt    5   557642.770 ± 314507.572  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000       basic.numericSort           10        force  avgt    5   389809.531 ±  33972.861  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000  basic.alphanumericSort           10        false  avgt    5   435965.653 ± 137802.112  us/op
TopNBenchmark.queryMultiQueryableIndex             N/A              1                hour            750000  basic.alphanumericSort           10        force  avgt    5   386555.540 ±   6790.380  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                 all            750000                 basic.A           10        false  avgt    5  2082680.574 ± 656078.762  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                 all            750000       basic.numericSort           10        false  avgt    5   165785.912 ±  22669.944  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                 all            750000  basic.alphanumericSort           10        false  avgt    5   155416.663 ±  77528.414  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                hour            750000                 basic.A           10        false  avgt    5  2102892.546 ± 634413.985  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                hour            750000       basic.numericSort           10        false  avgt    5   302456.169 ±  88204.908  us/op
TopNBenchmark.querySingleIncrementalIndex       onheap            N/A                hour            750000  basic.alphanumericSort           10        false  avgt    5   347785.978 ± 247907.083  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000                 basic.A           10        false  avgt    5    78131.986 ±  12065.964  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000                 basic.A           10        force  avgt    5    56115.802 ±   4901.144  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000       basic.numericSort           10        false  avgt    5    44441.229 ±  17818.964  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000       basic.numericSort           10        force  avgt    5    29320.550 ±   2044.562  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000  basic.alphanumericSort           10        false  avgt    5    40142.959 ±  20420.132  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                 all            750000  basic.alphanumericSort           10        force  avgt    5    32129.217 ±   2168.716  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000                 basic.A           10        false  avgt    5    86430.593 ±   8643.281  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000                 basic.A           10        force  avgt    5    78887.059 ±  31017.995  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000       basic.numericSort           10        false  avgt    5   132638.263 ±  11458.490  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000       basic.numericSort           10        force  avgt    5   132583.761 ±  24221.660  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000  basic.alphanumericSort           10        false  avgt    5   180629.884 ±  59802.573  us/op
TopNBenchmark.querySingleQueryableIndex            N/A              1                hour            750000  basic.alphanumericSort           10        force  avgt    5   181480.345 ±   6770.964  us/op

Release note

Vectorize TopN native query engine.

This PR has:

+      throw new ISE("Aggregator state exceeds 2 GB; cardinality too high for HeapVectorGrouper");
+    }
+    int newCapacity = aggStateBuffer.capacity();
+    while (newCapacity < neededCapacity) {


jtuglu1 · 2026-04-21T06:30:47Z

+
+  private void growBuffer(final long neededCapacity)
+  {
+    if (neededCapacity > Integer.MAX_VALUE) {


probably want to make this limit configurable

FrankChen021 · 2026-04-23T08:44:16Z

+      }
+    }
+
+    return true;


[P1] canVectorize admits unsupported object/COMPLEX dimensions.

VectorTopNEngine.canVectorize only filters out decorated specs, arrays, and multi-value columns, then returns true for any remaining type whose output type matches the column capabilities. But TopNVectorColumnProcessorFactory.makeObjectProcessor only handles STRING object selectors and throws for every other object/COMPLEX type. That means a query using a DefaultDimensionSpec over a nested/COMPLEX column can be marked vectorizable here and then fail at runtime in makeObjectProcessor instead of falling back to the row path. canVectorize should reject non-STRING object/COMPLEX dimensions up front so capability checks match actual factory support.

FrankChen021 · 2026-04-23T08:44:16Z

+          return Sequences.filter(
+              VectorTopNEngine.process(query, timeBoundaryInspector, cursorHolder, bufHolder.get()),
+              Predicates.notNull()
+          ).withBaggage(resourceCloser);


[P2] Vectorized TopN bypasses existing query metrics reporting.

The new early return into VectorTopNEngine.process skips the row-path bookkeeping that reports TopN metrics today. In the non-vector path this method records queryMetrics.cursor(...), then getMapFn records dimensionCardinality(...) and algorithm selection, and TopNMapFn records selector and pass-size metrics. None of that runs when shouldVectorize is true, so enabling vectorization changes emitted TopN metrics and removes operational visibility into algorithm choice and cardinality. If that loss is intended it should be wired back explicitly; otherwise this is a regression.

jtuglu1 force-pushed the vectorize-top-n-engine branch from 49c7b37 to 40d0c26 Compare April 18, 2026 03:29

github-advanced-security AI found potential problems Apr 18, 2026

View reviewed changes

Comment thread processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/HeapVectorGrouper.java

throw new ISE("Aggregator state exceeds 2 GB; cardinality too high for HeapVectorGrouper");

}

int newCapacity = aggStateBuffer.capacity();

while (newCapacity < neededCapacity) {

jtuglu1 added the Performance label Apr 18, 2026

jtuglu1 force-pushed the vectorize-top-n-engine branch from 40d0c26 to 16b8c6e Compare April 18, 2026 05:07

github-actions Bot added the Area - Querying label Apr 18, 2026

jtuglu1 mentioned this pull request Apr 18, 2026

Query Performance Degradation with Null Handling #19344

Open

jtuglu1 force-pushed the vectorize-top-n-engine branch from 16b8c6e to 8b4c736 Compare April 20, 2026 06:33

github-advanced-security AI found potential problems Apr 20, 2026

View reviewed changes

Comment thread processing/src/test/java/org/apache/druid/query/topn/vector/TopNVectorColumnSelectorTest.java Dismissed

Comment thread processing/src/test/java/org/apache/druid/query/topn/vector/TopNVectorColumnSelectorTest.java Dismissed

jtuglu1 requested review from clintropolis and gianm April 20, 2026 17:59

jtuglu1 commented Apr 21, 2026

View reviewed changes

jtuglu1 force-pushed the vectorize-top-n-engine branch from 8b4c736 to 05c35ff Compare April 21, 2026 21:11

perf: vectorize topN native engine

5b1b1c5

jtuglu1 force-pushed the vectorize-top-n-engine branch from 05c35ff to 5b1b1c5 Compare April 21, 2026 23:12

FrankChen021 reviewed Apr 23, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: vectorize topN native engine#19353

perf: vectorize topN native engine#19353
jtuglu1 wants to merge 1 commit intoapache:masterfrom
jtuglu1:vectorize-top-n-engine

jtuglu1 commented Apr 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

jtuglu1 Apr 21, 2026

Uh oh!

FrankChen021 Apr 23, 2026

Uh oh!

FrankChen021 Apr 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jtuglu1 commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Benchmarks

Release note

Uh oh!

Uh oh!

Uh oh!

jtuglu1 Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

FrankChen021 Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jtuglu1 commented Apr 18, 2026 •

edited

Loading