Bug description
Description
When sort_by_metric is enabled on a Word Cloud chart, buildQuery.ts unconditionally appends a secondary ORDER BY [series, ASC] alongside the metric sort. On Apache Druid, any multi-column ORDER BY prevents the native TopN query optimization, forcing a full GroupBy scan over the entire dataset before applying LIMIT. On high-cardinality dimensions this can cause dramatic query slowdowns and timeouts.
Root Cause
In superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts:
if (sort_by_metric && metric) {
orderby.push([metric, false]);
}
if (series) {
orderby.push([series, true]); // ← always added, even when sort_by_metric is true
}
When sort_by_metric=true, this generates:
ORDER BY term_count DESC, search_term ASC
Druid's TopN algorithm requires ordering by a single aggregate metric. The secondary dimension sort forces the full GroupBy execution path regardless of dataset size.
Steps to Reproduce
- Create a Word Cloud chart backed by a Druid datasource with a high-cardinality string dimension
- Enable Sort by metric
- Load the chart — on large datasets it will time out with "Unknown error (Unknown)"
Without "Sort by metric", the chart loads correctly (Druid uses TopN). With it enabled, the same query triggers a full GroupBy scan.
Proposed Fix
Make the secondary dimension sort mutually exclusive with sort_by_metric:
if (sort_by_metric && metric) {
orderby.push([metric, false]);
} else if (series) {
orderby.push([series, true]);
}
This preserves alphabetical ordering when sort_by_metric is disabled, while allowing Druid to use TopN optimization when metric sorting is requested. The secondary sort is also unnecessary for word cloud rendering, since word size is determined by metric value rather than series order.
Additional Notes
The existing test/buildQuery.test.ts does not cover sort_by_metric or orderby behavior. A PR for this fix should add test coverage for both cases.
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist
Bug description
Description
When
sort_by_metricis enabled on a Word Cloud chart,buildQuery.tsunconditionally appends a secondaryORDER BY [series, ASC]alongside the metric sort. On Apache Druid, any multi-columnORDER BYprevents the native TopN query optimization, forcing a full GroupBy scan over the entire dataset before applyingLIMIT. On high-cardinality dimensions this can cause dramatic query slowdowns and timeouts.Root Cause
In
superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts:When
sort_by_metric=true, this generates:Druid's TopN algorithm requires ordering by a single aggregate metric. The secondary dimension sort forces the full GroupBy execution path regardless of dataset size.
Steps to Reproduce
Without "Sort by metric", the chart loads correctly (Druid uses TopN). With it enabled, the same query triggers a full GroupBy scan.
Proposed Fix
Make the secondary dimension sort mutually exclusive with
sort_by_metric:This preserves alphabetical ordering when
sort_by_metricis disabled, while allowing Druid to use TopN optimization when metric sorting is requested. The secondary sort is also unnecessary for word cloud rendering, since word size is determined by metric value rather than series order.Additional Notes
The existing
test/buildQuery.test.tsdoes not coversort_by_metricororderbybehavior. A PR for this fix should add test coverage for both cases.Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist