Skip to content

fix(chart): word cloud secondary sort prevents Druid TopN optimization when sort_by_metric enabled#39073

Merged
Vitor-Avila merged 5 commits intoapache:masterfrom
bdonovan1:fix/word-cloud-druid-topn-sort
Apr 20, 2026
Merged

fix(chart): word cloud secondary sort prevents Druid TopN optimization when sort_by_metric enabled#39073
Vitor-Avila merged 5 commits intoapache:masterfrom
bdonovan1:fix/word-cloud-druid-topn-sort

Conversation

@bdonovan1
Copy link
Copy Markdown
Contributor

@bdonovan1 bdonovan1 commented Apr 2, 2026

SUMMARY

When sort_by_metric is enabled on a Word Cloud chart, buildQuery.ts unconditionally appended a secondary ORDER BY [series, ASC] alongside the metric sort. On Apache Druid, any multi-column ORDER BY prevents the native TopN query optimization, forcing a full GroupBy scan over the entire dataset before applying LIMIT. On high-cardinality dimensions this can cause dramatic query slowdowns and timeouts.

Root cause — in buildQuery.ts, the series sort was always appended when series was set, regardless of sort_by_metric:

if (sort_by_metric && metric) {
  orderby.push([metric, false]);
}
if (series) {
  orderby.push([series, true]);  // ← always added
}

Fix — add an independent sort_by_series control so users can choose whether to include the secondary dimension sort:

if (sort_by_metric && metric) {
  orderby.push([metric, false]);
}
if (sort_by_series && series) {
  orderby.push([series, true]);
}

This gives users full control over sorting behavior:

  • Metric only (new default for new charts): enables Druid TopN optimization
  • Series only: alphabetical ordering by dimension
  • Both: metric sort with series tiebreaker (previous behavior, preserved for existing charts)
  • Neither: no explicit ordering

Changes:

  • buildQuery.ts: sort_by_metric and sort_by_series are now independent controls
  • controlPanel.tsx: new "Sort by series" checkbox (default: false for new charts)
  • buildQuery.test.ts: test coverage for all four sort combinations
  • DB migration: sets sort_by_series=true for all existing word_cloud charts to preserve backward compatibility

Fixes #39072

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

Not applicable — no visual change to the word cloud rendering. The new "Sort by series" checkbox appears in the Query section of the word cloud control panel, below "Sort by metric".

TESTING INSTRUCTIONS

  1. Create a new Word Cloud chart — verify "Sort by series" is unchecked by default
  2. Enable "Sort by metric" only — verify the generated query has a single ORDER BY <metric> DESC
  3. Enable both "Sort by metric" and "Sort by series" — verify the query has ORDER BY <metric> DESC, <series> ASC
  4. Open an existing word cloud chart — verify "Sort by series" is checked (set by migration)

Unit tests:

cd superset-frontend
npm run test -- plugins/plugin-chart-word-cloud/test/buildQuery.test.ts --no-coverage

Migration test:

superset db upgrade   # sets sort_by_series=true for existing word_cloud charts
superset db downgrade # removes sort_by_series from word_cloud charts

ADDITIONAL INFORMATION

Runtime estimate: Migration processes only word_cloud charts (typically a small number). Tested against 6 charts in < 1 second. No downtime required.

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Apr 2, 2026

Code Review Agent Run #e84fc6

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 131beef..131beef
    • superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts
    • superset-frontend/plugins/plugin-chart-word-cloud/test/buildQuery.test.ts
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • Eslint (Linter) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@dosubot dosubot Bot added data:connect:druid Related to Druid explore:performance Related to performance of Explore viz:charts:wordcloud Related to the Wordcloud chart labels Apr 2, 2026
@Vitor-Avila
Copy link
Copy Markdown
Contributor

Hey @bdonovan1,

Thank you so much for the complete issue description with steps to repro + a fix with test coverage! I just approved CI here, but note you need a rebase.

This makes sense to me, but I'm also tagging @michael-s-molina, @villebro, @rusackas and @betodealmeida to see if they have any concerns.

Technically this could change existing charts for users that have a row limit smaller than the amount of data (and multiple dimensions with the same metric result), but I think it's ok.

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.47%. Comparing base (bf7ec85) to head (2e1f747).
⚠️ Report is 4 commits behind head on master.

Additional details and impacted files
@@           Coverage Diff           @@
##           master   #39073   +/-   ##
=======================================
  Coverage   64.47%   64.47%           
=======================================
  Files        2558     2558           
  Lines      133241   133241           
  Branches    30955    30955           
=======================================
+ Hits        85901    85902    +1     
+ Misses      45849    45848    -1     
  Partials     1491     1491           
Flag Coverage Δ
hive 39.85% <ø> (ø)
javascript 66.32% <100.00%> (+<0.01%) ⬆️
mysql 60.46% <ø> (ø)
postgres 60.54% <ø> (ø)
presto 41.63% <ø> (ø)
python 62.12% <ø> (ø)
sqlite 60.17% <ø> (ø)
unit 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@bdonovan1 bdonovan1 force-pushed the fix/word-cloud-druid-topn-sort branch from 131beef to 573c520 Compare April 2, 2026 23:10
@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Apr 3, 2026

Code Review Agent Run #42a88f

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 573c520..573c520
    • superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts
    • superset-frontend/plugins/plugin-chart-word-cloud/test/buildQuery.test.ts
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • Eslint (Linter) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Copy link
Copy Markdown
Contributor

@alexandrusoare alexandrusoare left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM but waiting on the other reviewers mentioned to see if they have any concerns

orderby.push([metric, false]);
}
if (series) {
} else if (series) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simple change, but I don't know the side effects of this change.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possible side-effects:

Unstable ordering among ties (same metric value) If multiple “series” values have the exact same metric value, then without the secondary series ASC tiebreaker, the ordering between those tied rows becomes database/engine dependent (and may vary between runs).

LIMIT / row limit can return different members of a “tied” group If the query has a row limit (Top-N) and there are ties at the cutoff boundary, the previous secondary sort gave a consistent way to decide which tied rows make it into the result set. Without it, you may get different tied rows.

I just want to make sure we don't get another issue about ties not being sorted as desired.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah that's what I thought too, so I think it's important to get thoughts from others. The fact is with the additional sorting, while it was consistent, users had no control over it I think? Like unsure if you can choose to be asc, desc, etc.

Do we need to look into make this configurable? One, the other, or both?

Copy link
Copy Markdown
Contributor Author

@bdonovan1 bdonovan1 Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @rusackas @alexandrusoare and @Vitor-Avila , thanks for taking a look at this! It sounds like there are concerns about consistently handling tie-breakers when two words have the same popularity, which is valid. I am fine with making this configurable with a second checkbox if necessary. Just one thing I would like to understand:

What is the use case for sorting only by series? Based on my own observations, this will return the top N unique words alphabetically, regardless of popularity in the dataset. That is the current default behavior, but it isn't really a "word cloud" in the traditional sense. In high cardinality datasets, this returns long-tail results that are a bit arbitrary and nonsensical. E.g. in our use case, we are generating a word cloud of most popular terms that end users typed into a particular search box. Sort-only-by-series gives us the few random users who typed something like !!, !., !.., etc (all with a count of 1), rather than actual popular search terms.

I almost feel like it should ALWAYS sort by metric, and sort by series should be optional for handling tie-breakers when two words have the exact same count. But I am happy to go either route:

  1. Add a second checkbox, so you can independently decide to sort by metric and/or series.
  2. Keep one checkbox, but change the meaning / description. Default behavior is to only sort by metric (DESC). But if you check the box, it will sort by metric (DESC), series (ASC).

Of those two choices (1) is more backwards-compatible, but (2) is more intuitive. Please advise on which choice we should make.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with you, not sure what's the purpose of sorting only by dimension, but can see users wanting to either doing only metric, or metric + dimension.

@yousoph @kasiazjc some product input would be greatly helpful here :) What are your thoughts on a sort control for Word Cloud? Currently, it always sort by dimension, but there's a checkbox to also sort by metric. The purpose of this PR is to be able to sort only for metric (as sorting by both is inefficient for some DBs) while keeping backwards compatibility. Some options:

  1. Keep the "Sort by Metric" checkbox in there, add a new "Sort by Series" checkbox that's enabled by default (for backwards compatibility).
  2. Have a dropdown selector to choose between sorting options (Similar to the X-Axis sort, or the un-aggregated table "Ordering" dropdown).

THank you!

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think option 1 makes sense to me (keep "Sort by "Metric", add "Sort by Series")
But one change from what you said Vitor - I think it's enabled by default for existing charts but ideally we could not check it for new Word Cloud charts going forward

(The "Ordering" dropdown in tables is multi select with both asc and desc options that you can select both on, that seems like it could be confusing here... and separately may be worth looking into on those charts, but that's a different problem :) )

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks, @yousoph!

@bdonovan1 do you think it would be possible to add a second checkbox, and have "Sort by Series" enabled for existing charts, but not checked for new charts? I think that might need to require a DB migration.

Thank you!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the discussion and quick consensus everyone!

I have updated the approach to add a separate, independent checkbox for "sort by series". This allows users to consistently handle tiebreakers (with a performance cost on some DBs like Druid). The default for this new checkbox is False - new charts will need to explicitly check the box if they want alphabetic tiebreakers.

However, as suggested by @Vitor-Avila , I also added a DB migration so existing charts will not be affected. If you apply the DB upgrade, it will automatically set this value to True for existing word cloud charts. So, existing users will still get the alphabetic tiebreakers without having to modify their charts.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes Word Cloud query generation so that enabling sort_by_metric no longer adds a secondary series sort, which can prevent Apache Druid from using its native TopN optimization and cause severe performance regressions on high-cardinality dimensions.

Changes:

  • Make series ordering mutually exclusive with sort_by_metric in the Word Cloud buildQuery implementation.
  • Add unit tests to cover orderby behavior for both sort_by_metric=true and sort_by_metric=false.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts Updates orderby construction to avoid multi-column ordering when metric sorting is enabled.
superset-frontend/plugins/plugin-chart-word-cloud/test/buildQuery.test.ts Adds test coverage validating the resulting orderby output in both sorting modes.

@bito-code-review
Copy link
Copy Markdown
Contributor

The change modifies the ordering logic in buildQuery.ts to make it mutually exclusive: if sort_by_metric is true, it orders by the metric descending; otherwise, if series is present, it orders by series ascending. Previously, both could apply simultaneously. Side effects include altered query results when both conditions are met—now only the primary sort (metric) takes precedence, potentially changing data display order in word clouds if users expect secondary sorting by series.

superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts

if (sort_by_metric && metric) {
    orderby.push([metric, false]);
  } else if (series) {
    orderby.push([series, true]);
  }

@bdonovan1 bdonovan1 force-pushed the fix/word-cloud-druid-topn-sort branch from 573c520 to 427d94b Compare April 13, 2026 19:47
@pull-request-size pull-request-size Bot added size/L and removed size/M labels Apr 13, 2026
@github-actions github-actions Bot added the risk:db-migration PRs that require a DB migration label Apr 13, 2026
@bdonovan1
Copy link
Copy Markdown
Contributor Author

bdonovan1 commented Apr 13, 2026

I have manually tested the change with a local superset installation. I verified that the new "sort by series" checkbox appears, and defaults to False on new charts. I also ran the DB upgrade / downgrade command, verified that it succeeds, and that existing word cloud charts default to True.

Screenshot 2026-04-13 at 12 46 02 PM

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Apr 13, 2026

Code Review Agent Run #2b28a2

Actionable Suggestions - 0
Additional Suggestions - 1
  • superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py - 1
    • Type Hint Specificity · Line 47-53
      To align with the project's type hinting standards requiring proper typing for all new Python code, update the dict type hints to dict[str, Any] for better specificity and MyPy compliance. This matches patterns in similar recent migrations.
      Code suggestion
       @@ -25,6 +25,7 @@
        from alembic import op
        from sqlalchemy import Column, Integer, String, Text
        from sqlalchemy.ext.declarative import declarative_base
      + from typing import Any
        from superset import db
        from superset.migrations.shared.utils import paginated_update
        from superset.utils import json
       @@ -47,2 +48,2 @@
      - def upgrade_params(params: dict) -> dict:
      + def upgrade_params(params: dict[str, Any]) -> dict[str, Any]:
       @@ -53,2 +54,2 @@
      - def downgrade_params(params: dict) -> dict:
      + def downgrade_params(params: dict[str, Any]) -> dict[str, Any]:
Review Details
  • Files reviewed - 4 · Commit Range: d56d31b..427d94b
    • superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/buildQuery.ts
    • superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/controlPanel.tsx
    • superset-frontend/plugins/plugin-chart-word-cloud/test/buildQuery.test.ts
    • superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • Eslint (Linter) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Comment on lines +65 to +66
params = json.loads(slc.params or "{}")
slc.params = json.dumps(upgrade_params(params))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The migration upgrade path assumes every params value is valid JSON object data. If any word_cloud slice has malformed JSON or a non-object JSON payload, json.loads/dictionary mutation will raise and abort the migration for the whole release. Guard parsing and skip invalid rows so one bad record does not break migration execution. [possible bug]

Severity Level: Critical 🚨
- ❌ Database upgrade can fail on single corrupt word_cloud slice.
- ⚠️ Prevents deploying code changes relying on this migration.
- ⚠️ Inconsistent with other migrations that guard slice JSON parsing.
Suggested change
params = json.loads(slc.params or "{}")
slc.params = json.dumps(upgrade_params(params))
try:
params = json.loads(slc.params or "{}")
if not isinstance(params, dict):
continue
slc.params = json.dumps(upgrade_params(params))
except (json.JSONDecodeError, TypeError):
continue
Steps of Reproduction ✅
1. Start Superset against a database where at least one `word_cloud` chart row in
`slices.params` is malformed JSON or a non-object value (for example, `params='\"just a
string\"'` or `params='not-json'`) in the `slices` table; these rows are what `Slice`
models in
`superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py:40-45`
read.

2. Run the database upgrade (e.g. `superset db upgrade` / Alembic `upgrade head`), which
executes the `upgrade()` function at
`superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py:58-67`.

3. Inside `upgrade()`, the loop over
`paginated_update(session.query(Slice).filter(Slice.viz_type == "word_cloud"))` at lines
62-64 yields the malformed slice, and line 65 executes `params = json.loads(slc.params or
"{}")` without any `try/except`.

4. For a non-JSON value this raises `json.JSONDecodeError`, or for a non-dict JSON (e.g.
string/array) `upgrade_params(params)` at lines 47-50/66 will hit
`params["sort_by_series"] = True` and raise `TypeError` ("list indices must be integers or
slices, not str"), propagating out of the loop and causing the whole migration (and thus
the overall upgrade command) to fail instead of skipping just the bad slice; this risk is
reinforced by older migrations like
`2017-01-24_12-31_db0c65b146bd_update_slice_model_json.py:6-13` and
`2018-04-10_11-19_bf706ae5eb46_cal_heatmap_metric_to_metrics.py:5-14`, which already wrap
similar `json.loads(slc.params or "{}")` calls in `try/except` to tolerate bad slice data.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py
**Line:** 65:66
**Comment:**
	*Possible Bug: The migration upgrade path assumes every `params` value is valid JSON object data. If any `word_cloud` slice has malformed JSON or a non-object JSON payload, `json.loads`/dictionary mutation will raise and abort the migration for the whole release. Guard parsing and skip invalid rows so one bad record does not break migration execution.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@betodealmeida do you know if we need this? I imagine we have schema validation to prevent malformed charts from being created but not 100% sure

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need it, but it wouldn't hurt. What would be better is to define these columns as JSON, like we've been doing for newer models, since then you get the validation for free.

For this PR I think it's fine, it's very unlikely that this would be needed.

Comment on lines +77 to +78
params = json.loads(slc.params or "{}")
slc.params = json.dumps(downgrade_params(params))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The downgrade path has the same unguarded JSON parsing/object assumption, so a single invalid params value can raise and stop downgrade entirely. Add safe parsing with type checks to keep downgrade resilient across corrupted rows. [possible bug]

Severity Level: Critical 🚨
- ❌ Database downgrade can be blocked by one bad word_cloud slice.
- ⚠️ Makes rollbacks brittle in presence of corrupted chart params.
- ⚠️ Diverges from established resilient patterns in similar migrations.
Suggested change
params = json.loads(slc.params or "{}")
slc.params = json.dumps(downgrade_params(params))
try:
params = json.loads(slc.params or "{}")
if not isinstance(params, dict):
continue
slc.params = json.dumps(downgrade_params(params))
except (json.JSONDecodeError, TypeError):
continue
Steps of Reproduction ✅
1. Use a database where at least one `word_cloud` slice row has malformed or non-object
JSON in `slices.params` (same concrete shape as in the upgrade case, e.g.
`params='not-json'` or `params='\"string\"'`) referenced by the `Slice` model at
`superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py:40-45`.

2. Run a database downgrade that crosses this revision (e.g. `superset db downgrade` /
Alembic `downgrade`), which calls the `downgrade()` function at
`superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py:70-79`.

3. In `downgrade()`, the loop over
`paginated_update(session.query(Slice).filter(Slice.viz_type == "word_cloud"))` at lines
74-76 yields the malformed slice, and line 77 runs `params = json.loads(slc.params or
"{}")` with no error handling.

4. A malformed `params` value triggers `json.JSONDecodeError` at line 77, or a non-dict
JSON causes `downgrade_params(params)` at lines 53-55/78 to call
`params.pop("sort_by_series", None)` and raise `AttributeError` for non-mapping types,
bubbling up and aborting the entire downgrade operation; other slice-migration files like
`2024-03-01_10-47_be1b217cd8cd_big_number_kpi_single_metric.py:14-27,37-50` and
`2025-04-13_22-10_378cecfdba9f_merge_x_axis_sort_series_with_x_axis_.py:7-27,37-55`
demonstrate that migrations touching `Slice.params` typically wrap
`json.loads(slc.params)` in `try/except` specifically to avoid this class of failure.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py
**Line:** 77:78
**Comment:**
	*Possible Bug: The downgrade path has the same unguarded JSON parsing/object assumption, so a single invalid `params` value can raise and stop downgrade entirely. Add safe parsing with type checks to keep downgrade resilient across corrupted rows.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the two comments are related: addressing one covers the other

Comment on lines +46 to +48
'Include a secondary sort by series name (ascending). '
+ 'This produces consistent ordering when multiple series have the '
+ 'same metric value, but may reduce query performance on some databases.',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we're calling it a "secondary" sorting rule, but the controls actually allow independent selection, right? So someone would be able to enable just it, making it "primary"?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, maybe we can remove "secondary" from this wording.

@Vitor-Avila
Copy link
Copy Markdown
Contributor

@bdonovan1 this LGTM! Left a minor comment and approved CI. Will reach out to a codeowner too since it's needed here

Copy link
Copy Markdown
Contributor

@Vitor-Avila Vitor-Avila left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a minor suggestion on display text. Otherwise LGTM

@rusackas
Copy link
Copy Markdown
Member

@bdonovan1 thanks for dialing this in! Ready to click the merge button, if you're willing to fix pre-commit and address the last little details in comments above :) Thanks in advance!

@netlify
Copy link
Copy Markdown

netlify Bot commented Apr 15, 2026

Deploy Preview for superset-docs-preview ready!

Name Link
🔨 Latest commit 864a07d
🔍 Latest deploy log https://app.netlify.com/projects/superset-docs-preview/deploys/69e118e51374d70008821627
😎 Deploy Preview https://deploy-preview-39073--superset-docs-preview.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@bdonovan1
Copy link
Copy Markdown
Contributor Author

Thanks everyone! I addressed the two comments in the review, and I believe I also fixed the code style, which was getting caught by Prettier and failing the pre-commit check. If all checks pass, then I think we are ready to merge!

@bito-code-review
Copy link
Copy Markdown
Contributor

bito-code-review Bot commented Apr 15, 2026

Code Review Agent Run #0ffd47

Actionable Suggestions - 0
Review Details
  • Files reviewed - 2 · Commit Range: 427d94b..933dc88
    • superset-frontend/plugins/plugin-chart-word-cloud/src/plugin/controlPanel.tsx
    • superset/migrations/versions/2026-04-13_19-28_fd0c8583b46d_add_sort_by_series_to_word_cloud_charts.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@bdonovan1 bdonovan1 closed this Apr 16, 2026
@bdonovan1 bdonovan1 reopened this Apr 16, 2026
Brian Donovan added 5 commits April 20, 2026 11:09
…n when sort_by_metric enabled

When sort_by_metric is true, buildQuery unconditionally appended a
secondary ORDER BY series ASC alongside the metric sort. On Druid, any
multi-column ORDER BY prevents the native TopN query optimization,
forcing a full GroupBy scan that can cause dramatic slowdowns and
timeouts on high-cardinality dimensions.

Make the series sort mutually exclusive with sort_by_metric, and add
test coverage for both sort behaviors.

Fixes apache#39072
Make sort_by_metric and sort_by_series independent controls instead of
the previous behavior where series sort was unconditionally appended
alongside the metric sort. This allows users to choose metric-only
sorting (enabling Druid TopN optimization) while optionally adding a
secondary series sort for tie-breaking.

Changes:
- buildQuery.ts: sort_by_metric and sort_by_series are now independent
- controlPanel.tsx: add sort_by_series checkbox (default: false)
- buildQuery.test.ts: cover all four sort combinations
- DB migration: set sort_by_series=true for existing word_cloud charts
  to preserve backward compatibility

Fixes apache#39072
Remove "secondary" from the description since sort_by_series and
sort_by_metric are independent controls — either can be used alone.
Clarify that the tiebreaker behavior only applies when both are enabled.

Fixes prettier formatting (+ at end of line, not beginning).
Guard against malformed or non-dict params in word_cloud slices
during upgrade/downgrade. Skips invalid rows instead of aborting
the entire migration.
@bdonovan1 bdonovan1 force-pushed the fix/word-cloud-druid-topn-sort branch from 864a07d to 2e1f747 Compare April 20, 2026 18:11
@Vitor-Avila Vitor-Avila merged commit 78fb096 into apache:master Apr 20, 2026
85 of 86 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data:connect:druid Related to Druid explore:performance Related to performance of Explore plugins risk:db-migration PRs that require a DB migration size/L viz:charts:wordcloud Related to the Wordcloud chart

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Word Cloud unconditional secondary sort prevents Druid TopN optimization, causing query timeout

7 participants