console: push cluster filter down in replica utilization history query#37323
Draft
jubrad wants to merge 1 commit into
Draft
console: push cluster filter down in replica utilization history query#37323jubrad wants to merge 1 commit into
jubrad wants to merge 1 commit into
Conversation
The cluster-detail page recomputes the whole-fleet utilization rollup on every load and only filters to the viewed cluster at the very end. Because the cluster predicate enters at the final join, the optimizer can't push it into the shared replica_utilization_history_binned CTE that the five Top-1 argmax CTEs all read, so the metrics aggregate and all five Top-1 passes run over every replica in the deployment before the last step discards other clusters. Three changes scope the heavy work to the requested cluster(s): - filter replica_history to the cluster up front (both UNION branches), so the cluster predicate reaches the source reads instead of the final join; - restrict the offline-event scan to that replica set, turning a full status-history scan into a lookup; - drop the redundant replica_history re-join in the binned CTE (the rollup is already derived from it, and it could fan out on a replica that ever had two sizes). Output is unchanged (verified row-for-row on synthetic fleets); only execution differs. On a synthetic fleet the per-cluster page-load latency drops ~8-15x and, unlike before, no longer grows with total fleet size. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bfef2b5 to
f55904e
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The cluster-detail page recomputes the whole-fleet replica-utilization rollup on every load and only filters to the viewed cluster at the very end. Because the
cluster_idpredicate enters at the final join, the optimizer can't push it into the sharedreplica_utilization_history_binnedCTE that the five Top-1 argmax CTEs all read — so the metrics aggregate and all five Top-1 passes run over every replica in the deployment before the last step discards other clusters.This pushes the filter to the front:
replica_historyto the cluster in both UNION branches, so the predicate reaches the source reads instead of the final join;replica_historyre-join in the binned CTE (the rollup is already derived from it, and it could fan out on a replica that ever had two sizes).Output is unchanged — verified row-for-row against the original query on synthetic fleets. Only execution differs:
EXPLAINshows thecluster_idfilter pushed down to thereplica_historysource reads, with the five Top-1 passes now scoped to one cluster.Console-only; no catalog or index changes. On a synthetic single-worker fleet (matching
mz_catalog_server) the per-cluster page-load p50 drops ~8–15× and, unlike before, no longer grows with total fleet size:The gain widens with concurrent viewers and with fleet size. In production the win is larger still: these synthetic tables are unindexed, whereas
mz_cluster_replica_metrics_history/mz_cluster_replica_status_historyare indexed onreplica_id, so the pushed-down replica set turns the residual full scans into lookups.