Skip to content

console: push cluster filter down in replica utilization history query#37323

Draft
jubrad wants to merge 1 commit into
MaterializeInc:mainfrom
jubrad:console-replica-utilization-pushdown
Draft

console: push cluster filter down in replica utilization history query#37323
jubrad wants to merge 1 commit into
MaterializeInc:mainfrom
jubrad:console-replica-utilization-pushdown

Conversation

@jubrad

@jubrad jubrad commented Jun 27, 2026

Copy link
Copy Markdown
Member

The cluster-detail page recomputes the whole-fleet replica-utilization rollup on every load and only filters to the viewed cluster at the very end. Because the cluster_id predicate enters at the final join, the optimizer can't push it into the shared replica_utilization_history_binned CTE that the five Top-1 argmax CTEs all read — so the metrics aggregate and all five Top-1 passes run over every replica in the deployment before the last step discards other clusters.

This pushes the filter to the front:

  • filter replica_history to the cluster in both UNION branches, so the predicate reaches the source reads instead of the final join;
  • restrict the offline-event scan to that replica set (a full status-history scan becomes a lookup);
  • drop the redundant replica_history re-join in the binned CTE (the rollup is already derived from it, and it could fan out on a replica that ever had two sizes).

Output is unchanged — verified row-for-row against the original query on synthetic fleets. Only execution differs: EXPLAIN shows the cluster_id filter pushed down to the replica_history source reads, with the five Top-1 passes now scoped to one cluster.

Console-only; no catalog or index changes. On a synthetic single-worker fleet (matching mz_catalog_server) the per-cluster page-load p50 drops ~8–15× and, unlike before, no longer grows with total fleet size:

clusters old p50 (1 viewer) new p50 (1 viewer) speedup
25 3.7 s 0.48 s 7.7×
100 14.8 s 1.2 s 12.4×
200 29.3 s 2.2 s 13.0×

The gain widens with concurrent viewers and with fleet size. In production the win is larger still: these synthetic tables are unindexed, whereas mz_cluster_replica_metrics_history / mz_cluster_replica_status_history are indexed on replica_id, so the pushed-down replica set turns the residual full scans into lookups.

The cluster-detail page recomputes the whole-fleet utilization rollup on
every load and only filters to the viewed cluster at the very end. Because
the cluster predicate enters at the final join, the optimizer can't push it
into the shared replica_utilization_history_binned CTE that the five Top-1
argmax CTEs all read, so the metrics aggregate and all five Top-1 passes run
over every replica in the deployment before the last step discards other
clusters.

Three changes scope the heavy work to the requested cluster(s):

- filter replica_history to the cluster up front (both UNION branches), so
  the cluster predicate reaches the source reads instead of the final join;
- restrict the offline-event scan to that replica set, turning a full
  status-history scan into a lookup;
- drop the redundant replica_history re-join in the binned CTE (the rollup
  is already derived from it, and it could fan out on a replica that ever
  had two sizes).

Output is unchanged (verified row-for-row on synthetic fleets); only
execution differs. On a synthetic fleet the per-cluster page-load latency
drops ~8-15x and, unlike before, no longer grows with total fleet size.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@jubrad jubrad force-pushed the console-replica-utilization-pushdown branch from bfef2b5 to f55904e Compare June 27, 2026 03:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant