doc: design for user-defined cluster replica sizes by jubrad · Pull Request #36186 · MaterializeInc/materialize

jubrad · 2026-04-21T16:15:40Z

Summary

Design document for making cluster replica sizes durable catalog objects with SQL DDL support.

Prototype implementation: catalog: add durable cluster replica sizes with SQL DDL #35971

Problem statement: env-var-only sizes require restarts, no ad-hoc experimentation, no safety guarantees
Success criteria: no-restart create/drop, SQL visibility, in-use protection, backwards compatibility
Solution: durable ClusterReplicaSize catalog object with ClusterReplicaSizeId, builtin sync, CREATE/DROP CLUSTER REPLICA SIZE DDL
Alternatives: string key, mutable sizes with drift tracking

🤖 Generated with Claude Code

Add ClusterReplicaSize as a new durable catalog object type. This lays the groundwork for making cluster replica sizes persistent and user-definable via SQL DDL. The new type follows the established pattern (NetworkPolicy, Cluster, etc.): - Proto Key/Value definitions in catalog-protos with Arbitrary derives - DurableType impl converting between ReplicaAllocation and raw proto-compatible fields (the Value stores raw u64/string fields rather than ReplicaAllocation directly, since Numeric doesn't implement Eq/Ord needed by TableTransaction) - StateUpdateKind variant with collection type mapping - Transaction CRUD methods (insert, remove, get) - Snapshot field and persist read/write support - Memory StateUpdateKind variant applied in pre-cluster ordering - Debug trace support - No-op v81→v82 catalog migration (existing catalogs have no ClusterReplicaSize entries; builtins will be populated in a follow-up commit) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add SQL DDL for creating and dropping user-defined cluster replica sizes. Sizes are immutable once created; drop and recreate to change. Syntax: CREATE CLUSTER REPLICA SIZE <name> ( CREDITS PER HOUR = '<numeric>', -- required WORKERS = <n>, -- default 1 SCALE = <n>, -- default 1 MEMORY LIMIT = '<size>', -- e.g. '4GiB', '512MiB', '1GB' CPU LIMIT = '<cpu>', -- e.g. '0.5' (cores), '500m' (millicpus) DISK LIMIT = '<size>', CPU EXCLUSIVE = <bool>, DISABLED = <bool>, IS CC = <bool>, -- default true SWAP ENABLED = <bool>, NODE SELECTORS = '<json>' -- e.g. '{"kubernetes.io/arch": "arm64"}' ); DROP CLUSTER REPLICA SIZE <name>; Access control: - Gated behind enable_custom_cluster_replica_sizes feature flag - mz_system bypasses the feature flag (always allowed) - RBAC requires superuser for both CREATE and DROP - Cannot drop builtin sizes (from env var) or sizes in use by replicas - Cannot create a size with a name that already exists Human-readable units: - Memory/disk: GiB, MiB, GB, MB, kB, or raw bytes - CPU: cores (0.5), millicpus (500m), or raw nanocpus Structured errors for drop rejection (ClusterReplicaSizeInUse, ReadOnlyClusterReplicaSize). Audit log events for create/drop with ObjectType::ClusterReplicaSize. Tests: - SLT end-to-end test covering feature flag gating, create, use with cluster, in-use drop rejection, builtin drop rejection, human- readable units (GiB, millicpus), and cleanup - Parser roundtrip test - Updated snapshot tests Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add a new internal system table that exposes the full cluster replica size configuration, including fields not shown in the public mz_catalog.mz_cluster_replica_sizes table: - node_selectors (JSONB): Kubernetes node selectors for scheduling - is_cc (bool): whether this is a modern cc-style size - swap_enabled (bool): whether swap is the spill-to-disk mechanism - cpu_exclusive (bool): exclusive CPU access per process - disabled (bool): whether the size is blocked from new replicas - builtin (bool): whether from env var or user-defined via SQL The table is in mz_internal with access restricted to mz_support (and mz_system as superuser), keeping internal scheduling details out of the public catalog. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Design document for making cluster replica sizes durable catalog objects with SQL DDL support (CREATE/DROP CLUSTER REPLICA SIZE). Covers: problem statement, success criteria, durable storage design with ClusterReplicaSizeId, builtin sync, SQL DDL syntax, access control, credit enforcement, system tables, and alternatives considered. Prototype: MaterializeInc#35971 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

jubrad and others added 4 commits April 13, 2026 19:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doc: design for user-defined cluster replica sizes#36186

doc: design for user-defined cluster replica sizes#36186
jubrad wants to merge 4 commits intoMaterializeInc:mainfrom
jubrad:jubrad/cluster-replica-sizes-design

jubrad commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jubrad commented Apr 21, 2026

Summary

Contents

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant