Skip to content

doc: design for user-defined cluster replica sizes#36186

Draft
jubrad wants to merge 4 commits intoMaterializeInc:mainfrom
jubrad:jubrad/cluster-replica-sizes-design
Draft

doc: design for user-defined cluster replica sizes#36186
jubrad wants to merge 4 commits intoMaterializeInc:mainfrom
jubrad:jubrad/cluster-replica-sizes-design

Conversation

@jubrad
Copy link
Copy Markdown
Contributor

@jubrad jubrad commented Apr 21, 2026

Summary

Design document for making cluster replica sizes durable catalog objects with SQL DDL support.

Contents

  • Problem statement: env-var-only sizes require restarts, no ad-hoc experimentation, no safety guarantees
  • Success criteria: no-restart create/drop, SQL visibility, in-use protection, backwards compatibility
  • Solution: durable ClusterReplicaSize catalog object with ClusterReplicaSizeId, builtin sync, CREATE/DROP CLUSTER REPLICA SIZE DDL
  • Alternatives: string key, mutable sizes with drift tracking

🤖 Generated with Claude Code

jubrad and others added 4 commits April 13, 2026 19:43
Add ClusterReplicaSize as a new durable catalog object type. This
lays the groundwork for making cluster replica sizes persistent and
user-definable via SQL DDL.

The new type follows the established pattern (NetworkPolicy, Cluster,
etc.):

- Proto Key/Value definitions in catalog-protos with Arbitrary derives
- DurableType impl converting between ReplicaAllocation and raw
  proto-compatible fields (the Value stores raw u64/string fields
  rather than ReplicaAllocation directly, since Numeric doesn't
  implement Eq/Ord needed by TableTransaction)
- StateUpdateKind variant with collection type mapping
- Transaction CRUD methods (insert, remove, get)
- Snapshot field and persist read/write support
- Memory StateUpdateKind variant applied in pre-cluster ordering
- Debug trace support
- No-op v81→v82 catalog migration (existing catalogs have no
  ClusterReplicaSize entries; builtins will be populated in a
  follow-up commit)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add SQL DDL for creating and dropping user-defined cluster replica
sizes. Sizes are immutable once created; drop and recreate to change.

Syntax:
  CREATE CLUSTER REPLICA SIZE <name> (
      CREDITS PER HOUR = '<numeric>',  -- required
      WORKERS = <n>,                   -- default 1
      SCALE = <n>,                     -- default 1
      MEMORY LIMIT = '<size>',         -- e.g. '4GiB', '512MiB', '1GB'
      CPU LIMIT = '<cpu>',             -- e.g. '0.5' (cores), '500m' (millicpus)
      DISK LIMIT = '<size>',
      CPU EXCLUSIVE = <bool>,
      DISABLED = <bool>,
      IS CC = <bool>,                  -- default true
      SWAP ENABLED = <bool>,
      NODE SELECTORS = '<json>'        -- e.g. '{"kubernetes.io/arch": "arm64"}'
  );
  DROP CLUSTER REPLICA SIZE <name>;

Access control:
- Gated behind enable_custom_cluster_replica_sizes feature flag
- mz_system bypasses the feature flag (always allowed)
- RBAC requires superuser for both CREATE and DROP
- Cannot drop builtin sizes (from env var) or sizes in use by replicas
- Cannot create a size with a name that already exists

Human-readable units:
- Memory/disk: GiB, MiB, GB, MB, kB, or raw bytes
- CPU: cores (0.5), millicpus (500m), or raw nanocpus

Structured errors for drop rejection (ClusterReplicaSizeInUse,
ReadOnlyClusterReplicaSize). Audit log events for create/drop with
ObjectType::ClusterReplicaSize.

Tests:
- SLT end-to-end test covering feature flag gating, create, use with
  cluster, in-use drop rejection, builtin drop rejection, human-
  readable units (GiB, millicpus), and cleanup
- Parser roundtrip test
- Updated snapshot tests

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Add a new internal system table that exposes the full cluster replica
size configuration, including fields not shown in the public
mz_catalog.mz_cluster_replica_sizes table:

  - node_selectors (JSONB): Kubernetes node selectors for scheduling
  - is_cc (bool): whether this is a modern cc-style size
  - swap_enabled (bool): whether swap is the spill-to-disk mechanism
  - cpu_exclusive (bool): exclusive CPU access per process
  - disabled (bool): whether the size is blocked from new replicas
  - builtin (bool): whether from env var or user-defined via SQL

The table is in mz_internal with access restricted to mz_support
(and mz_system as superuser), keeping internal scheduling details
out of the public catalog.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Design document for making cluster replica sizes durable catalog
objects with SQL DDL support (CREATE/DROP CLUSTER REPLICA SIZE).

Covers: problem statement, success criteria, durable storage design
with ClusterReplicaSizeId, builtin sync, SQL DDL syntax, access
control, credit enforcement, system tables, and alternatives
considered.

Prototype: MaterializeInc#35971

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant