Skip to content

Emit Prometheus metrics for RestateCluster effective configuration #123

@lukebond

Description

@lukebond

Motivation

The operator reconciles RestateCluster CRs and has access to their full parsed spec during each reconcile. Today, none of this configuration is exposed as Prometheus metrics. Operators who want to understand how their clusters are configured — retention durations, resource limits, concurrency settings, partition counts — have to query each CR individually via kubectl.

Exposing the effective configuration as Prometheus gauges makes this information queryable, alertable, and dashboardable alongside the runtime metrics that Restate already emits.

Proposal

Add per-RestateCluster Prometheus gauges, updated during each successful reconcile. All gauges labelled by namespace. Cleaned up when the CR is deleted (via the finalizer).

Configuration gauges (parsed from spec.compute.env[] and resource fields):

Metric Source Notes
restate_operator_cluster_journal_retention_seconds RESTATE_DEFAULT_JOURNAL_RETENTION Duration string → seconds
restate_operator_cluster_journal_retention_max_seconds RESTATE_MAX_JOURNAL_RETENTION Duration string → seconds
restate_operator_cluster_concurrency_limit RESTATE_WORKER__INVOKER__CONCURRENT_INVOCATIONS_LIMIT
restate_operator_cluster_throttle_rate RESTATE_WORKER__INVOKER__ACTION_THROTTLING__RATE
restate_operator_cluster_throttle_capacity RESTATE_WORKER__INVOKER__ACTION_THROTTLING__CAPACITY
restate_operator_cluster_compaction_threads RESTATE_STORAGE_LOW_PRIORITY_BG_THREADS
restate_operator_cluster_rocksdb_cache_bytes RESTATE_ROCKSDB_TOTAL_MEMORY_SIZE Quantity string → bytes
restate_operator_cluster_query_engine_memory_bytes RESTATE_ADMIN__QUERY_ENGINE__MEMORY_SIZE Quantity string → bytes
restate_operator_cluster_query_engine_parallelism RESTATE_ADMIN__QUERY_ENGINE__QUERY_PARALLELISM
restate_operator_cluster_num_partitions RESTATE_DEFAULT_NUM_PARTITIONS Immutable after creation
restate_operator_cluster_cpu_limit_millicores spec.compute.resources.limits.cpu Quantity string → millicores
restate_operator_cluster_memory_limit_bytes spec.compute.resources.limits.memory Quantity string → bytes
restate_operator_cluster_storage_request_bytes spec.storage.storageRequestBytes Already a number

Info metric (string labels):

Metric Labels
restate_operator_cluster_info namespace, image, replicas

Status gauges:

Metric Source
restate_operator_cluster_ready status.conditions[type=Ready].status as 1/0
restate_operator_cluster_generation_drift metadata.generation - status.observedGeneration

generation_drift non-zero for more than a few seconds means a spec change hasn't been reconciled — either pending or failing silently. Complements the Ready condition (which may stay True during non-disruptive spec changes that the operator hasn't picked up yet).

These metrics give the operator the same level of configuration visibility that the Restate server provides for runtime state — but for the cluster's declared configuration.

Why this is valuable

Alert correlation. Write alerts that reference configuration alongside runtime behaviour:

  • "Storage > 80% AND journal_retention > 86400s" → retention is likely the cause, not unbounded growth
  • "Memory > 90% AND concurrency_limit > 500" → fan-out is driving memory pressure
  • "CPU > 75% AND compaction_threads < 4 AND compaction_pending > 0" → compaction is bottlenecked

Today these correlations require a human to cross-reference kubectl with Grafana.

Fleet dashboards. A single Grafana table query shows every cluster's effective configuration. Useful for:

  • Which clusters are on non-default retention?
  • What's the distribution of concurrency limits across the fleet?
  • Are any clusters still on an old image version?
  • Which clusters have single-digit partition counts (potential hot-partition risk)?

Version management. Alert on clusters running outdated images: restate_operator_cluster_info{image!~".*:1\\.6\\..*"} finds clusters not on the current release. Useful for tracking rollout progress or finding stragglers.

Capacity planning. Aggregate resource allocation across the fleet: total CPU requested, total storage provisioned, distribution of memory limits by tier. Queryable without kubectl.

Incident triage. When investigating a customer issue, the on-call engineer can query the cluster's full configuration from Grafana without needing kubectl access or a port-forward. Reduces time-to-context during incidents.

Prior art

Flux CD's controllers emit reconciliation and configuration metrics for their CRs (GitRepository, Kustomization, HelmRelease). This gives operators fleet-wide visibility via Prometheus without needing to query individual resources. The pattern is: controllers emit what they know about the resources they manage.

Implementation notes

  • The prometheus crate and /metrics endpoint are already wired up (src/metrics.rs)
  • Use GaugeVec with namespace label for per-CR metrics
  • Parse values during the reconcile loop (spec is already deserialized)
  • On CR deletion (finalizer path), call remove_label_values to avoid stale series
  • Duration parsing: Restate uses Go-style durations ("24h", "1 week") — humantime crate or a small parser
  • Quantity parsing: K8s resource quantities ("4Gi", "100m") — k8s-openapi or a small parser
  • Env var extraction: iterate spec.compute.env once per reconcile, match by name, parse values. Missing env vars → metric not emitted (not 0)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions