postgres controllers metrics#1811
postgres controllers metrics#1811limak9182 wants to merge 5 commits intofeature/database-controllersfrom
Conversation
|
CLA Assistant Lite bot: I have read the CLA Document and I hereby sign the CLA You can retrigger this bot by commenting recheck in this Pull Request |
| @@ -0,0 +1,33 @@ | |||
| package metrics | |||
There was a problem hiding this comment.
If ports and adapters, this seems to be a secondary port for us in general for the core of both pgcluster and database.
Placing It here means we have a contract/port for the recorder.
Placing It inside let's say postgresql/shared/ports/metrics.go with name Set as Metrics.go would be more like It I think.
Ie. in business core we use metrics, we provide metrics. A recorder is what metrics depends on. It's implementation detail of the metrics area. Which could stay outside of the core. You see what I mean?
There was a problem hiding this comment.
I get it, thanks. I'll align it with approach we agreed on.
| Scheme: mgr.GetScheme(), | ||
| Recorder: mgr.GetEventRecorderFor("postgresdatabase-controller"), | ||
| Metrics: pgRecorder, | ||
| FleetCollector: pgFleetCollector, |
There was a problem hiding this comment.
is there a way to use naming that also share more intent? We have Metrics mapped to pgRecorder and FleetCollector -> pgFleetCollector. It might be hard for reader to understand the purpose of both of them
| return clustercore.PostgresClusterService(ctx, rc, req) | ||
| rc := &clustercore.ReconcileContext{Client: r.Client, Scheme: r.Scheme, Recorder: r.Recorder, Metrics: r.Metrics} | ||
| result, err := clustercore.PostgresClusterService(ctx, rc, req) | ||
| r.FleetCollector.CollectClusterMetrics(ctx, r.Client, r.Metrics) |
| } | ||
|
|
||
| func (p *PrometheusRecorder) SetDatabasePhases(phases map[string]float64) { | ||
| databases.Reset() |
| clusters = prometheus.NewGaugeVec(prometheus.GaugeOpts{ | ||
| Name: "splunk_operator_postgres_clusters", | ||
| Help: "Current number of PostgresCluster resources by status phase.", | ||
| }, []string{"phase", "pooler_enabled"}) |
There was a problem hiding this comment.
those are labels isnt it? I wonder if we shouldnt split pooler from phases as they are not really related?
Description
Adds comprehensive Prometheus metrics for the PostgreSQL controllers using a hexagonal
(ports & adapters) pattern — the domain code depends only on a
Recorderinterface, neveron Prometheus directly.
New package:
pkg/postgresql/metrics/ports.go—Recorderinterface (the port). Core service packages import only this.prometheus.go—PrometheusRecorderadapter: 6 metric families withsplunk_operator_postgres_prefix, registered against the controller-runtime metrics registry.noop.go—NoopRecorderfor unit tests.collector.go—FleetCollectorthat recomputes fleet-state gauges from the informer cache after each reconcile.Three-layer metrics collection:
controller_runtime_reconcile_total,_time_seconds,_errors_totalFleetCollectorlists CRs from cache and sets gaugesIncStatusTransition()called automatically insidepersistStatus/setStatus— zero manual metric calls in service codeCustom metrics (6 families):
status_transitions_totalclustersdatabasesmanaged_userspoolerspooler_instancesDesign decisions:
IncStatusTransitionis called insidepersistStatus/setStatus, so every condition write is automatically captured with no explicit calls scattered through service codecontroller,condition,status,reason,phase) — no per-resourcename/namespacelabelsNoopRecorder) and adapter swappabilitypkg/splunk/client/metrics/is untouchedKey Changes
pkg/postgresql/metrics/— new package: port interface, Prometheus adapter, noop adapter, fleet collectorpkg/postgresql/cluster/core/cluster.go—setStatus,syncPoolerStatus,syncStatusnow acceptRecorderand emitIncStatusTransitionautomaticallypkg/postgresql/database/core/database.go—persistStatusnow acceptsRecorderand emitsIncStatusTransitionautomatically. Also adds 2 missingupdateStatuscalls on error paths (role patch failure, database reconcile failure)pkg/postgresql/{cluster,database}/core/types.go—Metrics pgmetrics.Recorderfield added toReconcileContextinternal/controller/postgres{cluster,database}_controller.go— injectMetricsintoReconcileContext, call fleet collector after each reconcilecmd/main.go— createPrometheusRecorder, register with controller-runtime metrics registry, pass to controllersTesting and Verification
Setting up Grafana + Prometheus on KIND
1. Install the monitoring stack
kubectl port-forward svc/kube-prometheus-grafana -n monitoring 3000:80
Open http://localhost:3000 — login: admin / admin
The Prometheus datasource is auto-configured. Query any metric with the splunk_operator_postgres_ prefix.
Related Issues
Jira tickets, GitHub issues, Support tickets...
PR Checklist