Skip to content

Commit 2b15596

Browse files
lmicciniclaude
andcommitted
Add metrics Service and TLS support for InstanceHA
- Add a Kubernetes Service exposing the InstanceHA Prometheus metrics endpoint, with labels for automatic discovery by the telemetry operator's ScrapeConfig. - Add MetricsTLS field (tls.SimpleService) to the InstanceHa API, allowing TLS certificate configuration for the metrics endpoint. - Mount TLS certificate secret into the deployment and pass cert/key paths via environment variables when MetricsTLS is enabled. - Validate the MetricsTLS secret in the controller with hash tracking for automatic pod rollout on certificate rotation. - Add field indexer for the metrics TLS secret so the controller reconciles on secret changes. - Update the Python health/metrics server to wrap the HTTP socket with TLS when certificate environment variables are present. - Add RBAC annotation for Services to the InstanceHA controller. - Add functional tests for the metrics Service creation. - Update documentation for Prometheus metrics integration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent fc58bcd commit 2b15596

14 files changed

Lines changed: 414 additions & 27 deletions

File tree

apis/bases/instanceha.openstack.org_instancehas.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,18 @@ spec:
100100
default: 7410
101101
format: int32
102102
type: integer
103+
metricsTLS:
104+
description: MetricsTLS - Parameters related to TLS for the metrics
105+
endpoint
106+
properties:
107+
caBundleSecretName:
108+
description: CaBundleSecretName - holding the CA certs in a pre-created
109+
bundle file
110+
type: string
111+
secretName:
112+
description: SecretName - holding the cert, key for the service
113+
type: string
114+
type: object
103115
networkAttachments:
104116
description: |-
105117
NetworkAttachments is a list of NetworkAttachment resource names to expose

apis/instanceha/v1beta1/instanceha_types.go

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,11 @@ type InstanceHaSpec struct {
115115
// +kubebuilder:validation:Optional
116116
// Auth - Parameters related to authentication
117117
Auth AuthSpec `json:"auth,omitempty"`
118+
119+
// +kubebuilder:validation:Optional
120+
//+operator-sdk:csv:customresourcedefinitions:type=spec
121+
// MetricsTLS - Parameters related to TLS for the metrics endpoint
122+
MetricsTLS tls.SimpleService `json:"metricsTLS,omitempty"`
118123
}
119124

120125
// InstanceHaStatus defines the observed state of InstanceHa

apis/instanceha/v1beta1/zz_generated.deepcopy.go

Lines changed: 1 addition & 0 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

config/crd/bases/instanceha.openstack.org_instancehas.yaml

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,18 @@ spec:
100100
default: 7410
101101
format: int32
102102
type: integer
103+
metricsTLS:
104+
description: MetricsTLS - Parameters related to TLS for the metrics
105+
endpoint
106+
properties:
107+
caBundleSecretName:
108+
description: CaBundleSecretName - holding the CA certs in a pre-created
109+
bundle file
110+
type: string
111+
secretName:
112+
description: SecretName - holding the cert, key for the service
113+
type: string
114+
type: object
103115
networkAttachments:
104116
description: |-
105117
NetworkAttachments is a list of NetworkAttachment resource names to expose

docs/instanceha_guide.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -190,7 +190,9 @@ groups:
190190
191191
#### Scraping Configuration
192192
193-
The InstanceHA pod exposes metrics on TCP port 8080. To scrape with Prometheus, create a `PodMonitor` or `ServiceMonitor`:
193+
The InstanceHA pod exposes metrics on TCP port 8080. The infra-operator automatically creates a Kubernetes Service (`<instance-name>-metrics`) with the labels `metrics: enabled` and `service: instanceha`, which the telemetry-operator discovers and scrapes via the COO Prometheus. **No manual configuration is needed when the telemetry-operator is deployed.**
194+
195+
For environments using OpenShift user workload monitoring instead of (or in addition to) the telemetry-operator, create a `PodMonitor`:
194196

195197
```yaml
196198
apiVersion: monitoring.coreos.com/v1

docs/instanceha_prometheus.md

Lines changed: 10 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -482,37 +482,23 @@ When the [telemetry-operator](https://github.com/openstack-k8s-operators/telemet
482482
| OpenShift user workload monitoring | `prometheus-user-workload` in `openshift-user-workload-monitoring` | `thanos-querier` route in `openshift-monitoring` |
483483
| telemetry-operator (COO) | `prometheus-metric-storage` in `openstack` | `metric-storage-prometheus.openstack.svc:9090` |
484484
485-
The PodMonitor approach described above places InstanceHA metrics in the OpenShift user workload Prometheus. If you want InstanceHA metrics alongside other OpenStack metrics (Ceilometer, RabbitMQ, node-exporter, OVN) in the COO Prometheus, create a `ScrapeConfig` CR instead.
485+
### Automatic Discovery (default)
486486
487-
### Creating a ScrapeConfig for COO Prometheus
487+
The telemetry-operator **automatically discovers and scrapes InstanceHA metrics** — no manual configuration is required. The infra-operator creates a Kubernetes Service (`<instance-name>-metrics`) with the labels `metrics: enabled` and `service: instanceha`. The telemetry-operator's `MetricStorage` controller watches for Services with these labels and automatically generates a `ScrapeConfig` CR named `telemetry-instanceha` targeting port 8080.
488488
489-
The COO Prometheus only picks up CRs with the label `service: metricStorage`. Create a `ScrapeConfig` targeting the InstanceHA pod:
489+
This works the same way as the OVN metrics integration. When a `MetricStorage` CR exists in the namespace:
490490
491-
```yaml
492-
apiVersion: monitoring.rhobs/v1alpha1
493-
kind: ScrapeConfig
494-
metadata:
495-
name: instanceha-metrics
496-
namespace: openstack
497-
labels:
498-
service: metricStorage
499-
spec:
500-
scrapeInterval: 30s
501-
metricsPath: /metrics
502-
staticConfigs:
503-
- targets:
504-
- "<instanceha-pod-ip>:8080"
505-
```
491+
1. The telemetry-operator discovers the InstanceHA metrics Service via label selectors
492+
2. A `ScrapeConfig` CR is created with the target `<service-name>.<namespace>.svc:8080`
493+
3. The COO Prometheus picks up the `ScrapeConfig` and begins scraping
494+
4. If the InstanceHA Service is deleted or recreated, the `ScrapeConfig` is automatically reconciled
506495
507-
To discover the pod IP dynamically:
496+
To verify the automatic scrapeconfig was created:
508497
509498
```bash
510-
POD_IP=$(oc get pod -n openstack -l service=instanceha -o jsonpath='{.items[0].status.podIP}')
511-
echo "Target: ${POD_IP}:8080"
499+
oc get scrapeconfig -n openstack telemetry-instanceha -o yaml
512500
```
513501
514-
> **Note**: The COO `ScrapeConfig` uses static targets (IP:port), not label-based pod discovery like a `PodMonitor`. If the InstanceHA pod is rescheduled and gets a new IP, the `ScrapeConfig` must be updated. For automatic discovery, consider requesting native InstanceHA support in the telemetry-operator — the OVN metrics integration uses a label-based service discovery pattern that could be extended to InstanceHA.
515-
516502
### Alert Rules for COO Prometheus
517503
518504
The alert rules from the [Alert Rules](#alert-rules) section use the `monitoring.coreos.com/v1` API group, which is picked up by OpenShift's built-in Prometheus Operator. To use these alerts with the COO Prometheus instead, change the API group and add the `service: metricStorage` label:
@@ -532,7 +518,7 @@ spec:
532518
### Which Approach to Use
533519
534520
- **OpenShift user workload monitoring only** (no telemetry-operator): Use the PodMonitor approach from [Enabling Scraping](#enabling-scraping). This is simpler and uses automatic pod discovery.
535-
- **telemetry-operator deployed**: Use the ScrapeConfig approach if you want all OpenStack metrics in a single Prometheus. You can also use both approaches simultaneously — the PodMonitor and ScrapeConfig target different Prometheus instances and do not conflict.
521+
- **telemetry-operator deployed** (default): InstanceHA metrics are automatically scraped by the COO Prometheus alongside other OpenStack metrics (Ceilometer, RabbitMQ, node-exporter, OVN). No manual configuration needed. You can also deploy the PodMonitor simultaneously — it targets the OpenShift user workload Prometheus and does not conflict with the COO scrapeconfig.
536522
- **Querying across both**: OpenShift's `thanos-querier` route aggregates the cluster and user workload Prometheus instances. The COO Prometheus is separate and must be queried directly at `metric-storage-prometheus.openstack.svc:9090`.
537523
538524
---

internal/controller/instanceha/instanceha_controller.go

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ import (
5555

5656
commondeployment "github.com/openstack-k8s-operators/lib-common/modules/common/deployment"
5757
"github.com/openstack-k8s-operators/lib-common/modules/common/secret"
58+
commonservice "github.com/openstack-k8s-operators/lib-common/modules/common/service"
5859
"github.com/openstack-k8s-operators/lib-common/modules/common/util"
5960

6061
networkv1 "github.com/k8snetworkplumbingwg/network-attachment-definition-client/pkg/apis/k8s.cni.cncf.io/v1"
@@ -80,6 +81,7 @@ func (r *Reconciler) GetLogger(ctx context.Context) logr.Logger {
8081
// +kubebuilder:rbac:groups=instanceha.openstack.org,resources=instancehas/finalizers,verbs=update;patch
8182
// +kubebuilder:rbac:groups=core,resources=configmaps,verbs=get;list;watch;
8283
// +kubebuilder:rbac:groups=core,resources=secrets,verbs=get;list;watch;
84+
// +kubebuilder:rbac:groups=core,resources=services,verbs=get;list;watch;create;update;patch;delete
8385
// +kubebuilder:rbac:groups=k8s.cni.cncf.io,resources=network-attachment-definitions,verbs=get;list;watch
8486
// service account, role, rolebinding
8587
// +kubebuilder:rbac:groups="",resources=serviceaccounts,verbs=get;list;watch;create;update;patch
@@ -164,6 +166,7 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ct
164166
condition.UnknownCondition(condition.RoleReadyCondition, condition.InitReason, condition.RoleReadyInitMessage),
165167
condition.UnknownCondition(condition.RoleBindingReadyCondition, condition.InitReason, condition.RoleBindingReadyInitMessage),
166168
condition.UnknownCondition(condition.NetworkAttachmentsReadyCondition, condition.InitReason, condition.NetworkAttachmentsReadyInitMessage),
169+
condition.UnknownCondition(condition.CreateServiceReadyCondition, condition.InitReason, condition.CreateServiceReadyInitMessage),
167170
)
168171
instance.Status.Conditions.Init(&cl)
169172
instance.Status.ObservedGeneration = instance.Generation
@@ -369,8 +372,6 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ct
369372
)
370373
if err != nil {
371374
if k8s_errors.IsNotFound(err) {
372-
// Since the CA cert secret should have been manually created by the user and provided in the spec,
373-
// we treat this as a warning because it means that the service will not be able to start.
374375
instance.Status.Conditions.Set(condition.FalseCondition(
375376
condition.TLSInputReadyCondition,
376377
condition.ErrorReason,
@@ -390,6 +391,28 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ct
390391
configVars[instance.Spec.CaBundleSecretName] = env.SetValue(secretHash)
391392
}
392393

394+
if instance.Spec.MetricsTLS.Enabled() {
395+
hash, err := instance.Spec.MetricsTLS.ValidateCertSecret(ctx, helper, instance.Namespace)
396+
if err != nil {
397+
if k8s_errors.IsNotFound(err) {
398+
instance.Status.Conditions.Set(condition.FalseCondition(
399+
condition.TLSInputReadyCondition,
400+
condition.RequestedReason,
401+
condition.SeverityInfo,
402+
condition.TLSInputReadyWaitingMessage, err.Error()))
403+
return ctrl.Result{}, nil
404+
}
405+
instance.Status.Conditions.Set(condition.FalseCondition(
406+
condition.TLSInputReadyCondition,
407+
condition.ErrorReason,
408+
condition.SeverityWarning,
409+
condition.TLSInputErrorMessage,
410+
err.Error()))
411+
return ctrl.Result{}, err
412+
}
413+
configVars[tls.TLSHashName+"_metrics"] = env.SetValue(hash)
414+
}
415+
393416
// all cert input checks out so report InputReady
394417
instance.Status.Conditions.MarkTrue(condition.TLSInputReadyCondition, condition.InputReadyMessage)
395418

@@ -505,6 +528,28 @@ func (r *Reconciler) Reconcile(ctx context.Context, req ctrl.Request) (result ct
505528
// remove LastAppliedTopology from the .Status
506529
instance.Status.LastAppliedTopology = nil
507530
}
531+
commonsvc, err := commonservice.NewService(instanceha.MetricsService(instance), time.Duration(5)*time.Second, nil)
532+
if err != nil {
533+
instance.Status.Conditions.Set(condition.FalseCondition(
534+
condition.CreateServiceReadyCondition,
535+
condition.ErrorReason,
536+
condition.SeverityWarning,
537+
condition.CreateServiceReadyErrorMessage,
538+
err.Error()))
539+
return ctrl.Result{}, err
540+
}
541+
sres, serr := commonsvc.CreateOrPatch(ctx, helper)
542+
if serr != nil {
543+
instance.Status.Conditions.Set(condition.FalseCondition(
544+
condition.CreateServiceReadyCondition,
545+
condition.ErrorReason,
546+
condition.SeverityWarning,
547+
condition.CreateServiceReadyErrorMessage,
548+
serr.Error()))
549+
return sres, serr
550+
}
551+
instance.Status.Conditions.MarkTrue(condition.CreateServiceReadyCondition, condition.CreateServiceReadyMessage)
552+
508553
deployment := commondeployment.NewDeployment(instanceha.Deployment(instance, deploymentLabels, serviceAnnotations, cloud, configVarsHash, containerImage, topology, acSecretName), time.Duration(5)*time.Second)
509554
sfres, sferr := deployment.CreateOrPatch(ctx, helper)
510555
if sferr != nil {
@@ -558,6 +603,7 @@ const (
558603
instanceHaConfigMapField = ".spec.instanceHaConfigMap"
559604
topologyField = ".spec.topologyRef.Name"
560605
acSecretField = ".spec.auth.applicationCredentialSecret" // #nosec G101
606+
metricsTLSField = ".spec.metricsTLS.secretName" // #nosec G101
561607
)
562608

563609
var allWatchFields = []string{
@@ -568,6 +614,7 @@ var allWatchFields = []string{
568614
instanceHaConfigMapField,
569615
topologyField,
570616
acSecretField,
617+
metricsTLSField,
571618
}
572619

573620
// SetupWithManager sets up the controller with the Manager.
@@ -649,9 +696,21 @@ func (r *Reconciler) SetupWithManager(mgr ctrl.Manager) error {
649696
return err
650697
}
651698

699+
// index metricsTLSField
700+
if err := mgr.GetFieldIndexer().IndexField(context.Background(), &instancehav1.InstanceHa{}, metricsTLSField, func(rawObj client.Object) []string {
701+
cr := rawObj.(*instancehav1.InstanceHa)
702+
if cr.Spec.MetricsTLS.SecretName == nil {
703+
return nil
704+
}
705+
return []string{*cr.Spec.MetricsTLS.SecretName}
706+
}); err != nil {
707+
return err
708+
}
709+
652710
return ctrl.NewControllerManagedBy(mgr).
653711
For(&instancehav1.InstanceHa{}).
654712
Owns(&appsv1.Deployment{}).
713+
Owns(&corev1.Service{}).
655714
Owns(&corev1.ServiceAccount{}).
656715
Owns(&rbacv1.Role{}).
657716
Owns(&rbacv1.RoleBinding{}).

internal/instanceha/const.go

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
package instanceha
2+
3+
const (
4+
// MetricsCertPath is the path to the metrics certificate file
5+
MetricsCertPath = "/etc/pki/tls/certs/metrics.crt"
6+
// MetricsKeyPath is the path to the metrics private key file
7+
MetricsKeyPath = "/etc/pki/tls/private/metrics.key"
8+
// DefaultMetricsCertSecret is the default secret name for the metrics TLS certificate
9+
DefaultMetricsCertSecret = "cert-instanceha-metrics" //nolint:gosec
10+
)

internal/instanceha/funcs.go

Lines changed: 22 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ import (
1717
instancehav1 "github.com/openstack-k8s-operators/infra-operator/apis/instanceha/v1beta1"
1818
topologyv1 "github.com/openstack-k8s-operators/infra-operator/apis/topology/v1beta1"
1919
env "github.com/openstack-k8s-operators/lib-common/modules/common/env"
20+
"github.com/openstack-k8s-operators/lib-common/modules/common/tls"
2021

2122
"fmt"
2223
appsv1 "k8s.io/api/apps/v1"
@@ -103,6 +104,27 @@ func Deployment(
103104
volumeMounts = append(volumeMounts, instance.Spec.CreateVolumeMounts(nil)...)
104105
}
105106

107+
// add metrics TLS cert if defined
108+
if instance.Spec.MetricsTLS.Enabled() {
109+
certSecretName := DefaultMetricsCertSecret
110+
if instance.Spec.MetricsTLS.SecretName != nil && *instance.Spec.MetricsTLS.SecretName != "" {
111+
certSecretName = *instance.Spec.MetricsTLS.SecretName
112+
}
113+
metricsSvc := tls.Service{
114+
SecretName: certSecretName,
115+
CertMount: ptr.To(MetricsCertPath),
116+
KeyMount: ptr.To(MetricsKeyPath),
117+
}
118+
volumes = append(volumes, metricsSvc.CreateVolume("metrics-certs"))
119+
volumeMounts = append(volumeMounts, metricsSvc.CreateVolumeMounts("metrics-certs")...)
120+
121+
envVars["METRICS_TLS_CERT"] = env.SetValue(MetricsCertPath)
122+
envVars["METRICS_TLS_KEY"] = env.SetValue(MetricsKeyPath)
123+
124+
livenessProbe.HTTPGet.Scheme = corev1.URISchemeHTTPS
125+
readinessProbe.HTTPGet.Scheme = corev1.URISchemeHTTPS
126+
}
127+
106128
dep := &appsv1.Deployment{
107129
ObjectMeta: metav1.ObjectMeta{
108130
Name: instance.Name,

internal/instanceha/service.go

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
/*
2+
Licensed under the Apache License, Version 2.0 (the "License");
3+
you may not use this file except in compliance with the License.
4+
You may obtain a copy of the License at
5+
http://www.apache.org/licenses/LICENSE-2.0
6+
Unless required by applicable law or agreed to in writing, software
7+
distributed under the License is distributed on an "AS IS" BASIS,
8+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
9+
See the License for the specific language governing permissions and
10+
limitations under the License.
11+
*/
12+
13+
package instanceha
14+
15+
import (
16+
instancehav1 "github.com/openstack-k8s-operators/infra-operator/apis/instanceha/v1beta1"
17+
common "github.com/openstack-k8s-operators/lib-common/modules/common"
18+
labels "github.com/openstack-k8s-operators/lib-common/modules/common/labels"
19+
service "github.com/openstack-k8s-operators/lib-common/modules/common/service"
20+
corev1 "k8s.io/api/core/v1"
21+
)
22+
23+
// MetricsService exposes the InstanceHA metrics endpoint for Prometheus scraping
24+
func MetricsService(instance *instancehav1.InstanceHa) *corev1.Service {
25+
svcLabels := labels.GetLabels(instance, labels.GetGroupLabel("instanceha"), map[string]string{
26+
common.AppSelector: "instanceha",
27+
"metrics": "enabled",
28+
})
29+
30+
details := &service.GenericServiceDetails{
31+
Name: instance.GetName() + "-metrics",
32+
Namespace: instance.GetNamespace(),
33+
Labels: svcLabels,
34+
Selector: map[string]string{
35+
common.AppSelector: "instanceha",
36+
},
37+
Port: service.GenericServicePort{
38+
Name: "metrics",
39+
Port: 8080,
40+
Protocol: "TCP",
41+
},
42+
}
43+
44+
return service.GenericService(details)
45+
}

0 commit comments

Comments
 (0)