You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add metrics Service and TLS support for InstanceHA
- Add a Kubernetes Service exposing the InstanceHA Prometheus metrics
endpoint, with labels for automatic discovery by the telemetry
operator's ScrapeConfig.
- Add MetricsTLS field (tls.SimpleService) to the InstanceHa API,
allowing TLS certificate configuration for the metrics endpoint.
- Mount TLS certificate secret into the deployment and pass cert/key
paths via environment variables when MetricsTLS is enabled.
- Validate the MetricsTLS secret in the controller with hash tracking
for automatic pod rollout on certificate rotation.
- Add field indexer for the metrics TLS secret so the controller
reconciles on secret changes.
- Update the Python health/metrics server to wrap the HTTP socket with
TLS when certificate environment variables are present.
- Add RBAC annotation for Services to the InstanceHA controller.
- Add functional tests for the metrics Service creation.
- Update documentation for Prometheus metrics integration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy file name to clipboardExpand all lines: docs/instanceha_guide.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -190,7 +190,9 @@ groups:
190
190
191
191
#### Scraping Configuration
192
192
193
-
The InstanceHA pod exposes metrics on TCP port 8080. To scrape with Prometheus, create a `PodMonitor` or `ServiceMonitor`:
193
+
The InstanceHA pod exposes metrics on TCP port 8080. The infra-operator automatically creates a Kubernetes Service (`<instance-name>-metrics`) with the labels `metrics: enabled` and `service: instanceha`, which the telemetry-operator discovers and scrapes via the COO Prometheus. **No manual configuration is needed when the telemetry-operator is deployed.**
194
+
195
+
For environments using OpenShift user workload monitoring instead of (or in addition to) the telemetry-operator, create a `PodMonitor`:
Copy file name to clipboardExpand all lines: docs/instanceha_prometheus.md
+53-31Lines changed: 53 additions & 31 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,8 @@ InstanceHA exposes Prometheus metrics at `:8080/metrics` on the workload pod, co
6
6
7
7
The metrics are served by the `prometheus_client` Python library on the same HTTP server used for liveness and readiness probes. No sidecar or additional container is needed.
8
8
9
+
When pod-level TLS is enabled, the metrics endpoint serves over **HTTPS**. The openstack-operator creates a cert-manager Certificate producing a TLS secret (`cert-instanceha-metrics`), which the infra-operator mounts into the pod. The Python HTTP server wraps its socket with TLS automatically when the certificate files are present.
10
+
9
11
---
10
12
11
13
## Prerequisites
@@ -17,6 +19,34 @@ The metrics are served by the `prometheus_client` Python library on the same HTT
17
19
18
20
---
19
21
22
+
## TLS Configuration
23
+
24
+
When `OpenStackControlPlane` has pod-level TLS enabled (`spec.tls.podLevel.enabled: true`), the openstack-operator automatically provisions a cert-manager Certificate for the InstanceHA metrics endpoint. This produces a Kubernetes TLS secret (`cert-instanceha-metrics`) containing `tls.crt`, `tls.key`, and `ca.crt`.
25
+
26
+
The infra-operator InstanceHA controller **auto-detects** this secret: if the default secret `cert-instanceha-metrics` exists in the namespace, TLS is enabled automatically without any configuration on the InstanceHa CR. The controller:
27
+
1. Validates the TLS secret exists and is well-formed
28
+
2. Mounts the certificate at `/etc/pki/tls/certs/metrics.crt` and the key at `/etc/pki/tls/private/metrics.key`
29
+
3. Sets `METRICS_TLS_CERT` and `METRICS_TLS_KEY` environment variables
30
+
4. Switches liveness and readiness probes to HTTPS
31
+
32
+
The Python process detects these environment variables and wraps the HTTP server socket with TLS. A single wildcard certificate (`*.NAMESPACE.svc`) covers all InstanceHA instances in a namespace.
33
+
34
+
To use a custom TLS secret instead of the auto-detected default, set `metricsTLS.secretName` in the InstanceHa CR:
35
+
36
+
```yaml
37
+
apiVersion: instanceha.openstack.org/v1beta1
38
+
kind: InstanceHa
39
+
metadata:
40
+
name: instanceha
41
+
spec:
42
+
metricsTLS:
43
+
secretName: my-custom-metrics-cert
44
+
```
45
+
46
+
When the telemetry-operator is deployed, its `ScrapeConfig` automatically switches to `scheme: HTTPS` with the appropriate TLS configuration when `PrometheusTLS` is enabled — no manual changes are needed.
@@ -482,37 +518,23 @@ When the [telemetry-operator](https://github.com/openstack-k8s-operators/telemet
482
518
| OpenShift user workload monitoring | `prometheus-user-workload` in `openshift-user-workload-monitoring` | `thanos-querier` route in `openshift-monitoring` |
483
519
| telemetry-operator (COO) | `prometheus-metric-storage` in `openstack` | `metric-storage-prometheus.openstack.svc:9090` |
484
520
485
-
The PodMonitor approach described above places InstanceHA metrics in the OpenShift user workload Prometheus. If you want InstanceHA metrics alongside other OpenStack metrics (Ceilometer, RabbitMQ, node-exporter, OVN) in the COO Prometheus, create a `ScrapeConfig` CR instead.
521
+
### Automatic Discovery (default)
486
522
487
-
### Creating a ScrapeConfig for COO Prometheus
523
+
The telemetry-operator **automatically discovers and scrapes InstanceHA metrics** — no manual configuration is required. The infra-operator creates a Kubernetes Service (`<instance-name>-metrics`) with the labels `metrics: enabled` and `service: instanceha`. The telemetry-operator's `MetricStorage` controller watches for Services with these labels and automatically generates a `ScrapeConfig` CR named `telemetry-instanceha` targeting port 8080.
488
524
489
-
The COO Prometheus only picks up CRs with the label `service: metricStorage`. Create a `ScrapeConfig` targeting the InstanceHA pod:
525
+
This works the same way as the OVN metrics integration. When a `MetricStorage` CR exists in the namespace:
490
526
491
-
```yaml
492
-
apiVersion: monitoring.rhobs/v1alpha1
493
-
kind: ScrapeConfig
494
-
metadata:
495
-
name: instanceha-metrics
496
-
namespace: openstack
497
-
labels:
498
-
service: metricStorage
499
-
spec:
500
-
scrapeInterval: 30s
501
-
metricsPath: /metrics
502
-
staticConfigs:
503
-
- targets:
504
-
- "<instanceha-pod-ip>:8080"
505
-
```
527
+
1. The telemetry-operator discovers the InstanceHA metrics Service via label selectors
528
+
2. A `ScrapeConfig` CR is created with the target `<service-name>.<namespace>.svc:8080`
529
+
3. The COO Prometheus picks up the `ScrapeConfig` and begins scraping
530
+
4. If the InstanceHA Service is deleted or recreated, the `ScrapeConfig` is automatically reconciled
506
531
507
-
To discover the pod IP dynamically:
532
+
To verify the automatic scrapeconfig was created:
508
533
509
534
```bash
510
-
POD_IP=$(oc get pod -n openstack -l service=instanceha -o jsonpath='{.items[0].status.podIP}')
511
-
echo "Target: ${POD_IP}:8080"
535
+
oc get scrapeconfig -n openstack telemetry-instanceha -o yaml
512
536
```
513
537
514
-
> **Note**: The COO `ScrapeConfig` uses static targets (IP:port), not label-based pod discovery like a `PodMonitor`. If the InstanceHA pod is rescheduled and gets a new IP, the `ScrapeConfig` must be updated. For automatic discovery, consider requesting native InstanceHA support in the telemetry-operator — the OVN metrics integration uses a label-based service discovery pattern that could be extended to InstanceHA.
515
-
516
538
### Alert Rules for COO Prometheus
517
539
518
540
The alert rules from the [Alert Rules](#alert-rules) section use the `monitoring.coreos.com/v1` API group, which is picked up by OpenShift's built-in Prometheus Operator. To use these alerts with the COO Prometheus instead, change the API group and add the `service: metricStorage` label:
@@ -532,7 +554,7 @@ spec:
532
554
### Which Approach to Use
533
555
534
556
- **OpenShift user workload monitoring only** (no telemetry-operator): Use the PodMonitor approach from [Enabling Scraping](#enabling-scraping). This is simpler and uses automatic pod discovery.
535
-
- **telemetry-operator deployed**: Use the ScrapeConfig approach if you want all OpenStack metrics in a single Prometheus. You can also use both approaches simultaneously — the PodMonitor and ScrapeConfig target different Prometheus instances and do not conflict.
557
+
- **telemetry-operator deployed** (default): InstanceHA metrics are automatically scraped by the COO Prometheus alongside other OpenStack metrics (Ceilometer, RabbitMQ, node-exporter, OVN). No manual configuration needed. You can also deploy the PodMonitor simultaneously — it targets the OpenShift user workload Prometheus and does not conflict with the COO scrapeconfig.
536
558
- **Querying across both**: OpenShift's `thanos-querier` route aggregates the cluster and user workload Prometheus instances. The COO Prometheus is separate and must be queried directly at `metric-storage-prometheus.openstack.svc:9090`.
0 commit comments