Skip to content

Commit 62fcab1

Browse files
committed
improve: remove otel from metrics e2e because of stability issues
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
1 parent 5daec11 commit 62fcab1

File tree

8 files changed

+125
-270
lines changed

8 files changed

+125
-270
lines changed

docs/content/en/blog/releases/v5-3-release.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ A ready-to-use **Grafana dashboard** is included at
9898

9999
The
100100
[`metrics-processing` sample operator](https://github.com/java-operator-sdk/java-operator-sdk/tree/main/sample-operators/metrics-processing)
101-
provides a complete end-to-end setup with Prometheus, Grafana, and an OpenTelemetry Collector,
101+
provides a complete end-to-end setup with Prometheus and Grafana,
102102
installable via `observability/install-observability.sh`. This is a good starting point for
103103
verifying metrics in a real cluster.
104104

docs/content/en/docs/documentation/observability.md

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -128,9 +128,7 @@ All meters use `controller.name` as their primary tag. Counters optionally carry
128128
\* `namespace` tag is only included when `withNamespaceAsTag()` is enabled.
129129

130130
The execution timer uses explicit boundaries (10ms, 50ms, 100ms, 250ms, 500ms, 1s, 2s, 5s, 10s, 30s) to ensure
131-
compatibility with `histogram_quantile()` queries in Prometheus. This is important when using the OpenTelemetry Protocol (OTLP) registry, where
132-
`publishPercentileHistogram()` would otherwise produce Base2 Exponential Histograms that are incompatible with classic
133-
`_bucket` queries.
131+
compatibility with `histogram_quantile()` queries in Prometheus.
134132

135133
> **Note on Prometheus metric names**: The exact Prometheus metric name suffix depends on the `MeterRegistry` in use.
136134
> For `PrometheusMeterRegistry` the timer is exposed as `reconciliations_execution_duration_seconds_*`. For
@@ -144,8 +142,8 @@ A ready-to-use Grafana dashboard is available at
144142
It visualizes all of the metrics listed above, including reconciliation throughput, error rates, queue depth, active
145143
executions, resource counts, and execution duration histograms and heatmaps.
146144

147-
The dashboard is designed to work with metrics exported via OpenTelemetry Collector to Prometheus, as set up by the
148-
observability sample (see below).
145+
The dashboard is designed to work with metrics scraped directly by Prometheus from the operator's `/metrics` endpoint,
146+
as set up by the observability sample (see below).
149147

150148
#### Exploring metrics end-to-end
151149

@@ -155,7 +153,7 @@ includes a full end-to-end test,
155153
[`MetricsHandlingE2E`](https://github.com/java-operator-sdk/java-operator-sdk/blob/main/sample-operators/metrics-processing/src/test/java/io/javaoperatorsdk/operator/sample/metrics/MetricsHandlingE2E.java),
156154
that:
157155

158-
1. Installs a local observability stack (Prometheus, Grafana, OpenTelemetry Collector) via
156+
1. Installs a local observability stack (Prometheus, Grafana) via
159157
`observability/install-observability.sh`. That imports also the Grafana dashboards.
160158
2. Runs two reconcilers that produce both successful and failing reconciliations over a sustained period
161159
3. Verifies that the expected metrics appear in Prometheus

observability/install-observability.sh

Lines changed: 7 additions & 156 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ NC='\033[0m' # No Color
2525

2626
echo -e "${GREEN}========================================${NC}"
2727
echo -e "${GREEN}Installing Observability Stack${NC}"
28-
echo -e "${GREEN}OpenTelemetry + Prometheus + Grafana${NC}"
28+
echo -e "${GREEN}Prometheus + Grafana${NC}"
2929
echo -e "${GREEN}========================================${NC}"
3030

3131
# Check if helm is installed, download locally if not
@@ -47,51 +47,15 @@ fi
4747

4848
# Add Helm repositories
4949
echo -e "\n${YELLOW}Adding Helm repositories...${NC}"
50-
helm repo add jetstack https://charts.jetstack.io
51-
helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
5250
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
5351
helm repo update
5452
echo -e "${GREEN}✓ Helm repositories added${NC}"
5553

56-
echo -e "\n${GREEN}========================================${NC}"
57-
echo -e "${GREEN}Installing Components (Parallel)${NC}"
58-
echo -e "${GREEN}========================================${NC}"
59-
echo -e "The following will be installed:"
60-
echo -e " • cert-manager"
61-
echo -e " • OpenTelemetry Operator"
62-
echo -e " • Prometheus & Grafana"
63-
echo -e " • OpenTelemetry Collector"
64-
echo -e " • Service Monitors"
65-
echo -e "\n${YELLOW}All resources will be applied first, then we'll wait for them to become ready.${NC}\n"
66-
67-
# Install cert-manager (required for OpenTelemetry Operator)
68-
echo -e "\n${YELLOW}Installing cert-manager...${NC}"
69-
helm upgrade --install cert-manager jetstack/cert-manager \
70-
--namespace cert-manager \
71-
--create-namespace \
72-
--set crds.enabled=true
73-
echo -e "${GREEN}✓ cert-manager installation or upgrade started${NC}"
74-
7554
# Create observability namespace
7655
echo -e "\n${YELLOW}Creating observability namespace...${NC}"
7756
kubectl create namespace observability --dry-run=client -o yaml | kubectl apply -f -
7857
echo -e "${GREEN}✓ observability namespace ready${NC}"
7958

80-
# Install OpenTelemetry Operator
81-
echo -e "\n${YELLOW}Installing OpenTelemetry Operator...${NC}"
82-
83-
if helm list -n observability | grep -q opentelemetry-operator; then
84-
echo -e "${YELLOW}OpenTelemetry Operator already installed, upgrading...${NC}"
85-
helm upgrade opentelemetry-operator open-telemetry/opentelemetry-operator \
86-
--namespace observability \
87-
--set "manager.collectorImage.repository=otel/opentelemetry-collector-contrib"
88-
else
89-
helm install opentelemetry-operator open-telemetry/opentelemetry-operator \
90-
--namespace observability \
91-
--set "manager.collectorImage.repository=otel/opentelemetry-collector-contrib"
92-
fi
93-
echo -e "${GREEN}✓ OpenTelemetry Operator installation started${NC}"
94-
9559
# Install kube-prometheus-stack (includes Prometheus + Grafana)
9660
echo -e "\n${YELLOW}Installing Prometheus and Grafana stack...${NC}"
9761
if helm list -n observability | grep -q kube-prometheus-stack; then
@@ -110,115 +74,12 @@ else
11074
fi
11175
echo -e "${GREEN}✓ Prometheus and Grafana installation started${NC}"
11276

113-
# Create OpenTelemetry Collector instance
114-
echo -e "\n${YELLOW}Creating OpenTelemetry Collector...${NC}"
115-
cat <<EOF | kubectl apply -f -
116-
apiVersion: opentelemetry.io/v1beta1
117-
kind: OpenTelemetryCollector
118-
metadata:
119-
name: otel-collector
120-
namespace: observability
121-
spec:
122-
mode: deployment
123-
config:
124-
receivers:
125-
otlp:
126-
protocols:
127-
grpc:
128-
endpoint: 0.0.0.0:4317
129-
http:
130-
endpoint: 0.0.0.0:4318
131-
prometheus:
132-
config:
133-
scrape_configs:
134-
- job_name: 'otel-collector'
135-
scrape_interval: 10s
136-
static_configs:
137-
- targets: ['0.0.0.0:8888']
138-
139-
processors:
140-
batch:
141-
timeout: 10s
142-
memory_limiter:
143-
check_interval: 1s
144-
limit_percentage: 75
145-
spike_limit_percentage: 15
146-
147-
exporters:
148-
prometheus:
149-
endpoint: "0.0.0.0:8889"
150-
namespace: ""
151-
send_timestamps: true
152-
metric_expiration: 5m
153-
resource_to_telemetry_conversion:
154-
enabled: true
155-
debug:
156-
verbosity: detailed
157-
sampling_initial: 5
158-
sampling_thereafter: 200
159-
160-
service:
161-
pipelines:
162-
metrics:
163-
receivers: [otlp, prometheus]
164-
processors: [memory_limiter, batch]
165-
exporters: [prometheus, debug]
166-
traces:
167-
receivers: [otlp]
168-
processors: [memory_limiter, batch]
169-
exporters: [debug]
170-
EOF
171-
echo -e "${GREEN}✓ OpenTelemetry Collector created${NC}"
172-
173-
# Create ServiceMonitor for OpenTelemetry Collector
174-
echo -e "\n${YELLOW}Creating ServiceMonitor for OpenTelemetry...${NC}"
175-
cat <<EOF | kubectl apply -f -
176-
apiVersion: v1
177-
kind: Service
178-
metadata:
179-
name: otel-collector-prometheus
180-
namespace: observability
181-
labels:
182-
app: otel-collector
183-
spec:
184-
ports:
185-
- name: prometheus
186-
port: 8889
187-
targetPort: 8889
188-
protocol: TCP
189-
selector:
190-
app.kubernetes.io/name: otel-collector-collector
191-
---
192-
apiVersion: monitoring.coreos.com/v1
193-
kind: ServiceMonitor
194-
metadata:
195-
name: otel-collector
196-
namespace: observability
197-
labels:
198-
app: otel-collector
199-
release: kube-prometheus-stack
200-
spec:
201-
jobLabel: app
202-
selector:
203-
matchLabels:
204-
app: otel-collector
205-
endpoints:
206-
- port: prometheus
207-
interval: 30s
208-
EOF
209-
echo -e "${GREEN}✓ ServiceMonitor created${NC}"
210-
21177
# Wait for all pods to be ready
21278
echo -e "\n${GREEN}========================================${NC}"
21379
echo -e "${GREEN}All resources have been applied!${NC}"
21480
echo -e "${GREEN}========================================${NC}"
21581
echo -e "\n${YELLOW}Waiting for all pods to become ready (this may take 2-3 minutes)...${NC}"
21682

217-
# Wait for cert-manager pods
218-
echo -e "${YELLOW}Checking cert-manager pods...${NC}"
219-
kubectl wait --for=condition=ready pod --all -n cert-manager --timeout=300s 2>/dev/null || echo -e "${YELLOW}cert-manager already running or skipped${NC}"
220-
221-
# Wait for observability pods
22283
echo -e "${YELLOW}Checking observability pods...${NC}"
22384
kubectl wait --for=condition=ready pod --all -n observability --timeout=300s
22485

@@ -276,18 +137,11 @@ echo -e "\n${YELLOW}Prometheus:${NC}"
276137
echo -e " Access with: ${GREEN}kubectl port-forward -n observability svc/kube-prometheus-stack-prometheus 9090:9090${NC}"
277138
echo -e " Then open: ${GREEN}http://localhost:9090${NC}"
278139

279-
echo -e "\n${YELLOW}OpenTelemetry Collector:${NC}"
280-
echo -e " OTLP gRPC endpoint: ${GREEN}otel-collector-collector.observability.svc.cluster.local:4317${NC}"
281-
echo -e " OTLP HTTP endpoint: ${GREEN}otel-collector-collector.observability.svc.cluster.local:4318${NC}"
282-
echo -e " Prometheus metrics: ${GREEN}http://otel-collector-prometheus.observability.svc.cluster.local:8889/metrics${NC}"
283-
284-
echo -e "\n${YELLOW}Configure your Java Operator to use OpenTelemetry:${NC}"
285-
echo -e " Add dependency: ${GREEN}io.javaoperatorsdk:operator-framework-opentelemetry-support${NC}"
286-
echo -e " Set environment variables:"
287-
echo -e " ${GREEN}OTEL_SERVICE_NAME=your-operator-name${NC}"
288-
echo -e " ${GREEN}OTEL_EXPORTER_OTLP_ENDPOINT=http://otel-collector-collector.observability.svc.cluster.local:4318${NC}"
289-
echo -e " ${GREEN}OTEL_METRICS_EXPORTER=otlp${NC}"
290-
echo -e " ${GREEN}OTEL_TRACES_EXPORTER=otlp${NC}"
140+
echo -e "\n${YELLOW}Configure your Java Operator metrics:${NC}"
141+
echo -e " Add dependency: ${GREEN}io.javaoperatorsdk:micrometer-support${NC}"
142+
echo -e " Add dependency: ${GREEN}io.micrometer:micrometer-registry-prometheus${NC}"
143+
echo -e " Expose a ${GREEN}/metrics${NC} endpoint using PrometheusMeterRegistry"
144+
echo -e " Create a ServiceMonitor to let Prometheus scrape your operator"
291145

292146
echo -e "\n${GREEN}========================================${NC}"
293147
echo -e "${GREEN}Grafana Dashboards${NC}"
@@ -303,10 +157,7 @@ echo -e "\n${YELLOW}Note:${NC} Dashboards may take 30-60 seconds to appear in Gr
303157

304158
echo -e "\n${YELLOW}To uninstall:${NC}"
305159
echo -e " kubectl delete configmap -n observability jvm-metrics-dashboard josdk-operator-metrics-dashboard"
306-
echo -e " kubectl delete -n observability OpenTelemetryCollector otel-collector"
307160
echo -e " helm uninstall -n observability kube-prometheus-stack"
308-
echo -e " helm uninstall -n observability opentelemetry-operator"
309-
echo -e " helm uninstall -n cert-manager cert-manager"
310-
echo -e " kubectl delete namespace observability cert-manager"
161+
echo -e " kubectl delete namespace observability"
311162

312163
echo -e "\n${GREEN}Done!${NC}"

sample-operators/metrics-processing/k8s/operator.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,10 @@ spec:
3939
- name: operator
4040
image: metrics-processing-operator
4141
imagePullPolicy: Never
42+
ports:
43+
- name: metrics
44+
containerPort: 8080
45+
protocol: TCP
4246

4347
---
4448
apiVersion: rbac.authorization.k8s.io/v1

sample-operators/metrics-processing/pom.xml

Lines changed: 1 addition & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,7 @@
6060
</dependency>
6161
<dependency>
6262
<groupId>io.micrometer</groupId>
63-
<artifactId>micrometer-registry-otlp</artifactId>
64-
<version>${micrometer-core.version}</version>
65-
</dependency>
66-
<dependency>
67-
<groupId>org.yaml</groupId>
68-
<artifactId>snakeyaml</artifactId>
69-
<version>2.6</version>
63+
<artifactId>micrometer-registry-prometheus</artifactId>
7064
</dependency>
7165
<dependency>
7266
<groupId>org.apache.logging.log4j</groupId>

0 commit comments

Comments
 (0)