Skip to content

Commit 374e732

Browse files
committed
wip
Signed-off-by: Attila Mészáros <a_meszaros@apple.com>
1 parent c8a444b commit 374e732

File tree

9 files changed

+315
-1
lines changed

9 files changed

+315
-1
lines changed
Lines changed: 104 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,104 @@
1+
---
2+
title: Health Probes
3+
weight: 75
4+
---
5+
6+
Operators running in Kubernetes should expose health probe endpoints so that the kubelet can detect startup
7+
failures and runtime degradation. JOSDK provides the building blocks through its
8+
[`RuntimeInfo`](https://github.com/java-operator-sdk/java-operator-sdk/blob/main/operator-framework-core/src/main/java/io/javaoperatorsdk/operator/RuntimeInfo.java)
9+
API.
10+
11+
## RuntimeInfo
12+
13+
`RuntimeInfo` is available via `operator.getRuntimeInfo()` and exposes:
14+
15+
| Method | Purpose |
16+
|---|---|
17+
| `isStarted()` | `true` once the operator and all its controllers have fully started |
18+
| `allEventSourcesAreHealthy()` | `true` when every registered event source (informers, polling sources, etc.) reports a healthy status |
19+
| `unhealthyEventSources()` | returns a map of controller name → unhealthy event sources, useful for diagnostics |
20+
21+
These map naturally to Kubernetes probes:
22+
23+
- **Startup probe**`isStarted()` — fails until all informers have synced and the operator is ready to
24+
reconcile.
25+
- **Readiness probe**`allEventSourcesAreHealthy()` — fails if an informer loses its watch connection
26+
or any event source reports an unhealthy status.
27+
28+
## Setting Up Probe Endpoints
29+
30+
The example below uses [Jetty](https://eclipse.dev/jetty/) to expose health probe endpoints. Any HTTP
31+
server library works — the key is calling the `RuntimeInfo` methods to determine the response code.
32+
33+
```java
34+
import org.eclipse.jetty.server.Server;
35+
import org.eclipse.jetty.server.handler.ContextHandler;
36+
import org.eclipse.jetty.server.handler.ContextHandlerCollection;
37+
38+
Operator operator = new Operator();
39+
operator.register(new MyReconciler());
40+
operator.start();
41+
42+
var startup = new ContextHandler(new StartupHandler(operator), "/startup");
43+
var readiness = new ContextHandler(new ReadinessHandler(operator), "/ready");
44+
Server server = new Server(8080);
45+
server.setHandler(new ContextHandlerCollection(startup, readiness));
46+
server.start();
47+
```
48+
49+
Where `StartupHandler` and `ReadinessHandler` extend `org.eclipse.jetty.server.Handler.Abstract` and
50+
check `operator.getRuntimeInfo().isStarted()` and
51+
`operator.getRuntimeInfo().allEventSourcesAreHealthy()` respectively.
52+
53+
See the
54+
[`operations` sample operator](https://github.com/java-operator-sdk/java-operator-sdk/tree/main/sample-operators/operations)
55+
for a complete working example.
56+
57+
## Kubernetes Deployment Configuration
58+
59+
Once your operator exposes probe endpoints, configure them in your Deployment manifest:
60+
61+
```yaml
62+
containers:
63+
- name: operator
64+
ports:
65+
- name: probes
66+
containerPort: 8080
67+
startupProbe:
68+
httpGet:
69+
path: /startup
70+
port: probes
71+
initialDelaySeconds: 1
72+
periodSeconds: 3
73+
failureThreshold: 20
74+
readinessProbe:
75+
httpGet:
76+
path: /ready
77+
port: probes
78+
initialDelaySeconds: 5
79+
periodSeconds: 5
80+
failureThreshold: 3
81+
```
82+
83+
The startup probe gives the operator time to start (up to ~60 s with the settings above). Once the startup
84+
probe succeeds, the readiness probe takes over and will mark the pod as not-ready if any event source
85+
becomes unhealthy.
86+
87+
## Helm Chart Support
88+
89+
The [generic Helm chart](/docs/documentation/operations/helm-chart) supports health probes out of the box.
90+
Enable them in your `values.yaml`:
91+
92+
```yaml
93+
probes:
94+
port: 8080
95+
startup:
96+
enabled: true
97+
path: /startup
98+
readiness:
99+
enabled: true
100+
path: /ready
101+
```
102+
103+
All probe timing parameters (`initialDelaySeconds`, `periodSeconds`, `failureThreshold`) have sensible
104+
defaults and can be overridden.

helm/generic-helm-chart/templates/deployment.yaml

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,30 @@ spec:
5454
{{- toYaml .Values.securityContext | nindent 12 }}
5555
image: "{{ required "A valid .Values.image.repository is required" .Values.image.repository }}:{{ include "generic-operator.imageTag" . }}"
5656
imagePullPolicy: {{ .Values.image.pullPolicy }}
57+
{{- if or .Values.probes.startup.enabled .Values.probes.readiness.enabled }}
58+
ports:
59+
- name: probes
60+
containerPort: {{ .Values.probes.port }}
61+
protocol: TCP
62+
{{- end }}
63+
{{- if .Values.probes.startup.enabled }}
64+
startupProbe:
65+
httpGet:
66+
path: {{ .Values.probes.startup.path }}
67+
port: probes
68+
initialDelaySeconds: {{ .Values.probes.startup.initialDelaySeconds }}
69+
periodSeconds: {{ .Values.probes.startup.periodSeconds }}
70+
failureThreshold: {{ .Values.probes.startup.failureThreshold }}
71+
{{- end }}
72+
{{- if .Values.probes.readiness.enabled }}
73+
readinessProbe:
74+
httpGet:
75+
path: {{ .Values.probes.readiness.path }}
76+
port: probes
77+
initialDelaySeconds: {{ .Values.probes.readiness.initialDelaySeconds }}
78+
periodSeconds: {{ .Values.probes.readiness.periodSeconds }}
79+
failureThreshold: {{ .Values.probes.readiness.failureThreshold }}
80+
{{- end }}
5781
env:
5882
- name: OPERATOR_NAMESPACE
5983
valueFrom:

helm/generic-helm-chart/tests/deployment_test.yaml

Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -288,3 +288,57 @@ tests:
288288
- equal:
289289
path: spec.template.spec.serviceAccountName
290290
value: my-operator
291+
292+
- it: should not include probes by default
293+
asserts:
294+
- isNull:
295+
path: spec.template.spec.containers[0].startupProbe
296+
- isNull:
297+
path: spec.template.spec.containers[0].readinessProbe
298+
299+
- it: should add startup probe when enabled
300+
documentSelector:
301+
path: kind
302+
value: Deployment
303+
set:
304+
probes.startup.enabled: true
305+
asserts:
306+
- equal:
307+
path: spec.template.spec.containers[0].startupProbe.httpGet.path
308+
value: /startup
309+
- equal:
310+
path: spec.template.spec.containers[0].startupProbe.httpGet.port
311+
value: probes
312+
- contains:
313+
path: spec.template.spec.containers[0].ports
314+
content:
315+
name: probes
316+
containerPort: 8080
317+
protocol: TCP
318+
319+
- it: should add readiness probe when enabled
320+
documentSelector:
321+
path: kind
322+
value: Deployment
323+
set:
324+
probes.readiness.enabled: true
325+
asserts:
326+
- equal:
327+
path: spec.template.spec.containers[0].readinessProbe.httpGet.path
328+
value: /ready
329+
- equal:
330+
path: spec.template.spec.containers[0].readinessProbe.httpGet.port
331+
value: probes
332+
333+
- it: should add both probes when both enabled
334+
documentSelector:
335+
path: kind
336+
value: Deployment
337+
set:
338+
probes.startup.enabled: true
339+
probes.readiness.enabled: true
340+
asserts:
341+
- isNotNull:
342+
path: spec.template.spec.containers[0].startupProbe
343+
- isNotNull:
344+
path: spec.template.spec.containers[0].readinessProbe

helm/generic-helm-chart/values.yaml

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -128,3 +128,19 @@ extraVolumeMounts: []
128128
# RBAC configuration
129129
rbac:
130130
create: true
131+
132+
# Health probes configuration
133+
probes:
134+
port: 8080
135+
startup:
136+
enabled: false
137+
path: /startup
138+
initialDelaySeconds: 1
139+
periodSeconds: 3
140+
failureThreshold: 20
141+
readiness:
142+
enabled: false
143+
path: /ready
144+
initialDelaySeconds: 5
145+
periodSeconds: 5
146+
failureThreshold: 3

sample-operators/operations/pom.xml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,11 @@
8282
<artifactId>awaitility</artifactId>
8383
<scope>compile</scope>
8484
</dependency>
85+
<dependency>
86+
<groupId>org.eclipse.jetty</groupId>
87+
<artifactId>jetty-server</artifactId>
88+
<version>12.1.0</version>
89+
</dependency>
8590
<dependency>
8691
<groupId>io.javaoperatorsdk</groupId>
8792
<artifactId>operator-framework-junit</artifactId>

sample-operators/operations/src/main/java/io/javaoperatorsdk/operator/sample/metrics/MetricsHandlingSampleOperator.java

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,9 @@
2323
import java.util.HashMap;
2424
import java.util.Map;
2525

26+
import org.eclipse.jetty.server.Server;
27+
import org.eclipse.jetty.server.handler.ContextHandler;
28+
import org.eclipse.jetty.server.handler.ContextHandlerCollection;
2629
import org.jspecify.annotations.NonNull;
2730
import org.jspecify.annotations.Nullable;
2831
import org.slf4j.Logger;
@@ -59,7 +62,7 @@ public class MetricsHandlingSampleOperator {
5962
* Based on env variables a different flavor of Reconciler is used, showcasing how the same logic
6063
* can be implemented using the low level and higher level APIs.
6164
*/
62-
public static void main(String[] args) {
65+
public static void main(String[] args) throws Exception {
6366
log.info("Metrics Handling Sample Operator starting!");
6467

6568
var configProviders = new ArrayList<ConfigProvider>();
@@ -77,6 +80,13 @@ public static void main(String[] args) {
7780
new MetricsHandlingReconciler2(),
7881
configLoader.applyControllerConfigs(MetricsHandlingReconciler2.NAME));
7982
operator.start();
83+
84+
var startup = new ContextHandler(new StartupHandler(operator), "/startup");
85+
var readiness = new ContextHandler(new ReadinessHandler(operator), "/ready");
86+
Server server = new Server(8080);
87+
server.setHandler(new ContextHandlerCollection(startup, readiness));
88+
server.start();
89+
log.info("Health probe server started on port 8080");
8090
}
8191

8292
public static @NonNull Metrics initOTLPMetrics(boolean localRun) {
Lines changed: 44 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,44 @@
1+
/*
2+
* Copyright Java Operator SDK Authors
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package io.javaoperatorsdk.operator.sample.metrics;
17+
18+
import org.eclipse.jetty.server.Handler;
19+
import org.eclipse.jetty.server.Request;
20+
import org.eclipse.jetty.server.Response;
21+
import org.eclipse.jetty.util.Callback;
22+
23+
import io.javaoperatorsdk.operator.Operator;
24+
25+
import static io.javaoperatorsdk.operator.sample.metrics.StartupHandler.sendMessage;
26+
27+
public class ReadinessHandler extends Handler.Abstract {
28+
29+
private final Operator operator;
30+
31+
public ReadinessHandler(Operator operator) {
32+
this.operator = operator;
33+
}
34+
35+
@Override
36+
public boolean handle(Request request, Response response, Callback callback) {
37+
if (operator.getRuntimeInfo().allEventSourcesAreHealthy()) {
38+
sendMessage(response, 200, "ready", callback);
39+
} else {
40+
sendMessage(response, 400, "not ready: an event source is not healthy", callback);
41+
}
42+
return true;
43+
}
44+
}
Lines changed: 51 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,51 @@
1+
/*
2+
* Copyright Java Operator SDK Authors
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
package io.javaoperatorsdk.operator.sample.metrics;
17+
18+
import java.nio.ByteBuffer;
19+
import java.nio.charset.StandardCharsets;
20+
21+
import org.eclipse.jetty.server.Handler;
22+
import org.eclipse.jetty.server.Request;
23+
import org.eclipse.jetty.server.Response;
24+
import org.eclipse.jetty.util.Callback;
25+
26+
import io.javaoperatorsdk.operator.Operator;
27+
28+
public class StartupHandler extends Handler.Abstract {
29+
30+
private final Operator operator;
31+
32+
public StartupHandler(Operator operator) {
33+
this.operator = operator;
34+
}
35+
36+
@Override
37+
public boolean handle(Request request, Response response, Callback callback) {
38+
if (operator.getRuntimeInfo().isStarted()) {
39+
sendMessage(response, 200, "started", callback);
40+
} else {
41+
sendMessage(response, 400, "not started yet", callback);
42+
}
43+
return true;
44+
}
45+
46+
static void sendMessage(Response response, int code, String message, Callback callback) {
47+
response.setStatus(code);
48+
response.getHeaders().put("Content-Type", "text/plain; charset=utf-8");
49+
response.write(true, ByteBuffer.wrap(message.getBytes(StandardCharsets.UTF_8)), callback);
50+
}
51+
}

sample-operators/operations/src/test/resources/helm-values.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,3 +33,9 @@ primaryResources:
3333
- metricshandlingcustomresource1s
3434
- metricshandlingcustomresource2s
3535

36+
probes:
37+
startup:
38+
enabled: true
39+
readiness:
40+
enabled: true
41+

0 commit comments

Comments
 (0)