Skip to content

Commit 8d36842

Browse files
authored
docs(operator): add network tuning and idle connection timeout docs (kroxylicious#3533)
1 parent 5f85337 commit 8d36842

11 files changed

Lines changed: 205 additions & 49 deletions
Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
:_mod-docs-content-type: ASSEMBLY
2+
3+
// file included in the following:
4+
//
5+
// kroxylicious-operator/index.adoc
6+
7+
8+
[id='assembly-operator-advanced-proxy-tuning-{context}']
9+
= Advanced proxy tuning
10+
11+
[role="_abstract"]
12+
Configure advanced network and connection settings for the proxy.
13+
14+
include::../_modules/configuring/con-kafkaproxy-network-settings.adoc[leveloffset=+1]
15+
16+
include::../_modules/configuring/con-kafkaproxy-idle-timeouts.adoc[leveloffset=+1]

kroxylicious-docs/docs/_modules/configuring/con-configuring-idle-timeouts.adoc

Lines changed: 11 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -7,20 +7,19 @@
77
The proxy can automatically disconnect idle client connections to reclaim resources.
88
Idle timeout configuration is completely optional and disabled by default, allowing you to opt in only when needed for your deployment.
99

10-
== Two-stage timeout mechanism
10+
== When to enable idle timeouts
11+
12+
include::../../_snippets/snip-idle-timeout-when-to-enable.adoc[]
13+
14+
== When not to enable idle timeouts
1115

12-
The proxy supports two independent idle timeout settings that apply at different stages of the connection lifecycle:
16+
include::../../_snippets/snip-idle-timeout-when-not-to-enable.adoc[]
1317

14-
* **Unauthenticated timeout** (`unauthenticatedIdleTimeout`) - Applies to connections where the proxy cannot detect the completion of authentication. The proxy considers authentication to be complete if either of the following hold true:
15-
1. A transport subject builder creates a subject with an identity. Usually this would be from a client TLS certificate.
16-
2. A SASL inspection or termination filter has invoked `io.kroxylicious.proxy.filter.FilterContext.clientSaslAuthenticationSuccess` method.
17-
* **Authenticated timeout** (`authenticatedIdleTimeout`) - Applies to connections where an identity can be established. This timeout applies for the remainder of the connection's lifetime.
18+
== How idle timeouts work
1819

19-
Both timeout settings are optional and have no default values.
20-
You can configure one, both, or neither depending on your requirements.
21-
Timeout values use Go-style duration format (for example, `30s` for 30 seconds, `5m` for 5 minutes, `1h` for 1 hour).
22-
Supported units are: `d` (days), `h` (hours), `m` (minutes), `s` (seconds), `ms` (milliseconds), `μs` or `us` (microseconds), and `ns` (nanoseconds).
23-
Units can be combined, such as `1h30m` or `90s`.
20+
include::../../_snippets/snip-idle-timeout-how-it-works.adoc[]
21+
22+
include::../../_snippets/snip-idle-timeout-duration-format.adoc[]
2423

2524
== Configuration examples
2625

@@ -59,34 +58,6 @@ virtualClusters:
5958
<1> Disconnect unauthenticated connections after 30 seconds of inactivity.
6059
<2> Disconnect authenticated connections after 10 minutes of inactivity.
6160

62-
== When to enable idle timeouts
63-
64-
Consider enabling idle timeouts in the following scenarios:
65-
66-
* **Misbehaving clients** - Clients that abandon connections without properly closing them, leaving resources allocated unnecessarily.
67-
* **High-scale deployments** - Environments with many clients where connection resources (memory, file descriptors) are constrained.
68-
* **Connection exhaustion prevention** - Deployments approaching operating system or network limits on concurrent connections.
69-
* **Network infrastructure requirements** - Environments where network infrastructure (firewalls, load balancers) drops idle connections, and you want the proxy to disconnect gracefully first.
70-
* **Different security postures** - Scenarios where unauthenticated connections require stricter timeouts than authenticated connections for security reasons.
71-
72-
== When not to enable idle timeouts
73-
74-
Avoid enabling idle timeouts in the following scenarios:
75-
76-
* **Legitimate idle connections** - Applications where clients maintain long-lived connections with extended idle periods, such as consumers with long poll timeouts or applications using connection pooling.
77-
* **Stable network infrastructure** - Environments with reliable network infrastructure and no issues with idle connection management.
78-
* **Minimal overhead desired** - Deployments where the proxy's monitoring overhead should be kept to an absolute minimum.
79-
* **No resource constraints** - Systems with ample connection resources and no risk of connection exhaustion.
80-
8161
== Monitoring idle disconnects
8262

83-
The proxy tracks idle disconnects using the `kroxylicious_client_to_proxy_disconnects_total` metric with `cause="idle_timeout"`.
84-
This counter is incremented each time a connection is closed due to exceeding the configured idle timeout.
85-
86-
The `kroxylicious_client_to_proxy_disconnects_total` metric also tracks other disconnect scenarios:
87-
88-
* `cause="idle_timeout"` - Connection exceeded the configured idle timeout duration
89-
* `cause="client_closed"` - The downstream client initiated the connection close
90-
* `cause="server_closed"` - The upstream node closed the connection, causing the proxy to close the client connection
91-
92-
For more information about connection metrics, see xref:con-prometheus-metrics-proxy-{context}[Overview of proxy metrics].
63+
include::../../_snippets/snip-idle-timeout-monitoring.adoc[]

kroxylicious-docs/docs/_modules/configuring/con-configuring-network-settings.adoc

Lines changed: 19 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -13,19 +13,29 @@ These settings control low-level Netty behavior and are optional, with sensible
1313
network:
1414
proxy: # <1>
1515
workerThreadCount: 8 # <2>
16-
shutdownQuietPeriodSeconds: 10 # <3>
17-
management: # <4>
16+
shutdownQuietPeriod: 5s # <3>
17+
shutdownTimeout: 20s # <4>
18+
management: # <5>
1819
workerThreadCount: 2
19-
shutdownQuietPeriodSeconds: 5
20+
shutdownQuietPeriod: 5s
21+
shutdownTimeout: 20s
2022
virtualClusters:
2123
# ...
2224
----
23-
<1> Network settings for the proxy endpoints that handle client connections.
25+
<1> Network settings for the proxy listener that handles client connections.
2426
<2> Optional: Number of Netty worker threads for handling concurrent connections. Defaults to twice the number of available processors.
25-
<3> Optional: Grace period in seconds during which the proxy continues processing existing connections before shutting down. Defaults to 0.
26-
<4> Network settings for the management HTTP endpoints. Can be configured independently from proxy settings.
27+
<3> Optional: Grace period during which the proxy continues to accept and complete in-flight requests before shutting down. If no new requests arrive during this window, shutdown proceeds. Defaults to `2s` if not specified. Uses Go-style duration format (for example, `30s`, `5m`).
28+
<4> Optional: Maximum time allowed for the proxy to complete shutdown. If shutdown does not complete within this period, it is forced. Defaults to `15s` if not specified. Uses Go-style duration format.
29+
<5> Network settings for the management HTTP server. Can be configured independently from the proxy listener settings.
30+
31+
Supported duration units are: `h` (hours), `m` (minutes), `s` (seconds), `ms` (milliseconds), `μs` or `us` (microseconds), and `ns` (nanoseconds).
32+
Units can be combined, for example `1m30s`.
2733

28-
* All network settings are optional. The proxy will use sensible defaults if not specified.
2934
* The `workerThreadCount` setting allows tuning for high-concurrency deployments. Increasing this value can improve throughput when handling many simultaneous client connections.
30-
* The `shutdownQuietPeriodSeconds` setting provides a graceful shutdown window, allowing in-flight requests to complete before the proxy terminates.
31-
* Proxy and management endpoints can have different thread pool sizes and shutdown behaviors based on their different workload characteristics.
35+
* The `shutdownQuietPeriod` and `shutdownTimeout` settings together control graceful shutdown behaviour. `shutdownQuietPeriod` should always be less than or equal to `shutdownTimeout`.
36+
37+
[NOTE]
38+
====
39+
The `shutdownQuietPeriodSeconds` property (an integer number of seconds) is deprecated and will be removed in a future release.
40+
Use `shutdownQuietPeriod` with a Go-style duration string instead (for example, `shutdownQuietPeriod: 2s`).
41+
====
Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
:_mod-docs-content-type: CONCEPT
2+
3+
// file included in the following:
4+
//
5+
// kroxylicious-operator/_assemblies/assembly-operator-deploy-a-proxy.adoc
6+
7+
[id='con-kafkaproxy-idle-timeouts-{context}']
8+
= Configuring idle connection timeouts
9+
10+
[role="_abstract"]
11+
The proxy can automatically disconnect idle client connections to reclaim resources.
12+
Idle timeout configuration is optional and disabled by default. Enable it only when required for your deployment.
13+
Idle timeouts are configured under `spec.network.proxy` in the `KafkaProxy` resource and apply only to the proxy listener — they are not available for the management HTTP server.
14+
When enabled, idle disconnects are observable via the `kroxylicious_client_to_proxy_disconnects_total` metric.
15+
16+
== When to enable idle timeouts
17+
18+
include::../../_snippets/snip-idle-timeout-when-to-enable.adoc[]
19+
20+
== When not to enable idle timeouts
21+
22+
include::../../_snippets/snip-idle-timeout-when-not-to-enable.adoc[]
23+
24+
== How idle timeouts work
25+
26+
include::../../_snippets/snip-idle-timeout-how-it-works.adoc[]
27+
28+
include::../../_snippets/snip-idle-timeout-duration-format.adoc[]
29+
30+
== Configuration examples
31+
32+
.Example: Unauthenticated timeout only
33+
[source,yaml]
34+
----
35+
kind: KafkaProxy
36+
apiVersion: kroxylicious.io/v1alpha1
37+
metadata:
38+
namespace: my-proxy
39+
name: simple
40+
spec:
41+
network:
42+
proxy:
43+
unauthenticatedIdleTimeout: 30s # <1>
44+
----
45+
<1> Disconnect connections that remain unauthenticated for more than 30 seconds.
46+
47+
.Example: Both timeouts configured
48+
[source,yaml]
49+
----
50+
kind: KafkaProxy
51+
apiVersion: kroxylicious.io/v1alpha1
52+
metadata:
53+
namespace: my-proxy
54+
name: simple
55+
spec:
56+
network:
57+
proxy:
58+
unauthenticatedIdleTimeout: 30s # <1>
59+
authenticatedIdleTimeout: 10m # <2>
60+
----
61+
<1> Disconnect unauthenticated connections after 30 seconds of inactivity.
62+
<2> Disconnect authenticated connections after 10 minutes of inactivity.
63+
64+
== Monitoring idle disconnects
65+
66+
include::../../_snippets/snip-idle-timeout-monitoring.adoc[]
Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
:_mod-docs-content-type: CONCEPT
2+
3+
// file included in the following:
4+
//
5+
// kroxylicious-operator/_assemblies/assembly-operator-deploy-a-proxy.adoc
6+
7+
[id='con-kafkaproxy-network-settings-{context}']
8+
= Advanced network tuning for a proxy
9+
10+
[role="_abstract"]
11+
The `spec.network` section of a `KafkaProxy` resource provides low-level tuning options for the proxy listener and management HTTP server.
12+
The defaults are suitable for most deployments — only configure these settings if you have a specific operational reason to do so.
13+
14+
.Example `KafkaProxy` with network tuning applied
15+
[source,yaml]
16+
----
17+
kind: KafkaProxy
18+
apiVersion: kroxylicious.io/v1alpha1
19+
metadata:
20+
namespace: my-proxy
21+
name: simple
22+
spec:
23+
network:
24+
proxy:
25+
workerThreadCount: 8
26+
shutdownQuietPeriod: 5s
27+
shutdownTimeout: 20s
28+
management:
29+
workerThreadCount: 2
30+
shutdownQuietPeriod: 5s
31+
shutdownTimeout: 20s
32+
----
33+
where:
34+
35+
* `spec.network.proxy` configures network tuning for the proxy listener that handles Kafka client connections. All fields are optional.
36+
* `workerThreadCount` is the number of threads available to process requests across client connections. Each connection is pinned to one thread, but a single thread can serve many connections. More threads increase parallelism but also CPU consumption. Tune this in conjunction with the pod's CPU limits and validate under realistic load. Defaults to twice the number of available processors.
37+
* `shutdownQuietPeriod` is the grace period during which the proxy continues to accept and complete in-flight requests before shutting down. If no new requests arrive during this window, shutdown proceeds. Defaults to `2s` if not specified.
38+
* `shutdownTimeout` is the maximum time allowed for the proxy to complete shutdown, including the quiet period. If shutdown does not complete within this period, it is forced. Defaults to `15s` if not specified. Set this to a value less than the pod's `terminationGracePeriodSeconds` (Kubernetes default: `30s`) to ensure the proxy can finish gracefully before Kubernetes forcibly terminates the pod.
39+
* `spec.network.management` configures network tuning for the management HTTP server that serves metrics and health endpoints. Supports the same settings as the proxy listener, and can be configured independently.
40+
41+
Duration values use a string-based Go-style duration format (for example, `30s`, `5m`).
42+
Supported units are: `m` (minutes), `s` (seconds), `ms` (milliseconds), `μs` or `us` (microseconds), and `ns` (nanoseconds).
43+
Units can be combined, for example `1m30s`.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
:_mod-docs-content-type: SNIPPET
2+
3+
Both timeout settings are optional and have no default values.
4+
You can configure one, both, or neither depending on your requirements.
5+
Timeout values use a string-based duration format, following Go conventions (for example, `30s`, `5m`).
6+
Supported units are: `h` (hours), `m` (minutes), `s` (seconds), `ms` (milliseconds), `μs` or `us` (microseconds), and `ns` (nanoseconds).
7+
Units can be combined, for example `1m30s`.
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
:_mod-docs-content-type: SNIPPET
2+
3+
The proxy supports two independent timeout settings that apply at different stages of the connection lifecycle:
4+
5+
* *Unauthenticated timeout* (`unauthenticatedIdleTimeout`): applies to connections where the proxy has not yet detected completed authentication. The proxy considers authentication complete if either of the following is true:
6+
** A transport subject builder (a component that extracts an authenticated identity from transport-layer attributes) creates a subject with an identity (for example, from a client TLS certificate).
7+
** A SASL inspection or termination filter invokes `io.kroxylicious.proxy.filter.FilterContext.clientSaslAuthenticationSuccess`.
8+
* *Authenticated timeout* (`authenticatedIdleTimeout`): applies to connections where an identity has been established, for the remainder of the connection's lifetime.
9+
10+
[NOTE]
11+
====
12+
For the proxy to detect authentication completion, you must configure either TLS client certificate authentication or a SASL inspection or termination filter.
13+
Without one of these, all connections remain in the unauthenticated state for their entire lifetime, and `authenticatedIdleTimeout` has no effect.
14+
For more information, see the link:{SASLInspectionGuideUrl}[SASL inspection filter guide].
15+
====
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
:_mod-docs-content-type: SNIPPET
2+
3+
The proxy tracks idle disconnects using the `kroxylicious_client_to_proxy_disconnects_total` metric with `cause="idle_timeout"`.
4+
This counter increments each time a connection is closed after exceeding the configured idle timeout.
5+
6+
The `kroxylicious_client_to_proxy_disconnects_total` metric also tracks other disconnect scenarios:
7+
8+
* `cause="idle_timeout"` - Connection exceeded the configured idle timeout duration
9+
* `cause="client_closed"` - The downstream client initiated the connection close
10+
* `cause="server_closed"` - The upstream node closed the connection, causing the proxy to close the client connection
11+
12+
For more information about connection metrics, see xref:con-prometheus-metrics-proxy-{context}[Overview of proxy metrics].
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
:_mod-docs-content-type: SNIPPET
2+
3+
Avoid enabling idle timeouts in the following scenarios:
4+
5+
* *Legitimate idle connections*: applications that maintain long-lived connections with extended idle periods, such as consumers with long poll timeouts or applications using connection pooling.
6+
* *Stable network infrastructure*: environments with reliable network infrastructure and no issues with idle connection management.
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
:_mod-docs-content-type: SNIPPET
2+
3+
Consider enabling idle timeouts in the following scenarios:
4+
5+
* *Security posture*: unauthenticated connections can be closed quickly to limit the window for abuse, while authenticated connections get a more generous timeout.
6+
* *Unclosed connections*: clients that abandon connections without properly closing them, leaving resources allocated unnecessarily.
7+
* *Network infrastructure requirements*: environments where firewalls or load balancers drop idle connections, configure the proxy to disconnect gracefully first.

0 commit comments

Comments
 (0)