feat: Add Valkey Sentinel & HAProxy support in High Availability setup. by khtee · Pull Request #137 · valkey-io/valkey-helm

khtee · 2026-02-06T08:56:32Z

Feat #22

dmaes · 2026-02-11T13:03:11Z

Nice work!

I have some notes/questions:

You'll want to disable the default statefulset when sentinel is enabled.
Do you plan on adding a HAProxy deployment, so clients don't have to be aware of sentinel? ( See for example https://arystech.com/blog/setting-up-haproxy-with-redis-sentinel-for-high-availability-on-microk8s-kubernetes )

khtee · 2026-02-11T13:07:44Z

Nice work!

I have some notes/questions:

You'll want to disable the default statefulset when sentinel is enabled.

Do you plan on adding a HAProxy deployment, so clients don't have to be aware of sentinel? ( See for example https://arystech.com/blog/setting-up-haproxy-with-redis-sentinel-for-high-availability-on-microk8s-kubernetes )

Good points! Will work on both improvements.

yoannrt · 2026-02-11T15:17:39Z

Good job,

I have some questions too,
I assume to discover who is master, you have to ask sentinel SENTINEL GET-MASTER-ADDR-BY-NAME mymaster ?
(Currently, the replica service is sticked to pod-0 in the STS)

Also, would that make sense to run sentinel as a side container in the replica STS pods ?

amontalban · 2026-02-13T22:47:56Z

Thank you @khtee can't wait to get this merged in 🙏

dmaes · 2026-02-14T06:50:58Z

Also, would that make sense to run sentinel as a side container in the replica STS pods ?

You'll want to disable the default statefulset when sentinel is enabled

I got confused, thinking valkey-sentinel also runs the server process, it's been a while since I used redis/valkey. But yes, @khtee you'll want to run sentinel and the server side-to-side in the same pod, don't split them out in different statefulsets.

yoannrt · 2026-02-17T15:38:51Z

Do you plan on adding a HAProxy deployment, so clients don't have to be aware of sentinel?

@dmaes I guess when you suggest to implement HAproxy, it's for sentinel incompatible clients workloads ?
So this would be optional and the default implementation would be a valkey/sentinel HA setup ?

khtee · 2026-02-19T06:14:04Z

Also, would that make sense to run sentinel as a side container in the replica STS pods ?

You'll want to disable the default statefulset when sentinel is enabled

I got confused, thinking valkey-sentinel also runs the server process, it's been a while since I used redis/valkey. But yes, @khtee you'll want to run sentinel and the server side-to-side in the same pod, don't split them out in different statefulsets.

Added following enhancements.

Sentinel now runs as a sidecar in valkey pod.
HAProxy with sidecar watcher to track failover.

dmaes · 2026-02-19T07:58:51Z

Do you plan on adding a HAProxy deployment, so clients don't have to be aware of sentinel?

@dmaes I guess when you suggest to implement HAproxy, it's for sentinel incompatible clients workloads ? So this would be optional and the default implementation would be a valkey/sentinel HA setup ?

That's correct.

The truly kubernetes-native way would probably be to have a sentinel-master Service, using a sentinel.valkey.io/master: "true" label selector, and then some watcher that updates that label on the correct Pod,
but a HAproxy deployment is the easier option to implement, and is how most other Redis charts do it.

Signed-off-by: KHTee <teekahhui@hotmail.com>

Co-authored-by: Dieter Maes <dieter.maes@dmaes.be> Signed-off-by: khtee <75174583+khtee@users.noreply.github.com>

Signed-off-by: KHTee <teekahhui@hotmail.com>

Allow HAProxy to retry DNS resolution during startup when pending for Valkey node to start. Essentially it does the following - Try to use the last known IP. - If none, query the libc resolver (DNS). - If that fails, resolve to none (meaning the server has no IP address yet, but HAProxy won't crash) and wait for the runtime resolver health-checks to pick up the DNS correctly. Signed-off-by: KHTee <teekahhui@hotmail.com>

lazariv · 2026-03-02T12:50:57Z

Waiting for this to be merged. Thanks for your work!

asjonos · 2026-03-26T11:32:33Z

+          command:
+            - /bin/sh
+            - -c
+            - |
+              apk add --no-cache socat
+              exec sh /scripts/sentinel-watcher.sh


This syntax triggers a "Potential reverse shell detected" incident with Microsoft Defender for Cloud.

can this be mitigated by removing the pipe ?

Please note that in an air-gapped environment, the apk command cannot function. Generally, installing software at runtime is not a best practice.

I am not an expert with Valkey and/or Redis and especially not Sentinel.
But maybe the approach from DandyDeveloper/redis-ha can help here, which seems to do this entirely within the haproxy config:
https://github.com/DandyDeveloper/charts/blob/f4aadb7523966fd3852589627c18db937415a439/charts/redis-ha/templates/_configs.tpl#L542-L671

Please note that in an air-gapped environment, the apk command cannot function. Generally, installing software at runtime is not a best practice.

Neither will it work with readOnlyRootFilesystem deployments

@khtee in the values.yaml you mention not needing any additional tools, because valkey alpine image comes with nc -U, but then you seem to not be using nc -U, and instead you install socat as an additional tool.

I’ve been testing a different HAProxy approach
Instead of Sentinel watcher + socat + runtime socket updates, I think HAProxy can detect the writable master directly using native TCP health checks against the Valkey pods (INFO replication -> role:master).

So this ConfigMap is almost completely different, it replaces most of the current HAProxy integration.

This removes runtime package installs and socat

I’m pasting the full replacement ConfigMap below so the complete approach can be reviewed

HAProxy ConfigMap proposal

{{- if .Values.haproxy.enabled }} apiVersion: v1 kind: ConfigMap metadata: name: {{ include "valkey.fullname" . }}-haproxy labels: {{- include "valkey.labels" . | nindent 4 }} data: haproxy.cfg.tpl: | global log stdout format raw local0 maxconn 1024 ssl-server-verify none # Resolve Kubernetes DNS dynamically resolvers k8s parse-resolv-conf hold valid 1s defaults log global timeout connect {{ .Values.haproxy.config.timeout.connect }} timeout client {{ .Values.haproxy.config.timeout.client }} timeout server {{ .Values.haproxy.config.timeout.server }} timeout tunnel {{ .Values.haproxy.config.timeout.tunnel | default "0s" }} retries 3 frontend valkey_frontend_write bind *:{{ .Values.haproxy.service.port | default 6379 }} mode tcp option tcplog default_backend valkey_backend_master frontend valkey_frontend_read bind *:{{ .Values.haproxy.service.readPort | default 6380 }} mode tcp option tcplog default_backend valkey_backend_read backend valkey_backend_master mode tcp option tcp-check timeout check 5s default-server inter {{ .Values.haproxy.config.checkInterval }} fall 1 rise 2 {{- if .Values.tls.enabled }} tcp-check connect port {{ .Values.service.port }} ssl {{- else }} tcp-check connect port {{ .Values.service.port }} {{- end }} {{- if .Values.auth.enabled }} tcp-check send "AUTH HAPROXY_WATCHER_USER HAPROXY_WATCHER_PASS\r\n" tcp-check expect string +OK {{- end }} tcp-check send "INFO replication\r\n" tcp-check expect rstring role:master tcp-check send "QUIT\r\n" tcp-check expect string +OK {{- range $i := until (add (int .Values.replica.replicas) 1 | int) }} server valkey-{{ $i }} {{ include "valkey.fullname" $ }}-{{ $i }}.{{ include "valkey.headlessServiceName" $ }}.{{ $.Release.Namespace }}.svc.{{ $.Values.clusterDomain }}:{{ $.Values.service.port }} check init-addr last,libc,none resolvers k8s on-marked-down shutdown-sessions {{- end }} backend valkey_backend_read mode tcp option tcp-check timeout check 5s {{- if .Values.tls.enabled }} tcp-check connect port {{ .Values.service.port }} ssl {{- else }} tcp-check connect port {{ .Values.service.port }} {{- end }} {{- if .Values.auth.enabled }} tcp-check send "AUTH HAPROXY_WATCHER_USER HAPROXY_WATCHER_PASS\r\n" tcp-check expect string +OK {{- end }} tcp-check send "PING\r\n" tcp-check expect string +PONG tcp-check send "QUIT\r\n" tcp-check expect string +OK {{- range $i := until (add (int .Values.replica.replicas) 1 | int) }} server valkey-{{ $i }} {{ include "valkey.fullname" $ }}-{{ $i }}.{{ include "valkey.headlessServiceName" $ }}.{{ $.Release.Namespace }}.svc.{{ $.Values.clusterDomain }}:{{ $.Values.service.port }} check inter 15s fall 3 rise 2 init-addr last,libc,none resolvers k8s {{- end }} entrypoint.sh: | #!/bin/sh set -eu CFG_TPL="/config/haproxy.cfg.tpl" CFG_OUT="/tmp/haproxy.cfg" cp "${CFG_TPL}" "${CFG_OUT}" {{- if .Values.auth.enabled }} {{- $watcherUser := .Values.haproxy.watcherUser | default "default" }} {{- $watcherUserObj := index .Values.auth.aclUsers $watcherUser | default (dict "passwordKey" "") }} {{- $passKey := $watcherUserObj.passwordKey | default $watcherUser }} WATCHER_USER="{{ $watcherUser }}" WATCHER_PASS="" if [ -f "/valkey-users-secret/{{ $passKey }}" ]; then WATCHER_PASS=$(cat "/valkey-users-secret/{{ $passKey }}") elif [ -f "/valkey-auth-secret/{{ $passKey }}-password" ]; then WATCHER_PASS=$(cat "/valkey-auth-secret/{{ $passKey }}-password") elif [ -n "${HAPROXY_WATCHER_PASS:-}" ]; then WATCHER_PASS="${HAPROXY_WATCHER_PASS}" else echo "ERROR: Could not resolve password for watcher user '${WATCHER_USER}'." >&2 echo " Mount the secret as a volume or set the HAPROXY_WATCHER_PASS env var." >&2 exit 1 fi sed -i "s|HAPROXY_WATCHER_USER|${WATCHER_USER}|g" "${CFG_OUT}" sed -i "s|HAPROXY_WATCHER_PASS|${WATCHER_PASS}|g" "${CFG_OUT}" {{- end }} haproxy -c -V -f "${CFG_OUT}" || { echo "ERROR: haproxy config validation failed" >&2; exit 1; } exec haproxy -W -f "${CFG_OUT}" {{- end }}

Co-authored-by: Tim Karger <49390121+tkarger@users.noreply.github.com> Signed-off-by: khtee <75174583+khtee@users.noreply.github.com>

khtee

Fix code review comments

Co-authored-by: lazariv <lazariv.taras@gmail.com> Signed-off-by: khtee <75174583+khtee@users.noreply.github.com>

Signed-off-by: khtee <75174583+khtee@users.noreply.github.com>

Munken · 2026-03-31T19:50:12Z

    #   labels:
    #     severity: error
+
+haproxy:


Suggested change

haproxy:

# Enable haproxy in front of sentinel

# This is allows running redis in sentinel mode even when the client is not compatible

# https://arystech.com/blog/setting-up-haproxy-with-redis-sentinel-for-high-availability-on-microk8s-kubernetes

haproxy:

fix(chart): resolve schema validation and template errors

jose-10000 · 2026-04-03T08:31:59Z

Hi @khtee

I’ve been testing this with ACLs enabled + TLS active and found some discovery gaps when the default user is restricted. I've sent a PR to @khtee's fork with the following hardening fixes:

Inter-Sentinel Auth: Added sentinel-user/pass to fix quorum discovery ("empty array" issue).
Role Segregation: Support for a dedicated monitorUser (separate from replication).
Security: Fixed credential leaks in logs (redirected to stderr) and added native SHA256 hashing.
Schema: Fixed values.schema.json and HAProxy tag types to pass helm lint.

I've verified that ACL + TLS now work seamlessly. I'm currently finishing validation for HAProxy with credentials and will update soon.

Thanks for your contribution!

- Updating HAProxy watcher for near-instant IP-based failover. - Refactoring init scripts to support dynamic topology and universal auth/TLS injection. - Adding smart L7 health checks in HAProxy to handle ACL-protected nodes. - Fully parameterizing Service and ConfigMap ports for end-to-end flexibility.

…notations in haproxy-deployment

…schema Valkey Sentinel: Fixing Auth/ACL gaps in TLS environments and security hardening (Enhancements for PR valkey-io#137)

yoannrt · 2026-04-09T22:49:50Z

@@ -188,3 +188,34 @@ Validate replica authentication configuration
 {{- end }}
 {{- end -}}



Suggested change

{{/*

Validate haproxy is used in replica mode

*/}}

{{- define "valkey.validateHaproxyRequirements" -}}

{{- if and .Values.haproxy.enabled (not .Values.replica.enabled) }}

{{- fail "Haproxy is only relevant in replica mode with clients incompatible with Sentinel." }}

{{- end }}

{{- end -}}

update of deploy_valkey.yaml to fail if haproxy is enabled in standalone mode.

Suggested change

{{- include "valkey.validateHaproxyRequirements" . }}

lazariv

Redis/Valkey pub/sub (SUBSCRIBE) connections are long-lived and idle by design — they block waiting for messages. HAProxy's timeout client/timeout server treats them as stale and drops them, causing clients to see ConnectionError: Connection closed by server and forcing reconnect loops.

Adding timeout tunnel to the HAProxy defaults section solves this cleanly. In HAProxy, timeout tunnel governs bidirectional connections after the initial handshake — exactly the pattern pub/sub uses. Setting it to 0 keeps SUBSCRIBE connections alive indefinitely while preserving normal client/server timeouts as a safety net for regular command connections.

Without this, users running any pub/sub workload (Socket.IO, Celery, Sidekiq, etc.) through HAProxy must either set client/server timeouts to 0 (triggering HAProxy warnings) or accept periodic disconnects.

lazariv · 2026-04-13T11:21:38Z

+      log global
+      timeout connect {{ .Values.haproxy.config.timeout.connect }}
+      timeout client {{ .Values.haproxy.config.timeout.client }}
+      timeout server {{ .Values.haproxy.config.timeout.server }}


Suggested change

timeout server {{ .Values.haproxy.config.timeout.server }}

timeout server {{ .Values.haproxy.config.timeout.server }}

timeout tunnel {{ .Values.haproxy.config.timeout.tunnel }}

option clitcpka

option srvtcpka

lazariv · 2026-04-13T11:23:05Z

+                    },
+                    "server": {
+                      "type": "string"
+                    }


Suggested change

}

},

"tunnel": {

"type": "string"

}

lazariv · 2026-04-13T11:24:30Z

+    timeout:
+      connect: 5s
+      client: 1m
+      server: 1m


Suggested change

server: 1m

server: 1m

# Timeout for long-lived bidirectional connections (e.g. pub/sub).

# Set to 0 to keep pub/sub SUBSCRIBE connections alive indefinitely.

tunnel: 0s

jose-10000 · 2026-04-27T06:56:55Z

I've implemented a preStop hook. I found that improper shutdowns during updates were tripping up Sentinel and risked data loss, so this change ensures a more graceful exit. I made a new pull request @khtee

jose-10000 · 2026-04-30T11:16:04Z

I've added the workloadAnnotations block to haproxy-deployment.yaml. I noticed that the upstream main branch recently introduced workloadAnnotations for other workloads, so adding it here ensures HAProxy is aligned with main once this PR is eventually merged.

Do you have a rough ETA for this merge? Please let me know if there is anything else I can help review or test to get this PR over the finish line. I'm planning to adopt these changes soon and would like to have an idea of when it might be merged. Thanks!

jose-10000 · 2026-05-18T07:02:40Z

I have some reservations about the approach of running Sentinel as a sidecar within the same pod as the Valkey instance.

The main concern is related to Helm upgrade workflows: when a pod is taken down during a rolling update or a Helm upgrade, the entire pod is terminated — including both the Valkey container and the Sentinel sidecar. This means we lose a Sentinel node during the upgrade process.

Losing a Sentinel during an upgrade can temporarily break the quorum. For example, in a typical 3-Sentinel setup, losing one during an upgrade leaves only 2 Sentinels, which may still meet quorum — but in smaller or less redundant setups, this can prevent failover from working correctly during the exact moment it may be needed most.

A more resilient architecture would deploy Sentinels as independent pods (e.g., via a separate StatefulSet), decoupled from the Valkey data pods. This ensures that Sentinel availability is not tied to the lifecycle of the data nodes, preserving quorum integrity throughout rolling updates and upgrades.

dmaes · 2026-05-18T08:18:06Z

@jose-10000 across all other best known redis sentinel helm charts, running in the same pod is the most common setup, and has been working just fine for most people. So I (and I think that goes for others too), never really considered using different statefulsets?

jose-10000 · 2026-05-18T08:43:45Z

@jose-10000 across all other best known redis sentinel helm charts, running in the same pod is the most common setup, and has been working just fine for most people. So I (and I think that goes for others too), never really considered using different statefulsets?

Thanks for the context, @dmaes! Totally makes sense, I know most charts (like Bitnami) do it this way to keep things simplier. I just wanted to point out that coupling their lifecycles can be tricky for quorum during upgrades, but I'm fine keeping the current approach. Just wanted to share my two cents

trabelsieyal · 2026-05-20T14:47:18Z

Any idea when will this be released? can be amazing

khtee force-pushed the main branch from a9d23ba to c9ef751 Compare February 9, 2026 00:12

dmaes mentioned this pull request Feb 11, 2026

Implement HA with Sentinel #22

Open

khtee marked this pull request as draft February 18, 2026 23:57

khtee changed the title ~~feat: Add Valkey Sentinel support in High Availability setup.~~ feat: Add Valkey Sentinel & HAProxy support in High Availability setup. Feb 19, 2026

khtee marked this pull request as ready for review February 19, 2026 01:19

khtee marked this pull request as draft February 19, 2026 01:30

khtee marked this pull request as ready for review February 19, 2026 06:11

dmaes suggested changes Feb 19, 2026

View reviewed changes

Comment thread valkey/values.yaml Outdated

Comment thread valkey/templates/haproxy-deployment.yaml Outdated

Comment thread valkey/templates/haproxy-deployment.yaml

Comment thread valkey/values.yaml Outdated

Comment thread valkey/values.yaml

khtee force-pushed the main branch from 35e72a3 to 26d5117 Compare February 20, 2026 01:47

khtee and others added 5 commits February 20, 2026 09:53

feat: Add Valkey Sentinel support in High Availability setup.

4c7c981

Signed-off-by: KHTee <teekahhui@hotmail.com>

feat: Add Valkey Sentinel & HAProxy support in High Availability setup.

32de638

Signed-off-by: KHTee <teekahhui@hotmail.com>

Add securityContext to HAProxy & Sentinel watcher

708c707

Co-authored-by: Dieter Maes <dieter.maes@dmaes.be> Signed-off-by: khtee <75174583+khtee@users.noreply.github.com>

Use .Chart.AppVersion for sentinel watcher

adc429d

Signed-off-by: KHTee <teekahhui@hotmail.com>

khtee force-pushed the main branch from 26d5117 to 7184810 Compare February 20, 2026 01:55

khtee requested a review from dmaes February 20, 2026 01:56

LouisVallat mentioned this pull request Feb 25, 2026

Replace Bitnami Redis dependency and provide Helm chart support for Valkey passbolt/charts-passbolt#122

Open

tkarger reviewed Mar 11, 2026

View reviewed changes

Comment thread valkey/templates/haproxy-configmap.yaml Outdated