Skip to content

Commit 488250c

Browse files
authored
ops(rolling-update): add GOMEMLIMIT=1800MiB + --memory=2500m defaults (#617)
## Summary Add two OOM-defense defaults to `scripts/rolling-update.sh`: - `GOMEMLIMIT=1800MiB` (via new `DEFAULT_EXTRA_ENV`, merged into the existing `EXTRA_ENV` plumbing) - `--memory=2500m` on the remote `docker run` (via new `CONTAINER_MEMORY_LIMIT`) Both are env-var-controlled with empty-string opt-out (`${VAR-default}` so unset uses the default, but an explicit empty string disables it). ## Motivation 2026-04-24 incident: all 4 live nodes were kernel-OOM-SIGKILLed 22-169 times in 24h under a traffic spike. Each kill risked WAL-tail truncation and triggered election storms, cascading into p99 GET spikes to 6-8s. The runtime defense was applied by hand during the incident; this PR makes it the script default so future rollouts inherit it. - `GOMEMLIMIT` — Go runtime GCs aggressively as heap approaches the limit, keeping RSS below the container ceiling. - `--memory` (cgroup hard limit) — if Go can't keep up (e.g. non-heap growth), the kill is scoped to the container, not host processes like `qemu-guest-agent` or `systemd`. ## Behavior changes | Variable | Default | Opt-out | |----------|---------|---------| | `DEFAULT_EXTRA_ENV` | `GOMEMLIMIT=1800MiB` | `DEFAULT_EXTRA_ENV=""` | | `CONTAINER_MEMORY_LIMIT`| `2500m` | `CONTAINER_MEMORY_LIMIT=""` | Operator-supplied `EXTRA_ENV` keys override matching keys in `DEFAULT_EXTRA_ENV` (e.g., `EXTRA_ENV="GOMEMLIMIT=3000MiB"` wins over the default). ## Related Companion PRs (defense-in-depth): - #612 `memwatch` — graceful shutdown before kernel OOM (prevents WAL corruption in the first place) - #613 WAL auto-repair — recovers on startup when the above fails - #616 rolling-update via GitHub Actions over Tailscale — consumes this script ## Test plan - [x] `bash -n scripts/rolling-update.sh` passes - [x] Deployed equivalents manually on all 4 live nodes during the incident (2026-04-24T07:44Z - 07:46Z); no OOM recurrence since - [ ] Next rolling-update invocation should produce `docker run ... --memory=2500m ... -e GOMEMLIMIT=1800MiB ...` on each node ## Design doc reference `docs/design/2026_04_24_proposed_resilience_roadmap.md` (item 1 — capacity/runtime defenses).
2 parents 6548e9a + 75db3bf commit 488250c

2 files changed

Lines changed: 111 additions & 2 deletions

File tree

scripts/rolling-update.env.example

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,3 +51,10 @@ SSH_STRICT_HOST_KEY_CHECKING="accept-new"
5151
RAFTADMIN_REMOTE_BIN="/tmp/elastickv-raftadmin"
5252
RAFTADMIN_RPC_TIMEOUT_SECONDS="5"
5353
RAFTADMIN_ALLOW_INSECURE="true"
54+
55+
# OOM defenses applied on 2026-04-24 after kernel OOM-SIGKILL cascades.
56+
# GOMEMLIMIT makes Go GC before the container hits --memory; --memory keeps
57+
# any kill scoped to the container, not host processes. Set either to "" to
58+
# opt out. User EXTRA_ENV keys override matching keys in DEFAULT_EXTRA_ENV.
59+
DEFAULT_EXTRA_ENV="GOMEMLIMIT=1800MiB"
60+
CONTAINER_MEMORY_LIMIT="2500m"

scripts/rolling-update.sh

Lines changed: 104 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -62,6 +62,18 @@ Optional environment:
6262
Each pair must be KEY=VALUE with a non-empty KEY; pairs themselves must not
6363
contain whitespace.
6464
65+
DEFAULT_EXTRA_ENV defaults to "GOMEMLIMIT=1800MiB" (Go runtime soft memory
66+
ceiling; GC works harder before approaching the hard --memory limit so the
67+
kernel OOM killer is not triggered). Merged with EXTRA_ENV before forwarding;
68+
if a user-supplied EXTRA_ENV entry sets the same KEY, the user value wins.
69+
Set DEFAULT_EXTRA_ENV="" to disable the default.
70+
71+
CONTAINER_MEMORY_LIMIT
72+
docker run --memory value (default: 2500m). Hard container-scoped memory
73+
ceiling; any OOM kill is contained to the elastickv container rather than
74+
cascading to host processes (e.g. qemu-guest-agent, systemd). Paired with
75+
GOMEMLIMIT=1800MiB so Go GC preempts the kill. Set to "" to disable.
76+
6577
Notes:
6678
- If RAFT_TO_REDIS_MAP is unset, it is derived automatically from NODES,
6779
RAFT_PORT, and REDIS_PORT.
@@ -113,6 +125,9 @@ SSH_TARGETS="${SSH_TARGETS:-}"
113125
ROLLING_ORDER="${ROLLING_ORDER:-}"
114126
RAFT_TO_REDIS_MAP="${RAFT_TO_REDIS_MAP:-}"
115127
RAFT_TO_S3_MAP="${RAFT_TO_S3_MAP:-}"
128+
# Container OOM defenses. See usage() for rationale. Empty string disables.
129+
DEFAULT_EXTRA_ENV="${DEFAULT_EXTRA_ENV-GOMEMLIMIT=1800MiB}"
130+
CONTAINER_MEMORY_LIMIT="${CONTAINER_MEMORY_LIMIT-2500m}"
116131

117132
if [[ -z "$NODES" ]]; then
118133
echo "NODES is required" >&2
@@ -427,6 +442,7 @@ update_one_node() {
427442
RAFT_TO_REDIS_MAP="$RAFT_TO_REDIS_MAP_Q" \
428443
RAFT_TO_S3_MAP="$RAFT_TO_S3_MAP_Q" \
429444
EXTRA_ENV="$EXTRA_ENV_Q" \
445+
CONTAINER_MEMORY_LIMIT="$CONTAINER_MEMORY_LIMIT_Q" \
430446
bash -s <<'REMOTE'
431447
set -euo pipefail
432448
@@ -707,10 +723,20 @@ run_container() {
707723
done
708724
fi
709725
726+
# Optional hard container-scoped memory limit. Keeps any OOM kill contained
727+
# to the elastickv container rather than cascading to host processes
728+
# (e.g. qemu-guest-agent, systemd). Pair with GOMEMLIMIT via EXTRA_ENV so
729+
# the Go runtime GCs before the kernel kills the container.
730+
local memory_flags=()
731+
if [[ -n "${CONTAINER_MEMORY_LIMIT:-}" ]]; then
732+
memory_flags=(--memory "$CONTAINER_MEMORY_LIMIT")
733+
fi
734+
710735
docker run -d \
711736
--name "$CONTAINER_NAME" \
712737
--restart unless-stopped \
713738
--network host \
739+
"${memory_flags[@]}" \
714740
-v "$DATA_DIR:$DATA_DIR" \
715741
"${s3_creds_volume[@]}" \
716742
"${extra_env_flags[@]}" \
@@ -868,9 +894,85 @@ ensure_remote_raftadmin_binaries
868894
# CR handling additionally covers deploy.env files edited on Windows.
869895
# `${EXTRA_ENV:-}` is required because `set -u` is active and EXTRA_ENV
870896
# may be unset (the variable is optional in deploy.env).
871-
EXTRA_ENV_NORMALISED="${EXTRA_ENV:-}"
872-
EXTRA_ENV_NORMALISED="${EXTRA_ENV_NORMALISED//[$'\t\r\n']/ }"
897+
# Merge DEFAULT_EXTRA_ENV (operator-safety defaults like GOMEMLIMIT) with any
898+
# user-supplied EXTRA_ENV. User-supplied KEYs win over defaults for the same
899+
# KEY; the remote parser forwards pairs via `-e KEY=VALUE` so docker evaluates
900+
# the last occurrence, which means pre-pending defaults is correct: later user
901+
# entries override earlier defaults. We still de-duplicate here so the printed
902+
# command line stays clean.
903+
EXTRA_ENV_USER_NORMALISED="${EXTRA_ENV:-}"
904+
EXTRA_ENV_USER_NORMALISED="${EXTRA_ENV_USER_NORMALISED//[$'\t\r\n']/ }"
905+
EXTRA_ENV_DEFAULT_NORMALISED="${DEFAULT_EXTRA_ENV:-}"
906+
EXTRA_ENV_DEFAULT_NORMALISED="${EXTRA_ENV_DEFAULT_NORMALISED//[$'\t\r\n']/ }"
907+
908+
merge_extra_env() {
909+
local defaults="$1"
910+
local user="$2"
911+
# Portable across Bash 3.2 (macOS default) which lacks associative
912+
# arrays: concatenate user KEYs into a space-padded string and match
913+
# with " KEY " to test set membership. The EXTRA_ENV list is typically
914+
# a handful of entries, so the linear check is negligible.
915+
local -a user_pairs=()
916+
local -a default_pairs=()
917+
local pair key seen=" " merged=""
918+
919+
# Guard the here-strings: on Bash 3.2 (macOS default) `read` on an
920+
# empty here-string returns non-zero, which trips `set -e`. Skip the
921+
# read when the source string is empty — the empty array is the
922+
# intended result either way.
923+
# IFS is explicitly set per-read so a caller's surrounding IFS
924+
# doesn't change how DEFAULT_EXTRA_ENV / EXTRA_ENV are split.
925+
if [[ -n "$user" ]]; then
926+
IFS=$' \t\n' read -r -a user_pairs <<< "$user"
927+
fi
928+
for pair in "${user_pairs[@]}"; do
929+
[[ -n "$pair" ]] || continue
930+
[[ "$pair" == *=* ]] || continue
931+
key="${pair%%=*}"
932+
seen+="${key} "
933+
done
934+
935+
if [[ -n "$defaults" ]]; then
936+
IFS=$' \t\n' read -r -a default_pairs <<< "$defaults"
937+
# Unlike EXTRA_ENV (user-supplied, forgivable typos), DEFAULT_EXTRA_ENV
938+
# is baked into deploy.env — a malformed token there means a
939+
# safeguard we installed deliberately is silently ignored. Fail
940+
# loudly instead of dropping it.
941+
# Three failure modes to catch early:
942+
# - no `=` at all (e.g. GOMEMLIMIT) -> malformed
943+
# - empty key before `=` (e.g. =1800MiB) -> malformed
944+
# (the `*=*` pattern match alone accepts this)
945+
# - empty pair (covered by the continue above)
946+
for pair in "${default_pairs[@]}"; do
947+
[[ -n "$pair" ]] || continue
948+
if [[ "$pair" != *=* ]]; then
949+
echo "rolling-update: malformed DEFAULT_EXTRA_ENV entry '$pair' (expected KEY=VALUE)" >&2
950+
return 1
951+
fi
952+
if [[ "${pair%%=*}" == "" ]]; then
953+
echo "rolling-update: malformed DEFAULT_EXTRA_ENV entry '$pair' (empty key)" >&2
954+
return 1
955+
fi
956+
done
957+
fi
958+
for pair in "${default_pairs[@]}"; do
959+
[[ -n "$pair" ]] || continue
960+
[[ "$pair" == *=* ]] || continue
961+
key="${pair%%=*}"
962+
if [[ "$seen" != *" ${key} "* ]]; then
963+
merged+="${merged:+ }$pair"
964+
fi
965+
done
966+
for pair in "${user_pairs[@]}"; do
967+
[[ -n "$pair" ]] || continue
968+
merged+="${merged:+ }$pair"
969+
done
970+
printf '%s' "$merged"
971+
}
972+
973+
EXTRA_ENV_NORMALISED="$(merge_extra_env "$EXTRA_ENV_DEFAULT_NORMALISED" "$EXTRA_ENV_USER_NORMALISED")"
873974
EXTRA_ENV_Q="$(printf '%q' "$EXTRA_ENV_NORMALISED")"
975+
CONTAINER_MEMORY_LIMIT_Q="$(printf '%q' "${CONTAINER_MEMORY_LIMIT:-}")"
874976
S3_CREDENTIALS_FILE_Q="$(printf '%q' "${S3_CREDENTIALS_FILE:-}")"
875977
IMAGE_Q="$(printf '%q' "$IMAGE")"
876978
DATA_DIR_Q="$(printf '%q' "$DATA_DIR")"

0 commit comments

Comments
 (0)