Skip to content

Commit 92030f3

Browse files
kvapsclaude
andauthored
fix(e2e): unbreak LUKS cli-matrix cells — stdin to cryptsetup, diskful-only replica count (BUG-039) (#152)
* fix(e2e): forward stdin to cryptsetup in LUKS passphrase assert (BUG-039) assert_luks_passphrase_opens piped the master passphrase into on_node, but on_node runs kubectl exec without -i, so the pipe was never forwarded and cryptsetup read an empty key-file ("Nothing to read on input."). Every kernel-level passphrase assertion therefore failed on every stand — reported as BUG-039 'LUKS data-plane broken' — while the satellite had in fact formatted the backing device with the correct master passphrase (verified live: the operator passphrase opens the LUKS header once stdin is forwarded). Add an on_node_stdin helper (kubectl exec -i, same Running-pod selection) and route the assert through it. Keep cryptsetup stderr and print it on the failure path — the old 2>/dev/null swallowed the 'Nothing to read on input' tell and masked the root cause. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> * fix(e2e): count diskful replicas only in LUKS clone/resize/snap-restore cells (BUG-039) The three data-bearing LUKS cells waited for exactly 2 Resource CRDs after --auto-place=2, but on a 3-worker stand the controller adds (and flaps) a DISKLESS TIE_BREAKER witness, so the all-CRD count oscillates 2-3-2 and the equality check times out spuriously with 'did not autoplace 2 replicas'. Count diskful replicas via linstor_diskful_nodes instead — the convention the sibling encryption-passphrase-luks-rd and luks-rd-create cells already use. With the counting fixed, luks-resize-encrypted goes green on a live stand; luks-clone-encrypted and luks-snapshot-restore-encrypted now surface the real blocker (cross-node snapshot ship fails in the clone/restore engine), which is tracked separately as BUG-038. Co-Authored-By: Claude <noreply@anthropic.com> Signed-off-by: Andrei Kvapil <kvapss@gmail.com> --------- Signed-off-by: Andrei Kvapil <kvapss@gmail.com> Co-authored-by: Claude <noreply@anthropic.com>
1 parent 7389580 commit 92030f3

5 files changed

Lines changed: 60 additions & 23 deletions

File tree

tests/e2e/cli-matrix/lib.sh

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -439,15 +439,24 @@ wait_luks_header_present() {
439439
# replica of an encrypted RD so a Bug-175-class wire-injection / Bug-
440440
# 233-class wrong-passphrase regression is caught at the kernel level
441441
# rather than just at the REST envelope. Non-zero exit on failure.
442+
#
443+
# BUG-039: MUST go through on_node_stdin — plain on_node drops stdin
444+
# (kubectl exec without -i), so cryptsetup read an empty key-file and
445+
# this assert failed on every stand while the satellite had in fact
446+
# formatted with the correct master passphrase. The old 2>/dev/null
447+
# also swallowed cryptsetup's "Nothing to read on input." tell, so we
448+
# now keep stderr and print it on the failure path for triage.
442449
assert_luks_passphrase_opens() {
443450
local node=$1 dev=$2 passphrase=$3
444-
# NUL on stdin avoids leaking the passphrase via `ps -ef` argv and
451+
# Passphrase on stdin avoids leaking it via `ps -ef` argv and
445452
# also avoids re-quoting headaches if the passphrase contains shell
446453
# metachars (the e2e default has `!!` in it, which would trigger
447454
# bash history expansion inside `bash -c` without the heredoc).
448-
if ! printf '%s' "$passphrase" | on_node "$node" \
449-
cryptsetup luksOpen --test-passphrase --key-file=- "$dev" 2>/dev/null; then
455+
local err
456+
if ! err=$(printf '%s' "$passphrase" | on_node_stdin "$node" \
457+
cryptsetup luksOpen --test-passphrase --key-file=- "$dev" 2>&1); then
450458
echo "assert_luks_passphrase_opens: passphrase does NOT open ${node}:${dev}" >&2
459+
echo " cryptsetup said: $err" >&2
451460
return 1
452461
fi
453462
return 0

tests/e2e/cli-matrix/luks-clone-encrypted.sh

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -72,13 +72,14 @@ echo ">> [Bug 333] set cluster passphrase + create source encrypted RD"
7272
"${LCTL[@]}" volume-definition create "$RD_SRC" 64M >/dev/null
7373
"${LCTL[@]}" resource create --auto-place=2 --storage-pool="$POOL" "$RD_SRC" >/dev/null
7474

75+
# BUG-039: count DISKFUL replicas only. auto-place=2 on a 3-worker
76+
# stand adds (and flaps) a DISKLESS TIE_BREAKER witness, so a naive
77+
# all-CRD count oscillates 2→3→2 and an `== 2` equality check times
78+
# out spuriously. Same convention as encryption-passphrase-luks-rd.
7579
deadline=$(( $(date +%s) + 60 ))
7680
placed_src=()
7781
while (( $(date +%s) < deadline )); do
78-
mapfile -t placed_src < <(
79-
kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
80-
| awk -v rd="$RD_SRC." '$1 ~ "^"rd {sub(rd, "", $1); print $1}'
81-
)
82+
mapfile -t placed_src < <(linstor_diskful_nodes "$RD_SRC")
8283
if (( ${#placed_src[@]} == 2 )); then break; fi
8384
sleep 2
8485
done
@@ -113,13 +114,11 @@ rm -f "$err_file"
113114
"${LCTL[@]}" resource create --auto-place=2 --storage-pool="$POOL" "$RD_DST" \
114115
>/dev/null 2>&1 || true
115116

117+
# BUG-039: diskful-only count — see the placed_src loop above.
116118
deadline=$(( $(date +%s) + 120 ))
117119
placed_dst=()
118120
while (( $(date +%s) < deadline )); do
119-
mapfile -t placed_dst < <(
120-
kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
121-
| awk -v rd="$RD_DST." '$1 ~ "^"rd {sub(rd, "", $1); print $1}'
122-
)
121+
mapfile -t placed_dst < <(linstor_diskful_nodes "$RD_DST")
123122
if (( ${#placed_dst[@]} == 2 )); then break; fi
124123
sleep 2
125124
done

tests/e2e/cli-matrix/luks-resize-encrypted.sh

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -73,13 +73,14 @@ echo ">> [Bug 333] linstor rd c $RD -l drbd,luks,storage + vd c 64M + r c --auto
7373
# Resolve the placed pair so we can wait_uptodate against the actual
7474
# nodes the autoplacer picked rather than $WORKER_1+$WORKER_2 by
7575
# convention.
76+
# BUG-039: count DISKFUL replicas only. auto-place=2 on a 3-worker
77+
# stand adds (and flaps) a DISKLESS TIE_BREAKER witness, so a naive
78+
# all-CRD count oscillates 2→3→2 and an `== 2` equality check times
79+
# out spuriously. Same convention as encryption-passphrase-luks-rd.
7680
deadline=$(( $(date +%s) + 60 ))
7781
placed_nodes=()
7882
while (( $(date +%s) < deadline )); do
79-
mapfile -t placed_nodes < <(
80-
kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
81-
| awk -v rd="$RD." '$1 ~ "^"rd {sub(rd, "", $1); print $1}'
82-
)
83+
mapfile -t placed_nodes < <(linstor_diskful_nodes "$RD")
8384
if (( ${#placed_nodes[@]} == 2 )); then break; fi
8485
sleep 2
8586
done

tests/e2e/cli-matrix/luks-snapshot-restore-encrypted.sh

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -80,13 +80,14 @@ echo ">> [Bug 333] create source encrypted RD"
8080
"${LCTL[@]}" resource create --auto-place=2 --storage-pool="$POOL" "$RD_SRC" >/dev/null
8181

8282
# Resolve placed nodes.
83+
# BUG-039: count DISKFUL replicas only. auto-place=2 on a 3-worker
84+
# stand adds (and flaps) a DISKLESS TIE_BREAKER witness, so a naive
85+
# all-CRD count oscillates 2→3→2 and an `== 2` equality check times
86+
# out spuriously. Same convention as encryption-passphrase-luks-rd.
8387
deadline=$(( $(date +%s) + 60 ))
8488
placed_src=()
8589
while (( $(date +%s) < deadline )); do
86-
mapfile -t placed_src < <(
87-
kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
88-
| awk -v rd="$RD_SRC." '$1 ~ "^"rd {sub(rd, "", $1); print $1}'
89-
)
90+
mapfile -t placed_src < <(linstor_diskful_nodes "$RD_SRC")
9091
if (( ${#placed_src[@]} == 2 )); then break; fi
9192
sleep 2
9293
done
@@ -139,13 +140,11 @@ echo ">> [Bug 333] linstor r c $RD_DST --auto-place=2"
139140
# driven snapshot-restore-cross-node.sh.
140141
"${LCTL[@]}" resource create --auto-place=2 --storage-pool="$POOL" "$RD_DST" >/dev/null
141142

143+
# BUG-039: diskful-only count — see the placed_src loop above.
142144
deadline=$(( $(date +%s) + 60 ))
143145
placed_dst=()
144146
while (( $(date +%s) < deadline )); do
145-
mapfile -t placed_dst < <(
146-
kubectl get resources.blockstor.cozystack.io --no-headers 2>/dev/null \
147-
| awk -v rd="$RD_DST." '$1 ~ "^"rd {sub(rd, "", $1); print $1}'
148-
)
147+
mapfile -t placed_dst < <(linstor_diskful_nodes "$RD_DST")
149148
if (( ${#placed_dst[@]} == 2 )); then break; fi
150149
sleep 2
151150
done

tests/e2e/lib.sh

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -109,6 +109,35 @@ on_node() {
109109
kubectl -n "$NS" exec "$pod" -- "$@"
110110
}
111111

112+
# on_node_stdin runs CMD inside the satellite pod scheduled on NODE
113+
# with the caller's stdin forwarded into the container (`kubectl
114+
# exec -i`). Use this — never plain on_node — whenever the remote
115+
# command reads stdin (e.g. `cryptsetup --key-file=-`).
116+
#
117+
# BUG-039 root cause: on_node execs WITHOUT `-i`, so kubectl never
118+
# forwards the pipe and the remote command sees EOF on fd 0.
119+
# `printf pass | on_node ... cryptsetup luksOpen --key-file=-` fed
120+
# cryptsetup an EMPTY key ("Nothing to read on input"), making every
121+
# kernel-level passphrase assertion fail on every stand regardless of
122+
# what key the satellite actually formatted with. Kept separate from
123+
# on_node because an unconditional `-i` would let no-stdin callers
124+
# (poll loops, `while read` bodies) steal the calling script's stdin.
125+
on_node_stdin() {
126+
local node=$1
127+
shift
128+
local pod
129+
pod=$(kubectl -n "$NS" get pods -l app=blockstor-satellite \
130+
--field-selector "spec.nodeName=${node},status.phase=Running" \
131+
-o "jsonpath={.items[0].metadata.name}")
132+
133+
if [[ -z "$pod" ]]; then
134+
echo "no Running satellite pod on node $node" >&2
135+
return 1
136+
fi
137+
138+
kubectl -n "$NS" exec -i "$pod" -- "$@"
139+
}
140+
112141
# ---- k8s-native readers (preferred over drbdsetup-status grep) ----
113142
#
114143
# Replaces `kubectl exec satellite -- drbdsetup status ... | grep ...`

0 commit comments

Comments
 (0)