Skip to content

Commit 3da717b

Browse files
authored
Fix scheduled check aborting on transient curl/jq failures (#359)
## Summary Hotfix for the regression introduced by #357 — the scheduled check now aborts with `images[\$IMAGE]: bad array subscript` whenever any background `process_image` subshell hits a curl timeout. Example failing run: [24724122514](https://github.com/ethpandaops/eth-client-docker-image-builder/actions/runs/24724122514/job/72320487012#step:9:204). ## Root cause GitHub Actions invokes `run:` steps as `/usr/bin/bash -e /path/to/script.sh`. In that mode (unlike `#!/bin/bash -e` as a shebang or a `bash -c` invocation), a command substitution whose command fails aborts the surrounding subshell. The retry loop had this line: ```bash response=$(curl -s --max-time 20 -w $'\n%{http_code}' "$URL") if [ $? -ne 0 ] || [ -z "$response" ]; then ... fi ``` When curl timed out, `response=$(...)` killed the backgrounded `process_image &` subshell **before** `$?` could be checked — the function never reached its final `echo >> $imageOutput`, so its result file stayed empty. The aggregator then ran `images[\$IMAGE]=\$EXISTS` with an empty `$IMAGE` extracted from the empty file, which bash rejects as a bad subscript under `-e`. Same problem on `count=\$(echo "\$body" | jq -e ...)` whenever the body was not valid JSON (or jq -e output was null/false). ## Fix `.github/workflows/scheduled.yml`: - **`process_image`:** wrap the curl and jq -e command substitutions with `|| true` so transient failures drive the retry loop instead of aborting. Drop the now-redundant `\$? -ne 0` check — `curl -w '%{http_code}'` always writes at least `\n000`, so the empty-response guard is sufficient. - **Aggregator:** skip empty result files (`[ -s "\$file" ]`) and JSON missing `.image`, emitting a WARN line instead of crashing. This is belt-and-suspenders: a future regression in `process_image` can no longer take down the whole check job. ## Verification Reproduced and fixed under the exact GH Actions shell mode (`bash -e /tmp/extracted.sh`). End-to-end harness with 5 parallel calls — real tag, missing tag in existing repo, 404 on missing repo, DNS-fail, connection-refused — all produced valid result files and correct build decisions: ``` SKIP: ethpandaops/nimbus-eth2:epbs-devnet-1-a23ebfd (present) BUILD: ethpandaops/nimbus-eth2:definitely-missing-xxxxxxx BUILD: ethpandaops/not-a-real-repo:foo SKIP: some/bad:dns (inconclusive) SKIP: some/bad:connrefused (inconclusive) final exit: 0 ``` Also verified the aggregator's defensive guard: tossing empty / non-JSON / missing-image-field files into the mix produces WARN lines and still exits 0. ## Test plan - [ ] Next scheduled run completes the check job (no `bad array subscript`). - [ ] Nimbus-eth2 `epbs-devnet-1` is not re-queued for build since the image is present. - [ ] Any transient curl failure shows the WARN line and skips the build rather than crashing. Signed-off-by: Barnabas Busa <busa.barnabas@gmail.com>
1 parent bc11bbf commit 3da717b

1 file changed

Lines changed: 25 additions & 10 deletions

File tree

.github/workflows/scheduled.yml

Lines changed: 25 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -95,8 +95,13 @@ jobs:
9595
local exists="unknown"
9696
local attempt response http_code body count
9797
for attempt in 1 2 3; do
98-
response=$(curl -s --max-time 20 -w $'\n%{http_code}' "$URL")
99-
if [ $? -ne 0 ] || [ -z "$response" ]; then
98+
# `|| true` is load-bearing: the surrounding script runs under
99+
# `bash -e`, which aborts the subshell on a failed command
100+
# substitution — so a curl timeout here would kill the background
101+
# process_image mid-function and leave an empty output file,
102+
# breaking the aggregator with "bad array subscript".
103+
response=$(curl -s --max-time 20 -w $'\n%{http_code}' "$URL" || true)
104+
if [ -z "$response" ]; then
100105
sleep $((attempt * 2))
101106
continue
102107
fi
@@ -106,8 +111,10 @@ jobs:
106111
if [ "$http_code" = "200" ]; then
107112
# Require a well-formed tags payload and exact-match the tag
108113
# name so a substring hit (e.g. `foo-abc` vs `foo-abcdef`)
109-
# can't mask a missing tag.
110-
count=$(echo "$body" | jq -e --arg tag "$IMAGE_TAG" '[.results[]? | select(.name == $tag)] | length' 2>/dev/null)
114+
# can't mask a missing tag. `|| true` again: jq -e exits
115+
# non-zero on parse errors / null results and would otherwise
116+
# abort the subshell under bash -e.
117+
count=$(echo "$body" | jq -e --arg tag "$IMAGE_TAG" '[.results[]? | select(.name == $tag)] | length' 2>/dev/null || true)
111118
if [ -n "$count" ]; then
112119
if [ "$count" -gt 0 ]; then
113120
exists=true
@@ -169,13 +176,21 @@ jobs:
169176
170177
declare -A images
171178
172-
# Concatenate results, ensuring files exist before attempting to read
179+
# Concatenate results, ensuring files exist before attempting to read.
180+
# Skip empty / malformed files so a single aborted process_image subshell
181+
# can't crash the whole check job with "bad array subscript".
173182
for file in $TEMP_DIR/*_image.json; do
174-
if [ -f "$file" ]; then
175-
LINE=$(cat "$file" | jq -r '.line')
176-
IMAGE=$(cat "$file" | jq -r '.image')
177-
EXISTS=$(cat "$file" | jq -r '.exists')
178-
images[$IMAGE]=$EXISTS
183+
if [ -f "$file" ] && [ -s "$file" ]; then
184+
LINE=$(jq -r '.line // empty' "$file" 2>/dev/null || true)
185+
IMAGE=$(jq -r '.image // empty' "$file" 2>/dev/null || true)
186+
EXISTS=$(jq -r '.exists // empty' "$file" 2>/dev/null || true)
187+
if [ -n "$IMAGE" ]; then
188+
images[$IMAGE]=$EXISTS
189+
else
190+
echo "WARN: dropping malformed image-check result $(basename "$file"): $(head -c 200 "$file")" >&2
191+
fi
192+
else
193+
echo "WARN: image-check result $(basename "$file") is empty; skipping" >&2
179194
fi
180195
done
181196

0 commit comments

Comments
 (0)