Commit 3da717b
authored
Fix scheduled check aborting on transient curl/jq failures (#359)
## Summary
Hotfix for the regression introduced by #357 — the scheduled check now
aborts with `images[\$IMAGE]: bad array subscript` whenever any
background `process_image` subshell hits a curl timeout. Example failing
run:
[24724122514](https://github.com/ethpandaops/eth-client-docker-image-builder/actions/runs/24724122514/job/72320487012#step:9:204).
## Root cause
GitHub Actions invokes `run:` steps as `/usr/bin/bash -e
/path/to/script.sh`. In that mode (unlike `#!/bin/bash -e` as a shebang
or a `bash -c` invocation), a command substitution whose command fails
aborts the surrounding subshell. The retry loop had this line:
```bash
response=$(curl -s --max-time 20 -w $'\n%{http_code}' "$URL")
if [ $? -ne 0 ] || [ -z "$response" ]; then
...
fi
```
When curl timed out, `response=$(...)` killed the backgrounded
`process_image &` subshell **before** `$?` could be checked — the
function never reached its final `echo >> $imageOutput`, so its result
file stayed empty. The aggregator then ran `images[\$IMAGE]=\$EXISTS`
with an empty `$IMAGE` extracted from the empty file, which bash rejects
as a bad subscript under `-e`.
Same problem on `count=\$(echo "\$body" | jq -e ...)` whenever the body
was not valid JSON (or jq -e output was null/false).
## Fix
`.github/workflows/scheduled.yml`:
- **`process_image`:** wrap the curl and jq -e command substitutions
with `|| true` so transient failures drive the retry loop instead of
aborting. Drop the now-redundant `\$? -ne 0` check — `curl -w
'%{http_code}'` always writes at least `\n000`, so the empty-response
guard is sufficient.
- **Aggregator:** skip empty result files (`[ -s "\$file" ]`) and JSON
missing `.image`, emitting a WARN line instead of crashing. This is
belt-and-suspenders: a future regression in `process_image` can no
longer take down the whole check job.
## Verification
Reproduced and fixed under the exact GH Actions shell mode (`bash -e
/tmp/extracted.sh`). End-to-end harness with 5 parallel calls — real
tag, missing tag in existing repo, 404 on missing repo, DNS-fail,
connection-refused — all produced valid result files and correct build
decisions:
```
SKIP: ethpandaops/nimbus-eth2:epbs-devnet-1-a23ebfd (present)
BUILD: ethpandaops/nimbus-eth2:definitely-missing-xxxxxxx
BUILD: ethpandaops/not-a-real-repo:foo
SKIP: some/bad:dns (inconclusive)
SKIP: some/bad:connrefused (inconclusive)
final exit: 0
```
Also verified the aggregator's defensive guard: tossing empty / non-JSON
/ missing-image-field files into the mix produces WARN lines and still
exits 0.
## Test plan
- [ ] Next scheduled run completes the check job (no `bad array
subscript`).
- [ ] Nimbus-eth2 `epbs-devnet-1` is not re-queued for build since the
image is present.
- [ ] Any transient curl failure shows the WARN line and skips the build
rather than crashing.
Signed-off-by: Barnabas Busa <busa.barnabas@gmail.com>1 parent bc11bbf commit 3da717b
1 file changed
Lines changed: 25 additions & 10 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
95 | 95 | | |
96 | 96 | | |
97 | 97 | | |
98 | | - | |
99 | | - | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
100 | 105 | | |
101 | 106 | | |
102 | 107 | | |
| |||
106 | 111 | | |
107 | 112 | | |
108 | 113 | | |
109 | | - | |
110 | | - | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
111 | 118 | | |
112 | 119 | | |
113 | 120 | | |
| |||
169 | 176 | | |
170 | 177 | | |
171 | 178 | | |
172 | | - | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
173 | 182 | | |
174 | | - | |
175 | | - | |
176 | | - | |
177 | | - | |
178 | | - | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
179 | 194 | | |
180 | 195 | | |
181 | 196 | | |
| |||
0 commit comments