Skip to content

fix(ci3): don't abort CI when redis is unavailable (redis_setexz broken pipe)#24351

Draft
AztecBot wants to merge 1 commit into
nextfrom
cb/ci3-redis-noop-fix
Draft

fix(ci3): don't abort CI when redis is unavailable (redis_setexz broken pipe)#24351
AztecBot wants to merge 1 commit into
nextfrom
cb/ci3-redis-noop-fix

Conversation

@AztecBot

Copy link
Copy Markdown
Collaborator

Root cause

The nightly barretenberg debug build failed instantly (~50ms, before any build work) with:

--- Run barretenberg-debug CI ---

gzip: stdout: Broken pipe
##[error]Process completed with exit code 1.

(failing run: AztecProtocol/aztec-claude run 28313796517)

ci.sh barretenberg-debug calls bootstrap_ec2, whose very first action logs a heartbeat to redis:

# ci3/bootstrap_ec2:24
echo "CI booting..." | redis_setexz "$CI_LOG_ID" 300

redis_setexz was:

function redis_setexz {
  gzip | redis_cli -x SETEX $1 $2 &>/dev/null
}

When redis is unavailable, redis_cli is a documented no-op ("Noop when redis is unavailable") that never reads stdin. So gzip compresses, tries to write to the closed pipe, takes a SIGPIPE/stdout: Broken pipe, and exits non-zero. Under set -o pipefail + set -e (from ci3/source_options) that non-zero pipeline status aborts the entire CI run at the first log write — long before reaching EC2. The &>/dev/null only covers redis_cli, so gzip's stderr still leaks the confusing message.

This bit the aztec-claude run because its next mirror is stale and predates the job guard now present in the aztec-packages workflow (if: github.event_name != 'schedule' || github.repository == 'AztecProtocol/aztec-packages'), so the schedule fired in an environment with no AWS creds and no reachable redis. But it is a genuine latent landmine for the real aztec-packages nightly too: if the redis tunnel ever drops, the whole run would abort here with a cryptic broken-pipe error instead of degrading gracefully.

Fix

Make redis_setexz honor the same "noop when redis is unavailable" contract as redis_cli, while still draining stdin so upstream writers in a pipeline don't get SIGPIPE:

function redis_setexz {
  if [ "$CI_REDIS_AVAILABLE" -eq 1 ]; then
    gzip | redis_cli -x SETEX $1 $2 &>/dev/null
  else
    cat >/dev/null
  fi
}

This centrally protects all callers of redis_setexz (bootstrap_ec2, cache_log, denoise, run_test_cmd) — CI log/cache writes now degrade gracefully when redis is down instead of aborting the run.

Verification

Reproduced the original abort and confirmed the fix, under set -euo pipefail:

  • Before: echo ... | redis_setexz → pipeline exits 141 (SIGPIPE) / gzip non-zero → script aborts (matches CI).
  • After: redis-unavailable path returns cleanly (rc=0, stdin drained); redis-available path still pipes through gzip unchanged.

Note on recurring duplicates

This nightly auto-dispatches a claudebox investigation on every failure, and prior daily runs left ~15 cb/*redis-setexz* branches on the remote with no associated PRs (the fix never actually landed). This PR is the consolidated fix; the stale branches can be pruned.


Created by claudebox · group: slackbot

@AztecBot AztecBot added ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR. labels Jun 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. ci-no-fail-fast Sets NO_FAIL_FAST in the CI so the run is not aborted on the first failure claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant