Skip to content

fix(setup.sh): guard SIGPIPE in site-agent pod lookup (spurious [7h] Code 141)#2676

Merged
osu merged 3 commits into
NVIDIA:mainfrom
kirson-git:fix/setup-sigpipe-head
Jun 25, 2026
Merged

fix(setup.sh): guard SIGPIPE in site-agent pod lookup (spurious [7h] Code 141)#2676
osu merged 3 commits into
NVIDIA:mainfrom
kirson-git:fix/setup-sigpipe-head

Conversation

@kirson-git

Copy link
Copy Markdown
Contributor

Problem

In the [7h] site-agent verification, the pod lookup is:

_POD="$(kubectl get pods -n nico-rest -l ... -o name 2>/dev/null | head -1)"

The script runs under set -euo pipefail (line 77). head -1 closes the pipe after the first line, kubectl receives SIGPIPE → exit 141, and pipefail propagates it, so the command substitution fails and the phase aborts with SETUP FAILED ... Code: 141even though site-agent deployed fine (StatefulSet ready, Site CR HandshakeComplete). Reproduced on a clean v0.10.3 install; same code on main.

Fix

Append || true so the SIGPIPE-induced 141 is tolerated.

🤖 Generated with Claude Code

@kirson-git kirson-git requested review from a team and shayan1995 as code owners June 17, 2026 14:51
@copy-pr-bot

copy-pr-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai

coderabbitai Bot commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 925adcb4-6eaf-4a45-916c-2ca04dacb2fd

📥 Commits

Reviewing files that changed from the base of the PR and between ad52d81 and b962d93.

📒 Files selected for processing (1)
  • helm-prereqs/setup.sh
✅ Files skipped from review due to trivial changes (1)
  • helm-prereqs/setup.sh

Summary by CodeRabbit

  • Bug Fixes
    • Improved setup script resilience during gRPC connection verification by preventing failures when no matching pods are found, resulting in a more reliable deployment initialization process.

Walkthrough

helm-prereqs/setup.sh now appends || true to the _POD lookup pipeline so an empty kubectl match does not abort the script under set -euo pipefail.

Changes

NICo REST Site-Agent gRPC Verification Loop

Layer / File(s) Summary
Guard kubectl pipeline against empty pod list
helm-prereqs/setup.sh
`

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the setup.sh SIGPIPE workaround for the site-agent pod lookup failure.
Description check ✅ Passed The description matches the code change and clearly explains the 141 failure and the || true fix.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@mxh-0xbb

Copy link
Copy Markdown
Contributor

/ok to test ad52d81

@github-actions

github-actions Bot commented Jun 21, 2026

Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
boot-artifacts-aarch64 3 0 0 3 0 0
boot-artifacts-x86_64 3 0 0 3 0 0
forge-admin-cli-x86_64 265 6 24 98 7 130
machine-validation-runner 717 32 188 267 36 194
machine_validation 717 32 188 267 36 194
machine_validation-aarch64 717 32 188 267 36 194
nvmetal-carbide 717 32 188 267 36 194
TOTAL 3139 134 776 1172 151 906

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@mxh-0xbb

Copy link
Copy Markdown
Contributor

/ok to test d781f9c

@shayan1995 shayan1995 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for this.

@ajf

ajf commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

@kirson-git ad52d81 is missing a GPG or SSH signature and prevents merging.

kirson-git and others added 2 commits June 24, 2026 21:30
…lure)

`_POD="$(kubectl get pods ... -o name 2>/dev/null | head -1)"` runs under
`set -euo pipefail`. `head -1` closes the pipe early, kubectl gets SIGPIPE and
exits 141; pipefail propagates it, so the [7h] site-agent phase aborts with
'SETUP FAILED ... Code: 141' even though site-agent deployed successfully
(StatefulSet ready, Site CR HandshakeComplete). Add '|| true' so the pipe
result is tolerated.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Signed-off-by: kirson-git <ekirson@nvidia.com>
@osu osu force-pushed the fix/setup-sigpipe-head branch from d781f9c to b962d93 Compare June 25, 2026 04:30
@osu

osu commented Jun 25, 2026

Copy link
Copy Markdown
Member

/ok to test 1ee1fce

@github-actions

Copy link
Copy Markdown

@osu osu merged commit 852c517 into NVIDIA:main Jun 25, 2026
56 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants