Skip to content

Commit 8022969

Browse files
sbryngelsonclaude
andcommitted
Guard squeue/sacct pipelines against set -euo pipefail
With pipefail, a transient squeue failure would exit the script instead of falling through to return UNKNOWN. Add || true to both pipelines. Also fix stale comment about tail stopping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent a82959e commit 8022969

1 file changed

Lines changed: 3 additions & 3 deletions

File tree

.github/scripts/monitor_slurm_job.sh

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,15 +37,15 @@ get_job_state() {
3737
local state
3838

3939
# Try squeue first (fast, works for active jobs)
40-
state=$(squeue -j "$jid" -h -o '%T' 2>/dev/null | head -n1 | tr -d ' ')
40+
state=$(squeue -j "$jid" -h -o '%T' 2>/dev/null | head -n1 | tr -d ' ' || true)
4141
if [ -n "$state" ]; then
4242
echo "$state"
4343
return
4444
fi
4545

4646
# Fallback to sacct (works for completed/historical jobs)
4747
if command -v sacct >/dev/null 2>&1; then
48-
state=$(sacct -j "$jid" -n -X -P -o State 2>/dev/null | head -n1 | cut -d'|' -f1)
48+
state=$(sacct -j "$jid" -n -X -P -o State 2>/dev/null | head -n1 | cut -d'|' -f1 || true)
4949
if [ -n "$state" ]; then
5050
echo "$state"
5151
return
@@ -164,7 +164,7 @@ exec 3<&-
164164
kill "${tail_pid}" 2>/dev/null || true
165165
tail_pid=""
166166

167-
# Wait for output file to finish growing (stabilize) before stopping tail
167+
# Wait for output file to stabilize (NFS flush) before final read
168168
if [ -f "$output_file" ]; then
169169
last_size=-1
170170
same_count=0

0 commit comments

Comments
 (0)