Skip to content

Commit 74e11ab

Browse files
sbryngelsonclaude
andcommitted
Fix review issues: PREEMPTED state, wait race, concurrency group, retry timeout
- Add PREEMPTED and REVOKED to monitor_slurm_job.sh terminal states so preempted jobs don't hang the monitor loop indefinitely - Wait for both build PIDs unconditionally to prevent orphaned processes racing with on_retry_command clean - Drop event_name from concurrency group so PR and review events for the same branch properly cancel each other - Reduce retry timeout to 150min so retries have room within the 480min job timeout Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 9ef978f commit 74e11ab

2 files changed

Lines changed: 6 additions & 4 deletions

File tree

.github/scripts/monitor_slurm_job.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ get_job_state() {
5858
# Check if a state is terminal (job is done, for better or worse)
5959
is_terminal_state() {
6060
case "$1" in
61-
COMPLETED|FAILED|CANCELLED|CANCELLED+|TIMEOUT|OUT_OF_MEMORY|NODE_FAIL|BOOT_FAIL|DEADLINE)
61+
COMPLETED|FAILED|CANCELLED|CANCELLED+|TIMEOUT|OUT_OF_MEMORY|NODE_FAIL|BOOT_FAIL|DEADLINE|PREEMPTED|REVOKED)
6262
return 0 ;;
6363
*)
6464
return 1 ;;

.github/workflows/bench.yml

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ on:
77
workflow_dispatch:
88

99
concurrency:
10-
group: ${{ github.workflow }}-${{ github.ref }}-${{ github.event_name }}
10+
group: ${{ github.workflow }}-${{ github.ref }}
1111
cancel-in-progress: true
1212

1313
jobs:
@@ -105,13 +105,15 @@ jobs:
105105
with:
106106
max_attempts: 3
107107
retry_wait_seconds: 60
108-
timeout_minutes: 480
108+
timeout_minutes: 150
109109
command: |
110110
(cd pr && ${{ matrix.build_script }}) &
111111
pid1=$!
112112
(cd master && ${{ matrix.build_script }}) &
113113
pid2=$!
114-
wait $pid1 && wait $pid2
114+
wait $pid1; e1=$?
115+
wait $pid2; e2=$?
116+
[ $e1 -eq 0 ] && [ $e2 -eq 0 ]
115117
on_retry_command: |
116118
(cd pr && ./mfc.sh clean) &
117119
(cd master && ./mfc.sh clean) &

0 commit comments

Comments
 (0)