Skip to content

Commit a99f333

Browse files
committed
Handle nvidia-smi mismatch errors for archdetect.
1 parent a9faaa7 commit a99f333

1 file changed

Lines changed: 6 additions & 0 deletions

File tree

init/eessi_archdetect.sh

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -181,6 +181,12 @@ accelpath() {
181181
nvidia_smi_out=$(mktemp -p /tmp nvidia_smi_out.XXXXX)
182182
nvidia-smi --query-gpu=gpu_name,count,driver_version,compute_cap --format=csv,noheader 2>&1 > $nvidia_smi_out
183183
if [[ $? -eq 0 ]]; then
184+
if grep -q "Failed to initialize NVML: Driver/library version mismatch" $nvidia_smi_out; then
185+
log "ERROR" "accelpath: nvidia-smi command failed with 'Failed to initialize NVML: Driver/library version mismatch'"
186+
rm -f $nvidia_smi_out
187+
exit 4
188+
fi
189+
184190
nvidia_smi_info=$(head -1 $nvidia_smi_out)
185191
cuda_cc=$(echo $nvidia_smi_info | sed 's/, /,/g' | cut -f4 -d, | sed 's/\.//g')
186192
log "DEBUG" "accelpath: CUDA compute capability '${cuda_cc}' derived from nvidia-smi output '${nvidia_smi_info}'"

0 commit comments

Comments
 (0)