Skip to content

Commit 5a42430

Browse files
[rocm-libraries] ROCm/rocm-libraries#5713 (commit e179279)
Adding New Notification Detection ## Motivation Restricting one of the notification failure patterns to match a specific missing drivers log pattern. This will help reduce the noise of erroneous logs. Also adding a new failure pattern to notify us of Github access issues. ## Technical Details - Set the failure pattern to match the exact failure observed in the logs. - Switching to a plain substring search so special characters are handled literally. - Added a new failure pattern for Github access errors. ## Test Plan - Force a failure using the known failure patterns. ## Test Result The forced failures were triggered and caught by the notification system. ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
1 parent ba2fb02 commit 5a42430

1 file changed

Lines changed: 5 additions & 3 deletions

File tree

script/infra_helper/send_failure_notifications.sh

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -22,12 +22,13 @@ PATTERNS=(
2222
'login attempt to .* failed with status: 401 Unauthorized'
2323
'docker login failed'
2424
'HTTP request sent .* 404 Not Found'
25-
'cat: .* No such file or directory'
25+
'/sys/module/amdgpu/version: No such file or directory'
2626
'GPU not found'
2727
'Could not connect to Redis at .* Connection timed out'
2828
'unauthorized: your account must log in with a Personal Access Token'
2929
'sccache: error: Server startup failed: Address in use'
3030
'No space left on device'
31+
'Could not resolve host: github.com'
3132
)
3233

3334
DESCRIPTIONS=(
@@ -40,10 +41,11 @@ DESCRIPTIONS=(
4041
"Docker login failed"
4142
"Sccache Error"
4243
"Device space error"
44+
"Unable to access Github"
4345
)
4446

4547
# Indices into PATTERNS/DESCRIPTIONS for which a node name lookup is performed.
46-
NODE_PATTERN_INDICES=(3 4 8) # cat: No such file, GPU not found, No space left on device
48+
NODE_PATTERN_INDICES=(3 4 8 9)
4749

4850
# ---------------------------------------------------------------------------
4951
# Fetch and scan the log.
@@ -92,7 +94,7 @@ process_block() {
9294
if [[ "$node_idx" == "$i" ]]; then
9395
node_name=$(wget -q --no-check-certificate -O - "${BUILD_URL}consoleText" | awk '
9496
/NODE_NAME[[:space:]]*=/ { node = $NF }
95-
/'"$pattern"'/ { print node; exit }
97+
index($0, "'"$pattern"'") { print node; exit }
9698
')
9799
break
98100
fi

0 commit comments

Comments
 (0)