Skip to content

ci(feat): refactor error handling#25610

Open
joshmarinacci wants to merge 20 commits into
mainfrom
failure-refactor
Open

ci(feat): refactor error handling#25610
joshmarinacci wants to merge 20 commits into
mainfrom
failure-refactor

Conversation

@joshmarinacci
Copy link
Copy Markdown
Contributor

@joshmarinacci joshmarinacci commented May 22, 2026

Description:

Create a common set-failure-mode workflow that can be used to set the failure mode variable of a single task, or aggregate a bunch of child tasks.

Related issue(s):

Partially addresses #25509 and #25513 and #25470

Notes for reviewer:

Checklist

  • Documented (Code comments, README, etc.)
  • Tested (unit, integration, etc.)

Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
@joshmarinacci joshmarinacci requested a review from a team as a code owner May 22, 2026 21:45
@joshmarinacci joshmarinacci requested a review from nathanklick May 22, 2026 21:45
@joshmarinacci joshmarinacci self-assigned this May 22, 2026
@trunk-io
Copy link
Copy Markdown

trunk-io Bot commented May 22, 2026

Merging to main in this repository is managed by Trunk.

  • To merge this pull request, check the box to the left or comment /trunk merge below.

After your PR is submitted to the merge queue, this comment will be automatically updated with its status. If the PR fails, failure details will also be posted here

@lfdt-bot
Copy link
Copy Markdown

lfdt-bot commented May 22, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
@codecov

This comment has been minimized.

Signed-off-by: Josh Marinacci <joshua@marinacci.org>
@andrewb1269 andrewb1269 requested review from andrewb1269 and Copilot and removed request for nathanklick May 28, 2026 14:46
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR centralizes CI failure-mode calculation into a shared workflow support script and refactors several zxc-* workflows to use it, adding failed-tests aggregation where applicable.

Changes:

  • Adds .github/workflows/support/scripts/set-failure-mode.sh for step-outcome and job-aggregate failure classification.
  • Refactors multiple CI workflows to pass step/job status arrays into the shared script.
  • Updates MATS downstream categorization to treat the new cancelled failure mode as environment-related.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
.github/workflows/support/scripts/set-failure-mode.sh Adds shared failure-mode and failed-tests output logic.
.github/workflows/zxc-compile-and-spotless-check.yaml Replaces inline failure-mode logic with script inputs.
.github/workflows/zxc-dependency-module-check.yaml Replaces inline failure-mode logic with script inputs.
.github/workflows/zxc-execute-unit-tests.yaml Refactors unit-test failure-mode output to use shared script.
.github/workflows/zxc-execute-hapi-tests.yaml Refactors HAPI aggregate failure-mode logic and adds failed-tests job output.
.github/workflows/zxc-execute-otter-tests.yaml Refactors Otter aggregate failure-mode logic and adds failed-tests job output.
.github/workflows/zxc-mats-tests.yaml Refactors MATS aggregate failure-mode/failed-tests logic.
.github/workflows/zxc-verify-gradle-build-determinism.yaml Refactors Gradle determinism aggregate failure-mode logic.
.github/workflows/zxc-verify-docker-build-determinism.yaml Refactors Docker determinism aggregate failure-mode logic.
.github/workflows/zxc-xts-tests.yaml Refactors XTS aggregate failure-mode/failed-tests logic, including panel jobs.
.github/workflows/node-flow-build-application.yaml Maps MATS cancelled failure mode to environment category.

Comment thread .github/workflows/zxc-xts-tests.yaml
Comment thread .github/workflows/zxc-execute-unit-tests.yaml Outdated
Comment thread .github/workflows/support/scripts/set-failure-mode.sh
Comment thread .github/workflows/zxc-verify-docker-build-determinism.yaml Outdated
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
@andrewb1269
Copy link
Copy Markdown
Contributor

We are trying to provide a reason for why XTS workflows are cancelled to our Slack channels. I suggest we provide it as a separate variable, rather than trying to include it into the failure_mode variable. If we provide a separate variable, we can populate that variable only when there is a case to check.

Comment on lines +39 to +44
if has_failure "${test_arr[@]}"; then
failure_mode="test"
fi
if has_failure "${workflow_arr[@]}"; then
failure_mode="workflow"
fi
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is a test failure and a workflow failure, then the test failure is overwritten by the workflow failure. Suggest changing to this to elif so if we have a test failure we do not worry about the workflow failure(s).

Comment thread .github/workflows/support/scripts/set-failure-mode.sh Outdated
Comment on lines +63 to +64
workflow) failure_mode="workflow" ;;
test) [[ "$failure_mode" != "workflow" ]] && failure_mode="test" ;;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will overwrite the failure_mode to workflow and hide potential test failure types. We want to prioritize test failures over workflow failures.

Comment thread .github/workflows/support/scripts/set-failure-mode.sh
esac
failed_tests="${failed_tests:+${failed_tests}, }${name}"
elif [[ "$status" == "cancelled" ]]; then
[[ "$failure_mode" == "none" ]] && failure_mode="cancelled"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failure_mode can be workflow, test, or none. It should stay as none since there was not a failure, it was only a cancelled workflow.

failed_tests="${failed_tests:+${failed_tests}, }${name}"
elif [[ "$status" == "cancelled" ]]; then
[[ "$failure_mode" == "none" ]] && failure_mode="cancelled"
failed_tests="${failed_tests:+${failed_tests}, }${name} (cancelled)"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know if it was a failed test or not, so we should not add it to the failed_tests list.

Comment thread .github/workflows/support/scripts/set-failure-mode.sh
Comment thread .github/workflows/zxc-xts-tests.yaml
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
Signed-off-by: Josh Marinacci <joshua@marinacci.org>
@andrewb1269
Copy link
Copy Markdown
Contributor

The remaining comments are regarding:

  1. Cancelled tests being processed vs. not being processed. I'd propose we do NOT use cancelled tests in this PR and can discuss for a follow-on PR.
  2. The priority of test failures vs. workflow failures. I think the order should be: test > workflow > none, then we will add cancelled in a future PR giving us test > workflow > cancelled > none

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants