|
| 1 | +# Example: Automated Test Failure Analysis |
| 2 | + |
| 3 | +This document shows an example of what the automated failure analysis would look like when the weekly tests fail. |
| 4 | + |
| 5 | +## Scenario |
| 6 | + |
| 7 | +The CI pipeline runs on Monday morning at 5:01 AM UTC (as scheduled by cron: '1 5 * * 1'). One or more tests fail. |
| 8 | + |
| 9 | +## Automated Response |
| 10 | + |
| 11 | +### 1. Failure Detection |
| 12 | + |
| 13 | +The `auto_fix_failures.yml` workflow is triggered via `workflow_run` event when: |
| 14 | +- The "CI pipeline for pySDC" workflow completes |
| 15 | +- The conclusion is "failure" |
| 16 | +- The trigger event was "schedule" (Monday morning run) |
| 17 | + |
| 18 | +### 2. Analysis Process |
| 19 | + |
| 20 | +The workflow: |
| 21 | +1. Checks out the repository |
| 22 | +2. Installs Python and required dependencies (requests, PyGithub) |
| 23 | +3. Runs `analyze_failures.py` which: |
| 24 | + - Fetches all jobs from the failed workflow run via GitHub API |
| 25 | + - Downloads logs for each failed job |
| 26 | + - Extracts error messages, tracebacks, and failure patterns |
| 27 | + - Generates a detailed markdown report |
| 28 | + - Saves both markdown and JSON versions |
| 29 | + |
| 30 | +### 3. Branch and PR Creation |
| 31 | + |
| 32 | +The workflow: |
| 33 | +1. Creates a new branch named `auto-fix/test-failure-YYYYMMDD-HHMMSS` |
| 34 | +2. Commits the failure analysis files |
| 35 | +3. Runs `create_failure_pr.py` which: |
| 36 | + - Creates a pull request from the new branch to master |
| 37 | + - Includes a comprehensive description with links and instructions |
| 38 | + - Adds labels: `automated`, `test-failure`, `needs-investigation` |
| 39 | + |
| 40 | +## Example Output |
| 41 | + |
| 42 | +### Example PR Title |
| 43 | +``` |
| 44 | +🔴 Auto-fix: Weekly test failures (12345678) |
| 45 | +``` |
| 46 | + |
| 47 | +### Example PR Description |
| 48 | +```markdown |
| 49 | +## 🔴 Automated Test Failure Report |
| 50 | + |
| 51 | +This PR was automatically created in response to test failures in the weekly CI run. |
| 52 | + |
| 53 | +### Summary |
| 54 | +- **Workflow Run:** https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678 |
| 55 | +- **Failed Jobs:** 3 out of 25 |
| 56 | +- **Trigger:** Weekly scheduled run (Monday morning) |
| 57 | + |
| 58 | +### What This PR Contains |
| 59 | + |
| 60 | +This PR includes an automated analysis of the test failures. The detailed report can be found in the committed `failure_analysis.md` file. |
| 61 | + |
| 62 | +### Next Steps |
| 63 | + |
| 64 | +1. **Review the Analysis:** Check the `failure_analysis.md` file for detailed error information |
| 65 | +2. **Investigate Root Cause:** Review the workflow logs and error messages |
| 66 | +3. **Apply Fixes:** If you identify the issue, commit fixes to this branch |
| 67 | +4. **Test Locally:** Reproduce and verify the fix before merging |
| 68 | +5. **Update CI:** Ensure the fix resolves the weekly test failures |
| 69 | + |
| 70 | +### How to Fix Issues |
| 71 | + |
| 72 | +You can push commits directly to this branch: |
| 73 | + |
| 74 | +```bash |
| 75 | +git fetch origin |
| 76 | +git checkout auto-fix/test-failure-20240101-050500 |
| 77 | +# Make your changes |
| 78 | +git add . |
| 79 | +git commit -m "Fix: describe your fix" |
| 80 | +git push origin auto-fix/test-failure-20240101-050500 |
| 81 | +``` |
| 82 | + |
| 83 | +### Alternative Actions |
| 84 | + |
| 85 | +- If this is a **transient failure**, you can close this PR |
| 86 | +- If this requires **more investigation**, convert this PR to an issue |
| 87 | +- If this is a **known issue**, link it to existing issues/PRs |
| 88 | + |
| 89 | +--- |
| 90 | + |
| 91 | +**Note:** This is an automated PR. Please review carefully before merging. |
| 92 | +``` |
| 93 | + |
| 94 | +### Example `failure_analysis.md` Content |
| 95 | + |
| 96 | +```markdown |
| 97 | +# Automated Test Failure Analysis |
| 98 | + |
| 99 | +**Generated:** 2024-01-01T05:15:00Z |
| 100 | +**Workflow Run:** https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678 |
| 101 | + |
| 102 | +## Summary |
| 103 | + |
| 104 | +- Total Jobs: 25 |
| 105 | +- Failed Jobs: 3 |
| 106 | + |
| 107 | +## Failed Jobs |
| 108 | + |
| 109 | +### 1. user_cpu_tests_linux (base, 3.10) |
| 110 | + |
| 111 | +- **Job ID:** 23456789 |
| 112 | +- **Started:** 2024-01-01T05:02:00Z |
| 113 | +- **Completed:** 2024-01-01T05:10:00Z |
| 114 | +- **Logs:** [View Job Logs](https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678/jobs/23456789) |
| 115 | + |
| 116 | +#### Error Details |
| 117 | + |
| 118 | +**Error 1:** |
| 119 | +``` |
| 120 | +FAILED pySDC/tests/test_something.py::test_feature - AssertionError: assert 5 == 6 |
| 121 | +E assert 5 == 6 |
| 122 | +``` |
| 123 | +
|
| 124 | +**Error 2:** |
| 125 | +``` |
| 126 | +Traceback (most recent call last): |
| 127 | + File "pySDC/core/something.py", line 142, in method |
| 128 | + result = self.compute() |
| 129 | + File "pySDC/core/something.py", line 200, in compute |
| 130 | + value = dependency.get_value() |
| 131 | +AttributeError: 'NoneType' object has no attribute 'get_value' |
| 132 | +``` |
| 133 | +
|
| 134 | +### 2. user_cpu_tests_linux (pytorch, 3.13) |
| 135 | +
|
| 136 | +- **Job ID:** 23456790 |
| 137 | +- **Started:** 2024-01-01T05:02:00Z |
| 138 | +- **Completed:** 2024-01-01T05:12:00Z |
| 139 | +- **Logs:** [View Job Logs](https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678/jobs/23456790) |
| 140 | +
|
| 141 | +#### Error Details |
| 142 | +
|
| 143 | +**Error 1:** |
| 144 | +``` |
| 145 | +ModuleNotFoundError: No module named 'torch' |
| 146 | +ERROR: Could not import pytorch dependencies |
| 147 | +``` |
| 148 | +
|
| 149 | +### 3. project_cpu_tests_linux (RDC) |
| 150 | +
|
| 151 | +- **Job ID:** 23456791 |
| 152 | +- **Started:** 2024-01-01T05:05:00Z |
| 153 | +- **Completed:** 2024-01-01T05:14:00Z |
| 154 | +- **Logs:** [View Job Logs](https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678/jobs/23456791) |
| 155 | +
|
| 156 | +#### Error Details |
| 157 | +
|
| 158 | +**Error 1:** |
| 159 | +``` |
| 160 | +ImportError: cannot import name 'RDC_Controller' from 'pySDC.implementations.controllers' |
| 161 | +``` |
| 162 | +
|
| 163 | +## Recommended Actions |
| 164 | +
|
| 165 | +1. Review the error messages above |
| 166 | +2. Check if this is a known issue in recent commits |
| 167 | +3. Review the full logs linked above for complete context |
| 168 | +4. Consider if this is related to: |
| 169 | + - Dependency updates (check recent dependency changes) |
| 170 | + - Environment configuration issues |
| 171 | + - Test infrastructure problems |
| 172 | + - Flaky tests that need to be fixed |
| 173 | +5. If needed, manually investigate and apply fixes to this PR |
| 174 | +
|
| 175 | +## How to Use This PR |
| 176 | +
|
| 177 | +This PR was automatically created to help investigate test failures. You can: |
| 178 | +
|
| 179 | +- Use this PR to track the investigation |
| 180 | +- Add commits with fixes directly to this branch |
| 181 | +- Close this PR if the issue is resolved elsewhere |
| 182 | +- Convert this to an issue if it needs more discussion |
| 183 | +``` |
| 184 | + |
| 185 | +## Benefits |
| 186 | + |
| 187 | +1. **Immediate Notification**: Team is notified via PR instead of just email |
| 188 | +2. **Centralized Tracking**: All failure information in one place |
| 189 | +3. **Actionable**: PR branch can be used to apply fixes directly |
| 190 | +4. **Historical Record**: PRs remain in history for future reference |
| 191 | +5. **Reduced Manual Work**: No need to manually dig through CI logs |
| 192 | +6. **Easy Collaboration**: Team members can comment and contribute |
| 193 | + |
| 194 | +## Workflow Permissions |
| 195 | + |
| 196 | +The workflow uses these permissions: |
| 197 | +- `contents: write` - To create branches and commit files |
| 198 | +- `pull-requests: write` - To create and label PRs |
| 199 | +- `issues: write` - To add labels |
| 200 | +- `actions: read` - To read workflow run and job information |
| 201 | + |
| 202 | +All of these use the default `GITHUB_TOKEN`, no additional secrets needed. |
0 commit comments