Skip to content

Commit 896ddc1

Browse files
committed
2 parents 20957fd + 558d47b commit 896ddc1

215 files changed

Lines changed: 16891 additions & 543 deletions

File tree

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/scripts/EXAMPLE.md

Lines changed: 202 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,202 @@
1+
# Example: Automated Test Failure Analysis
2+
3+
This document shows an example of what the automated failure analysis would look like when the weekly tests fail.
4+
5+
## Scenario
6+
7+
The CI pipeline runs on Monday morning at 5:01 AM UTC (as scheduled by cron: '1 5 * * 1'). One or more tests fail.
8+
9+
## Automated Response
10+
11+
### 1. Failure Detection
12+
13+
The `auto_fix_failures.yml` workflow is triggered via `workflow_run` event when:
14+
- The "CI pipeline for pySDC" workflow completes
15+
- The conclusion is "failure"
16+
- The trigger event was "schedule" (Monday morning run)
17+
18+
### 2. Analysis Process
19+
20+
The workflow:
21+
1. Checks out the repository
22+
2. Installs Python and required dependencies (requests, PyGithub)
23+
3. Runs `analyze_failures.py` which:
24+
- Fetches all jobs from the failed workflow run via GitHub API
25+
- Downloads logs for each failed job
26+
- Extracts error messages, tracebacks, and failure patterns
27+
- Generates a detailed markdown report
28+
- Saves both markdown and JSON versions
29+
30+
### 3. Branch and PR Creation
31+
32+
The workflow:
33+
1. Creates a new branch named `auto-fix/test-failure-YYYYMMDD-HHMMSS`
34+
2. Commits the failure analysis files
35+
3. Runs `create_failure_pr.py` which:
36+
- Creates a pull request from the new branch to master
37+
- Includes a comprehensive description with links and instructions
38+
- Adds labels: `automated`, `test-failure`, `needs-investigation`
39+
40+
## Example Output
41+
42+
### Example PR Title
43+
```
44+
🔴 Auto-fix: Weekly test failures (12345678)
45+
```
46+
47+
### Example PR Description
48+
```markdown
49+
## 🔴 Automated Test Failure Report
50+
51+
This PR was automatically created in response to test failures in the weekly CI run.
52+
53+
### Summary
54+
- **Workflow Run:** https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678
55+
- **Failed Jobs:** 3 out of 25
56+
- **Trigger:** Weekly scheduled run (Monday morning)
57+
58+
### What This PR Contains
59+
60+
This PR includes an automated analysis of the test failures. The detailed report can be found in the committed `failure_analysis.md` file.
61+
62+
### Next Steps
63+
64+
1. **Review the Analysis:** Check the `failure_analysis.md` file for detailed error information
65+
2. **Investigate Root Cause:** Review the workflow logs and error messages
66+
3. **Apply Fixes:** If you identify the issue, commit fixes to this branch
67+
4. **Test Locally:** Reproduce and verify the fix before merging
68+
5. **Update CI:** Ensure the fix resolves the weekly test failures
69+
70+
### How to Fix Issues
71+
72+
You can push commits directly to this branch:
73+
74+
```bash
75+
git fetch origin
76+
git checkout auto-fix/test-failure-20240101-050500
77+
# Make your changes
78+
git add .
79+
git commit -m "Fix: describe your fix"
80+
git push origin auto-fix/test-failure-20240101-050500
81+
```
82+
83+
### Alternative Actions
84+
85+
- If this is a **transient failure**, you can close this PR
86+
- If this requires **more investigation**, convert this PR to an issue
87+
- If this is a **known issue**, link it to existing issues/PRs
88+
89+
---
90+
91+
**Note:** This is an automated PR. Please review carefully before merging.
92+
```
93+
94+
### Example `failure_analysis.md` Content
95+
96+
```markdown
97+
# Automated Test Failure Analysis
98+
99+
**Generated:** 2024-01-01T05:15:00Z
100+
**Workflow Run:** https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678
101+
102+
## Summary
103+
104+
- Total Jobs: 25
105+
- Failed Jobs: 3
106+
107+
## Failed Jobs
108+
109+
### 1. user_cpu_tests_linux (base, 3.10)
110+
111+
- **Job ID:** 23456789
112+
- **Started:** 2024-01-01T05:02:00Z
113+
- **Completed:** 2024-01-01T05:10:00Z
114+
- **Logs:** [View Job Logs](https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678/jobs/23456789)
115+
116+
#### Error Details
117+
118+
**Error 1:**
119+
```
120+
FAILED pySDC/tests/test_something.py::test_feature - AssertionError: assert 5 == 6
121+
E assert 5 == 6
122+
```
123+
124+
**Error 2:**
125+
```
126+
Traceback (most recent call last):
127+
File "pySDC/core/something.py", line 142, in method
128+
result = self.compute()
129+
File "pySDC/core/something.py", line 200, in compute
130+
value = dependency.get_value()
131+
AttributeError: 'NoneType' object has no attribute 'get_value'
132+
```
133+
134+
### 2. user_cpu_tests_linux (pytorch, 3.13)
135+
136+
- **Job ID:** 23456790
137+
- **Started:** 2024-01-01T05:02:00Z
138+
- **Completed:** 2024-01-01T05:12:00Z
139+
- **Logs:** [View Job Logs](https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678/jobs/23456790)
140+
141+
#### Error Details
142+
143+
**Error 1:**
144+
```
145+
ModuleNotFoundError: No module named 'torch'
146+
ERROR: Could not import pytorch dependencies
147+
```
148+
149+
### 3. project_cpu_tests_linux (RDC)
150+
151+
- **Job ID:** 23456791
152+
- **Started:** 2024-01-01T05:05:00Z
153+
- **Completed:** 2024-01-01T05:14:00Z
154+
- **Logs:** [View Job Logs](https://github.com/Parallel-in-Time/pySDC/actions/runs/12345678/jobs/23456791)
155+
156+
#### Error Details
157+
158+
**Error 1:**
159+
```
160+
ImportError: cannot import name 'RDC_Controller' from 'pySDC.implementations.controllers'
161+
```
162+
163+
## Recommended Actions
164+
165+
1. Review the error messages above
166+
2. Check if this is a known issue in recent commits
167+
3. Review the full logs linked above for complete context
168+
4. Consider if this is related to:
169+
- Dependency updates (check recent dependency changes)
170+
- Environment configuration issues
171+
- Test infrastructure problems
172+
- Flaky tests that need to be fixed
173+
5. If needed, manually investigate and apply fixes to this PR
174+
175+
## How to Use This PR
176+
177+
This PR was automatically created to help investigate test failures. You can:
178+
179+
- Use this PR to track the investigation
180+
- Add commits with fixes directly to this branch
181+
- Close this PR if the issue is resolved elsewhere
182+
- Convert this to an issue if it needs more discussion
183+
```
184+
185+
## Benefits
186+
187+
1. **Immediate Notification**: Team is notified via PR instead of just email
188+
2. **Centralized Tracking**: All failure information in one place
189+
3. **Actionable**: PR branch can be used to apply fixes directly
190+
4. **Historical Record**: PRs remain in history for future reference
191+
5. **Reduced Manual Work**: No need to manually dig through CI logs
192+
6. **Easy Collaboration**: Team members can comment and contribute
193+
194+
## Workflow Permissions
195+
196+
The workflow uses these permissions:
197+
- `contents: write` - To create branches and commit files
198+
- `pull-requests: write` - To create and label PRs
199+
- `issues: write` - To add labels
200+
- `actions: read` - To read workflow run and job information
201+
202+
All of these use the default `GITHUB_TOKEN`, no additional secrets needed.

0 commit comments

Comments
 (0)