This directory contains scripts and workflows for automatically handling test failures in the pySDC CI pipeline.
When the weekly CI tests (scheduled for Monday mornings) fail, the automated workflow will:
- Detect the failure
- Analyze the failed jobs and extract error information
- Create a Pull Request with:
- A detailed failure analysis report
- Links to the failed workflow run and job logs
- Recommended actions for investigation and fixes
- Instructions on how to apply fixes
auto_fix_failures.yml: Main workflow that triggers on CI pipeline failures- Only activates for scheduled (Monday morning) runs that fail
- Analyzes failures and creates a PR automatically
-
analyze_failures.py: Analyzes workflow run failures- Fetches job information from GitHub API
- Extracts error messages and tracebacks from logs
- Generates a detailed markdown report
- Saves analysis as JSON for further processing
-
create_failure_pr.py: Creates a Pull Request for failures- Uses the analysis from
analyze_failures.py - Creates a formatted PR with all relevant information
- Adds appropriate labels for easy identification
- Uses the analysis from
- Detection: The
workflow_runtrigger monitors the "CI pipeline for pySDC" workflow - Filtering: Only runs that failed AND were triggered by schedule (Monday cron job) activate the auto-fix workflow
- Analysis: The workflow checks out the code and runs the analysis script
- Reporting: A new branch is created with the failure analysis
- PR Creation: An automated PR is opened with the analysis and instructions
When you receive an automated failure PR:
- Review the
failure_analysis.mdfile in the PR - Check the linked workflow run and job logs
- Investigate the root cause of the failures
- Apply fixes by pushing commits to the PR branch
- Test your fixes locally or wait for CI to run on the PR
- Merge when the issue is resolved
The workflow requires the following permissions (already configured):
contents: write- To create branches and commit filespull-requests: write- To create PRsissues: write- To add labelsactions: read- To read workflow run information
You can customize the behavior by editing:
-
Trigger conditions in
auto_fix_failures.yml:if: >- ${{ github.event.workflow_run.conclusion == 'failure' && github.event.workflow_run.event == 'schedule' }}
-
Error patterns in
analyze_failures.py:error_patterns = [ 'ERROR:', 'FAILED', # Add more patterns here ]
-
PR labels in
create_failure_pr.py:labels_data = {'labels': ['automated', 'test-failure', 'needs-investigation']}
An automated PR will include:
- Title:
🔴 Auto-fix: Weekly test failures (run_id) - Body: Summary of failures, workflow run link, next steps
- Files:
failure_analysis.mdwith detailed error information - Labels:
automated,test-failure,needs-investigation
- Check that the CI pipeline workflow is named exactly "CI pipeline for pySDC"
- Verify that the run was triggered by schedule (not push/PR)
- Ensure the workflow run actually failed
- Check the workflow logs for the "Analyze test failures" step
- Verify that
GITHUB_TOKENhas sufficient permissions - Check if the API rate limit was exceeded
- Ensure
GITHUB_TOKENhas write permissions - Check if a PR already exists for the branch
- Verify that there are actual changes to commit
The scripts require:
- Python 3.7+
requestslibraryPyGithublibrary (for potential future enhancements)
These are installed automatically in the workflow.
Potential improvements:
- Automatic fix suggestions using AI/LLM
- Pattern recognition for common failures
- Integration with issue tracking
- Notification to relevant maintainers
- Automatic retry of flaky tests
- Historical failure analysis and trends