Skip to content

Commit bed7306

Browse files
StackRox Automationclaude
andcommitted
Remove test-specific logic and skip Slack on PR label tests
Removed: - TESTING_ONCALL.md (separate test documentation) - is-test and test-comment inputs from workflow - All [TEST MODE] and test-specific messages - Color coding (warning vs failure) based on test mode - is_test outputs and conditionals Changed: - Notify job skips when triggered by PR label (github.event_name != 'pull_request') - PR label trigger uses descriptive workflow name: "Test Workflow (Label Trigger)" - Updated README with simplified testing instructions Testing with PR label now: - Runs Claude analysis only - Uploads report artifact - Skips Slack notification (no spam) - Verify by downloading artifact from workflow run Production behavior unchanged: - Real test failures trigger full workflow - Posts to Slack with @acs-collector-oncall - Uses 'failure' color consistently Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
1 parent ed6f26e commit bed7306

3 files changed

Lines changed: 77 additions & 144 deletions

File tree

.github/scripts/README.md

Lines changed: 68 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -45,59 +45,30 @@ Uses `claude-code-base-action` to execute the `/analyze-test-failures` skill:
4545
- Creates `analysis-report.md` with actionable insights
4646

4747
**Claude has access to:**
48+
- `Skill` - Load and execute the analysis skill
4849
- `Read` - View source files
4950
- `Grep` - Search codebase
5051
- `Glob` - Find files
5152
- `Bash` - Execute git commands, create reports
52-
- `Skill` - Load and execute the analysis skill
5353

5454
### 4. Notify
55-
Posts to Slack with:
55+
Posts to Slack (#team-acs-collector-oncall) with:
5656
- AI-generated root cause analysis
5757
- Evidence from code and logs
5858
- Platform-specific patterns detected
5959
- Actionable recommendations with file:line references
6060

6161
Falls back to simple notification if analysis fails.
6262

63-
## Required Secrets
64-
65-
### Already Configured ✅
66-
- `SLACK_COLLECTOR_ONCALL_WEBHOOK`
67-
- `GCP_CLAUDE_SERVICE_ACCOUNT_KEY`
68-
- `GCP_CLAUDE_PROJECT_ID`
69-
7063
## Files
7164

7265
### Workflows
7366
- `.github/workflows/integration-tests.yml` - Main integration test workflow
7467
- `.github/workflows/analyze-and-notify.yml` - Reusable analysis workflow
75-
- `.github/workflows/test-oncall-analysis.yml` - Test workflow with synthetic failures
7668

7769
### Skill
7870
- `.claude/commands/analyze-test-failures.md` - Claude skill defining analysis logic
7971

80-
### Documentation
81-
- `.github/workflows/TESTING_ONCALL.md` - How to test the workflow
82-
83-
## Testing
84-
85-
### Manual Test Run
86-
87-
1. Go to Actions → "Test On-Call Analysis Workflow"
88-
2. Click "Run workflow"
89-
3. Select branch: `add-test-analysis-job`
90-
4. Check #team-acs-collector-oncall for [TEST] Slack message
91-
92-
See `.github/workflows/TESTING_ONCALL.md` for details.
93-
94-
### Local Skill Development
95-
96-
```bash
97-
# Test the skill locally (requires Claude CLI)
98-
claude /analyze-test-failures test-artifacts/ "Integration Tests" "rhcos-arm64,cos"
99-
```
100-
10172
## Example Output
10273

10374
**Slack message with AI analysis:**
@@ -143,35 +114,67 @@ Integration tests failed.
143114
- Links recent git changes to failures
144115
- Provides concrete next steps
145116

117+
## Testing
118+
119+
### Test on a PR
120+
121+
Add the label `test-oncall-workflow` to any PR to trigger the workflow.
122+
123+
**What happens:**
124+
- Workflow runs with empty test artifacts
125+
- Claude analyzes and generates a report
126+
- Report is uploaded as artifact
127+
- **Slack notification is skipped** (only runs on actual test failures)
128+
129+
**Use case:** Verify Claude analysis executes without spamming Slack.
130+
131+
**To verify it worked:**
132+
1. Check the workflow run in Actions tab
133+
2. Download the `failure-analysis` artifact to see the generated report
134+
135+
### Test with Real Failures
136+
137+
The best test is observing the workflow on actual test failures:
138+
1. Wait for integration tests to fail naturally
139+
2. Check #team-acs-collector-oncall for the AI analysis
140+
3. Verify the analysis is helpful and actionable
141+
146142
## Configuration
147143

148144
### Vertex AI Region
149145
Set in `.github/workflows/analyze-and-notify.yml`:
150146
```yaml
151147
env:
152-
CLOUD_ML_REGION: us-east5 # Or your preferred region
148+
CLOUD_ML_REGION: us-east5
153149
```
154150
151+
### Required Secrets
152+
153+
Already configured:
154+
- `GCP_CLAUDE_SERVICE_ACCOUNT_KEY` - Service account JSON for Vertex AI
155+
- `GCP_CLAUDE_PROJECT_ID` - GCP project ID
156+
- `SLACK_COLLECTOR_ONCALL_WEBHOOK` - Slack webhook URL
157+
155158
### Allowed Tools
159+
156160
Claude has access to these tools for investigation:
157161
```yaml
158162
allowed_tools: "Skill,Read,Grep,Glob,Bash"
159163
```
160164

161165
### Reusable Workflow Inputs
166+
162167
The `analyze-and-notify.yml` workflow accepts:
163168
- `failed-jobs` - Comma-separated list of failed job names
164169
- `workflow-name` - Name of the workflow that failed
165-
- `is-test` - Whether this is a test run (adds [TEST MODE] prefix)
166-
- `test-comment` - Optional comment for test runs
167170

168171
## Troubleshooting
169172

170173
### No Analysis Report Generated
171174

172175
**Check:**
173176
1. Claude action step logs - did it execute successfully?
174-
2. "Check if analysis report was created" step - file exists?
177+
2. "Check if analysis report was created" step - does file exist?
175178
3. Skill file exists at `.claude/commands/analyze-test-failures.md`
176179
4. `Skill` tool is in `allowed_tools`
177180

@@ -189,13 +192,41 @@ Check Claude action logs for specific error details.
189192

190193
**Check:**
191194
1. `SLACK_COLLECTOR_ONCALL_WEBHOOK` secret is set
192-
2. Notify job logs show the download step succeeded
195+
2. Notify job logs show download step succeeded
193196
3. Webhook URL is valid
194197

198+
### Analysis Quality Issues
199+
200+
**If Claude's analysis is not helpful:**
201+
1. Check that test artifacts are being uploaded correctly
202+
2. Verify JUnit XML format is valid
203+
3. Update skill instructions in `.claude/commands/analyze-test-failures.md`
204+
4. The skill can be iterated on independently of the workflow
205+
206+
## Local Development
207+
208+
### Test the Skill Locally
209+
210+
```bash
211+
# Requires Claude CLI installed
212+
claude /analyze-test-failures test-artifacts/ "Integration Tests" "rhcos-arm64,cos"
213+
```
214+
215+
### Update the Skill
216+
217+
Edit `.claude/commands/analyze-test-failures.md` to:
218+
- Change analysis instructions
219+
- Update report format
220+
- Add new investigation steps
221+
- Modify recommendations structure
222+
223+
Changes take effect on the next workflow run - no workflow YAML changes needed.
224+
195225
## Future Enhancements
196226

197227
- [ ] Correlate failures with specific PR/commit
198-
- [ ] Track failure patterns over time
228+
- [ ] Track failure patterns over time
199229
- [ ] Link to similar historical failures
200230
- [ ] Auto-create issues for recurring failures
201231
- [ ] Support for other test frameworks beyond JUnit XML
232+
- [ ] Integration with test retries/flakiness detection

.github/workflows/TESTING_ONCALL.md

Lines changed: 0 additions & 81 deletions
This file was deleted.

.github/workflows/analyze-and-notify.yml

Lines changed: 9 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -11,16 +11,6 @@ on:
1111
description: 'Name of the workflow that failed'
1212
required: true
1313
type: string
14-
is-test:
15-
description: 'Whether this is a test run'
16-
required: false
17-
type: boolean
18-
default: false
19-
test-comment:
20-
description: 'Optional comment for test runs'
21-
required: false
22-
type: string
23-
default: ''
2414
pull_request:
2515
types: [labeled]
2616

@@ -35,21 +25,16 @@ jobs:
3525
outputs:
3626
workflow_name: ${{ steps.params.outputs.workflow_name }}
3727
failed_jobs: ${{ steps.params.outputs.failed_jobs }}
38-
is_test: ${{ steps.params.outputs.is_test }}
3928
steps:
4029
- name: Set workflow parameters
4130
id: params
4231
run: |
4332
if [ "${{ github.event_name }}" = "pull_request" ]; then
44-
echo "failed_jobs=rhcos-arm64,cos-logs" >> $GITHUB_OUTPUT
45-
echo "workflow_name=Integration Tests" >> $GITHUB_OUTPUT
46-
echo "is_test=true" >> $GITHUB_OUTPUT
47-
echo "artifact_name=test-failure-artifacts" >> $GITHUB_OUTPUT
33+
echo "failed_jobs=test-label-trigger" >> $GITHUB_OUTPUT
34+
echo "workflow_name=Test Workflow (Label Trigger)" >> $GITHUB_OUTPUT
4835
else
4936
echo "failed_jobs=${{ inputs.failed-jobs }}" >> $GITHUB_OUTPUT
5037
echo "workflow_name=${{ inputs.workflow-name }}" >> $GITHUB_OUTPUT
51-
echo "is_test=${{ inputs.is-test }}" >> $GITHUB_OUTPUT
52-
echo "artifact_name=" >> $GITHUB_OUTPUT
5338
fi
5439
5540
- name: Checkout repository
@@ -80,8 +65,6 @@ jobs:
8065
prompt: |
8166
/analyze-test-failures test-artifacts/ "${{ steps.params.outputs.workflow_name }}" "${{ steps.params.outputs.failed_jobs }}"
8267
83-
${{ steps.params.outputs.is_test == 'true' && 'Add [TEST MODE] prefix to the report title.' || '' }}
84-
8568
- name: Check if analysis report was created
8669
id: check-report
8770
if: always()
@@ -115,7 +98,7 @@ jobs:
11598
notify:
11699
runs-on: ubuntu-24.04
117100
needs: analyze-failures
118-
if: always()
101+
if: always() && github.event_name != 'pull_request'
119102
steps:
120103
- name: Download analysis report
121104
uses: actions/download-artifact@v4
@@ -143,12 +126,12 @@ jobs:
143126
env:
144127
SLACK_WEBHOOK: ${{ secrets.SLACK_COLLECTOR_ONCALL_WEBHOOK }}
145128
SLACK_CHANNEL: team-acs-collector-oncall
146-
SLACK_COLOR: ${{ needs.analyze-failures.outputs.is_test == 'true' && 'warning' || 'failure' }}
129+
SLACK_COLOR: failure
147130
SLACK_LINK_NAMES: true
148-
SLACK_TITLE: "${{ needs.analyze-failures.outputs.is_test == 'true' && '[TEST] ' || '' }}${{ needs.analyze-failures.outputs.workflow_name }} failed"
131+
SLACK_TITLE: "${{ needs.analyze-failures.outputs.workflow_name }} failed"
149132
MSG_MINIMAL: actions url,commit
150133
SLACK_MESSAGE: |
151-
${{ needs.analyze-failures.outputs.is_test == 'true' && '**This is a test of the oncall analysis workflow - please ignore**' || '@acs-collector-oncall' }}
134+
@acs-collector-oncall
152135
153136
${{ steps.read-analysis.outputs.analysis }}
154137
@@ -158,11 +141,11 @@ jobs:
158141
env:
159142
SLACK_WEBHOOK: ${{ secrets.SLACK_COLLECTOR_ONCALL_WEBHOOK }}
160143
SLACK_CHANNEL: team-acs-collector-oncall
161-
SLACK_COLOR: ${{ needs.analyze-failures.outputs.is_test == 'true' && 'warning' || 'failure' }}
144+
SLACK_COLOR: failure
162145
SLACK_LINK_NAMES: true
163-
SLACK_TITLE: "${{ needs.analyze-failures.outputs.is_test == 'true' && '[TEST] ' || '' }}${{ needs.analyze-failures.outputs.workflow_name }} failed"
146+
SLACK_TITLE: "${{ needs.analyze-failures.outputs.workflow_name }} failed"
164147
MSG_MINIMAL: actions url,commit
165148
SLACK_MESSAGE: |
166-
${{ needs.analyze-failures.outputs.is_test == 'true' && '**This is a test - AI analysis unavailable**' || '@acs-collector-oncall' }}
149+
@acs-collector-oncall
167150
168151
AI analysis unavailable. Check workflow logs.

0 commit comments

Comments
 (0)