Remove test-specific logic and skip Slack on PR label tests

StackRox Automation · claude · StackRox Automation · commit bed7306b018e · 2026-05-20T23:48:28.000Z
Removed:
- TESTING_ONCALL.md (separate test documentation)
- is-test and test-comment inputs from workflow
- All [TEST MODE] and test-specific messages
- Color coding (warning vs failure) based on test mode
- is_test outputs and conditionals

Changed:
- Notify job skips when triggered by PR label (github.event_name != 'pull_request')
- PR label trigger uses descriptive workflow name: "Test Workflow (Label Trigger)"
- Updated README with simplified testing instructions

Testing with PR label now:
- Runs Claude analysis only
- Uploads report artifact
- Skips Slack notification (no spam)
- Verify by downloading artifact from workflow run

Production behavior unchanged:
- Real test failures trigger full workflow
- Posts to Slack with @acs-collector-oncall
- Uses 'failure' color consistently

Co-Authored-By: Claude Sonnet 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/.github/scripts/README.md b/.github/scripts/README.md
@@ -45,59 +45,30 @@ Uses `claude-code-base-action` to execute the `/analyze-test-failures` skill:
 - Creates `analysis-report.md` with actionable insights
 
 **Claude has access to:**
+- `Skill` - Load and execute the analysis skill
 - `Read` - View source files
 - `Grep` - Search codebase
 - `Glob` - Find files
 - `Bash` - Execute git commands, create reports
-- `Skill` - Load and execute the analysis skill
 
 ### 4. Notify
-Posts to Slack with:
+Posts to Slack (#team-acs-collector-oncall) with:
 - AI-generated root cause analysis
 - Evidence from code and logs
 - Platform-specific patterns detected
 - Actionable recommendations with file:line references
 
 Falls back to simple notification if analysis fails.
 
-## Required Secrets
-
-### Already Configured ✅
-- `SLACK_COLLECTOR_ONCALL_WEBHOOK`
-- `GCP_CLAUDE_SERVICE_ACCOUNT_KEY`
-- `GCP_CLAUDE_PROJECT_ID`
-
 ## Files
 
 ### Workflows
 - `.github/workflows/integration-tests.yml` - Main integration test workflow
 - `.github/workflows/analyze-and-notify.yml` - Reusable analysis workflow
-- `.github/workflows/test-oncall-analysis.yml` - Test workflow with synthetic failures
 
 ### Skill
 - `.claude/commands/analyze-test-failures.md` - Claude skill defining analysis logic
 
-### Documentation
-- `.github/workflows/TESTING_ONCALL.md` - How to test the workflow
-
-## Testing
-
-### Manual Test Run
-
-1. Go to Actions → "Test On-Call Analysis Workflow"
-2. Click "Run workflow"
-3. Select branch: `add-test-analysis-job`
-4. Check #team-acs-collector-oncall for [TEST] Slack message
-
-See `.github/workflows/TESTING_ONCALL.md` for details.
-
-### Local Skill Development
-
-```bash
-# Test the skill locally (requires Claude CLI)
-claude /analyze-test-failures test-artifacts/ "Integration Tests" "rhcos-arm64,cos"
-```
-
 ## Example Output
 
 **Slack message with AI analysis:**
@@ -143,35 +114,67 @@ Integration tests failed.
 - Links recent git changes to failures
 - Provides concrete next steps
 
+## Testing
+
+### Test on a PR
+
+Add the label `test-oncall-workflow` to any PR to trigger the workflow.
+
+**What happens:**
+- Workflow runs with empty test artifacts
+- Claude analyzes and generates a report
+- Report is uploaded as artifact
+- **Slack notification is skipped** (only runs on actual test failures)
+
+**Use case:** Verify Claude analysis executes without spamming Slack.
+
+**To verify it worked:**
+1. Check the workflow run in Actions tab
+2. Download the `failure-analysis` artifact to see the generated report
+
+### Test with Real Failures
+
+The best test is observing the workflow on actual test failures:
+1. Wait for integration tests to fail naturally
+2. Check #team-acs-collector-oncall for the AI analysis
+3. Verify the analysis is helpful and actionable
+
 ## Configuration
 
 ### Vertex AI Region
 Set in `.github/workflows/analyze-and-notify.yml`:
 ```yaml
 env:
-  CLOUD_ML_REGION: us-east5  # Or your preferred region
+  CLOUD_ML_REGION: us-east5
 ```
 
+### Required Secrets
+
+Already configured:
+- `GCP_CLAUDE_SERVICE_ACCOUNT_KEY` - Service account JSON for Vertex AI
+- `GCP_CLAUDE_PROJECT_ID` - GCP project ID
+- `SLACK_COLLECTOR_ONCALL_WEBHOOK` - Slack webhook URL
+
 ### Allowed Tools
+
 Claude has access to these tools for investigation:
 ```yaml
 allowed_tools: "Skill,Read,Grep,Glob,Bash"
 ```
 
 ### Reusable Workflow Inputs
+
 The `analyze-and-notify.yml` workflow accepts:
 - `failed-jobs` - Comma-separated list of failed job names
 - `workflow-name` - Name of the workflow that failed
-- `is-test` - Whether this is a test run (adds [TEST MODE] prefix)
-- `test-comment` - Optional comment for test runs
 
 ## Troubleshooting
 
 ### No Analysis Report Generated
 
 **Check:**
 1. Claude action step logs - did it execute successfully?
-2. "Check if analysis report was created" step - file exists?
+2. "Check if analysis report was created" step - does file exist?
 3. Skill file exists at `.claude/commands/analyze-test-failures.md`
 4. `Skill` tool is in `allowed_tools`
 
@@ -189,13 +192,41 @@ Check Claude action logs for specific error details.
 
 **Check:**
 1. `SLACK_COLLECTOR_ONCALL_WEBHOOK` secret is set
-2. Notify job logs show the download step succeeded
+2. Notify job logs show download step succeeded
 3. Webhook URL is valid
 
+### Analysis Quality Issues
+
+**If Claude's analysis is not helpful:**
+1. Check that test artifacts are being uploaded correctly
+2. Verify JUnit XML format is valid
+3. Update skill instructions in `.claude/commands/analyze-test-failures.md`
+4. The skill can be iterated on independently of the workflow
+
+## Local Development
+
+### Test the Skill Locally
+
+```bash
+# Requires Claude CLI installed
+claude /analyze-test-failures test-artifacts/ "Integration Tests" "rhcos-arm64,cos"
+```
+
+### Update the Skill
+
+Edit `.claude/commands/analyze-test-failures.md` to:
+- Change analysis instructions
+- Update report format
+- Add new investigation steps
+- Modify recommendations structure
+
+Changes take effect on the next workflow run - no workflow YAML changes needed.
+
 ## Future Enhancements
 
 - [ ] Correlate failures with specific PR/commit
-- [ ] Track failure patterns over time
+- [ ] Track failure patterns over time  
 - [ ] Link to similar historical failures
 - [ ] Auto-create issues for recurring failures
 - [ ] Support for other test frameworks beyond JUnit XML
+- [ ] Integration with test retries/flakiness detection
diff --git a/.github/workflows/TESTING_ONCALL.md b/.github/workflows/TESTING_ONCALL.md
diff --git a/.github/workflows/analyze-and-notify.yml b/.github/workflows/analyze-and-notify.yml
@@ -11,16 +11,6 @@ on:
         description: 'Name of the workflow that failed'
         required: true
         type: string
-      is-test:
-        description: 'Whether this is a test run'
-        required: false
-        type: boolean
-        default: false
-      test-comment:
-        description: 'Optional comment for test runs'
-        required: false
-        type: string
-        default: ''
   pull_request:
     types: [labeled]
 
@@ -35,21 +25,16 @@ jobs:
     outputs:
       workflow_name: ${{ steps.params.outputs.workflow_name }}
       failed_jobs: ${{ steps.params.outputs.failed_jobs }}
-      is_test: ${{ steps.params.outputs.is_test }}
     steps:
       - name: Set workflow parameters
         id: params
         run: |
           if [ "${{ github.event_name }}" = "pull_request" ]; then
-            echo "failed_jobs=rhcos-arm64,cos-logs" >> $GITHUB_OUTPUT
-            echo "workflow_name=Integration Tests" >> $GITHUB_OUTPUT
-            echo "is_test=true" >> $GITHUB_OUTPUT
-            echo "artifact_name=test-failure-artifacts" >> $GITHUB_OUTPUT
+            echo "failed_jobs=test-label-trigger" >> $GITHUB_OUTPUT
+            echo "workflow_name=Test Workflow (Label Trigger)" >> $GITHUB_OUTPUT
           else
             echo "failed_jobs=${{ inputs.failed-jobs }}" >> $GITHUB_OUTPUT
             echo "workflow_name=${{ inputs.workflow-name }}" >> $GITHUB_OUTPUT
-            echo "is_test=${{ inputs.is-test }}" >> $GITHUB_OUTPUT
-            echo "artifact_name=" >> $GITHUB_OUTPUT
           fi
 
       - name: Checkout repository
@@ -80,8 +65,6 @@ jobs:
           prompt: |
             /analyze-test-failures test-artifacts/ "${{ steps.params.outputs.workflow_name }}" "${{ steps.params.outputs.failed_jobs }}"
 
-            ${{ steps.params.outputs.is_test == 'true' && 'Add [TEST MODE] prefix to the report title.' || '' }}
-
       - name: Check if analysis report was created
         id: check-report
         if: always()
@@ -115,7 +98,7 @@ jobs:
   notify:
     runs-on: ubuntu-24.04
     needs: analyze-failures
-    if: always()
+    if: always() && github.event_name != 'pull_request'
     steps:
       - name: Download analysis report
         uses: actions/download-artifact@v4
@@ -143,12 +126,12 @@ jobs:
         env:
           SLACK_WEBHOOK: ${{ secrets.SLACK_COLLECTOR_ONCALL_WEBHOOK }}
           SLACK_CHANNEL: team-acs-collector-oncall
-          SLACK_COLOR: ${{ needs.analyze-failures.outputs.is_test == 'true' && 'warning' || 'failure' }}
+          SLACK_COLOR: failure
           SLACK_LINK_NAMES: true
-          SLACK_TITLE: "${{ needs.analyze-failures.outputs.is_test == 'true' && '[TEST] ' || '' }}${{ needs.analyze-failures.outputs.workflow_name }} failed"
+          SLACK_TITLE: "${{ needs.analyze-failures.outputs.workflow_name }} failed"
           MSG_MINIMAL: actions url,commit
           SLACK_MESSAGE: |
-            ${{ needs.analyze-failures.outputs.is_test == 'true' && '**This is a test of the oncall analysis workflow - please ignore**' || '@acs-collector-oncall' }}
+            @acs-collector-oncall
 
             ${{ steps.read-analysis.outputs.analysis }}
 
@@ -158,11 +141,11 @@ jobs:
         env:
           SLACK_WEBHOOK: ${{ secrets.SLACK_COLLECTOR_ONCALL_WEBHOOK }}
           SLACK_CHANNEL: team-acs-collector-oncall
-          SLACK_COLOR: ${{ needs.analyze-failures.outputs.is_test == 'true' && 'warning' || 'failure' }}
+          SLACK_COLOR: failure
           SLACK_LINK_NAMES: true
-          SLACK_TITLE: "${{ needs.analyze-failures.outputs.is_test == 'true' && '[TEST] ' || '' }}${{ needs.analyze-failures.outputs.workflow_name }} failed"
+          SLACK_TITLE: "${{ needs.analyze-failures.outputs.workflow_name }} failed"
           MSG_MINIMAL: actions url,commit
           SLACK_MESSAGE: |
-            ${{ needs.analyze-failures.outputs.is_test == 'true' && '**This is a test - AI analysis unavailable**' || '@acs-collector-oncall' }}
+            @acs-collector-oncall
 
             AI analysis unavailable. Check workflow logs.