Combine test and production workflows

StackRox Automation · claude · StackRox Automation · commit ed6f26e449e9 · 2026-05-20T23:22:32.000Z
Removed test-oncall-analysis.yml and made analyze-and-notify.yml
dual-purpose:
1. Called by integration-tests.yml when tests actually fail
2. Triggered by PR label 'test-oncall-workflow' for testing

Changes:
- Added pull_request trigger with label filter
- Set workflow parameters based on trigger type
  - workflow_call: uses inputs from caller
  - pull_request: uses test values (no artifacts expected)
- Outputs parameters from analyze-failures for notify job
- Removed separate test workflow
- Updated TESTING_ONCALL.md documentation

Testing with label:
- No fake artifacts created
- Claude analyzes empty test-artifacts/ directory
- Generates appropriate report (likely "no failures found")
- Posts to Slack with [TEST] prefix

This simplifies the workflow structure while still allowing
easy testing via PR labels.

Co-Authored-By: Claude Sonnet 4.5 &lt;noreply@anthropic.com&gt;
diff --git a/.github/workflows/TESTING_ONCALL.md b/.github/workflows/TESTING_ONCALL.md
@@ -4,55 +4,78 @@ This document explains how to test the Claude AI test failure analysis workflow.
 
 ## How to Test
 
-1. Go to Actions → "Test On-Call Analysis Workflow"
-2. Click "Run workflow"
-3. Select branch: `add-test-analysis-job` (or whatever your PR branch is)
-4. Optionally add a comment (e.g., "Testing PR #3381")
-5. Click "Run workflow"
-6. Check the Slack channel for the [TEST] notification
+Add the label `test-oncall-workflow` to any PR.
 
-## What the test does
+The `analyze-and-notify` workflow will:
+1. Run automatically when the label is added
+2. Look for test artifacts (there won't be any on a PR without test failures)
+3. Run Claude analysis with empty/no artifacts
+4. Generate a report based on what it finds (or an empty report)
+5. Post to Slack with [TEST] prefix
 
-1. **Creates fake test failures**: Generates synthetic JUnit XML files with test failures
-   - The failures reference fake file paths and error messages
-   - This simulates the artifact structure from real integration test runs
+## What This Tests
 
-2. **Runs Claude analysis**: 
-   - Parses the XML test reports from artifacts
-   - Attempts to examine source code and git history
-   - Generates analysis report (may note that referenced files don't exist)
+✅ **Workflow execution**: All jobs run in correct sequence  
+✅ **Claude integration**: claude-code-base-action executes successfully  
+✅ **Skill loading**: `/analyze-test-failures` command is available  
+✅ **Slack notification**: Webhook delivers message to #team-acs-collector-oncall  
+✅ **Report generation**: Claude creates analysis-report.md (even if empty)  
 
-3. **Posts to Slack**:
-   - Sends notification to #team-acs-collector-oncall
-   - Prefixed with [TEST] to indicate test mode
-   - Includes the AI-generated analysis
+⚠️ **What it doesn't test**: Quality of analysis on real collector test failures (no real artifacts)
 
-## What this test validates
+## Expected Behavior
 
-✅ **Workflow execution**: All jobs run in correct sequence  
-✅ **Artifact handling**: Test reports are created, uploaded, and downloaded correctly  
-✅ **GCP authentication**: Vertex AI credentials work  
-✅ **Claude integration**: claude-code-base-action runs successfully  
-✅ **Slack notification**: Webhook delivers message to correct channel  
-✅ **Report generation**: analysis-report.md is created and included in Slack  
+**If you test on a PR without test failures:**
+
+Claude will analyze the empty `test-artifacts/` directory and should generate a report saying:
+```markdown
+**🤖 AI Analysis [TEST MODE]**
+
+**Root Cause**: No test failures found in artifacts directory.
+
+**Evidence**:
+• test-artifacts/ directory is empty or contains no JUnit XML files
+• No failed tests to analyze
 
-⚠️ **What it doesn't validate**: The quality of Claude's analysis on real collector test failures (since it uses synthetic data)
+**Recommendations**:
+• This is a test run with no actual test failures
+• The workflow is functioning correctly
+```
 
-## Expected Slack Message
+**Slack message:**
+```
+[TEST] Integration Tests failed
 
-You should receive a Slack message in #team-acs-collector-oncall with [TEST] prefix containing Claude's analysis of the synthetic test failures. The content will vary based on what Claude finds when analyzing the fake error messages.
+**This is a test of the oncall analysis workflow - please ignore**
+
+[Claude's report about no failures found]
+```
+
+## To Test with Real Failures
+
+Trigger the workflow on an actual test failure:
+1. Wait for integration tests to fail naturally, OR
+2. Intentionally break a test and push to a branch
+3. The workflow will run automatically with real artifacts
+4. Check Slack for analysis with actual root cause
 
 ## Cleanup
 
-After testing, you can:
+After testing:
 - Remove the `test-oncall-workflow` label from the PR
-- Delete the test workflow run from Actions
-- The Slack message will remain for reference
+- The Slack [TEST] message will remain for reference
 
 ## Troubleshooting
 
-If the test fails:
+**No workflow run triggered:**
+- Check that `.github/workflows/analyze-and-notify.yml` exists in the PR branch
+- New workflows require merge to main before PR triggers work
+
+**No Slack notification:**
+- Check `SLACK_COLLECTOR_ONCALL_WEBHOOK` secret is set
+- Verify webhook URL is valid
 
-1. **No Slack message**: Check that `SLACK_COLLECTOR_ONCALL_WEBHOOK` secret is set
-2. **No analysis report**: Check the "Analyze test failures with Claude" step logs
-3. **Action not found**: Make sure this PR includes `.github/workflows/test-oncall-analysis.yml`
+**Claude fails:**
+- Check analyze-failures job logs for errors
+- Verify `GCP_CLAUDE_SERVICE_ACCOUNT_KEY` and `GCP_CLAUDE_PROJECT_ID` secrets are set
+- See "Troubleshooting" section in `.github/scripts/README.md`
diff --git a/.github/workflows/analyze-and-notify.yml b/.github/workflows/analyze-and-notify.yml
@@ -21,11 +21,37 @@ on:
         required: false
         type: string
         default: ''
+  pull_request:
+    types: [labeled]
 
 jobs:
   analyze-failures:
     runs-on: ubuntu-24.04
+    if: |
+      always() && (
+        github.event_name == 'workflow_call' ||
+        (github.event_name == 'pull_request' && contains(github.event.pull_request.labels.*.name, 'test-oncall-workflow'))
+      )
+    outputs:
+      workflow_name: ${{ steps.params.outputs.workflow_name }}
+      failed_jobs: ${{ steps.params.outputs.failed_jobs }}
+      is_test: ${{ steps.params.outputs.is_test }}
     steps:
+      - name: Set workflow parameters
+        id: params
+        run: |
+          if [ "${{ github.event_name }}" = "pull_request" ]; then
+            echo "failed_jobs=rhcos-arm64,cos-logs" >> $GITHUB_OUTPUT
+            echo "workflow_name=Integration Tests" >> $GITHUB_OUTPUT
+            echo "is_test=true" >> $GITHUB_OUTPUT
+            echo "artifact_name=test-failure-artifacts" >> $GITHUB_OUTPUT
+          else
+            echo "failed_jobs=${{ inputs.failed-jobs }}" >> $GITHUB_OUTPUT
+            echo "workflow_name=${{ inputs.workflow-name }}" >> $GITHUB_OUTPUT
+            echo "is_test=${{ inputs.is-test }}" >> $GITHUB_OUTPUT
+            echo "artifact_name=" >> $GITHUB_OUTPUT
+          fi
+
       - name: Checkout repository
         uses: actions/checkout@v4
 
@@ -52,9 +78,9 @@ jobs:
           use_vertex: true
           allowed_tools: "Skill,Read,Grep,Glob,Bash"
           prompt: |
-            /analyze-test-failures test-artifacts/ "${{ inputs.workflow-name }}" "${{ inputs.failed-jobs }}"
+            /analyze-test-failures test-artifacts/ "${{ steps.params.outputs.workflow_name }}" "${{ steps.params.outputs.failed_jobs }}"
 
-            ${{ inputs.is-test && 'Add [TEST MODE] prefix to the report title.' || '' }}
+            ${{ steps.params.outputs.is_test == 'true' && 'Add [TEST MODE] prefix to the report title.' || '' }}
 
       - name: Check if analysis report was created
         id: check-report
@@ -117,13 +143,12 @@ jobs:
         env:
           SLACK_WEBHOOK: ${{ secrets.SLACK_COLLECTOR_ONCALL_WEBHOOK }}
           SLACK_CHANNEL: team-acs-collector-oncall
-          SLACK_COLOR: ${{ inputs.is-test && 'warning' || 'failure' }}
+          SLACK_COLOR: ${{ needs.analyze-failures.outputs.is_test == 'true' && 'warning' || 'failure' }}
           SLACK_LINK_NAMES: true
-          SLACK_TITLE: "${{ inputs.is-test && '[TEST] ' || '' }}${{ inputs.workflow-name }} failed"
+          SLACK_TITLE: "${{ needs.analyze-failures.outputs.is_test == 'true' && '[TEST] ' || '' }}${{ needs.analyze-failures.outputs.workflow_name }} failed"
           MSG_MINIMAL: actions url,commit
           SLACK_MESSAGE: |
-            ${{ inputs.is-test && '**This is a test of the oncall analysis workflow - please ignore**' || '@acs-collector-oncall' }}
-            ${{ inputs.test-comment && format('Comment: {0}', inputs.test-comment) || '' }}
+            ${{ needs.analyze-failures.outputs.is_test == 'true' && '**This is a test of the oncall analysis workflow - please ignore**' || '@acs-collector-oncall' }}
 
             ${{ steps.read-analysis.outputs.analysis }}
 
@@ -133,11 +158,11 @@ jobs:
         env:
           SLACK_WEBHOOK: ${{ secrets.SLACK_COLLECTOR_ONCALL_WEBHOOK }}
           SLACK_CHANNEL: team-acs-collector-oncall
-          SLACK_COLOR: ${{ inputs.is-test && 'warning' || 'failure' }}
+          SLACK_COLOR: ${{ needs.analyze-failures.outputs.is_test == 'true' && 'warning' || 'failure' }}
           SLACK_LINK_NAMES: true
-          SLACK_TITLE: "${{ inputs.is-test && '[TEST] ' || '' }}${{ inputs.workflow-name }} failed"
+          SLACK_TITLE: "${{ needs.analyze-failures.outputs.is_test == 'true' && '[TEST] ' || '' }}${{ needs.analyze-failures.outputs.workflow_name }} failed"
           MSG_MINIMAL: actions url,commit
           SLACK_MESSAGE: |
-            ${{ inputs.is-test && '**This is a test - AI analysis unavailable**' || '@acs-collector-oncall' }}
+            ${{ needs.analyze-failures.outputs.is_test == 'true' && '**This is a test - AI analysis unavailable**' || '@acs-collector-oncall' }}
 
             AI analysis unavailable. Check workflow logs.
diff --git a/.github/workflows/test-oncall-analysis.yml b/.github/workflows/test-oncall-analysis.yml