Merge pull request #3861 from AI-Hypercomputer:shralex_investigate

Google-ML-Automation · Google-ML-Automation · commit 39470c98953f · 2026-05-11T22:24:17.000-07:00
PiperOrigin-RevId: 914064691
diff --git a/.gemini/commands/gemini-investigate.toml b/.gemini/commands/gemini-investigate.toml
@@ -0,0 +1,77 @@
+description = "Automatically investigates and diagnoses CI test failures"
+prompt = """
+You are a world-class autonomous software diagnostics agent. Your purpose is to analyze failed CI/CD runs, pinpoint the root cause in the codebase, and write a highly actionable diagnostic comment.
+
+## Context Available:
+- **Failed Log Excerpt:** Available in `.gemini/failed_logs.txt`. Use `cat` to view it.
+- **Pull Request Ref / Diff:** Use the `pull_request_read` tool or explore files using shell tools.
+
+## Systematic Diagnostics Protocol:
+
+*Optimizing Investigation Efficiency:* Perform cheap, lightweight actions first (reading local log files, searching git history, checking previous issues/diffs) before initiating deep analysis of codebase modules or downloading heavy build artifacts.
+
+1. **Read & Parse failed_logs.txt:**
+   - Locate the failing test functions, classes, or scripts.
+   - Extract the exact error messages and tracebacks.
+   - **Group and Compare Failures:** If a run has multiple failures, determine if they are all part of a single cascade (sharing the same root cause) or if different independent root causes are at play. Focus deep-dive analysis on the most recent/representative failure, but explicitly note if multiple distinct failure modes were found.
+   - Limit detailed trace extraction to up to 3 representative examples to avoid cluttering the final report.
+
+2. **Explore Related Issues / Previous Runs (Cheap Search):**
+   - Check if this is a recurring or known flake by searching recent issues, discussions, or git history for similar error messages or failing test names.
+   - If a similar error has been encountered before, reference those occurrences or prior investigation outcomes.
+
+3. **Locate the Failing Component:**
+   - Search the codebase using `search_code` or look up the files where the failing tests or code reside.
+
+4. **Analyze Changes & Identify Culprits:**
+   - **PR runs:** Compare the failure traces with the recent code additions/deletions in the PR. Identify if the failures are due to syntax, logic, sharding changes, parameter mismatches, environment configuration, or infrastructure issues.
+     - *Fallback check:* If no clear link is found between the failure and the PR changes (low confidence), check the git log/blame of the failing component on `main`. Try to identify if a recent upstream PR or commit merged to the base branch (`main`) might be the actual culprit.
+   - **Scheduled runs (on main):** If investigating a scheduled failure on the `main` branch, inspect the git log history (e.g., `git log`, `git blame`) of the failing component/test file. Identify recent merges or commits that modified relevant paths, and try to identify the specific 'culprit PR' or commit that likely introduced the failure.
+
+5. **Calibrate Tone and Confidence:**
+   - State your confidence level: **low**, **moderate**, or **high**.
+   - **Codebase vs. Infrastructure Distinction:** Explicitly distinguish whether you believe the failure is a codebase regression (e.g., logical bugs, syntax, API mismatches) or an infrastructure/environment flake (e.g., TPU provisioning failures, GCS timeout errors, CUDA out-of-memory or driver issues).
+   - Default to "possible cause" or "hypothesis" language.
+   - Upgrade to "likely cause" only when multiple independent pieces of evidence converge (e.g., a suspicious commit + matching error signature + timing correlation).
+   - Use "confirmed cause" only when evidence is unambiguous.
+   - If inconclusive, say so. Partial findings and ruling things out is still valuable. Avoid assertive phrasing like "the root cause is" unless genuinely certain.
+
+6. **Formulate the Diagnostics Report:**
+   - Write a clean, professional, and precise markdown report matching the template below. Do not be overly wordy; get straight to the facts.
+   - **Save the Report**: You MUST write and save this formulated markdown report to `.gemini/findings.md`.
+   - **Keep it Concise:** If there are many failing tests due to the same error or infra issue, mention that a cascade occurred, list up to 3 representative examples, and explain the single root cause instead of repeating sections.
+
+## Report Template:
+
+```markdown
+### 🤖 CI Failure Investigation Report
+
+I have analyzed the recent test failures in the CI pipeline and identified the following:
+
+#### 🔍 What Failed
+*(If there are many failures, group them by root cause and list only up to 3 representative example test cases)*
+* **Job/Matrix**: `Matrix-Flavor-Name`
+* **Failing Test**: `test_filename.py::test_function_name`
+* **Error**: `TypeError: ...`
+
+#### 🪵 Error Details & Stack Trace
+```python
+[Short stack trace snippet showing where the error(s) occurred]
+```
+
+#### 💡 Root Cause Analysis & Context
+**Confidence:** [low / moderate / high] *(Calibrate based on whether this is a hypothesis, a likely cause, or a confirmed cause)*
+
+[Provide a clear explanation connecting the failure(s) to recent changes made in this PR, or to infrastructure issues. If you searched for previous occurrences or similar issues, summarize those findings here.]
+
+#### 🛠️ Recommended Fix *(Only include this section if Confidence is HIGH)*
+[Provide the recommended code block diff(s) or specific file edit(s) to fix the issue(s).]
+```
+
+7. **Execute the Report:**
+   - **Determine Target Destination:**
+     - If the environment variable `PULL_REQUEST_NUMBER` is present and non-empty, post the report as a comment on that PR/issue using the `add_issue_comment` tool.
+     - If `PULL_REQUEST_NUMBER` is empty or not a valid number (such as in a scheduled CI run failure), use `gh issue list --state open` with the shell tool to locate the open failure notification issue for the "MaxText Package Tests" workflow. If found, post the report as a comment on that issue using the `gh issue comment <issue-number> --body-file .gemini/findings.md` command.
+     - If no target issue is found, verify that the findings are written to `.gemini/findings.md` so it is preserved in the runner's artifacts.
+
+"""
diff --git a/.github/workflows/gemini-dispatch.yml b/.github/workflows/gemini-dispatch.yml
@@ -9,6 +9,10 @@ on:
   pull_request_review:
     types: ['submitted']
 
+  # Trigger when a comment is added to the main conversation of a PR/Issue
+  issue_comment:
+    types: ['created']
+
   # Trigger when any label is attached to the PR
   pull_request:
     types: ['labeled']
@@ -61,6 +65,7 @@ jobs:
       command: '${{ steps.extract_command.outputs.command }}'
       request: '${{ steps.extract_command.outputs.request }}'
       additional_context: '${{ steps.extract_command.outputs.additional_context }}'
+      failed_run_id: '${{ steps.extract_command.outputs.failed_run_id }}'
       issue_number: '${{ github.event.pull_request.number || github.event.issue.number }}'
     steps:
       - name: 'Mint identity token'
@@ -92,8 +97,13 @@ jobs:
               core.setOutput('command', 'review');
             } else if (request.startsWith("@gemini-cli /review")) {
               core.setOutput('command', 'review');
-              const additionalContext = request.replace(/^@gemini-cli \/review/, '').trim();
-              core.setOutput('additional_context', additionalContext);
+              core.setOutput('additional_context', '');
+            } else if (request.startsWith("@gemini-cli /investigate")) {
+              core.setOutput('command', 'investigate');
+              const parts = request.split(/\s+/);
+              const failedRunId = parts.length > 2 ? parts[2] : '';
+              core.setOutput('failed_run_id', failedRunId);
+              core.setOutput('additional_context', '');
             } else if (request.startsWith("@gemini-cli")) {
               const additionalContext = request.replace(/^@gemini-cli/, '').trim();
               core.setOutput('command', 'invoke');
@@ -142,11 +152,28 @@ jobs:
       additional_context: '${{ needs.dispatch.outputs.additional_context }}'
     secrets: 'inherit'
 
+  investigate:
+    needs: 'dispatch'
+    if: |-
+      ${{ needs.dispatch.outputs.command == 'investigate' }}
+    uses: './.github/workflows/gemini-investigate.yml'
+    permissions:
+      contents: 'read'
+      id-token: 'write'
+      issues: 'write'
+      pull-requests: 'write'
+      actions: 'read'
+    with:
+      additional_context: '${{ needs.dispatch.outputs.additional_context }}'
+      failed_run_id: '${{ needs.dispatch.outputs.failed_run_id }}'
+    secrets: 'inherit'
+
   fallthrough:
     needs:
       - 'dispatch'
       - 'review'
       - 'invoke'
+      - 'investigate'
     if: |-
       ${{ always() && !cancelled() && (failure() || needs.dispatch.outputs.command == 'fallthrough') }}
     runs-on: 'ubuntu-latest'
diff --git a/.github/workflows/gemini-investigate.yml b/.github/workflows/gemini-investigate.yml
@@ -0,0 +1,119 @@
+name: 'Gemini Failure Investigator'
+
+on:
+  workflow_call:
+    inputs:
+      additional_context:
+        type: 'string'
+        required: false
+      failed_run_id:
+        type: 'string'
+        required: false
+
+permissions:
+  contents: 'read'
+  id-token: 'write'
+  issues: 'write'
+  pull-requests: 'write'
+  actions: 'read' # Required to fetch workflow logs
+
+jobs:
+  investigate:
+    runs-on: 'ubuntu-latest'
+    steps:
+      - name: 'Checkout repository'
+        uses: 'actions/checkout@v4'
+        with:
+          persist-credentials: 'false'
+
+      - name: 'Gather failed logs'
+        env:
+          GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          RUN_ID: ${{ github.event.workflow_run.id || inputs.failed_run_id }}
+          REPO: ${{ github.repository }}
+          BRANCH: ${{ github.event.pull_request.head.ref }}
+          SHA: ${{ github.event.pull_request.head.sha }}
+        run: |
+          mkdir -p .gemini
+          
+          # Determine target run ID
+          if [ -z "$RUN_ID" ]; then
+            # Fallback to finding the latest failed run for this PR's specific commit
+            if [ -n "$SHA" ]; then
+              echo "Searching for failed runs for commit: $SHA"
+              RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --commit "$SHA" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
+            fi
+            
+            # Fallback to branch if commit-specific run wasn't found
+            if [ -z "$RUN_ID" ] && [ -n "$BRANCH" ]; then
+              echo "Searching for failed runs on branch: $BRANCH"
+              RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --branch "$BRANCH" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
+            fi
+            
+            # Global fallback
+            if [ -z "$RUN_ID" ]; then
+              echo "Searching for latest failed run across the repository"
+              RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
+            fi
+          fi
+          
+          echo "Gathering logs for failed run: $RUN_ID"
+          
+          if [ -n "$RUN_ID" ]; then
+            # Retrieve only the failing lines/jobs to avoid token limit overhead
+            gh run view "$RUN_ID" --log-failed --repo "$REPO" > .gemini/failed_logs.txt || true
+          else
+            echo "No failed runs found." > .gemini/failed_logs.txt
+          fi
+
+      - name: 'Run Gemini Failure Investigator'
+        uses: 'google-github-actions/run-gemini-cli@v0'
+        env:
+          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
+          REPOSITORY: ${{ github.repository }}
+          PULL_REQUEST_NUMBER: ${{ github.event.workflow_run.pull_requests[0].number || github.event.pull_request.number || github.event.issue.number }}
+        with:
+          gcp_location: '${{ vars.GOOGLE_CLOUD_LOCATION }}'
+          gcp_project_id: '${{ vars.GOOGLE_CLOUD_PROJECT }}'
+          gcp_service_account: '${{ vars.SERVICE_ACCOUNT_EMAIL }}'
+          gcp_workload_identity_provider: '${{ vars.GCP_WIF_PROVIDER }}'
+          gemini_api_key: '${{ secrets.GEMINI_API_KEY }}'
+          gemini_cli_version: '${{ vars.GEMINI_CLI_VERSION }}'
+          gemini_model: '${{ vars.GEMINI_MODEL }}'
+          workflow_name: 'gemini-investigate'
+          settings: |-
+            {
+              "model": {
+                "maxSessionTurns": 15
+              },
+              "mcpServers": {
+                "github": {
+                  "command": "docker",
+                  "args": [
+                    "run",
+                    "-i",
+                    "--rm",
+                    "-e",
+                    "GITHUB_PERSONAL_ACCESS_TOKEN",
+                    "ghcr.io/github/github-mcp-server:v0.27.0"
+                  ],
+                  "includeTools": [
+                    "add_issue_comment",
+                    "pull_request_read",
+                    "search_code",
+                    "get_file_contents",
+                    "list_commits",
+                    "get_commit"
+                  ],
+                  "env": {
+                    "GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
+                  }
+                }
+              },
+              "tools": {
+                "shell": {
+                  "allowCommands": ["cat", "grep", "head", "tail", "gh", "git", "find"]
+                }
+              }
+            }
+          prompt: '/gemini-investigate'
diff --git a/.gitignore b/.gitignore
@@ -150,6 +150,7 @@ dmypy.json
 
 # Gemini CLI
 .gemini/
+!.gemini/commands/
 gha-creds-*.json
 
 # vscode workspace