Skip to content

Commit 39470c9

Browse files
Merge pull request #3861 from AI-Hypercomputer:shralex_investigate
PiperOrigin-RevId: 914064691
2 parents e222db8 + 4721c23 commit 39470c9

4 files changed

Lines changed: 226 additions & 2 deletions

File tree

Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
description = "Automatically investigates and diagnoses CI test failures"
2+
prompt = """
3+
You are a world-class autonomous software diagnostics agent. Your purpose is to analyze failed CI/CD runs, pinpoint the root cause in the codebase, and write a highly actionable diagnostic comment.
4+
5+
## Context Available:
6+
- **Failed Log Excerpt:** Available in `.gemini/failed_logs.txt`. Use `cat` to view it.
7+
- **Pull Request Ref / Diff:** Use the `pull_request_read` tool or explore files using shell tools.
8+
9+
## Systematic Diagnostics Protocol:
10+
11+
*Optimizing Investigation Efficiency:* Perform cheap, lightweight actions first (reading local log files, searching git history, checking previous issues/diffs) before initiating deep analysis of codebase modules or downloading heavy build artifacts.
12+
13+
1. **Read & Parse failed_logs.txt:**
14+
- Locate the failing test functions, classes, or scripts.
15+
- Extract the exact error messages and tracebacks.
16+
- **Group and Compare Failures:** If a run has multiple failures, determine if they are all part of a single cascade (sharing the same root cause) or if different independent root causes are at play. Focus deep-dive analysis on the most recent/representative failure, but explicitly note if multiple distinct failure modes were found.
17+
- Limit detailed trace extraction to up to 3 representative examples to avoid cluttering the final report.
18+
19+
2. **Explore Related Issues / Previous Runs (Cheap Search):**
20+
- Check if this is a recurring or known flake by searching recent issues, discussions, or git history for similar error messages or failing test names.
21+
- If a similar error has been encountered before, reference those occurrences or prior investigation outcomes.
22+
23+
3. **Locate the Failing Component:**
24+
- Search the codebase using `search_code` or look up the files where the failing tests or code reside.
25+
26+
4. **Analyze Changes & Identify Culprits:**
27+
- **PR runs:** Compare the failure traces with the recent code additions/deletions in the PR. Identify if the failures are due to syntax, logic, sharding changes, parameter mismatches, environment configuration, or infrastructure issues.
28+
- *Fallback check:* If no clear link is found between the failure and the PR changes (low confidence), check the git log/blame of the failing component on `main`. Try to identify if a recent upstream PR or commit merged to the base branch (`main`) might be the actual culprit.
29+
- **Scheduled runs (on main):** If investigating a scheduled failure on the `main` branch, inspect the git log history (e.g., `git log`, `git blame`) of the failing component/test file. Identify recent merges or commits that modified relevant paths, and try to identify the specific 'culprit PR' or commit that likely introduced the failure.
30+
31+
5. **Calibrate Tone and Confidence:**
32+
- State your confidence level: **low**, **moderate**, or **high**.
33+
- **Codebase vs. Infrastructure Distinction:** Explicitly distinguish whether you believe the failure is a codebase regression (e.g., logical bugs, syntax, API mismatches) or an infrastructure/environment flake (e.g., TPU provisioning failures, GCS timeout errors, CUDA out-of-memory or driver issues).
34+
- Default to "possible cause" or "hypothesis" language.
35+
- Upgrade to "likely cause" only when multiple independent pieces of evidence converge (e.g., a suspicious commit + matching error signature + timing correlation).
36+
- Use "confirmed cause" only when evidence is unambiguous.
37+
- If inconclusive, say so. Partial findings and ruling things out is still valuable. Avoid assertive phrasing like "the root cause is" unless genuinely certain.
38+
39+
6. **Formulate the Diagnostics Report:**
40+
- Write a clean, professional, and precise markdown report matching the template below. Do not be overly wordy; get straight to the facts.
41+
- **Save the Report**: You MUST write and save this formulated markdown report to `.gemini/findings.md`.
42+
- **Keep it Concise:** If there are many failing tests due to the same error or infra issue, mention that a cascade occurred, list up to 3 representative examples, and explain the single root cause instead of repeating sections.
43+
44+
## Report Template:
45+
46+
```markdown
47+
### 🤖 CI Failure Investigation Report
48+
49+
I have analyzed the recent test failures in the CI pipeline and identified the following:
50+
51+
#### 🔍 What Failed
52+
*(If there are many failures, group them by root cause and list only up to 3 representative example test cases)*
53+
* **Job/Matrix**: `Matrix-Flavor-Name`
54+
* **Failing Test**: `test_filename.py::test_function_name`
55+
* **Error**: `TypeError: ...`
56+
57+
#### 🪵 Error Details & Stack Trace
58+
```python
59+
[Short stack trace snippet showing where the error(s) occurred]
60+
```
61+
62+
#### 💡 Root Cause Analysis & Context
63+
**Confidence:** [low / moderate / high] *(Calibrate based on whether this is a hypothesis, a likely cause, or a confirmed cause)*
64+
65+
[Provide a clear explanation connecting the failure(s) to recent changes made in this PR, or to infrastructure issues. If you searched for previous occurrences or similar issues, summarize those findings here.]
66+
67+
#### 🛠️ Recommended Fix *(Only include this section if Confidence is HIGH)*
68+
[Provide the recommended code block diff(s) or specific file edit(s) to fix the issue(s).]
69+
```
70+
71+
7. **Execute the Report:**
72+
- **Determine Target Destination:**
73+
- If the environment variable `PULL_REQUEST_NUMBER` is present and non-empty, post the report as a comment on that PR/issue using the `add_issue_comment` tool.
74+
- If `PULL_REQUEST_NUMBER` is empty or not a valid number (such as in a scheduled CI run failure), use `gh issue list --state open` with the shell tool to locate the open failure notification issue for the "MaxText Package Tests" workflow. If found, post the report as a comment on that issue using the `gh issue comment <issue-number> --body-file .gemini/findings.md` command.
75+
- If no target issue is found, verify that the findings are written to `.gemini/findings.md` so it is preserved in the runner's artifacts.
76+
77+
"""

.github/workflows/gemini-dispatch.yml

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,10 @@ on:
99
pull_request_review:
1010
types: ['submitted']
1111

12+
# Trigger when a comment is added to the main conversation of a PR/Issue
13+
issue_comment:
14+
types: ['created']
15+
1216
# Trigger when any label is attached to the PR
1317
pull_request:
1418
types: ['labeled']
@@ -61,6 +65,7 @@ jobs:
6165
command: '${{ steps.extract_command.outputs.command }}'
6266
request: '${{ steps.extract_command.outputs.request }}'
6367
additional_context: '${{ steps.extract_command.outputs.additional_context }}'
68+
failed_run_id: '${{ steps.extract_command.outputs.failed_run_id }}'
6469
issue_number: '${{ github.event.pull_request.number || github.event.issue.number }}'
6570
steps:
6671
- name: 'Mint identity token'
@@ -92,8 +97,13 @@ jobs:
9297
core.setOutput('command', 'review');
9398
} else if (request.startsWith("@gemini-cli /review")) {
9499
core.setOutput('command', 'review');
95-
const additionalContext = request.replace(/^@gemini-cli \/review/, '').trim();
96-
core.setOutput('additional_context', additionalContext);
100+
core.setOutput('additional_context', '');
101+
} else if (request.startsWith("@gemini-cli /investigate")) {
102+
core.setOutput('command', 'investigate');
103+
const parts = request.split(/\s+/);
104+
const failedRunId = parts.length > 2 ? parts[2] : '';
105+
core.setOutput('failed_run_id', failedRunId);
106+
core.setOutput('additional_context', '');
97107
} else if (request.startsWith("@gemini-cli")) {
98108
const additionalContext = request.replace(/^@gemini-cli/, '').trim();
99109
core.setOutput('command', 'invoke');
@@ -142,11 +152,28 @@ jobs:
142152
additional_context: '${{ needs.dispatch.outputs.additional_context }}'
143153
secrets: 'inherit'
144154

155+
investigate:
156+
needs: 'dispatch'
157+
if: |-
158+
${{ needs.dispatch.outputs.command == 'investigate' }}
159+
uses: './.github/workflows/gemini-investigate.yml'
160+
permissions:
161+
contents: 'read'
162+
id-token: 'write'
163+
issues: 'write'
164+
pull-requests: 'write'
165+
actions: 'read'
166+
with:
167+
additional_context: '${{ needs.dispatch.outputs.additional_context }}'
168+
failed_run_id: '${{ needs.dispatch.outputs.failed_run_id }}'
169+
secrets: 'inherit'
170+
145171
fallthrough:
146172
needs:
147173
- 'dispatch'
148174
- 'review'
149175
- 'invoke'
176+
- 'investigate'
150177
if: |-
151178
${{ always() && !cancelled() && (failure() || needs.dispatch.outputs.command == 'fallthrough') }}
152179
runs-on: 'ubuntu-latest'
Lines changed: 119 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,119 @@
1+
name: 'Gemini Failure Investigator'
2+
3+
on:
4+
workflow_call:
5+
inputs:
6+
additional_context:
7+
type: 'string'
8+
required: false
9+
failed_run_id:
10+
type: 'string'
11+
required: false
12+
13+
permissions:
14+
contents: 'read'
15+
id-token: 'write'
16+
issues: 'write'
17+
pull-requests: 'write'
18+
actions: 'read' # Required to fetch workflow logs
19+
20+
jobs:
21+
investigate:
22+
runs-on: 'ubuntu-latest'
23+
steps:
24+
- name: 'Checkout repository'
25+
uses: 'actions/checkout@v4'
26+
with:
27+
persist-credentials: 'false'
28+
29+
- name: 'Gather failed logs'
30+
env:
31+
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }}
32+
RUN_ID: ${{ github.event.workflow_run.id || inputs.failed_run_id }}
33+
REPO: ${{ github.repository }}
34+
BRANCH: ${{ github.event.pull_request.head.ref }}
35+
SHA: ${{ github.event.pull_request.head.sha }}
36+
run: |
37+
mkdir -p .gemini
38+
39+
# Determine target run ID
40+
if [ -z "$RUN_ID" ]; then
41+
# Fallback to finding the latest failed run for this PR's specific commit
42+
if [ -n "$SHA" ]; then
43+
echo "Searching for failed runs for commit: $SHA"
44+
RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --commit "$SHA" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
45+
fi
46+
47+
# Fallback to branch if commit-specific run wasn't found
48+
if [ -z "$RUN_ID" ] && [ -n "$BRANCH" ]; then
49+
echo "Searching for failed runs on branch: $BRANCH"
50+
RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --branch "$BRANCH" --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
51+
fi
52+
53+
# Global fallback
54+
if [ -z "$RUN_ID" ]; then
55+
echo "Searching for latest failed run across the repository"
56+
RUN_ID=$(gh run list --workflow "MaxText Package Tests" --status failure --limit 1 --json databaseId --jq '.[0].databaseId' --repo "$REPO")
57+
fi
58+
fi
59+
60+
echo "Gathering logs for failed run: $RUN_ID"
61+
62+
if [ -n "$RUN_ID" ]; then
63+
# Retrieve only the failing lines/jobs to avoid token limit overhead
64+
gh run view "$RUN_ID" --log-failed --repo "$REPO" > .gemini/failed_logs.txt || true
65+
else
66+
echo "No failed runs found." > .gemini/failed_logs.txt
67+
fi
68+
69+
- name: 'Run Gemini Failure Investigator'
70+
uses: 'google-github-actions/run-gemini-cli@v0'
71+
env:
72+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
73+
REPOSITORY: ${{ github.repository }}
74+
PULL_REQUEST_NUMBER: ${{ github.event.workflow_run.pull_requests[0].number || github.event.pull_request.number || github.event.issue.number }}
75+
with:
76+
gcp_location: '${{ vars.GOOGLE_CLOUD_LOCATION }}'
77+
gcp_project_id: '${{ vars.GOOGLE_CLOUD_PROJECT }}'
78+
gcp_service_account: '${{ vars.SERVICE_ACCOUNT_EMAIL }}'
79+
gcp_workload_identity_provider: '${{ vars.GCP_WIF_PROVIDER }}'
80+
gemini_api_key: '${{ secrets.GEMINI_API_KEY }}'
81+
gemini_cli_version: '${{ vars.GEMINI_CLI_VERSION }}'
82+
gemini_model: '${{ vars.GEMINI_MODEL }}'
83+
workflow_name: 'gemini-investigate'
84+
settings: |-
85+
{
86+
"model": {
87+
"maxSessionTurns": 15
88+
},
89+
"mcpServers": {
90+
"github": {
91+
"command": "docker",
92+
"args": [
93+
"run",
94+
"-i",
95+
"--rm",
96+
"-e",
97+
"GITHUB_PERSONAL_ACCESS_TOKEN",
98+
"ghcr.io/github/github-mcp-server:v0.27.0"
99+
],
100+
"includeTools": [
101+
"add_issue_comment",
102+
"pull_request_read",
103+
"search_code",
104+
"get_file_contents",
105+
"list_commits",
106+
"get_commit"
107+
],
108+
"env": {
109+
"GITHUB_PERSONAL_ACCESS_TOKEN": "${GITHUB_TOKEN}"
110+
}
111+
}
112+
},
113+
"tools": {
114+
"shell": {
115+
"allowCommands": ["cat", "grep", "head", "tail", "gh", "git", "find"]
116+
}
117+
}
118+
}
119+
prompt: '/gemini-investigate'

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -150,6 +150,7 @@ dmypy.json
150150

151151
# Gemini CLI
152152
.gemini/
153+
!.gemini/commands/
153154
gha-creds-*.json
154155

155156
# vscode workspace

0 commit comments

Comments
 (0)