You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
chore: parallelize nightly evaluations and fix suite timeouts (#472)
This PR overhauls the nightly evaluations suite to run significantly
faster and eliminates several sources of non-deterministic timeouts and
endless loops.
### 🚀 Enhancements
* **Parallel Execution**: Refactored `evals-nightly.yml` to dynamically
generate a matrix of all `.eval.ts` files and run them concurrently,
drastically reducing the total suite execution time.
* **Aggregated Reporting**: Enhanced `scripts/aggregate_evals.ts` to
collect artifacts from parallel jobs and output a unified, structured
Markdown table with pass rates, latencies, and detailed collapsible
failure logs.
### 🛠️ Stability & Bug Fixes
* **Issue Fixer Search Loops**: Scaffolded missing mock source files
(e.g., `src/index.js`), a `package.json` with a valid `test` script, and
a mock `gh` CLI executable in the `TestRig`. This prevents the agent
from infinitely searching for non-existent files or trying to repair a
broken mock testing environment.
* **Assistant Polling Loops**: Injected the mock MCP server into
`gemini-assistant.eval.ts` so it can successfully execute
`add_issue_comment`. Updated the system prompt to explicitly instruct
the agent to exit immediately after posting its plan, rather than
infinitely polling the issue for an `@gemini-cli /approve` comment. Also
explicitly defined the typo in `fix-typo` to stop the agent from
hopelessly guessing.
* **JSON Quoting Flakes**: Updated `gemini-scheduled-triage.toml` to
output its JSON array to `$GITHUB_ENV` using a heredoc (`cat << 'EOF'`)
instead of `echo '...'`. This prevents bash syntax errors when the
model's generated text naturally contains single quotes.
* **Global Timeout Boundaries**: Increased the `TestRig` hard-kill
timeout from 3 to 10 minutes to safely accommodate complex, high-turn
fixes (e.g., `fix-flaky-test`).
* **Security Warnings**: Added a top-level `permissions: contents: read`
block to the nightly workflow to resolve CodeQL linting warnings.
Successful run:
https://github.com/google-github-actions/run-gemini-cli/actions/runs/22689405186
Copy file name to clipboardExpand all lines: .github/commands/gemini-invoke.toml
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -82,7 +82,7 @@ Begin every task by building a complete picture of the situation.
82
82
Please review this plan. To approve, comment `@gemini-cli /approve` on this issue. To make changes, comment changes needed.
83
83
```
84
84
85
-
3. **Post the Plan**: You MUST use `add_issue_comment` to post your plan. The workflow should end only after this tool call has been successfully formulated.
85
+
3. **Post the Plan**: You MUST use `add_issue_comment` to post your plan. The workflow should end only after this tool call has been successfully formulated. Do not wait for human approval or check for comments; exit immediately after posting.
Copy file name to clipboardExpand all lines: .github/commands/gemini-scheduled-triage.toml
+9-3Lines changed: 9 additions & 3 deletions
Original file line number
Diff line number
Diff line change
@@ -85,9 +85,15 @@ Iterate through each issue object. For each issue:
85
85
86
86
### Step 5: Construct and Write Output
87
87
88
-
Assemble the results into a single JSON array, formatted as a string, according to the **Output Specification** below. Finally, execute the command to write this string to the output file, ensuring the JSON is enclosed in single quotes to prevent shell interpretation.
89
-
90
-
- Use the shell command to write: `echo 'TRIAGED_ISSUES=...' > "$GITHUB_ENV"` (Replace `...` with the final, minified JSON array string).
88
+
Assemble the results into a single JSON array, formatted as a string, according to the **Output Specification** below. Finally, execute the command to write this string to the output file.
89
+
90
+
- Use the shell command to write using a heredoc to prevent quote escaping issues:
91
+
```bash
92
+
cat << 'EOF' >> "$GITHUB_ENV"
93
+
TRIAGED_ISSUES=...
94
+
EOF
95
+
```
96
+
(Replace `...` with the final, minified JSON array string).
0 commit comments