You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: tests/benchmarks/mcp_universe/README.md
+30-49Lines changed: 30 additions & 49 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,9 +4,9 @@ This directory contains the integration of the MCP-Universe repository managemen
4
4
5
5
## Overview
6
6
7
-
MCP-Universe is a comprehensive benchmark from Salesforce AI Research that evaluates LLMs on realistic tasks using real-world MCP servers. This integration focuses on the **repository management domain** with:
7
+
MCP-Universe is a comprehensive benchmark from Salesforce AI Research that evaluates LLMs on realistic tasks using real-world MCP servers. This integration focuses on the repository management domain with:
8
8
9
-
-**28 pure GitHub tasks** (github_task_0001 through github_task_0030, excluding 0013 and 0020)
9
+
- 28 GitHub tasks
10
10
- Tests realistic GitHub operations including:
11
11
- Creating repositories and branches
12
12
- Managing files and commits
@@ -18,22 +18,21 @@ MCP-Universe is a comprehensive benchmark from Salesforce AI Research that evalu
18
18
19
19
### Prerequisites
20
20
21
-
1.**Docker** - Required to run the GitHub MCP server
Only the access token is passed to the Docker container. The account name is used locally by the evaluator for template substitution in task assertions (e.g., checking `{{GITHUB_PERSONAL_ACCOUNT_NAME}}/repo-name` exists).
146
-
147
-
## Troubleshooting
148
-
149
-
### "Docker not found"
150
-
Ensure Docker Desktop is running and restart your terminal.
151
-
152
-
### "GITHUB_PERSONAL_ACCESS_TOKEN environment variable not set"
153
-
Export the required environment variables before running tests.
154
-
155
-
### "repository doesn't exist" (false negative)
156
-
GitHub's search API has indexing delays for newly created repos. The evaluator patches handle this with direct API calls, but occasional failures may occur.
157
-
158
-
### Rate limiting
159
-
If you hit GitHub API rate limits, wait a few minutes or use a token with higher limits.
160
-
161
-
### Tests pass but some checks fail
162
-
Review the `*_readable.log` files in the output directory for detailed execution traces.
0 commit comments