Skip to content

Commit 0e43911

Browse files
zjwu0522claude
andauthored
📝 docs: Gemini 3 reasoning-effort warning and beautify README News section (#232)
Co-authored-by: Claude <noreply@anthropic.com>
1 parent c3c92b9 commit 0e43911

2 files changed

Lines changed: 20 additions & 7 deletions

File tree

‎README.md‎

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -18,13 +18,13 @@ MCPMark provides a reproducible, extensible benchmark for researchers and engine
1818

1919
## News
2020

21-
- **02/Dec/2025** - Evaluated `gemini-3-pro-preview` (thinking: low): Pass@1 50.6% ± 2.3%, Pass@4 67.7%, Pass^4 31.5% - so close to `gpt-5-high` (51.6%)! Also `deepseek-v3.2-thinking` Pass@1 36.8% ± 1.8%, Pass@4 51.2%, Pass^4 21.3% and `deepseek-v3.2-chat` Pass@1 29.7% ± 1.5%, Pass@4 46.5%, Pass^4 13.4%
22-
- **02/Dec/2025** - Obfuscate GitHub @mentions to prevent notification spam during evaluation ([#229](https://github.com/eval-sys/mcpmark/pull/229))
23-
- **01/Dec/2025** - DeepSeek v3.2 release uses MCPMark! Kudos to the DeepSeek team on securing the best open-source model with a significant performance gain. [X Post](https://x.com/deepseek_ai/status/1995452650557763728) | [Technical Report](https://cas-bridge.xethub.hf.co/xet-bridge-us/692cfec93b25b81d09307b94/2d0aa38511b9df084d12a00fe04a96595496af772cb766c516c4e6aee1e21246?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=cas%2F20251203%2Fus-east-1%2Fs3%2Faws4_request&X-Amz-Date=20251203T145756Z&X-Amz-Expires=3600&X-Amz-Signature=31d39c39a42319dba189c1164f8a8bff69e4211b7520b75b7f3d4013a23b3022&X-Amz-SignedHeaders=host&X-Xet-Cas-Uid=634c72e6fe1bfa967d6c2b5c&response-content-disposition=inline%3B+filename*%3DUTF-8%27%27paper.pdf%3B+filename%3D%22paper.pdf%22%3B&response-content-type=application%2Fpdf&x-id=GetObject&Expires=1764777476&Policy=eyJTdGF0ZW1lbnQiOlt7IkNvbmRpdGlvbiI6eyJEYXRlTGVzc1RoYW4iOnsiQVdTOkVwb2NoVGltZSI6MTc2NDc3NzQ3Nn19LCJSZXNvdXJjZSI6Imh0dHBzOi8vY2FzLWJyaWRnZS54ZXRodWIuaGYuY28veGV0LWJyaWRnZS11cy82OTJjZmVjOTNiMjViODFkMDkzMDdiOTQvMmQwYWEzODUxMWI5ZGYwODRkMTJhMDBmZTA0YTk2NTk1NDk2YWY3NzJjYjc2NmM1MTZjNGU2YWVlMWUyMTI0NioifV19&Signature=HxFnQM7j%7EnuD9Qr81qqbXkunCc4nLLmTHv-5EosJu8EqlQ3VRyBibLNz0ur1d9h2SFp1Lvji3tNOQSWZsW%7EMS6wbmN5E4jjgbcXR40oxG4nhcq8Hy5jnHlEcQ9GyV9B0HTeXmQJ32AjkEDymEl9iVISRzEzwiu9J8wQL659QHSU5v81eexEk7LTfETikOdKCUQJy0uNqdDb3N%7Elfegq6XrxuZU5UawtlJYV57g1afkLln0ZYxqkYSEqxRdGwIAbfd1Te2Yi60I%7ELEB3qok4LM2%7E4gBWDBaB%7ESN902sbutiQYuvk6V5tFlSVq3MHaRJfJBCMTZiNtb5JAHKZSyVlGuw__&Key-Pair-Id=K2L8F4GPSG1IFC)
24-
- **17/Nov/2025** - Added 50 easy tasks (10 per MCP server) for smaller (<100B) open-source models ([#225](https://github.com/eval-sys/mcpmark/pull/225))
25-
- **31/Oct/2025** - Community PR from insforge: better MCP servers achieve better results with fewer tokens! ([#214](https://github.com/eval-sys/mcpmark/pull/214))
26-
- **13/Oct/2025** - Added ReAct agent support. PRs for new agent scaffolds welcome! ([#209](https://github.com/eval-sys/mcpmark/pull/209))
27-
- **10/Sep/2025** - `qwen-3-coder-plus` is the best open-source model! Kudos to Qwen team. [X Post](https://x.com/Alibaba_Qwen/status/1965457023438651532)
21+
- 🏅 **02 Dec** — Evaluated `gemini-3-pro-preview` (thinking: low): **Pass@1 50.6%** ± 2.3% — so close to `gpt-5-high` (51.6%)! Also `deepseek-v3.2-thinking` 36.8% and `deepseek-v3.2-chat` 29.7%
22+
- 🔥 **02 Dec** — Obfuscate GitHub @mentions to prevent notification spam during evaluation ([#229](https://github.com/eval-sys/mcpmark/pull/229))
23+
- 🏅 **01 Dec** — DeepSeek v3.2 uses MCPMark! Kudos on securing the best open-source model. [X Post](https://x.com/deepseek_ai/status/1995452650557763728) | [Technical Report](https://huggingface.co/deepseek-ai/DeepSeek-V3.2/resolve/main/assets/paper.pdf)
24+
- 🔥 **17 Nov** — Added 50 easy tasks (10 per MCP server) for smaller open-source models ([#225](https://github.com/eval-sys/mcpmark/pull/225))
25+
- 🤝 **31 Oct** — Community PR from insforge: better MCP servers achieve better results with fewer tokens! ([#214](https://github.com/eval-sys/mcpmark/pull/214))
26+
- 🔥 **13 Oct** — Added ReAct agent support. PRs for new agent scaffolds welcome! ([#209](https://github.com/eval-sys/mcpmark/pull/209))
27+
- 🏅 **10 Sep** — `qwen-3-coder-plus` is the best open-source model! Kudos to Qwen team. [X Post](https://x.com/Alibaba_Qwen/status/1965457023438651532)
2828

2929
---
3030

‎src/agents/base_agent.py‎

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,14 @@ def __init__(
6666
self.litellm_input_model_name,
6767
)
6868

69+
# Warn if Gemini 3 model uses unsupported reasoning_effort value
70+
if self._is_gemini_3_model() and self.reasoning_effort not in ["default", "low", "high"]:
71+
logger.warning(
72+
"Gemini 3 models only support reasoning_effort 'low' or 'high', "
73+
"got '%s'. LiteLLM may map this to the nearest supported value.",
74+
self.reasoning_effort,
75+
)
76+
6977
def __repr__(self) -> str: # pragma: no cover - debug helper
7078
return (
7179
f"{self.__class__.__name__}(service='{self.mcp_service}', "
@@ -418,6 +426,11 @@ def _is_gemini_model(self) -> bool:
418426
model_lower = self.litellm_input_model_name.lower()
419427
return "gemini" in model_lower or "bison" in model_lower
420428

429+
def _is_gemini_3_model(self) -> bool:
430+
"""Check if this is a Gemini 3 series model."""
431+
model_lower = self.litellm_input_model_name.lower()
432+
return "gemini-3" in model_lower or "gemini/gemini-3" in model_lower
433+
421434
def _simplify_schema_for_gemini(self, schema: Optional[Dict[str, Any]]) -> Dict[str, Any]:
422435
if not isinstance(schema, dict):
423436
return schema or {}

0 commit comments

Comments
 (0)