fix: return MCP connection errors to LLM instead of raising by jsonmp-k8 · Pull Request #1531 · kagent-dev/kagent

jsonmp-k8 · 2026-03-20T20:23:51Z

Summary

Wrap McpTool instances with ConnectionSafeMcpTool that catches persistent connection errors and returns them as error text to the LLM
Catches ConnectionError (stdlib), TimeoutError (stdlib), httpx.TransportError (httpx network/timeout/protocol errors), and McpError (MCP session stream drops and read timeouts)
The error message includes the tool name, error type, and instructs the LLM not to retry
KAgentMcpToolset.get_tools() automatically wraps all McpTool instances

Root cause

When an MCP HTTP tool call fails with "connection reset by peer", the error propagates up to the ADK flow handler, which sends it back to the LLM as a function error. The LLM interprets this as a transient failure and retries the same tool call — creating a tight loop of LLM call → tool call → connection error → LLM call for up to max_llm_calls (500) iterations, burning 100% CPU.

The MCP client wraps transport-level errors into McpError via mcp.shared.session.send_request() before they reach the tool, so catching only stdlib/httpx errors is insufficient — McpError must also be handled.

Testing

python -m pytest python/packages/kagent-adk/tests/unittests/test_mcp_connection_error_handling.py -v (10 tests)
python -m pytest python/packages/kagent-adk/tests/unittests/ -v (170 passed)

Test coverage:

ConnectionResetError, ConnectionRefusedError, TimeoutError — caught, returned as error dict
httpx.ConnectError, httpx.ReadError, httpx.ConnectTimeout — caught via httpx.TransportError
McpError (session read timeout) — caught, returned as error dict
ValueError, CancelledError — still raised (not connection errors)
KAgentMcpToolset.get_tools() wraps McpTool → ConnectionSafeMcpTool

Fixes #1530

Copilot

Pull request overview

Wraps MCP tools to gracefully surface persistent connection failures to the LLM as normal tool output (instead of raising), preventing tight retry loops and high CPU usage in static agent runs (Fixes #1530).

Changes:

Add ConnectionSafeMcpTool that catches connection-related exceptions and returns an error payload instructing the LLM not to retry.
Update KAgentMcpToolset.get_tools() to wrap returned McpTool instances with ConnectionSafeMcpTool.
Add unit tests covering connection vs non-connection error behavior (including CancelledError propagation).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File	Description
`python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py`	Introduces `ConnectionSafeMcpTool` and wraps MCP tools returned by `KAgentMcpToolset` to avoid raising persistent connection failures.
`python/packages/kagent-adk/tests/unittests/test_mcp_connection_error_handling.py`	Adds pytest coverage ensuring connection errors are returned as error text while other exceptions still propagate.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py

python/packages/kagent-adk/tests/unittests/test_mcp_connection_error_handling.py

python/packages/kagent-adk/src/kagent/adk/_mcp_toolset.py

jsonmp-k8 · 2026-03-24T21:38:12Z

@EItanya can you please review this ?

EItanya

Thanks for the thorough root cause analysis and fix, @jsonmp-k8! The approach of catching connection errors at the tool level to prevent the retry spin loop makes sense. A few observations:

Concerns

1. `McpError` may be too broad

McpError is the general MCP protocol error class, not just connection/transport errors. Catching all McpError could silently swallow legitimate protocol-level errors (e.g., invalid tool arguments, server-side validation failures) that the LLM should actually see as actionable errors to correct its behavior. Could this be narrowed — e.g., by inspecting the error code or message to distinguish transport-level failures from protocol-level ones?

2. The `new` + `dict` copy pattern is fragile

The wrapping creates ConnectionSafeMcpTool instances by bypassing __init__ and copying __dict__. This works today since McpTool uses plain instance attributes, but it's tightly coupled to the internal implementation of McpTool (from google-adk). If upstream adds __slots__, properties, or post-init hooks, this will break silently. A composition approach (delegating to the inner tool rather than re-typing the instance) would be more robust against upstream changes.

3. Add `exc_info` to the error log

The logger.error(error_message) call loses the traceback, which makes debugging intermittent network issues harder for operators. Consider:

logger.error(error_message, exc_info=error)

4. Minor style: `Dict` import unnecessary

The file already has from __future__ import annotations, so dict[str, Any] works without importing Dict from typing.

Comment left by Claude on behalf of @EItanya

Address review feedback on kagent-dev#1531: - Only catch transport-level McpErrors (timeouts, stream drops) via keyword inspection; protocol-level McpErrors (invalid args, validation) now propagate so the LLM can correct its behavior - Replace fragile __new__ + __dict__ copy with composition pattern: store inner McpTool and delegate via __getattr__ - Add exc_info to logger.error() for operator-visible tracebacks - Remove unused Dict import, use lowercase dict[str, Any] - Simplify test setup: mock inner tool directly, remove patch boilerplate - Add test_protocol_mcp_error_still_raises to verify narrowing Signed-off-by: Jaison Paul <paul.jaison@gmail.com>

When an MCP HTTP tool call fails with a persistent connection error (e.g. "connection reset by peer"), the error propagates to the LLM as a function error. The LLM interprets this as transient and retries the same tool call, creating a tight loop that burns 100% CPU for up to max_llm_calls (500) iterations. Wrap McpTool instances with ConnectionSafeMcpTool that catches connection errors (ConnectionError, TimeoutError, httpx.TransportError, McpError) and returns them as error text. This lets the LLM inform the user about the failure instead of retrying indefinitely. Fixes kagent-dev#1530 Signed-off-by: Jaison Paul <paul.jaison@gmail.com>

Address review feedback on kagent-dev#1531: - Only catch transport-level McpErrors (timeouts, stream drops) via keyword inspection; protocol-level McpErrors (invalid args, validation) now propagate so the LLM can correct its behavior - Replace fragile __new__ + __dict__ copy with composition pattern: store inner McpTool and delegate via __getattr__ - Add exc_info to logger.error() for operator-visible tracebacks - Remove unused Dict import, use lowercase dict[str, Any] - Simplify test setup: mock inner tool directly, remove patch boilerplate - Add test_protocol_mcp_error_still_raises to verify narrowing Signed-off-by: Jaison Paul <paul.jaison@gmail.com>

jsonmp-k8 · 2026-04-07T04:16:34Z

Thanks for the detailed review, @EItanya! All four points addressed in 996ceb3:

1. McpError narrowing

Removed McpError from _CONNECTION_ERROR_TYPES and now catch it separately. Added _is_transport_mcp_error() that checks the error message for transport-related keywords (timeout, connection, reset, closed, stream, etc.). Protocol-level McpErrors (e.g., invalid tool arguments) are re-raised so the LLM can correct itself. Added test_protocol_mcp_error_still_raises to verify.

2. Composition over `new` + `dict`

ConnectionSafeMcpTool now stores the inner tool and delegates via __getattr__. Wrapping is just ConnectionSafeMcpTool(tool). This is robust against upstream __slots__, properties, or post-init changes.

3. `exc_info` on error log

Done — logger.error(error_message, exc_info=error).

4. `Dict` import

Removed — using lowercase dict[str, Any] with from __future__ import annotations.

Signed-off-by: Jaison Paul <paul.jaison@gmail.com>

Copilot AI review requested due to automatic review settings March 20, 2026 20:23

jsonmp-k8 requested review from EItanya, peterj and yuval-k as code owners March 20, 2026 20:23

Copilot started reviewing on behalf of jsonmp-k8 March 20, 2026 20:24 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

jsonmp-k8 force-pushed the fix/1530-mcp-tool-call-cpu-spin branch 3 times, most recently from 41c3c17 to aa46632 Compare March 20, 2026 21:11

jmhbh previously approved these changes Mar 23, 2026

View reviewed changes

jsonmp-k8 force-pushed the fix/1530-mcp-tool-call-cpu-spin branch 2 times, most recently from 9fa4fe9 to ae91dd0 Compare March 24, 2026 21:37

jsonmp-k8 force-pushed the fix/1530-mcp-tool-call-cpu-spin branch 3 times, most recently from 5e717fc to f4a74cd Compare April 3, 2026 06:08

EItanya reviewed Apr 6, 2026

View reviewed changes

jsonmp-k8 dismissed jmhbh’s stale review via 996ceb3 April 7, 2026 04:14

jsonmp-k8 added 2 commits April 7, 2026 00:15

jsonmp-k8 force-pushed the fix/1530-mcp-tool-call-cpu-spin branch from 996ceb3 to 48764d8 Compare April 7, 2026 04:15

jsonmp-k8 and others added 2 commits April 7, 2026 00:18

fix: apply ruff formatting

fb5ecbd

Signed-off-by: Jaison Paul <paul.jaison@gmail.com>

Merge branch 'main' into fix/1530-mcp-tool-call-cpu-spin

544815a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: return MCP connection errors to LLM instead of raising#1531

fix: return MCP connection errors to LLM instead of raising#1531
jsonmp-k8 wants to merge 4 commits intokagent-dev:mainfrom
jsonmp-k8:fix/1530-mcp-tool-call-cpu-spin

jsonmp-k8 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsonmp-k8 commented Mar 24, 2026

Uh oh!

EItanya left a comment

Uh oh!

jsonmp-k8 commented Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jsonmp-k8 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root cause

Testing

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jsonmp-k8 commented Mar 24, 2026

Uh oh!

EItanya left a comment

Choose a reason for hiding this comment

Concerns

1. McpError may be too broad

2. The __new__ + __dict__ copy pattern is fragile

3. Add exc_info to the error log

4. Minor style: Dict import unnecessary

Uh oh!

jsonmp-k8 commented Apr 7, 2026

1. McpError narrowing

2. Composition over __new__ + __dict__

3. exc_info on error log

4. Dict import

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jsonmp-k8 commented Mar 20, 2026 •

edited

Loading

1. `McpError` may be too broad

2. The `new` + `dict` copy pattern is fragile

3. Add `exc_info` to the error log

4. Minor style: `Dict` import unnecessary

2. Composition over `new` + `dict`

3. `exc_info` on error log

4. `Dict` import