fix: resolve Windows stdio deadlocks caused by grpc.aio and google.auth#139
fix: resolve Windows stdio deadlocks caused by grpc.aio and google.auth#139gitrishiom wants to merge 2 commits intogoogleanalytics:mainfrom
Conversation
- Replaced grpc.aio async clients with synchronous variants offloaded via asyncio.to_thread to prevent transport starvation on Windows pipes. - Provided fallback logic for google.auth.default() to prevent blocking subprocess spawns during credential resolution.
ZLeventer
left a comment
There was a problem hiding this comment.
This is a real issue — grpc.aio and the Google auth library both try to manage their own event loops, and on Windows this creates deadlocks when the MCP server's stdio transport is also running an event loop.
The approach of dropping to sync clients and wrapping in asyncio.to_thread() is a pragmatic fix. It avoids the event loop collision by keeping gRPC off the async event loop entirely, while still presenting async interfaces to the MCP framework.
A few observations:
-
Import placement —
import asynciois repeated inside each function body. Since it's a stdlib module with negligible import cost, moving it to the top-level import block would be cleaner. Not a blocker, just style. -
Client creation per call —
create_admin_api_client_sync()is called fresh inside each_fetch()closure. If the sync client creation is expensive (auth token fetch, channel setup), consider caching the sync client the same way the async client was cached at module level. This matters for repeated tool calls in a single MCP session. -
Thread pool sizing —
to_threaduses the default executor. For an MCP server handling one request at a time this is fine, but worth noting if concurrent tool calls are ever supported.
The core fix is sound — this unblocks Windows users who were getting hung MCP servers.
ZLeventer
left a comment
There was a problem hiding this comment.
The asyncio.to_thread + sync client approach is a legitimate solution to the grpc.aio / Windows stdio deadlock. A few things worth addressing before merge:
Debug logging left in production code (blocker)
_create_credentials now contains multiple print(..., file=sys.stderr) statements that will write noise to every MCP client's stderr on every tool call:
print("_create_credentials started", file=sys.stderr)
print(f"Loading credentials directly from {path}", file=sys.stderr)These should be removed before merge.
Module-level imports moved inside function
import sys, import os, and import google.oauth2.credentials are now inside _create_credentials. Python caches imports so it's not a performance issue, but it's inconsistent with the rest of the codebase and harder to spot during review. Worth moving to module level.
Docstrings removed from client factories
The sync client factory functions dropped their docstrings. The originals documented the credential scope used — worth keeping.
Alternative worth considering
Rather than converting all clients to sync + asyncio.to_thread unconditionally, a targeted approach would only apply the workaround on Windows:
import sys
if sys.platform == 'win32':
return await asyncio.to_thread(_fetch_sync)
else:
response = await async_client.run_report(request)This avoids adding thread-pool overhead on Linux/macOS where grpc.aio works fine, and makes the Windows-specific workaround more visible.
The underlying diagnosis (grpc.aio + stdio deadlock on Windows) is correct and this is a real user-affecting bug. Happy to see it fixed — the approach just needs cleanup before it's mergeable.
- Remove all debug print() statements from _create_credentials (blocker) - Move import asyncio, sys, os, google.oauth2.credentials to top-level imports - Restore docstrings on all client factory functions - Keep both sync and async client factories; async clients remain for Linux/macOS - Apply Windows-only workaround via sys.platform == 'win32' conditional: uses sync client + asyncio.to_thread on Windows, native async on other platforms - Avoids thread-pool overhead on Linux/macOS where grpc.aio works correctly
|
Thanks for the detailed review @ZLeventer! I've pushed a follow-up commit addressing all your feedback:
|
Description
This PR resolves a critical starvation/deadlock issue when running the MCP server over standard I/O (stdio) on Windows.
The Problem
The Solution
This has been tested locally with Google Antigravity on Windows 11 with the analytics-mcp connected to a Gemini Code Assist instance, resulting in perfect stability where it previously hung indefinitely.