Skip to content

fix: resolve OAuth port conflict on re-auth in stdio mode#606

Open
Alexheath21 wants to merge 1 commit into
taylorwilsdon:mainfrom
Alexheath21:fix/oauth-port-conflict-reauth
Open

fix: resolve OAuth port conflict on re-auth in stdio mode#606
Alexheath21 wants to merge 1 commit into
taylorwilsdon:mainfrom
Alexheath21:fix/oauth-port-conflict-reauth

Conversation

@Alexheath21
Copy link
Copy Markdown

@Alexheath21 Alexheath21 commented Mar 24, 2026

Summary

  • Fixes the OAuth port conflict bug where MinimalOAuthServer fails with "port already in use" when re-authenticating against its own process in stdio mode
  • start() now detects a live server thread even when is_running is stale (set to False by an exception handler while the thread still holds the port)
  • The socket bind fallback probes is_actually_running() before erroring, recovering when the port is held by our own server
  • stop() no longer short-circuits on is_running=False, ensuring proper cleanup of the uvicorn server and thread

Fixes #546

Root Cause

In stdio mode, MinimalOAuthServer starts a uvicorn server on a daemon thread. When the server thread encounters an error, it sets is_running = False but the thread and port binding persist. On the next re-auth attempt:

  1. start() sees is_running=False and skips the "already running" check
  2. The socket bind check fails because the port is held by the previous server thread in our own process
  3. start() returns "Port 8000 is already in use" — blocking all subsequent MCP calls until restart

Fix Details

Three targeted changes to auth/oauth_callback_server.py:

  1. start() — thread-alive check before is_running flag: Checks server_thread.is_alive() first, which catches the stale-flag scenario. If the thread is alive and the port responds, reuses the server.

  2. start() — bind failure recovery: When the socket bind fails with EADDRINUSE, probes is_actually_running() before giving up. If the port is responding (our own server), marks it as running and succeeds.

  3. stop() — unconditional cleanup: Removed the if not self.is_running: return guard so stop() always signals should_exit and joins the thread, preventing zombie server threads.

Test Plan

  • Added test_start_reuses_server_when_thread_alive_and_port_responding — reproduces the exact re-auth bug scenario
  • Added test_start_recovers_when_port_in_use_by_own_server — covers the bind-failure recovery path
  • Added test_stop_cleans_up_even_when_is_running_false — verifies stop() works with stale flag
  • All 4 existing tests continue to pass
  • Full test suite: 309 passed (2 pre-existing failures unrelated to this change)

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes

    • Improved OAuth server startup reliability by better detecting and reusing running instances when the internal state flag is stale.
    • Enhanced port-binding failure handling to recover when a server instance already occupies the port.
    • Strengthened server shutdown to complete cleanly even with inconsistent internal state.
  • Tests

    • Added comprehensive test coverage for OAuth server lifecycle edge cases and error recovery scenarios.

…inst itself

In stdio mode, when a token expires and the MCP server needs to
re-authenticate, ensure_oauth_callback_available() calls start() on
MinimalOAuthServer. The server thread may still be alive and holding the
port even though is_running was set to False by an exception handler —
causing start() to fail with "port already in use" against its own
process.

Three fixes:
- start() now checks server_thread.is_alive() before the is_running flag,
  detecting and reusing a server whose flag is stale
- The socket bind fallback now probes is_actually_running() before
  returning an error, recovering when the port is held by our own server
- stop() no longer short-circuits on is_running=False, ensuring the
  uvicorn server and thread are properly cleaned up even when the flag
  is stale

Fixes taylorwilsdon#546

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

Modified MinimalOAuthServer.start() and stop() methods to tolerate stale is_running flags by verifying port responsiveness before reusing or restarting the server. Added fallback logic to detect and reuse an existing bound server when socket binding fails, and ensured stop operations complete regardless of the current flag state.

Changes

Cohort / File(s) Summary
OAuth Server Lifecycle Handling
auth/oauth_callback_server.py
Modified start() to check is_actually_running() when is_running flag is stale or bind fails, enabling server reuse instead of failure. Updated stop() to perform shutdown operations regardless of is_running state and set the flag to False even on exception.
OAuth Server Lifecycle Tests
tests/auth/test_oauth_callback_server.py
Added three test cases covering: stale is_running flag with live thread recovery, socket bind failure with active port detection, and stop operation with initial False flag state.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Poem

🐰 A stale flag once brought the server down,
But now we check the port all 'round!
When binding fails or threads won't cease,
TCP health brings blessed peace. ✨

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is comprehensive and well-structured, covering summary, root cause analysis, fix details, and test plan. However, it does not follow the provided repository template (missing structured sections like Type of Change, Testing checkboxes, and Checklist items). Reformat the description to match the repository template, including Type of Change, Testing section with checkboxes, and the required Checklist including the 'Allow edits from maintainers' acknowledgment.
Docstring Coverage ⚠️ Warning Docstring coverage is 29.41% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: resolve OAuth port conflict on re-auth in stdio mode' clearly describes the main change—a fix for the OAuth server port conflict issue that occurs during re-authentication in stdio mode, which is the primary objective of this PR.
Linked Issues check ✅ Passed The PR successfully addresses the primary coding requirement of issue #546 (Bug 1: Port conflict) by implementing thread-alive checks and bind-failure recovery logic. However, it does not address Bug 2 (OAuth state mismatch across processes), which is noted as a separate concern requiring persistent state sharing.
Out of Scope Changes check ✅ Passed All code changes directly relate to fixing the port conflict bug in MinimalOAuthServer. The modifications to start(), stop(), and related tests are all in scope for issue #546's Bug 1 (port conflict). No out-of-scope changes detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@auth/oauth_callback_server.py`:
- Around line 240-249: The except block currently catches all OSError instances;
update it to catch OSError as exc and only perform the "reuse" fallback when
exc.errno == errno.EADDRINUSE (matching the logic in is_actually_running()). If
errno matches, log the reuse message, set self.is_running = True and return
True, ""; otherwise handle the other errors normally (e.g., build the error_msg
using self.port and hostname or re-raise/return failure). Ensure errno is
imported where oauth_callback_server.py defines this logic.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 1c014115-734c-4b27-8581-38a8c3d4cc10

📥 Commits

Reviewing files that changed from the base of the PR and between c29ad41 and bc6229c.

📒 Files selected for processing (2)
  • auth/oauth_callback_server.py
  • tests/auth/test_oauth_callback_server.py

Comment on lines 240 to 249
except OSError:
# Before failing, check if the port is held by a previous server
# instance in our own process that we can still use for callbacks.
if self.is_actually_running():
logger.info(
f"Port {self.port} is in use by our own OAuth server, reusing"
)
self.is_running = True
return True, ""
error_msg = f"Port {self.port} is already in use on {hostname}. Cannot start minimal OAuth server."
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 148


🏁 Script executed:

head -20 auth/oauth_callback_server.py && wc -l auth/oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 548


🏁 Script executed:

sed -n '230,260p' auth/oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 1364


🏁 Script executed:

grep -n "import errno\|from errno" auth/oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 92


🏁 Script executed:

sed -n '1,400p' auth/oauth_callback_server.py | grep -n "is_actually_running" -A 10 -B 2

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 3314


🏁 Script executed:

sed -n '200,280p' auth/oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 3461


🏁 Script executed:

grep -n "except OSError\|except " auth/oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 407


🏁 Script executed:

sed -n '155,175p' auth/oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 772


🏁 Script executed:

sed -n '135,170p' auth/oauth_callback_server.py

Repository: taylorwilsdon/google_workspace_mcp

Length of output: 1491


Narrow bind-recovery fallback to EADDRINUSE only.

Line 240 currently catches any OSError, but the reuse fallback should only apply to address-in-use conflicts. Other bind failures (e.g., permission denied, invalid address) should not be treated as reusable-running. This is inconsistent with the is_actually_running() method (line 163), which already checks if exc.errno == errno.EADDRINUSE for the same scenario.

💡 Suggested fix
-        except OSError:
+        except OSError as exc:
             # Before failing, check if the port is held by a previous server
             # instance in our own process that we can still use for callbacks.
-            if self.is_actually_running():
+            if exc.errno == errno.EADDRINUSE and self.is_actually_running():
                 logger.info(
                     f"Port {self.port} is in use by our own OAuth server, reusing"
                 )
                 self.is_running = True
                 return True, ""
-            error_msg = f"Port {self.port} is already in use on {hostname}. Cannot start minimal OAuth server."
+            if exc.errno == errno.EADDRINUSE:
+                error_msg = (
+                    f"Port {self.port} is already in use on {hostname}. "
+                    "Cannot start minimal OAuth server."
+                )
+            else:
+                error_msg = (
+                    f"Failed to bind minimal OAuth server on {hostname}:{self.port}: {exc}"
+                )
             logger.error(error_msg)
             return False, error_msg
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@auth/oauth_callback_server.py` around lines 240 - 249, The except block
currently catches all OSError instances; update it to catch OSError as exc and
only perform the "reuse" fallback when exc.errno == errno.EADDRINUSE (matching
the logic in is_actually_running()). If errno matches, log the reuse message,
set self.is_running = True and return True, ""; otherwise handle the other
errors normally (e.g., build the error_msg using self.port and hostname or
re-raise/return failure). Ensure errno is imported where
oauth_callback_server.py defines this logic.

@taylorwilsdon
Copy link
Copy Markdown
Owner

There was a PR that the author closed looking to do something similar in #590

Will take a look. Some of the coderabbit feedback looks actionable before merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: Multiple stdio processes cause OAuth callback server port conflict and state mismatch

2 participants