Skip to content

Commit 5b2872c

Browse files
alonelishclaudeRoiGlinik
authored
ROB-4013 MCP auth health check: auto-detect + enable GitHub context (supersedes HolmesGPT#2117) (HolmesGPT#2118)
> Split: the `/api/chat` raw-bytes-body fix that previously lived here has moved to HolmesGPT#2120. This PR is now MCP-only. Supersedes HolmesGPT#2117 (originally by @aantn); its commits are included. ## Summary Makes MCP toolset authentication validation work end-to-end. **From HolmesGPT#2117 (@aantn) — `health_check_tool` foundation** - Adds an opt-in `health_check_tool` config field to MCP toolsets. MCP servers (e.g. GitHub) return their full tool list even with an invalid credential, so listing tools never proves auth. When set, the named read-only tool is invoked during prerequisite evaluation; a failure (e.g. 401) marks the toolset disabled instead of "connected". Wires `get_me` / `get_current_user` into the helm chart for the built-in GitHub / GitLab addons. **Follow-up (this PR)** 1. **Auto-detect the health check tool.** Toolset configs generated outside the helm template (e.g. Robusta's `custom_toolset.yaml`) don't set `health_check_tool`, so the check was skipped. When unset, Holmes now falls back to an allowlist of well-known read-only identity tools (`get_me`, `get_current_user`, `get_authenticated_user`, `whoami`) and invokes the first one the server exposes. Explicit config still takes precedence. 2. **Enable the GitHub `context` toolset by default.** `get_me` only exists when the GitHub MCP server's `context` toolset is enabled, but the chart default (`repos,issues,pull_requests,actions`) excluded it — so auto-detection found nothing. Default is now `repos,issues,pull_requests,actions,context` (read-only/lightweight). Adds a debug log when no identity tool is exposed, so a skipped check is diagnosable. 3. **Detailed failure messages.** Health-check failures now include the tool name and params (`health check tool '<name>' with params {} failed - ...`), per the toolset error-detail contract (CodeRabbit). ## Testing `tests/test_mcp_toolset.py`: health-check, auto-detect, skipped-check-logging, and error-message tests (`TestMCPHealthCheckTool`). Full suite passing. 🤖 Generated with [Claude Code](https://claude.com/claude-code) <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Added authentication credential validation for MCP servers at startup via health-check tools * Health-check tools are auto-detected from identity tools or can be explicitly configured * Improved error messages when authentication validation fails * **Documentation** * New section documenting health-check validation, configuration, and troubleshooting * **Chores** * Updated default Helm values to include context tool for authentication validation <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Signed-off-by: Claude <noreply@anthropic.com> Signed-off-by: alonelish <alon.elish@gmail.com> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Roi Glinik <groi.tech@gmail.com>
1 parent bf88efe commit 5b2872c

5 files changed

Lines changed: 515 additions & 2 deletions

File tree

docs/data-sources/remote-mcp-servers.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -391,6 +391,32 @@ MCP servers can forward HTTP headers from the incoming request to the MCP backen
391391

392392
For full details on template syntax, blocked headers, precedence rules, and examples for other toolset types, see [HTTP Header Propagation](header-propagation.md).
393393

394+
**Validating Authentication at Startup (Health Check)**
395+
396+
Many MCP servers return their full tool list even when the configured credential is invalid (a bad token, expired key, etc.). Because of this, simply connecting and listing tools does not prove that authentication works — the toolset can appear "connected" while every real call would fail.
397+
398+
To catch this at startup, Holmes can invoke a single read-only tool during the toolset's health check. If the call fails (e.g. `401 Unauthorized`), the toolset is marked **disabled** with the underlying error surfaced, instead of showing as enabled.
399+
400+
- **Automatic:** if the server exposes a well-known identity tool — `get_me`, `get_current_user`, `get_authenticated_user`, or `whoami` — Holmes auto-detects and uses it. No configuration needed.
401+
- **Explicit:** for any other server, set `health_check_tool` to a tool of your choice. This takes precedence over auto-detection.
402+
403+
The chosen tool **must be read-only, take no required arguments** (it is called with empty arguments `{}`), and actually exercise the credential. If `health_check_tool` is unset and the server exposes none of the known identity tools, the auth health check is skipped and the toolset loads as long as listing tools succeeds.
404+
405+
```yaml
406+
mcp_servers:
407+
my_server:
408+
description: "My custom MCP server"
409+
config:
410+
url: "http://my-mcp:8000/mcp"
411+
mode: streamable-http
412+
headers:
413+
Authorization: "Bearer {{ env.MY_API_KEY }}"
414+
# Invoked with empty args at startup to verify the credential is valid.
415+
# Must be a read-only, no-argument tool exposed by your server.
416+
health_check_tool: get_current_user
417+
llm_instructions: "..."
418+
```
419+
394420
## Configuration Format Migration
395421
396422
The MCP server configuration format has been updated. The `url` field must now be inside the `config` section.

helm/holmes/templates/toolset-config.yaml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,6 +106,7 @@ data:
106106
"url" (printf "http://%s-github-mcp-server.%s.svc.cluster.local:8000/mcp" .Release.Name .Release.Namespace)
107107
"mode" "streamable-http"
108108
"icon_url" "https://raw.githubusercontent.com/gilbarbara/logos/de2c1f96ff6e74ea7ea979b43202e8d4b863c655/logos/github.svg"
109+
"health_check_tool" "get_me"
109110
}}
110111
{{- $githubMcpServers := dict "github" (dict
111112
"description" "GitHub MCP Server - access repositories, pull requests, issues, and GitHub Actions. Debug CI failures, search code, and delegate tasks to Copilot."
@@ -119,6 +120,7 @@ data:
119120
"url" (printf "http://%s-github-mcp-server.%s.svc.cluster.local:8000/sse" .Release.Name .Release.Namespace)
120121
"mode" "sse"
121122
"icon_url" "https://raw.githubusercontent.com/gilbarbara/logos/de2c1f96ff6e74ea7ea979b43202e8d4b863c655/logos/github.svg"
123+
"health_check_tool" "get_me"
122124
}}
123125
{{- $githubMcpServers := dict "github" (dict
124126
"description" "GitHub MCP Server - access repositories, pull requests, issues, and GitHub Actions. Debug CI failures, search code, and delegate tasks to Copilot."
@@ -136,6 +138,7 @@ data:
136138
"url" (printf "http://%s-gitlab-mcp-server.%s.svc.cluster.local:8000/sse" .Release.Name .Release.Namespace)
137139
"mode" "sse"
138140
"icon_url" "https://raw.githubusercontent.com/gilbarbara/logos/de2c1f96ff6e74ea7ea979b43202e8d4b863c655/logos/gitlab.svg"
141+
"health_check_tool" "get_current_user"
139142
)
140143
"llm_instructions" (include "holmes.gitlabMcp.llmInstructions" . | trim)
141144
)

helm/holmes/values.yaml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -476,9 +476,12 @@ mcpAddons:
476476

477477
# Toolsets to enable (comma-separated list of toolset names)
478478
# Available: repos, issues, pull_requests, actions, code_security, users, context, etc.
479-
# Default: repos, issues, pull_requests, actions (covers most investigation needs)
479+
# Default: repos, issues, pull_requests, actions, context (covers most investigation needs).
480+
# 'context' exposes the read-only get_me tool, which Holmes uses to validate
481+
# the GitHub token at startup (a bad PAT marks the toolset disabled instead of
482+
# appearing connected). Keep 'context' enabled unless you have a reason not to.
480483
# Ignored when `tools` below is set (tools takes precedence as a hard allowlist).
481-
toolsets: "repos,issues,pull_requests,actions"
484+
toolsets: "repos,issues,pull_requests,actions,context"
482485

483486
# Individual tools to enable (comma-separated, optional).
484487
# When set, restricts Holmes to EXACTLY this list of tools — takes precedence

holmes/plugins/toolsets/mcp/toolset_mcp.py

Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -113,6 +113,20 @@ class MCPMode(str, Enum):
113113
STDIO = "stdio"
114114

115115

116+
# Well-known, read-only "who am I" tools used to verify MCP authentication when
117+
# no health_check_tool is explicitly configured. MCP servers commonly expose an
118+
# authenticated-identity endpoint (e.g. GitHub's get_me, GitLab's
119+
# get_current_user). Calling one with empty arguments is a side-effect-free way
120+
# to confirm credentials (such as an API token) are actually valid, since
121+
# list_tools succeeds even with a bad token. Order reflects matching priority.
122+
DEFAULT_HEALTH_CHECK_TOOLS: List[str] = [
123+
"get_me",
124+
"get_current_user",
125+
"get_authenticated_user",
126+
"whoami",
127+
]
128+
129+
116130

117131

118132

@@ -164,6 +178,14 @@ class MCPConfig(ToolsetConfig):
164178
title="OAuth",
165179
description="OAuth authorization_code configuration. When set, users authenticate via browser before tools can be used.",
166180
)
181+
health_check_tool: Optional[str] = Field(
182+
default=None,
183+
title="Health Check Tool",
184+
description="Name of a read-only tool to invoke during health check to verify authentication works. "
185+
"If set, this tool will be called with empty arguments after loading tools to ensure the "
186+
"connection is fully functional (e.g., API token is valid). Example: 'get_me' for GitHub MCP.",
187+
examples=["get_me", "get_current_user"],
188+
)
167189

168190
def get_lock_string(self) -> str:
169191
return str(self.url)
@@ -201,6 +223,14 @@ class StdioMCPConfig(ToolsetConfig):
201223
description="Icon URL for this MCP server, displayed in the UI for tool calls.",
202224
examples=["https://cdn.simpleicons.org/github/181717"],
203225
)
226+
health_check_tool: Optional[str] = Field(
227+
default=None,
228+
title="Health Check Tool",
229+
description="Name of a read-only tool to invoke during health check to verify authentication works. "
230+
"If set, this tool will be called with empty arguments after loading tools to ensure the "
231+
"connection is fully functional (e.g., API token is valid). Example: 'get_me' for GitHub MCP.",
232+
examples=["get_me", "get_current_user"],
233+
)
204234

205235
def get_lock_string(self) -> str:
206236
return str(self.command)
@@ -932,6 +962,20 @@ def prerequisites_callable(self, config) -> Tuple[bool, str]:
932962
if not self.tools:
933963
logging.warning("mcp server %s loaded 0 tools.", self.name)
934964

965+
# Invoke a read-only tool to verify authentication actually works.
966+
# MCP servers (e.g. GitHub) often return their tool list even with
967+
# invalid credentials, so listing tools alone doesn't prove auth.
968+
# Use the explicitly-configured tool if set, otherwise auto-detect a
969+
# well-known read-only identity tool from the loaded tools.
970+
health_check_tool_name = (
971+
self._mcp_config.health_check_tool
972+
or self._auto_detect_health_check_tool()
973+
)
974+
if health_check_tool_name:
975+
health_check_result = self._run_health_check_tool(health_check_tool_name)
976+
if not health_check_result[0]:
977+
return health_check_result
978+
935979
return (True, "")
936980
except Exception as e:
937981
error_detail = _extract_root_error_message(e)
@@ -941,6 +985,86 @@ def prerequisites_callable(self, config) -> Tuple[bool, str]:
941985
". If the server is still starting up, Holmes will retry automatically",
942986
)
943987

988+
def _auto_detect_health_check_tool(self) -> Optional[str]:
989+
"""Pick a default health-check tool from a known allowlist of read-only
990+
identity tools when none is explicitly configured.
991+
992+
Returns the name of the first allowlisted tool the server exposes, or
993+
None if it exposes none (in which case the auth health check is skipped).
994+
"""
995+
tool_names = {t.name for t in self.tools}
996+
for candidate in DEFAULT_HEALTH_CHECK_TOOLS:
997+
if candidate in tool_names:
998+
logging.info(
999+
"MCP server %s: no health_check_tool configured, auto-detected '%s' for auth validation",
1000+
self.name,
1001+
candidate,
1002+
)
1003+
return candidate
1004+
# No identity tool exposed — the auth health check is skipped. Log it so
1005+
# a silently-skipped check is diagnosable (e.g. GitHub MCP without its
1006+
# 'context' toolset enabled does not expose get_me). Set an explicit
1007+
# health_check_tool in config to validate auth via a different tool.
1008+
logging.debug(
1009+
"MCP server %s: no health_check_tool configured and none of %s are "
1010+
"exposed by the server; skipping auth health check",
1011+
self.name,
1012+
DEFAULT_HEALTH_CHECK_TOOLS,
1013+
)
1014+
return None
1015+
1016+
def _run_health_check_tool(self, tool_name: str) -> Tuple[bool, str]:
1017+
"""Invoke a tool to verify authentication actually works.
1018+
1019+
MCP servers may return a tool list even with invalid credentials (e.g., bad
1020+
GitHub token). This method calls a specified read-only tool with empty
1021+
arguments to verify the connection is fully functional.
1022+
"""
1023+
matching_tools = [t for t in self.tools if t.name == tool_name]
1024+
if not matching_tools:
1025+
available = [t.name for t in self.tools]
1026+
return (
1027+
False,
1028+
f"MCP server {self.name}: health_check_tool '{tool_name}' not found. "
1029+
f"Available tools: {', '.join(available[:10])}{'...' if len(available) > 10 else ''}",
1030+
)
1031+
1032+
try:
1033+
result = asyncio.run(self._call_health_check_tool_async(tool_name))
1034+
if result.isError:
1035+
error_chunks = [
1036+
RemoteMCPTool._extract_text_from_content_block(c)
1037+
for c in result.content
1038+
]
1039+
error_text = " ".join(t for t in error_chunks if t)
1040+
logging.warning(
1041+
"MCP server %s health check failed (tool: %s): %s",
1042+
self.name, tool_name, error_text or "unknown error"
1043+
)
1044+
return (
1045+
False,
1046+
f"MCP server {self.name}: health check tool '{tool_name}' with params {{}} "
1047+
f"failed - {error_text or 'unknown error'}",
1048+
)
1049+
logging.info("MCP server %s health check passed (tool: %s)", self.name, tool_name)
1050+
return (True, "")
1051+
except Exception as e:
1052+
error_detail = _extract_root_error_message(e)
1053+
logging.warning(
1054+
"MCP server %s health check exception (tool: %s): %s",
1055+
self.name, tool_name, error_detail
1056+
)
1057+
return (
1058+
False,
1059+
f"MCP server {self.name}: health check tool '{tool_name}' with params {{}} "
1060+
f"failed - {error_detail}",
1061+
)
1062+
1063+
async def _call_health_check_tool_async(self, tool_name: str):
1064+
"""Call a tool during health check (no request context available)."""
1065+
async with get_initialized_mcp_session(self, None) as session:
1066+
return await session.call_tool(tool_name, {})
1067+
9441068
def _check_oauth_server_reachable(self) -> Tuple[bool, str]:
9451069
"""For OAuth MCP servers, verify reachability without authenticating.
9461070

0 commit comments

Comments
 (0)