Skip to content

dynamic mcp: enforce loaded tool execution#774

Open
nickmisasi wants to merge 6 commits into
dynamic-mcp-03-registry-meta-toolsfrom
dynamic-mcp-04-context-toolrunner
Open

dynamic mcp: enforce loaded tool execution#774
nickmisasi wants to merge 6 commits into
dynamic-mcp-03-registry-meta-toolsfrom
dynamic-mcp-04-context-toolrunner

Conversation

@nickmisasi
Copy link
Copy Markdown
Collaborator

@nickmisasi nickmisasi commented May 23, 2026

Summary

Enforces strict loaded/unloaded tool visibility in LLM context and toolrunner flows, restores loaded MCP tools from conversation state, and hardens retry/approval execution behavior.

Stack: 4/6

Ticket Link

N/A

Screenshots

N/A

Release Note

Improved dynamic MCP tool execution safety by requiring tools to be loaded before use.

Made with Cursor

Summary by CodeRabbit

Release Notes

  • New Features

    • Added dynamic MCP tool loading with explicit load/unload capability per conversation
    • Enhanced tool information now includes schema, descriptions, and server metadata
  • Bug Fixes

    • Unknown tools now produce clear error messages instead of approval requests
    • Tool metadata properly preserved through approval workflows
    • Assistant reasoning no longer unnecessarily persisted in conversations
  • Improvements

    • Better error handling for unavailable and unloaded tools
    • Improved tool retry logic to avoid false failure streaks

Review Change Stack

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 23, 2026

🤖 LLM Evaluation Results

OpenAI

⚠️ Overall: 21/28 tests passed (75.0%)

Provider Total Passed Failed Pass Rate
⚠️ OPENAI 28 21 7 75.0%

❌ Failed Evaluations

Show 7 failures

OPENAI

1. TestReactEval/[openai]_react_cat_message

  • Score: 0.00
  • Rubric: The word/emoji is a cat emoji or a heart/love emoji
  • Reason: The output is the text string "smiley_cat", not an actual cat emoji (e.g., 😺/🐱) or a heart/love emoji (e.g., ❤️/💕).

2. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: is a list of bugs
  • Reason: The output does not provide an actual list of bugs; it states it cannot access the database and asks the user to paste bug reports, providing only a blank table template.

3. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: includes a description of each bug
  • Reason: The output does not include descriptions of any specific bugs; it only asks the user to paste bug reports and provides an empty table template.

4. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes each bug to a user
  • Reason: The output provides a blank table template and requests that the user paste bug details, including reporter name/username, but it does not actually attribute any specific bug to any user.

5. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes the bug about trying to save without a color and the save button not doing anything to @maria.nunez
  • Reason: The output does not mention the specific bug about trying to save without a color or the save button doing nothing, nor does it attribute any bug to @maria.nunez. It only provides a generic template and request for more information.

6. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: the bug about the end user being able to change channel banner is attributed to @maria.nunez
  • Reason: The output does not mention any specific bug about an end user being able to change the channel banner, nor does it attribute such a bug to @maria.nunez. It only provides a template and asks the user to paste bug reports.

7. TestDirectMessageConversations/[openai]_bot_dm_tool_introspection

  • Score: 0.50
  • Rubric: mentions Github and refers to the documentation
  • Reason: The output refers to documentation (Mattermost docs link) but it does not mention GitHub anywhere, so it does not satisfy the 'mentions Github' requirement.

Anthropic

⚠️ Overall: 21/28 tests passed (75.0%)

Provider Total Passed Failed Pass Rate
⚠️ ANTHROPIC 28 21 7 75.0%

❌ Failed Evaluations

Show 7 failures

ANTHROPIC

1. TestReactEval/[anthropic]_react_cat_message

  • Score: 0.00
  • Rubric: The word/emoji is a cat emoji or a heart/love emoji
  • Reason: The output is the text string "heart_eyes_cat", not an actual cat emoji (e.g., 😺/🐱) or a heart/love emoji (e.g., ❤️/😍).

2. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: is a list of bugs
  • Reason: The output does not provide a list of bugs; it states it cannot access bug tracking systems and asks the user to provide sources, offering to format them later.

3. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: includes a description of each bug
  • Reason: The output does not include any bug descriptions; it states it cannot access bug trackers and asks the user to provide source material. While it mentions it could format bugs with a 'Description' field later, it does not actually provide descriptions of each bug.

4. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes each bug to a user
  • Reason: The output does not attribute any bugs to specific users; it only says it cannot access bug trackers and suggests providing sources, with a generic field list including 'Reporter' but no actual user attributions.

5. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes the bug about trying to save without a color and the save button not doing anything to @maria.nunez
  • Reason: The output explains lack of access to bug tracking data and suggests ways to provide bug reports, but it does not mention the specific bug (saving without a color / save button does nothing) nor attribute it to @maria.nunez.

6. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: the bug about the end user being able to change channel banner is attributed to @maria.nunez
  • Reason: The output does not mention the specific bug about an end user being able to change the channel banner, nor does it attribute that bug to @maria.nunez. It only states lack of access and asks for sources.

7. TestDirectMessageConversations/[anthropic]_bot_dm_tool_introspection

  • Score: 0.00
  • Rubric: mentions Github and refers to the documentation
  • Reason: The output refers to documentation at docs.mattermost.com, satisfying the documentation part, but it does not mention GitHub anywhere.

This comment was automatically generated by the eval CI pipeline.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e882aa74f0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread llmcontext/llm_context.go
@nickmisasi
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

📝 Walkthrough

Walkthrough

This PR implements conversation-scoped MCP dynamic tool loading with strict registries and loaded-tool persistence. It threads request context through MCP tool retrieval, extends tool-use serialization with approval metadata (schema, description, bare name), detects and errors unloaded/unknown tools early in execution, exempts meta-tool failures from retry limits, and strips persisted assistant reasoning from tool rounds.

Changes

MCP Tool Loading, Metadata Persistence, and Registry

Layer / File(s) Summary
MCP tool retrieval context parameter
api/api.go, api/api_mcp.go, mcp/client_manager.go, llmcontext/llm_context.go, api/*_test.go, mcp/*_test.go, conversations/*_test.go
GetToolsForUser now accepts context.Context as first argument, enabling request-scoped tool retrieval and cancellation across API, MCP manager, and llmcontext layers; all test mocks updated to match new signature.
Tool approval metadata serialization and redaction
conversation/content_block.go, conversation/content_block_test.go
ContentBlock struct gains optional JSON fields for tool ID, input schema, MCP bare name, and description; FilterForNonRequester redacts these fields alongside arguments when tool-use blocks are not shared; round-trip and redaction tests validate correctness.
Tool metadata propagation through conversation flows
conversation/convert.go, conversation/convert_test.go, conversation/helpers.go, conversation/helpers_test.go
Tool schema, server origin, MCP bare name, and description are preserved when converting between ContentBlock and llm.Post.ToolUse; marshalToolSchema helper normalizes schema values to json.RawMessage; toolUseBlocks enriches blocks with full metadata; conversion tests validate schema equality and metadata fidelity.
Retry exemption for MCP meta-tools and batch-skipped results
llm/tool_retry.go, llm/tool_retry_test.go
Adds BatchSkippedToolResult, IsBatchSkippedToolResult, and IsToolRetryExempt to mark special tool outcomes; trailingFailedToolCalls tracks exempt/skipped errors separately without counting toward consecutive-failure limits; 7 new tests validate meta-tool and batch-skip edge cases.
Strict MCP dynamic tool loading registry and context options
llmcontext/llm_context.go, server/main.go
Introduces MCPToolRetrievalOverrideProvider and LoadedMCPToolStore interfaces; extends Builder with loaded-tool store and telemetry injection; adds context options for conversation ID, request context, disabled servers, tool predicates, and preloaded tools; implements registry stashing, loading restoration, and late-bind conversation ID attachment; normalizes MCP server origins.
Strict MCP registry test infrastructure and coverage
llmcontext/llm_context_test.go, llmcontext/llm_context_strict_test.go
Adds staticMCPToolProvider, fakeLoadedMCPToolStore, and telemetry fakes; introduces 30+ tests covering initial tool visibility, preload lifecycle, restoration on conversation ID attach, retrieval override application, registry search filtering, loaded/unloaded marking semantics, and telemetry emission across strict vs flag-off modes.
Unloaded MCP tool error handling in execution
toolrunner/toolrunner.go
Detects unavailable/unloaded tools early and generates error results with "load_tool" hints for unloaded MCP tools; refactors tool resolution into explicit error-type switch distinguishing missing store, unloaded MCP tools, and unknown tools; records MCP dynamic search success telemetry; enriches approved tool calls with missing schema/bare name metadata before returning for approval.
Unknown tool and unloaded tool test suite
toolrunner/toolrunner_test.go, toolrunner/toolrunner_extended_test.go, conversations/dm_conversation_test.go
Expands toolrunner tests with unknown tool error handling, nil context edge cases, batch blocking when unknown/unloaded tools present; adds 12 extended tests for server origin preservation, approval gating, callback wiring, unloaded MCP error paths, telemetry emission, and retry-limit interaction; adds DM test validating unknown tool error persistence and status propagation.
DM conversation and assistant reasoning integration
conversation/service.go, conversation/service_test.go, conversations/dm_conversation_test.go
Strips persisted assistant reasoning and reasoning signature from tool-round posts in turnsToLLMPosts; tests validate reasoning is cleared while tool results preserved; DM tests updated to construct and pass tool stores explicitly, and to validate unknown-tool error results in conversation turns.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

Setup Cloud Test Server

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 13.04% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main change: enforcing loaded tool execution in the dynamic MCP feature, which is a core objective of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dynamic-mcp-04-context-toolrunner

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@conversation/convert.go`:
- Around line 254-259: In the json.RawMessage branch of the schema handling,
detect and treat a raw "null" value as absent: after confirming schema is
json.RawMessage, check if len(raw)==0 or the trimmed bytes equal []byte("null")
(e.g., bytes.TrimSpace(raw) == []byte("null")); if so return nil, otherwise
return the copied raw with append(json.RawMessage(nil), raw...).

In `@llmcontext/llm_context.go`:
- Around line 407-410: Comparisons use raw ServerOrigin strings which can differ
by trailing slashes/whitespace; call normalizeMCPServerOrigin(...) on both
spec.ServerOrigin and tool.ServerOrigin before comparing or using as map keys so
normalized values are matched consistently (update the if check around
spec.ServerOrigin/tool.ServerOrigin and any storage/lookup of origin keys), and
apply the same normalization in the other block referenced (the code around
lines handling 640-649) to ensure allowlist/preload logic uses normalized
origins when calling llm.MCPToolNameMatches(tool.Name, spec.ToolName) and when
storing/looking up origin-based keys.
- Around line 262-267: requestContextFromLLMContext currently drops caller
request cancellation/deadline because it always falls back to Background; change
its signature to accept a fallback stdcontext.Context and return
c.RequestContext when set otherwise return the provided fallback (so callers can
pass the real request context into GetToolsForUser instead of losing
cancellation), update the call site that invokes GetToolsForUser to pass the
incoming request context through that new param; also make origin comparisons
deterministic by using normalizeMCPServerOrigin whenever comparing or building
keys in preloadMCPTools and retainOnlyAllowedMCPTools (replace raw uses of
spec.ServerOrigin and t.ServerOrigin with
normalizeMCPServerOrigin(spec.ServerOrigin) /
normalizeMCPServerOrigin(t.ServerOrigin) before equality checks and allowlist
key construction).

In `@mcp/client_manager.go`:
- Around line 184-187: The current GetToolsForUser method silently substitutes
context.Background() when ctx is nil, which masks cancellation and deadlines;
change GetToolsForUser to validate ctx at the start and return a clear *Errors
(or standard error wrapped in *Errors) indicating "nil context passed" (or
similar) instead of creating a background context, update the function
signature/return path to produce this error when ctx == nil, and adjust callers
to pass a non-nil context (or handle the new error) so context propagation is
preserved; reference the GetToolsForUser method and the Errors return type when
making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: 609b73c1-2fe3-442b-9f7c-59444be78519

📥 Commits

Reviewing files that changed from the base of the PR and between e6c9846 and e882aa7.

📒 Files selected for processing (29)
  • api/api.go
  • api/api_mcp.go
  • api/api_no_tools_test.go
  • api/api_test.go
  • conversation/content_block.go
  • conversation/content_block_test.go
  • conversation/convert.go
  • conversation/convert_test.go
  • conversation/helpers.go
  • conversation/helpers_test.go
  • conversation/service.go
  • conversation/service_test.go
  • conversations/agent_mention_reminder_test.go
  • conversations/conversations_test.go
  • conversations/dm_conversation_test.go
  • llm/tool_retry.go
  • llm/tool_retry_test.go
  • llmcontext/llm_context.go
  • llmcontext/llm_context_strict_test.go
  • llmcontext/llm_context_test.go
  • mcp/client_embedded_oauth_test.go
  • mcp/client_integration_test.go
  • mcp/client_manager.go
  • mcp/client_manager_test.go
  • mcp/client_test.go
  • server/main.go
  • toolrunner/toolrunner.go
  • toolrunner/toolrunner_extended_test.go
  • toolrunner/toolrunner_test.go

Comment thread conversation/convert.go
Comment thread llmcontext/llm_context.go
Comment thread llmcontext/llm_context.go Outdated
Comment thread mcp/client_manager.go
@nickmisasi nickmisasi force-pushed the dynamic-mcp-03-registry-meta-tools branch from e6c9846 to e5ef694 Compare May 23, 2026 02:52
@nickmisasi nickmisasi force-pushed the dynamic-mcp-04-context-toolrunner branch from e882aa7 to 80e5ad8 Compare May 23, 2026 02:52
@nickmisasi nickmisasi force-pushed the dynamic-mcp-03-registry-meta-tools branch from e5ef694 to 0928d23 Compare May 23, 2026 16:48
@nickmisasi nickmisasi force-pushed the dynamic-mcp-04-context-toolrunner branch from 80e5ad8 to c383e63 Compare May 23, 2026 16:48
Co-authored-by: Cursor <cursoragent@cursor.com>
@nickmisasi nickmisasi force-pushed the dynamic-mcp-04-context-toolrunner branch from c383e63 to 1c509aa Compare May 23, 2026 17:23
Co-authored-by: Cursor <cursoragent@cursor.com>
nickmisasi and others added 4 commits May 23, 2026 13:37
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review — No Findings

Reviewed the full diff (34 files, ~3.4k additions) for this PR which introduces strict loaded/unloaded tool visibility enforcement, MCP dynamic tool loading via search_tools/load_tool meta-tools, and hardened retry/approval behavior.

Areas analyzed

  • Meta-tool auto-approval (conversations/conversations.go): search_tools and load_tool bypass the normal policy-checker approval flow via isMCPMetaToolCall. Verified that:

    • The check correctly rejects tool calls with a ServerOrigin set, preventing a malicious MCP server from registering a same-named tool that gets auto-approved.
    • When llmContext.Tools exists, it confirms the tool has no server origin in the store — a second defense against name collisions.
    • In channel mode, applyBotChannelAutoEverywhereToolFilter removes meta-tools (empty ServerOrigin), so they are not reachable from channel contexts.
  • Tool registry scoping (llmcontext/llm_context.go): The ToolRegistry fed to meta-tools is built from mcpTools that have already been filtered by admin bot allowlist, user-disabled server origins, and channel predicates. load_tool can only materialize tools from this pre-filtered registry — it cannot bypass access controls.

  • Loaded-tool restoration (restoreLoadedMCPTools, AttachConversationID): Previously loaded tools are restored only if they still exist in the current (filtered) registry. Stale entries are deleted. After AttachConversationID, handleDMViaConversation applies a second RemoveToolsByServerOrigin pass, ensuring user-disabled server tools can't resurface through restoration.

  • Unknown/unavailable tool handling (toolrunner/toolrunner.go): The new containsUnavailableTools check runs BEFORE shouldExecute, preventing both execution and approval-prompt exposure for unknown tools. Mixed batches (some available, some not) correctly skip all tools and return batch-skipped errors.

  • Tool retry exemptions (llm/tool_retry.go): IsToolRetryExempt for meta-tools prevents search/load failures from exhausting the 3-failure limit. This is bounded by LLM token limits and natural convergence behavior — not exploitable for resource exhaustion beyond normal tool-call patterns.

  • Data redaction (conversation/content_block.go, conversation/convert.go): The new InputSchema, MCPBareName, and ToolDescription fields are properly cleared in FilterForNonRequester (for non-requesting users) and in the redact-unshared path of BlocksToPost.

  • allToolsAutoRunEverywhere with meta-tools: When a turn contains only meta-tool calls, the function returns true (shared=true). Meta-tool results contain only tool names/schemas (not user channel data), so sharing them is safe.

No medium, high, or critical vulnerabilities identified.

Open in Web View Automation 

Sent by Cursor Automation: Find vulnerabilities

Comment thread conversation/service.go
Comment on lines +482 to +490
if turn.Role == "assistant" {
// Anthropic signed thinking must be replayed byte-for-byte. Our stored
// content blocks intentionally normalize assistant output (merge text
// blocks, pair tool_use/result turns, redact private content), so the
// persisted thinking block is not safe to send back as provider history.
// Keep reasoning for UI/persistence, but omit it from rebuilt requests.
post.Reasoning = ""
post.ReasoningSignature = ""
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nickmisasi nickmisasi requested a review from crspeller May 25, 2026 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant