Skip to content

dynamic mcp: add registry and loading tools#773

Open
nickmisasi wants to merge 2 commits into
dynamic-mcp-02-client-catalogfrom
dynamic-mcp-03-registry-meta-tools
Open

dynamic mcp: add registry and loading tools#773
nickmisasi wants to merge 2 commits into
dynamic-mcp-02-client-catalogfrom
dynamic-mcp-03-registry-meta-tools

Conversation

@nickmisasi
Copy link
Copy Markdown
Collaborator

@nickmisasi nickmisasi commented May 23, 2026

Summary

Adds the dynamic MCP tool registry, BM25 search, and the search_tools / load_tool meta-tools used to discover and materialize MCP tools on demand.

Stack: 3/6

Ticket Link

N/A

Screenshots

N/A

Release Note

Added dynamic MCP search and load tools for on-demand MCP tool discovery.

Made with Cursor

Summary by CodeRabbit

Release Notes

  • New Features
    • Added dynamic MCP tool discovery and runtime loading capabilities, enabling language models to search for available tools and load them on-demand
    • Implemented relevance-based search with fallback matching to help locate tools when exact names are unknown
    • Tools can now be loaded incrementally during conversations, reducing initial context overhead

Review Change Stack

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 23, 2026

🤖 LLM Evaluation Results

OpenAI

⚠️ Overall: 21/28 tests passed (75.0%)

Provider Total Passed Failed Pass Rate
⚠️ OPENAI 28 21 7 75.0%

❌ Failed Evaluations

Show 7 failures

OPENAI

1. TestReactEval/[openai]_react_cat_message

  • Score: 0.00
  • Rubric: The word/emoji is a cat emoji or a heart/love emoji
  • Reason: The output is the text "smiley_cat", not an actual cat emoji (e.g., 🐱/😺) or a heart/love emoji (e.g., ❤️/💖).

2. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: is a list of bugs
  • Reason: The output does not provide a list of bugs; it states inability to access bug tracker data and offers a template for the user to fill in.

3. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: includes a description of each bug
  • Reason: The output does not include descriptions of each bug; it states it cannot enumerate bugs without source data and provides a blank template table.

4. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes each bug to a user
  • Reason: The output does not attribute any specific bugs to any users; it only asks the user to provide a bug list and offers a template with a 'Reported by' column left blank.

5. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes the bug about trying to save without a color and the save button not doing anything to @maria.nunez
  • Reason: The output does not mention the specific bug (trying to save without a color; save button not doing anything) nor does it attribute it to @maria.nunez. It only states it lacks access to bug data and provides a template.

6. TestConversationMentionHandling/[openai]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: the bug about the end user being able to change channel banner is attributed to @maria.nunez
  • Reason: The output does not mention the specific bug about an end user being able to change the channel banner, nor does it attribute that bug to @maria.nunez. It only asks for source data and provides a generic template.

7. TestDirectMessageConversations/[openai]_bot_dm_tool_introspection

  • Score: 0.00
  • Rubric: mentions Github and refers to the documentation
  • Reason: The output refers to documentation (docs.mattermost.com) but does not mention GitHub anywhere. Since the rubric requires both mentioning GitHub and referring to documentation, it fails.

Anthropic

⚠️ Overall: 21/28 tests passed (75.0%)

Provider Total Passed Failed Pass Rate
⚠️ ANTHROPIC 28 21 7 75.0%

❌ Failed Evaluations

Show 7 failures

ANTHROPIC

1. TestReactEval/[anthropic]_react_cat_message

  • Score: 0.00
  • Rubric: The word/emoji is a cat emoji or a heart/love emoji
  • Reason: The output is the text string "heart_eyes_cat", not an actual cat emoji (e.g., 😺/🐱) or a heart/love emoji (e.g., ❤️/😍).

2. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: is a list of bugs
  • Reason: The output does not provide any actual bug list; it states inability to access trackers and suggests how to create a list, but contains no bugs themselves.

3. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: includes a description of each bug
  • Reason: The output does not describe any specific bugs; it states inability to access bug trackers and suggests how to compile a list, but provides no descriptions of each bug.

4. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes each bug to a user
  • Reason: The output states it cannot access bug tracking systems and suggests how to create a list, but it does not list any bugs or attribute any bug to a user.

5. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: attributes the bug about trying to save without a color and the save button not doing anything to @maria.nunez
  • Reason: The output does not mention @maria.nunez and does not attribute any specific bug (saving without a color / save button not doing anything) to her; it only states lack of access and suggests how to gather bugs.

6. TestConversationMentionHandling/[anthropic]_conversation_from_attribution_long_thread.json

  • Score: 0.00
  • Rubric: the bug about the end user being able to change channel banner is attributed to @maria.nunez
  • Reason: The output does not mention the specific bug about an end user being able to change the channel banner, nor does it attribute that bug to @maria.nunez. It only states it cannot access bug tracking systems and suggests how to compile a list.

7. TestDirectMessageConversations/[anthropic]_bot_dm_tool_introspection

  • Score: 0.00
  • Rubric: mentions Github and refers to the documentation
  • Reason: The output refers to documentation (docs.mattermost.com) but does not mention GitHub. Since the rubric requires both mentioning GitHub and referring to the documentation, it fails.

This comment was automatically generated by the eval CI pipeline.

@nickmisasi
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 23, 2026

📝 Walkthrough

Walkthrough

This PR implements a complete MCP tool discovery and dynamic loading system. It introduces a BM25 text search index, an in-memory tool registry using BM25 for relevance matching with name-based fallback, and two LLM-callable meta tools (search_tools and load_tool) that enable runtime tool discovery, ranking, and loading with telemetry.

Changes

MCP Tool Discovery and Loading System

Layer / File(s) Summary
BM25 Search Index
mcp/bm25.go, mcp/bm25_test.go
BM25 types (BM25Document, BM25Result, BM25Index), constructor builds term/document frequencies and document length statistics, Search tokenizes/deduplicates queries, scores documents using BM25 formula, filters non-positive scores, sorts deterministically by score then ID, and applies result limits. Tests verify relevance ranking, multi-part text matching, tie-breaking, empty/no-match handling, and non-Latin tokenization (Japanese, Chinese, namespaced snake-case identifiers).
Tool Registry with Search and Fallback
mcp/dynamic_registry.go, mcp/dynamic_registry_test.go
Registry types (ToolRegistry, ToolRegistryEntry, ToolSearchResult), introspection methods (Len, List, Lookup), and dual-path search (Search uses BM25; ClosestMatches falls back to name-based token/substring scoring when BM25 returns nil). Supports optional per-server retrieval-summary overrides. Tests cover deterministic top-N ordering, empty queries, retrieval overrides, duplicate tool handling (same namespace vs. different namespaces), BM25/fallback routing, nil-safety, and filtered-tool-only visibility.
Meta Tools: LLM Discovery Interface
mcp/meta_tools.go, mcp/meta_tools_test.go
Two meta tools (search_tools, load_tool) with JSON schemas, option plumbing, and resolvers that close over the registry. search_tools resolver trims query, optionally marks search state on llm.Context, returns top-N registry matches, emits search events. load_tool resolver validates name/context/registry, looks up tool by exact name, invokes optional recorder callback, adds tool to context, marks loaded, emits load events, returns schema and closest-match suggestions on miss. Tests verify meta-tool definitions, search result limiting and empty handling, load exact matching and miss fallback, recorder invocation and error propagation, retrieval-override application, nil-registry handling, telemetry emission across success/error scenarios, and context state markers.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 1.69% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: adding a dynamic MCP tool registry and loading tools (search_tools and load_tool meta-tools).
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dynamic-mcp-03-registry-meta-tools

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
mcp/dynamic_registry_test.go (1)

63-70: ⚡ Quick win

Use a table-driven structure for these two query cases.

This function exercises multiple test cases and should be table-driven per repository test rules.

As per coding guidelines, "**/*_test.go: Go tests must be table-driven when there is more than one test case".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp/dynamic_registry_test.go` around lines 63 - 70, The test
TestToolRegistrySearchEmptyQuery currently asserts two cases inline; convert it
to a table-driven test that iterates over cases for empty and whitespace-only
queries. Create a slice of structs (e.g., name and query) inside
TestToolRegistrySearchEmptyQuery, loop over them with t.Run(case.name, func(t
*testing.T) { ... }), construct the registry with NewToolRegistry and
testRegistryTool as before, call registry.Search(case.query, 10) and assert
require.Nil(t, ...) for each case; this keeps the same assertions but follows
the repository rule requiring table-driven tests.
mcp/bm25_test.go (1)

54-61: ⚡ Quick win

Convert this multi-case test to a table-driven test.

This test currently validates more than one case inline; please convert it to a table-driven shape to match test conventions.

As per coding guidelines, "**/*_test.go: Go tests must be table-driven when there is more than one test case".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@mcp/bm25_test.go` around lines 54 - 61, Replace the inline multi-case
assertions in TestBM25EmptyQueryReturnsNil with a table-driven subtest loop:
define a slice of test cases (each with a name and query string for the empty
and whitespace cases), construct the BM25 index once using NewBM25Index, then
iterate over cases calling t.Run(case.name, func(t *testing.T) { require.Nil(t,
idx.Search(case.query, 10)) }). Keep references to NewBM25Index and idx.Search
and ensure each case has its own name for clarity.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@mcp/dynamic_registry.go`:
- Around line 152-153: The method searchWithIndex currently calls r.bm25.Search
(and later may call ClosestMatches) without checking that r.bm25 is non-nil,
which will panic for a zero-value ToolRegistry; update searchWithIndex in
ToolRegistry to first guard if r.bm25 == nil (or otherwise not initialized) and
return an empty []ToolSearchResult (or a safe default) instead of calling
r.bm25.Search/ClosestMatches, ensuring any callers of searchWithIndex receive a
valid empty result when the index is absent.

---

Nitpick comments:
In `@mcp/bm25_test.go`:
- Around line 54-61: Replace the inline multi-case assertions in
TestBM25EmptyQueryReturnsNil with a table-driven subtest loop: define a slice of
test cases (each with a name and query string for the empty and whitespace
cases), construct the BM25 index once using NewBM25Index, then iterate over
cases calling t.Run(case.name, func(t *testing.T) { require.Nil(t,
idx.Search(case.query, 10)) }). Keep references to NewBM25Index and idx.Search
and ensure each case has its own name for clarity.

In `@mcp/dynamic_registry_test.go`:
- Around line 63-70: The test TestToolRegistrySearchEmptyQuery currently asserts
two cases inline; convert it to a table-driven test that iterates over cases for
empty and whitespace-only queries. Create a slice of structs (e.g., name and
query) inside TestToolRegistrySearchEmptyQuery, loop over them with
t.Run(case.name, func(t *testing.T) { ... }), construct the registry with
NewToolRegistry and testRegistryTool as before, call registry.Search(case.query,
10) and assert require.Nil(t, ...) for each case; this keeps the same assertions
but follows the repository rule requiring table-driven tests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI (base), Organization UI (inherited)

Review profile: CHILL

Plan: Pro

Run ID: ad1bb676-7703-4a3a-81e5-493bb9d8f01e

📥 Commits

Reviewing files that changed from the base of the PR and between dbdf599 and e6c9846.

📒 Files selected for processing (6)
  • mcp/bm25.go
  • mcp/bm25_test.go
  • mcp/dynamic_registry.go
  • mcp/dynamic_registry_test.go
  • mcp/meta_tools.go
  • mcp/meta_tools_test.go

Comment thread mcp/dynamic_registry.go
@nickmisasi nickmisasi force-pushed the dynamic-mcp-02-client-catalog branch from dbdf599 to eb5e85f Compare May 23, 2026 02:52
@nickmisasi nickmisasi force-pushed the dynamic-mcp-03-registry-meta-tools branch from e6c9846 to e5ef694 Compare May 23, 2026 02:52
Co-authored-by: Cursor <cursoragent@cursor.com>
@nickmisasi nickmisasi force-pushed the dynamic-mcp-03-registry-meta-tools branch from e5ef694 to 0928d23 Compare May 23, 2026 16:48
Co-authored-by: Cursor <cursoragent@cursor.com>
Copy link
Copy Markdown
Contributor

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Security Review: No Vulnerabilities Found

Reviewed the dynamic MCP tool registry, BM25 search, and search_tools/load_tool meta-tools for security concerns.

Analysis

Scope of this PR: Pure data-structure and search logic. Introduces an immutable ToolRegistry built from a pre-provided list of tools, a BM25 text-search index, and two meta-tools that let the LLM discover and load tools at runtime.

Key security invariant: The ToolRegistry must be constructed with only tools the requesting user is authorized to use. This is enforced upstream by Context.KeepMCPTool and Context.DisabledMCPServerOrigins (applied "before strict registry construction" per field docs), with the integration layer in the next PR in the stack.

Checked and verified:

  • No permission bypass: load_tool only loads from the pre-filtered registry (exact name match required). No wildcard or fuzzy loading. The registry is immutable post-construction.
  • No data leakage: search_tools returns only names and summaries from the registry scope. Error responses from load_tool (closest matches) also only reference tools already in the user's filtered registry.
  • No secrets exposure: No credentials, tokens, or admin-only data in search/load responses. Tool schemas are MCP server-defined metadata, not sensitive.
  • No injection vectors: Search queries are only used for BM25 token scoring against pre-loaded data. No SQL, no shell, no template rendering.
  • No DoS risk: BM25 tokenization is O(n) with no backtracking. Index size is bounded by admin-configured MCP tool count.
  • Proper null safety: All code paths handle nil context, nil registry, nil tool store, and empty inputs gracefully with structured error responses.

The code delegates the permission-filtering responsibility to the caller (registry constructor), which is a sound architectural pattern. The filtering fields (KeepMCPTool, DisabledMCPServerOrigins, PreloadedMCPTools) are already defined in llm.Context with clear documentation of when they're applied.

Open in Web View Automation 

Sent by Cursor Automation: Find vulnerabilities

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant