Skip to content

Backport today's MCP/search/audit fixes to 1.13#28872

Closed
Vishnuujain wants to merge 16 commits into
1.13from
backport-mcp-today-1.13
Closed

Backport today's MCP/search/audit fixes to 1.13#28872
Vishnuujain wants to merge 16 commits into
1.13from
backport-mcp-today-1.13

Conversation

@Vishnuujain

@Vishnuujain Vishnuujain commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Backports the 16 commits cherry-picked into 1.13.1 today onto 1.13 (protected branch, so via PR).

Includes: #27982, #28352, #28383, #28622, #28618, #28658, #28633, #28632, #28669, #28512, #28698, #28743, #28764, #28758, #28821, #28776.

All applied cleanly (no conflicts). Backend reactor build green on 1.13.1 with identical content.

🤖 Generated with Claude Code


Summary by Gitar

  • MCP Tooling & Observability:
    • Integrated per-call latency tracking, error categorization, and client identification (e.g., Claude/Cursor/VSCode) via McpUsageResource and AuthEnrichedMcpContextExtractor.
    • Implemented response-size budgeting across tools to prevent LLM context overflow, with automated truncation and fallback envelopes.
  • Search & Vectorization:
    • Vectorized testSuite and testCase entities for improved hybrid search; added specific BodyTextContributor implementations to optimize semantic payload quality.
    • Added similarityScore to search_metadata results and improved index resolution to prevent cross-type leakage.
  • Audit Logging:
    • Improved EventSubscriptionScheduler to re-arm audit log consumer triggers on startup, resolving issues where abandoned or stale Quartz triggers would stop consumer processing.
  • Refactoring:
    • Centralized response-trimming logic and standardized exception-to-status mapping across all MCP tools to ensure consistent error reporting.

This will update automatically on new commits.

pmbrull and others added 16 commits June 9, 2026 18:29
* chore(mcp): add server.json for MCP Registry publishing

Adds metadata for publishing openmetadata-mcp to the official MCP
Registry (registry.modelcontextprotocol.io). Aggregators like PulseMCP
scrape the official registry, so this single entry surfaces the server
across the ecosystem.

The server is self-hosted per deployment, so the streamable-http URL
uses an {openmetadata_host} template variable that clients resolve to
their own OpenMetadata hostname.

* chore(mcp): align server.json description with #27975 messaging

Reframes the registry description to match the "trusted context and
business semantics for AI" positioning from the README rebrand in #27975.

Also tightens the description to satisfy the schema's 100-char cap on
the field (the prior 506-char copy would have failed validation at
publish time) and adds websiteUrl pointing to the MCP docs page.

* chore(mcp): mark server.json description as the official MCP

The registry namespace (io.github.open-metadata/*) is invisible to users
browsing aggregators like PulseMCP — they see only title and description.
Calling out "Official OpenMetadata MCP" differentiates this canonical
entry from any community wrappers people might publish under other
namespaces.

* chore(mcp): clarify host variable supports custom ports

Many self-hosted OpenMetadata deployments run on the default :8585
without a reverse proxy. Spell that out in the openmetadata_host
variable description so users know they can include a port.

* fix

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
(cherry picked from commit 10f2658)
(cherry picked from commit 9d4bf69)
* MCP Tool Usage

* Update generated TypeScript types

* Address PR review feedback on MCP usage tracking

Reorder UA heuristic so VS Code wins over Claude CLI for composite
User-Agents, refactor to a predicate list, and sanitise the resolved
client name (trim, strip control chars, cap at 64 chars). Bound the
schema field to match.

Bound the latency aggregation lists in McpUsageResource with reservoir
sampling so summary/per-tool percentile estimates stay valid without
unbounded heap growth. Skip null-timestamp rows in the history loop and
update the stale /history Swagger description to reflect the ok/fail
shape. Convert CallToolOutcome to a Java record and update the recorder
flow to use accessor methods.

Fix the pre-existing regression in McpImpersonationTest where the mock
still wired the legacy callTool path. Add DefaultToolContextTest with
direct coverage for classifyException (all four ErrorCategory buckets,
cause-chain walk, null message in chain) and the unknown-tool outcome.

(cherry picked from commit 1dcf8dd)
(cherry picked from commit 21aee7f)
…erflow (#28383)

* fix(mcp): cap search_metadata response size and truncate columnNames

* fix(mcp): remove column truncation, guide LLM to use smaller page sizes

(cherry picked from commit f18943d)
(cherry picked from commit 6a627c7)
(cherry picked from commit 5c852b9)
(cherry picked from commit ca7bf47)
)

(cherry picked from commit 2c709b2)
(cherry picked from commit 11116e7)
…_entity_lineage shape (#28658)

(cherry picked from commit 570f285)
(cherry picked from commit 71aada7)
… tools (#28633)

(cherry picked from commit d96e9ba)
(cherry picked from commit 8238b7d)
…28632)

* fix(mcp): slim root_cause_analysis payload to fit LLM context limits

* gitarbot feedback: bundle RCA params into RcaRequest record (<=5 param rule)

* address review feedback: null-safety + sqlQueryKey + hint envelope

(cherry picked from commit e92b7ce)
(cherry picked from commit cb610ee)
(cherry picked from commit e5c9b85)
(cherry picked from commit 794d3cf)
#28512)

* feat(mcp): return _score as similarityScore in search_metadata tool

* test: cover similarityScore mapping in search_metadata results

---------

Co-authored-by: Vishnu Jain <121681876+Vishnuujain@users.noreply.github.com>
Co-authored-by: Vishnu Jain <vishnujtimes@gmail.com>
(cherry picked from commit 5b34107)
(cherry picked from commit a04806f)
* Fixes #27796: resolve search_metadata index by entityType to prevent cross-type leak

* gitarbot feedback: validate entityType against index registry, fall back to dataAsset

* harden search_metadata param handling: tolerate non-string entityType/query/fields and object queryFilter

(cherry picked from commit 68d3368)
(cherry picked from commit 27d0642)
…adata (#28743)

(cherry picked from commit 671fd3d)
(cherry picked from commit b82c626)
…8764)

* refactor(mcp): shared response-trim + params utils, global size-budget net, error-message null-guards

* refactor(mcp): expose serializeWithinBudget so Collate dispatcher shares the size-budget net

* refactor(mcp): move McpParams/McpResponseTrim to util package, guard RCA error messages

(cherry picked from commit 01bd98c)
(cherry picked from commit 8c80406)
(cherry picked from commit 7373f2d)
(cherry picked from commit 8ce1f48)
…kipping when the job exists (#28821)

(cherry picked from commit 05551d5)
(cherry picked from commit 6a158fb)
* refactor(mcp): shared response-trim + params utils, global size-budget net, error-message null-guards

* refactor(mcp): expose serializeWithinBudget so Collate dispatcher shares the size-budget net

* feat(mcp): trim wide-table payload in get_entity_details (column descriptions, schema/model sql)

* docs(mcp): annotate intentional non-short-circuit operators in GetEntityTool

(cherry picked from commit b90c57f)
(cherry picked from commit 9ec4bab)
@Vishnuujain Vishnuujain requested a review from a team as a code owner June 9, 2026 13:02
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

❌ PR checklist incomplete

This PR cannot be merged until the following are addressed on its linked issue:

  • No GitHub issue is linked. Link an issue in the Development section of the PR (or add Fixes #12345 to the description). For a same-org cross-repo issue, add Fixes open-metadata/<repo>#123 to the description.

The fields live on the linked issue in the Shipping project (open the issue → right sidebar → Projects). After you set them, re-run this check (or push a commit) — issue/project changes do not re-trigger it automatically.

Maintainers can bypass this check by adding the skip-pr-checks label.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@Vishnuujain Vishnuujain closed this Jun 9, 2026
@Vishnuujain Vishnuujain deleted the backport-mcp-today-1.13 branch June 9, 2026 13:04
@gitar-bot

gitar-bot Bot commented Jun 9, 2026

Copy link
Copy Markdown
Code Review ✅ Approved

Backports MCP fixes and search improvements to version 1.13, including payload truncation, audit log consumer re-arming, and hybrid search vectorization. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants