Skip to content

fix(mcp): cap search_metadata response size to prevent LLM context overflow#28383

Merged
Shreyansh100704 merged 9 commits into
mainfrom
fix/mcp-result-truncation
May 26, 2026
Merged

fix(mcp): cap search_metadata response size to prevent LLM context overflow#28383
Shreyansh100704 merged 9 commits into
mainfrom
fix/mcp-result-truncation

Conversation

@Shreyansh100704

@Shreyansh100704 Shreyansh100704 commented May 22, 2026

Copy link
Copy Markdown
Contributor

Describe your changes:

  • Added 100k character safety cap on search_metadata responses. When exceeded, results are trimmed and the LLM is guided to narrow the search.
  • Updated tool schema description to guide LLM toward smaller page sizes for broad queries.

Fixes #28382

Type of change:

  • Bug fix
  • Improvement
  • New feature
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation

High-level design:

N/A — small change.

Tests:

Use cases covered

Unit tests

Backend integration tests

Ingestion integration tests

Playwright (UI) tests

Manual testing performed

UI screen recording / screenshots:

Not applicable.

Checklist:

  • I have read the CONTRIBUTING document.
  • My PR title is Fixes #28382: fix(mcp): cap search_metadata response size to prevent LLM context overflow
  • My PR is linked to a GitHub issue via Fixes #28382 above.
  • I have commented on my code, particularly in hard-to-understand areas.
  • For JSON Schema changes: N/A, no schema changes.
  • For UI changes: N/A, backend only.
  • Added 3 new unit tests covering trim trigger, no-trim, and floor-of-one cases.

Summary by Gitar

  • MCP logging & safety:
    • Downgraded response size log level to DEBUG to reduce log noise.
    • Applied an 0.8 safety factor to the character limit cap for more robust response trimming.
  • Testing:
    • Added testResponseTrimmedWhenExceedingCharLimit and related cases to SearchMetadataAggregationTest to verify payload truncation logic.

This will update automatically on new commits.

@Shreyansh100704 Shreyansh100704 requested review from Copilot and removed request for Copilot May 22, 2026 16:51
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

1 similar comment
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@Shreyansh100704 Shreyansh100704 enabled auto-merge (squash) May 22, 2026 16:52
Comment thread openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java Outdated
@Shreyansh100704 Shreyansh100704 requested review from Copilot and removed request for Copilot May 22, 2026 17:08
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings May 25, 2026 04:36
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@gitar-bot

gitar-bot Bot commented May 25, 2026

Copy link
Copy Markdown
Code Review ✅ Approved 2 resolved / 2 findings

Limits search_metadata response size with a 100k character cap and 0.8 safety factor to prevent LLM context overflow. Resolves excessive log noise by downgrading response size messaging to DEBUG.

✅ 2 resolved
Performance: INFO-level log on every search response is noisy

📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:345-348
Line 345-348 logs the response size at INFO level for every search_metadata invocation. In a production environment with frequent MCP calls, this will generate significant log volume. Consider using DEBUG level for the per-request size logging and keeping only the WARN for the trimming case (which is already at WARN).

Edge Case: Proportional trimming may not bring response under the cap

📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java:350-354
The trimming logic assumes results are uniformly sized (targetCount = size * MAX / serialized.length()). If early results are significantly larger than average (e.g., entities with long descriptions or many tags), the trimmed response may still exceed MAX_RESPONSE_CHARS. Consider either re-serializing after trim to verify, or applying a small safety factor (e.g., multiply the denominator by 1.2).

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR mitigates LLM context-window overflows caused by overly large search_metadata responses in the MCP module by adding a hard response-size cap and updating the tool schema guidance to encourage smaller result sets for broad queries.

Changes:

  • Added a MAX_RESPONSE_CHARS cap (100k) to search_metadata responses and trim results when exceeded.
  • Updated the tool schema (tools.json) to guide smaller size values for broad/generic queries and adjusted the pagination example default.
  • Refined the “many results” guidance message to push toward narrower queries.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java Adds response-size measurement + trimming logic and updates user guidance messaging.
openmetadata-mcp/src/main/resources/json/data/mcp/tools.json Updates the size parameter description and example to reduce broad-query payload sizes.

Comment thread openmetadata-mcp/src/main/resources/json/data/mcp/tools.json Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@Shreyansh100704 Shreyansh100704 requested review from Copilot and removed request for Copilot May 25, 2026 09:00
Comment thread openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SearchMetadataTool.java Outdated
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings May 25, 2026 09:10
@Shreyansh100704 Shreyansh100704 force-pushed the fix/mcp-result-truncation branch from 8ac97a1 to 41f37a6 Compare May 25, 2026 09:10
@Shreyansh100704 Shreyansh100704 removed the request for review from Copilot May 25, 2026 09:10
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings May 26, 2026 05:27
@Shreyansh100704 Shreyansh100704 removed the request for review from Copilot May 26, 2026 05:27
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

Copilot AI review requested due to automatic review settings May 26, 2026 05:28
@Shreyansh100704 Shreyansh100704 removed the request for review from Copilot May 26, 2026 05:29
@github-actions

Copy link
Copy Markdown
Contributor

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

@pmbrull pmbrull added the safe to test Add this label to run secure Github workflows on PRs label May 26, 2026
@Shreyansh100704 Shreyansh100704 merged commit f18943d into main May 26, 2026
58 of 64 checks passed
@Shreyansh100704 Shreyansh100704 deleted the fix/mcp-result-truncation branch May 26, 2026 08:06
@github-actions

Copy link
Copy Markdown
Contributor

🟡 Playwright Results — all passed (10 flaky)

✅ 4247 passed · ❌ 0 failed · 🟡 10 flaky · ⏭️ 87 skipped

Shard Passed Failed Flaky Skipped
✅ Shard 1 299 0 0 4
✅ Shard 2 805 0 0 8
🟡 Shard 3 792 0 4 8
🟡 Shard 4 844 0 1 12
🟡 Shard 5 718 0 1 47
🟡 Shard 6 789 0 4 8
🟡 10 flaky test(s) (passed on retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 3, 1 retry)
  • Features/KnowledgeCenterTextEditor.spec.ts › Rich Text Editor - Text Formatting (shard 3, 1 retry)
  • Features/Table.spec.ts › Table pagination with sorting should works (shard 3, 1 retry)
  • Flow/ExploreAggregationCountsMatching.spec.ts › should verify left panel counts and tab search results for normal search (shard 3, 2 retries)
  • Pages/CustomProperties.spec.ts › Set & Update all CP types on apiCollection (shard 4, 1 retry)
  • Pages/ExplorePageRightPanel_KnowledgeCenter.spec.ts › Should remove user owner for knowledgeCenter (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Lineage/PlatformLineage.spec.ts › Verify domain platform view (shard 6, 1 retry)
  • Pages/ServiceEntity.spec.ts › Inactive Announcement create & delete (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Shreyansh100704 added a commit that referenced this pull request May 26, 2026
…erflow (#28383)

* fix(mcp): cap search_metadata response size and truncate columnNames

* fix(mcp): remove column truncation, guide LLM to use smaller page sizes
Vishnuujain pushed a commit that referenced this pull request Jun 9, 2026
…erflow (#28383)

* fix(mcp): cap search_metadata response size and truncate columnNames

* fix(mcp): remove column truncation, guide LLM to use smaller page sizes

(cherry picked from commit f18943d)
Vishnuujain pushed a commit that referenced this pull request Jun 9, 2026
…erflow (#28383)

* fix(mcp): cap search_metadata response size and truncate columnNames

* fix(mcp): remove column truncation, guide LLM to use smaller page sizes

(cherry picked from commit f18943d)
(cherry picked from commit 6a627c7)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Broad MCP queries crashing LLM

3 participants