fix(mcp): add dynamic response truncation for oversized info tool responses by aminghadersohi · Pull Request #39107 · apache/superset

aminghadersohi · 2026-04-04T13:25:44Z

Summary

When MCP info tools (get_chart_info, get_dataset_info, get_dashboard_info, get_instance_info) return responses exceeding the configured token limit, the ResponseSizeGuardMiddleware now dynamically truncates large fields instead of blocking the response entirely with an error.

Problem

Individual object responses from info tools can exceed the 25K token limit when they contain:

Datasets with hundreds of columns/metrics
Dashboards with many charts, each with long descriptions
Charts with large form_data dicts or long descriptions
Dashboards with large json_metadata, position_json, or filter_state
Nested fields like charts[i].description inside dashboard responses

Previously, these responses were completely blocked. But info tools don't support pagination, so the LLM client had no fallback.

Solution

Five-phase progressive truncation that preserves partial data while fitting within the token budget:

Phase 1: Truncate long top-level string fields (description, css, json_metadata, position_json) to 500 chars
Phase 2: Truncate large list fields (columns, metrics, charts) to 30 items
Phase 3: Recursively truncate strings inside nested structures (e.g. charts[i].description, filter_state.dataMask)
Phase 4: Aggressively reduce lists to 10 items and summarize large dicts (>20 keys)
Phase 5: Nuclear — empty all list/dict fields, keeping only scalar metadata

Design decisions

No inline markers in lists: Truncated lists are simply shortened without appending marker dicts, preserving typed list contracts (e.g. List[TableColumnInfo] stays homogeneous)
Recursive depth cap: Nested string truncation is capped at depth 10 to prevent runaway recursion
Info tools only: Non-info tools (list tools, execute_sql, get_chart_data) continue to be blocked with actionable error messages — truncating tabular data would lose integrity
Complements PR fix(mcp): strip json_metadata and position_json from get_dashboard_info response #39101: That PR strips json_metadata/position_json at the schema level and uses DashboardChartSummary instead of full ChartInfo; this PR adds a safety net for anything that still exceeds limits

Monitoring

New event: mcp_response_truncated logged when truncation succeeds (tool name, original/truncated token counts, which fields were truncated)
Truncated responses include _response_truncated: true and _truncation_notes metadata

Changes

superset/mcp_service/utils/token_utils.py: Added truncate_oversized_response(), _truncate_strings_recursive(), and helper functions
superset/mcp_service/middleware.py: Added _try_truncate_info_response() method to ResponseSizeGuardMiddleware

Testing

69 unit tests pass (5 new for recursive truncation, all pre-existing pass)
Tests cover: top-level truncation, nested truncation, list truncation without markers, dict summarization, progressive phases, dashboard edge case with 30 charts

When info tools (get_chart_info, get_dataset_info, get_dashboard_info, get_instance_info) return responses exceeding the token limit, the ResponseSizeGuardMiddleware now progressively truncates large fields instead of returning an error. Truncation is applied in four phases: 1. Truncate long string fields (description, css, sql) 2. Truncate large list fields (columns, metrics, charts) 3. Aggressively reduce lists and summarize large dicts 4. Replace all collections with summary markers This preserves partial data for the LLM client while staying within the token budget. Non-info tools (list tools, execute_sql) continue to be blocked with actionable error messages as before.

bito-code-review · 2026-04-04T13:25:54Z

Code Review Agent Run #041747

Actionable Suggestions - 0

Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

superset/mcp_service/utils/token_utils.py - 1
- Incorrect truncation size reporting · Line 419-439

Review Details

Files reviewed - 4 · Commit Range: 24db1ad..24db1ad
- superset/mcp_service/middleware.py
- superset/mcp_service/utils/token_utils.py
- tests/unit_tests/mcp_service/test_middleware.py
- tests/unit_tests/mcp_service/utils/test_token_utils.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

/review - Manually triggers a full AI review.
/pause - Pauses automatic reviews on this pull request.
/resume - Resumes automatic reviews.
/resolve - Marks all Bito-posted review comments as resolved.
/abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by

codecov · 2026-04-04T13:28:48Z

Codecov Report

❌ Patch coverage is 0% with 116 lines in your changes missing coverage. Please review.
✅ Project coverage is 64.46%. Comparing base (d796543) to head (0b3c2d0).
⚠️ Report is 84 commits behind head on master.

Files with missing lines	Patch %	Lines
superset/mcp_service/utils/token_utils.py	0.00%	89 Missing ⚠️
superset/mcp_service/middleware.py	0.00%	27 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #39107      +/-   ##
==========================================
- Coverage   64.52%   64.46%   -0.06%     
==========================================
  Files        2536     2536              
  Lines      131208   131324     +116     
  Branches    30457    30485      +28     
==========================================
  Hits        84661    84661              
- Misses      45084    45200     +116     
  Partials     1463     1463

Flag	Coverage Δ
hive	`39.96% <0.00%> (-0.09%)`	⬇️
mysql	`60.75% <0.00%> (-0.13%)`	⬇️
postgres	`60.83% <0.00%> (-0.13%)`	⬇️
presto	`39.97% <0.00%> (-0.09%)`	⬇️
python	`62.42% <0.00%> (-0.13%)`	⬇️
sqlite	`60.45% <0.00%> (-0.13%)`	⬇️
unit	`100.00% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copilot

Pull request overview

This PR improves MCP “info” tool reliability by adding a progressive response-truncation fallback when responses exceed the configured token budget, so the middleware can return partial metadata instead of hard-blocking with an error.

Changes:

Added truncate_oversized_response() plus helper truncation phases in token_utils.py, and introduced an INFO_TOOLS allowlist.
Updated ResponseSizeGuardMiddleware to attempt truncation for allowlisted info tools before raising ToolError, and to emit a new mcp_response_truncated monitoring event on success.
Added unit tests for truncation helpers and middleware integration tests covering truncation vs blocking behavior.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File	Description
`superset/mcp_service/utils/token_utils.py`	Adds INFO tool allowlist and progressive truncation helpers to fit oversized info responses within the token limit.
`superset/mcp_service/middleware.py`	Tries truncation for info tools before blocking, and logs `mcp_response_truncated` on successful truncation.
`tests/unit_tests/mcp_service/utils/test_token_utils.py`	Adds unit tests for truncation helpers and `truncate_oversized_response()`.
`tests/unit_tests/mcp_service/test_middleware.py`	Adds middleware tests ensuring info tools truncate while non-info tools still block.

Address review feedback: do not append marker dicts into truncated lists as this breaks typed list contracts (e.g. List[TableColumnInfo]). Truncation metadata is communicated through top-level _truncation_notes instead. Also remove misleading "use select_columns" suggestion from truncation messages since info tools don't accept that parameter.

Add Phase 3 (recursive string truncation) to handle strings inside nested structures like charts[i].description, filter_state.dataMask, and native_filters[i].config that top-level truncation misses. Without this, a dashboard with 30 charts each having a 10KB description would still blow the token limit even after list truncation, because the strings inside each chart item were untouched. The recursive walker is depth-capped at 10 to prevent runaway recursion.

bito-code-review · 2026-04-04T14:58:24Z

Code Review Agent Run #5a4c9a

Actionable Suggestions - 0

Filtered by Review Rules

Bito filtered these suggestions based on rules created automatically for your feedback. Manage rules.

superset/mcp_service/utils/token_utils.py - 1
- Recursive truncation misses strings in lists · Line 451-456

Review Details

Files reviewed - 2 · Commit Range: 24db1ad..0b3c2d0
- superset/mcp_service/utils/token_utils.py
- tests/unit_tests/mcp_service/utils/test_token_utils.py
Files skipped - 0
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

/review - Manually triggers a full AI review.
/pause - Pauses automatic reviews on this pull request.
/resume - Resumes automatic reviews.
/resolve - Marks all Bito-posted review comments as resolved.
/abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by

mistercrunch

LGTM

…ponses (#39107) (cherry picked from commit 83ad1ec)

Copilot AI review requested due to automatic review settings April 4, 2026 13:25

pull-request-size Bot added the size/XL label Apr 4, 2026

Copilot started reviewing on behalf of aminghadersohi April 4, 2026 13:26 View session

Copilot AI reviewed Apr 4, 2026

View reviewed changes

Comment thread superset/mcp_service/utils/token_utils.py Outdated

codeant-ai-for-open-source Bot reviewed Apr 4, 2026

View reviewed changes

Comment thread superset/mcp_service/utils/token_utils.py Outdated

aminghadersohi added 2 commits April 4, 2026 09:47

aminghadersohi changed the title ~~fix(mcp): add dynamic response truncation for info tools~~ fix(mcp): add dynamic response truncation for oversized info tool responses Apr 4, 2026

mistercrunch approved these changes Apr 6, 2026

View reviewed changes

aminghadersohi merged commit 83ad1ec into apache:master Apr 6, 2026
67 checks passed

michael-s-molina pushed a commit that referenced this pull request Apr 13, 2026

fix(mcp): add dynamic response truncation for oversized info tool res…

4764eea

…ponses (#39107) (cherry picked from commit 83ad1ec)

sadpandajoe added 🎪 ⚡ showtime-trigger-start Create new ephemeral environment for this PR and removed 🎪 ⚡ showtime-trigger-start Create new ephemeral environment for this PR labels Apr 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(mcp): add dynamic response truncation for oversized info tool responses#39107

fix(mcp): add dynamic response truncation for oversized info tool responses#39107
aminghadersohi merged 3 commits intoapache:masterfrom
aminghadersohi:aminghadersohi/response-size-guard-info-tools

aminghadersohi commented Apr 4, 2026 •

edited

Loading

Uh oh!

bito-code-review Bot commented Apr 4, 2026 •

edited

Loading

Code Review Agent Run #041747

Uh oh!

codecov Bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

bito-code-review Bot commented Apr 4, 2026 •

edited

Loading

Code Review Agent Run #5a4c9a

Uh oh!

mistercrunch left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

aminghadersohi commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Design decisions

Monitoring

Changes

Testing

Uh oh!

bito-code-review Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Agent Run #041747

Uh oh!

codecov Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

bito-code-review Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Agent Run #5a4c9a

Uh oh!

mistercrunch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

aminghadersohi commented Apr 4, 2026 •

edited

Loading

bito-code-review Bot commented Apr 4, 2026 •

edited

Loading

codecov Bot commented Apr 4, 2026 •

edited

Loading

bito-code-review Bot commented Apr 4, 2026 •

edited

Loading