Make doc preprocessing options configurable in MCP server#17913
Open
kanlac wants to merge 1 commit into
Open
Conversation
Add --use_doc_orientation_classify and --use_doc_unwarping CLI arguments (and corresponding environment variables) to allow users to enable document preprocessing when needed. Previously hardcoded to False.
|
Thanks for your contribution! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Add
--use_doc_orientation_classifyand--use_doc_unwarpingCLI arguments (and correspondingPADDLEOCR_MCP_USE_DOC_ORIENTATION_CLASSIFY/PADDLEOCR_MCP_USE_DOC_UNWARPINGenvironment variables) to the MCP server.Previously these options were hardcoded to
Falsewith no way for users to override them. The default remainsFalseto preserve backward compatibility.Problem
When processing scanned PDF documents via the MCP server in service mode (
aistudio,self_hosted), layout detection can miss entire text blocks — lines are silently dropped from the output with no error or warning. EnablinguseDocUnwarpinganduseDocOrientationClassifyin the API request resolves the issue.Since these options are hardcoded to
Falseand not exposed as configuration, users have no way to work around this without patching the source code.Related: #17164
Usage
Changes
mcp_server/paddleocr_mcp/__main__.py: Add two new CLI arguments with env var fallbacksmcp_server/paddleocr_mcp/pipelines.py: Replace hardcodedFalsewith configurable instance variables