diff --git a/docs/version3.x/pipeline_usage/PP-StructureV3.en.md b/docs/version3.x/pipeline_usage/PP-StructureV3.en.md index ad52a51de80..e3f1bf7b5d9 100644 --- a/docs/version3.x/pipeline_usage/PP-StructureV3.en.md +++ b/docs/version3.x/pipeline_usage/PP-StructureV3.en.md @@ -1536,7 +1536,7 @@ If not set, the default is True. format_block_content Meaning:Whether to format the content in block_content as Markdown.
-Description: If not set, the initialized default value will be used, which is False by default. +Description: If not set, the initialized default value will be used, which is False by default. When set to True, the block_content of image-type blocks will contain image path information (e.g., <img src="..." />). When set to False (default), the block_content of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to True. bool @@ -2302,7 +2302,7 @@ If set to None, the default value is True. format_block_content Meaning:Whether to format the content in block_content as Markdown.
-Description: If set to None, the default value is False. +Description: If set to None, the default value is False. When set to True, the block_content of image-type blocks will contain image path information (e.g., <img src="..." />). When set to False (default), the block_content of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to True. bool|None @@ -2480,7 +2480,7 @@ If set to None, the instantiation value is used; otherwise, this pa format_block_content -Whether to format the content in block_content as Markdown. If set to None, the instantiation value is used; otherwise, this parameter takes precedence. +Whether to format the content in block_content as Markdown. If set to None, the instantiation value is used; otherwise, this parameter takes precedence. When set to True, the block_content of image-type blocks will contain image path information (e.g., <img src="..." />). When set to False (default), the block_content of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to True. bool|None @@ -2778,7 +2778,7 @@ If enabled, the cell detection model will not be used, and only the table struct
  • use_seal_recognition: (bool) Whether to enable seal text recognition sub-pipeline
  • use_table_recognition: (bool) Whether to enable table recognition sub-pipeline
  • use_formula_recognition: (bool) Whether to enable formula recognition sub-pipeline
  • -
  • format_block_content: (bool) Controls whether to format the block_content into Markdown format
  • +
  • format_block_content: (bool) Controls whether to format the block_content into Markdown format. When set to True, the block_content of image-type blocks will contain image path information (e.g., <img src="..." />). When set to False (default), the block_content of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to True.
  • markdown_ignore_labels: (List[str]) Labels of layout regions that need to be ignored in Markdown
  • diff --git a/docs/version3.x/pipeline_usage/PP-StructureV3.md b/docs/version3.x/pipeline_usage/PP-StructureV3.md index b708cac44ba..319d5ec0b06 100644 --- a/docs/version3.x/pipeline_usage/PP-StructureV3.md +++ b/docs/version3.x/pipeline_usage/PP-StructureV3.md @@ -1508,7 +1508,7 @@ paddleocr pp_structurev3 -i ./pp_structure_v3_demo.png --device gpu format_block_content -含义:是否将block_content中的内容格式化为Markdown格式。
    说明:如果不设置,将使用产线初始化的该参数值,默认初始化为False
    +含义:是否将block_content中的内容格式化为Markdown格式。
    说明:如果不设置,将使用产线初始化的该参数值,默认初始化为False。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为True
    bool @@ -2225,7 +2225,7 @@ for item in markdown_images: format_block_content -是否将block_content中的内容格式化为Markdown格式。如果设置为None,将使用产线初始化的该参数值,默认初始化为False。 +是否将block_content中的内容格式化为Markdown格式。如果设置为None,将使用产线初始化的该参数值,默认初始化为False。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为Truebool|None @@ -2393,7 +2393,7 @@ for item in markdown_images: format_block_content 含义:是否将block_content中的内容格式化为Markdown格式。 -
    说明:设置为None表示使用实例化参数,否则该参数优先级更高。
    +
    说明:设置为None表示使用实例化参数,否则该参数优先级更高。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为True
    bool|None @@ -2664,7 +2664,7 @@ for item in markdown_images:
  • use_seal_recognition: (bool) 控制是否启用印章文本识别子产线
  • use_table_recognition: (bool) 控制是否启用表格识别子产线
  • use_formula_recognition: (bool) 控制是否启用公式识别子产线
  • -
  • format_block_content: (bool) 控制是否将 block_content 中的内容格式化为Markdown格式
  • +
  • format_block_content: (bool) 控制是否将 block_content 中的内容格式化为Markdown格式。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为True
  • markdown_ignore_labels: (List[str]) 需要在Markdown中忽略的版面标签
  • diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md index 32eab6ae56c..12cb973a1bd 100644 --- a/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL.en.md @@ -484,7 +484,7 @@ If not set, the initialized default value will be used, which is initialized to format_block_content Meaning:Controls whether to format the block_content content within as Markdown.
    Description: -If not set, the initialized default value will be used, which defaults to initialization asFalse. +If not set, the initialized default value will be used, which defaults to initialization asFalse. When set to True, the block_content of image-type blocks will contain image path information (e.g., <img src="..." />). When set to False (default), the block_content of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to True. bool @@ -939,7 +939,7 @@ If set to None, the initialized default value will be used, which i format_block_content Meaning:Controls whether to format the block_content content within as Markdown.
    Description: -If set to None, the initialized default value will be used, which defaults to initialization asFalse. +If set to None, the initialized default value will be used, which defaults to initialization asFalse. When set to True, the block_content of image-type blocks will contain image path information (e.g., <img src="..." />). When set to False (default), the block_content of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to True. bool|None None @@ -1181,8 +1181,8 @@ Setting it to None means using the instantiation parameter; otherwi format_block_content Meaning:The parameter meaning is basically the same as the instantiation parameter.
    -Description: -Setting it to None means using the instantiation parameter; otherwise, this parameter takes precedence. +Description: +Setting it to None means using the instantiation parameter; otherwise, this parameter takes precedence. When set to True, the block_content of image-type blocks will contain image path information (e.g., <img src="..." />). When set to False (default), the block_content of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to True. bool|None None @@ -1411,7 +1411,7 @@ Setting it to None means using the instantiation parameter; otherwi - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline. - `use_layout_detection`: `(bool)` Controls whether to enable the layout detection module. - `use_chart_recognition`: `(bool)` Controls whether to enable the chart recognition function. - - `format_block_content`: `(bool)` Controls whether to save the formatted markdown content in `JSON`. + - `format_block_content`: `(bool)` Controls whether to save the formatted markdown content in `JSON`. When set to `True`, the `block_content` of image-type blocks will contain image path information (e.g., ``). When set to `False` (default), the `block_content` of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to `True`. - `markdown_ignore_labels`: `(List[str])` Labels of layout regions that need to be ignored in Markdown - `doc_preprocessor_res`: `(Dict[str, Union[List[float], str]])` A dictionary of document preprocessing results, which exists only when `use_doc_preprocessor=True`. @@ -1438,7 +1438,7 @@ Setting it to None means using the instantiation parameter; otherwi - `use_doc_preprocessor`: `(bool)` Controls whether to enable the document preprocessing sub-pipeline. - `use_layout_detection`: `(bool)` Controls whether to enable the layout detection module. - `use_chart_recognition`: `(bool)` Controls whether to enable the chart recognition function. - - `format_block_content`: `(bool)` Controls whether to save the formatted markdown content in `JSON`. + - `format_block_content`: `(bool)` Controls whether to save the formatted markdown content in `JSON`. When set to `True`, the `block_content` of image-type blocks will contain image path information (e.g., ``). When set to `False` (default), the `block_content` of image-type blocks will only contain OCR-recognized text content without image paths. To include image paths in JSON output, set this parameter to `True`. - `doc_preprocessor_res`: `(Dict[str, Union[List[float], str]])` A dictionary of document preprocessing results, which exists only when `use_doc_preprocessor=True`. - `input_path`: `(str)` The image path accepted by the document preprocessing sub-pipeline. When the input is a `numpy.ndarray`, it is saved as `None`; here, it is `None`. diff --git a/docs/version3.x/pipeline_usage/PaddleOCR-VL.md b/docs/version3.x/pipeline_usage/PaddleOCR-VL.md index c2fddb9adc4..9a3edf649ce 100644 --- a/docs/version3.x/pipeline_usage/PaddleOCR-VL.md +++ b/docs/version3.x/pipeline_usage/PaddleOCR-VL.md @@ -474,7 +474,7 @@ paddleocr doc_parser -i ./paddleocr_vl_demo.png --use_layout_detection False format_block_content 含义:控制是否将 block_content 中的内容格式化为Markdown格式。
    -说明:如果不设置,将使用初始化的默认值,默认初始化为False。 +说明:如果不设置,将使用初始化的默认值,默认初始化为False。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为Truebool @@ -909,7 +909,7 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"] format_block_content 含义:控制是否将 block_content 中的内容格式化为Markdown格式。
    -说明:如果设置为None,将使用初始化的默认值,默认初始化为False。 +说明:如果设置为None,将使用初始化的默认值,默认初始化为False。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为Truebool|None None @@ -1142,7 +1142,7 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"] format_block_content 含义:参数含义与实例化参数基本相同。
    说明: -设置为None表示使用实例化参数,否则该参数优先级更高。 +设置为None表示使用实例化参数,否则该参数优先级更高。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为Truebool|None None @@ -1363,7 +1363,7 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]
  • use_doc_preprocessor: (bool) 控制是否启用文档预处理子产线
  • use_layout_detection: (bool) 控制是否启用版面检测模块
  • use_chart_recognition: (bool) 控制是否开启图表识别功能
  • -
  • format_block_content: (bool) 控制是否在JSON中保存格式化后的markdown内容
  • +
  • format_block_content: (bool) 控制是否在JSON中保存格式化后的markdown内容。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为True
  • doc_preprocessor_res: (Dict[str, Union[str, Dict[str, bool], int]]) 文档预处理子产线的输出结果。仅当use_doc_preprocessor=True时存在 @@ -1399,7 +1399,7 @@ output = pipeline.predict(["imgs/file1.png", "imgs/file2.png", "imgs/file3.png"]
  • use_doc_preprocessor: (bool) 控制是否启用文档预处理子产线
  • use_layout_detection: (bool) 控制是否启用版面检测模块
  • use_chart_recognition: (bool) 控制是否开启图表识别功能
  • -
  • format_block_content: (bool) 控制是否在JSON中保存格式化后的markdown内容
  • +
  • format_block_content: (bool) 控制是否在JSON中保存格式化后的markdown内容。当设置为True时,图片类型的 block 的 block_content 将包含图片路径信息(如 <img src="..." />);当设置为False(默认)时,图片类型的 block 的 block_content 仅包含 OCR 识别的文本内容,不包含图片路径。如需在 JSON 输出中获取图片地址,请将此参数设置为True
  • doc_preprocessor_res: (Dict[str, Union[str, Dict[str, bool], int]]) 文档预处理子产线的输出结果。仅当use_doc_preprocessor=True时存在