Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 22 additions & 1 deletion docs/version3.x/pipeline_usage/PP-ChatOCRv4.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -971,7 +971,7 @@ Before using the PP-ChatOCRv4-doc pipeline locally, ensure you have completed th

Please note: If you encounter issues such as the program becoming unresponsive, unexpected program termination, running out of memory resources, or extremely slow inference during execution, please try adjusting the configuration according to the documentation, such as disabling unnecessary features or using lighter-weight models.

Before performing model inference, you first need to prepare the API key for the large language model. PP-ChatOCRv4 supports large model services on the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService) or the locally deployed standard OpenAI interface. If using the Baidu Cloud Qianfan Platform, refer to [Authentication and Authorization](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) to obtain the API key. If using a locally deployed large model service, refer to the [PaddleNLP Large Model Deployment Documentation](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm) for deployment of the dialogue interface and vectorization interface for large models, and fill in the corresponding `base_url` and `api_key`. If you need to use a multimodal large model for data fusion, refer to the OpenAI service deployment in the [PaddleMIX Model Documentation](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2) for multimodal large model deployment, and fill in the corresponding `base_url` and `api_key`.
Before performing model inference, you first need to prepare the API key for the large language model. PP-ChatOCRv4 supports large model services on the [Baidu Cloud Qianfan Platform](https://console.bce.baidu.com/qianfan/ais/console/onlineService), [MiniMax Cloud](https://www.minimax.io/), or the locally deployed standard OpenAI interface. If using the Baidu Cloud Qianfan Platform, refer to [Authentication and Authorization](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) to obtain the API key. If using MiniMax Cloud, obtain an API key from [MiniMax Platform](https://platform.minimaxi.com/) and pass it via `--minimax_api_key` or the `MINIMAX_API_KEY` environment variable. If using a locally deployed large model service, refer to the [PaddleNLP Large Model Deployment Documentation](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm) for deployment of the dialogue interface and vectorization interface for large models, and fill in the corresponding `base_url` and `api_key`. If you need to use a multimodal large model for data fusion, refer to the OpenAI service deployment in the [PaddleMIX Model Documentation](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2) for multimodal large model deployment, and fill in the corresponding `base_url` and `api_key`.

**Note**: If local deployment of a multimodal large model is restricted due to the local environment, you can comment out the lines containing the `mllm` variable in the code and only use the large language model for information extraction.

Expand All @@ -983,6 +983,9 @@ After updating the configuration file, you can complete quick inference using ju
```bash
paddleocr pp_chatocrv4_doc -i vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key

# Use MiniMax Cloud as the LLM provider
paddleocr pp_chatocrv4_doc -i vehicle_certificate-1.png -k 驾驶室准乘人数 --minimax_api_key your_minimax_api_key

# 通过 --invoke_mllm 和 --pp_docbee_base_url 使用多模态大模型
paddleocr pp_chatocrv4_doc -i vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key --invoke_mllm True --pp_docbee_base_url http://127.0.0.1:8080/
```
Expand Down Expand Up @@ -1381,6 +1384,12 @@ Any float > <code>0</code></li>
<td></td>
</tr>
<tr>
<td><code>minimax_api_key</code></td>
<td><b>Meaning:</b>API key for <a href="https://platform.minimaxi.com/">MiniMax Cloud</a>. When set, uses MiniMax (MiniMax-M2.7, 204K context) as the LLM provider instead of Qianfan. Can also be set via the <code>MINIMAX_API_KEY</code> environment variable.</td>
<td><code>str</code></td>
<td></td>
</tr>
<tr>
<td><code>device</code></td>
<td><b>Meaning:</b>The device used for inference.<br/>
<b>Description:</b>
Expand Down Expand Up @@ -1479,6 +1488,18 @@ chat_bot_config = {
"api_key": "api_key", # your api_key
}

# Alternatively, use MiniMax Cloud as the LLM provider:
# from paddleocr._pipelines.llm_config import get_minimax_chat_bot_config
# chat_bot_config = get_minimax_chat_bot_config() # reads MINIMAX_API_KEY env var
# Or configure manually:
# chat_bot_config = {
# "module_name": "chat_bot",
# "model_name": "MiniMax-M2.7", # 204K context window
# "base_url": "https://api.minimax.io/v1",
# "api_type": "openai",
# "api_key": "your_minimax_api_key",
# }

retriever_config = {
"module_name": "retriever",
"model_name": "embedding-v1",
Expand Down
23 changes: 22 additions & 1 deletion docs/version3.x/pipeline_usage/PP-ChatOCRv4.md
Original file line number Diff line number Diff line change
Expand Up @@ -861,7 +861,7 @@ devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://padd

**请注意,如果在执行过程中遇到程序失去响应、程序异常退出、内存资源耗尽、推理速度极慢等问题,请尝试参考文档调整配置,例如关闭不需要使用的功能或使用更轻量的模型。**

在进行模型推理之前,首先需要准备大语言模型的 api_key,PP-ChatOCRv4 支持在[百度云千帆平台](https://console.bce.baidu.com/qianfan/ais/console/onlineService)或者本地部署的标准 OpenAI 接口大模型服务。如果使用百度云千帆平台,可以参考[认证鉴权](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) 获取 api_key。如果使用本地部署的大模型服务,可以参考[PaddleNLP大模型部署文档](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm)进行大模型部署对话接口部署和向量化接口部署,并填写对应的 base_url 和 api_key 即可。如果需要使用多模态大模型进行数据融合,可以参考[PaddleMIX模型文档](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2)中的OpenAI服务部署进行多模态大模型部署,并填写对应的 base_url 和 api_key 即可。
在进行模型推理之前,首先需要准备大语言模型的 api_key,PP-ChatOCRv4 支持在[百度云千帆平台](https://console.bce.baidu.com/qianfan/ais/console/onlineService)、[MiniMax 开放平台](https://platform.minimaxi.com/)或者本地部署的标准 OpenAI 接口大模型服务。如果使用百度云千帆平台,可以参考[认证鉴权](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) 获取 api_key。如果使用 MiniMax 开放平台,可以在 [MiniMax 开放平台](https://platform.minimaxi.com/) 获取 API key,并通过 `--minimax_api_key` 或 `MINIMAX_API_KEY` 环境变量传入。如果使用本地部署的大模型服务,可以参考[PaddleNLP大模型部署文档](https://github.com/PaddlePaddle/PaddleNLP/tree/develop/llm)进行大模型部署对话接口部署和向量化接口部署,并填写对应的 base_url 和 api_key 即可。如果需要使用多模态大模型进行数据融合,可以参考[PaddleMIX模型文档](https://github.com/PaddlePaddle/PaddleMIX/tree/develop/paddlemix/examples/ppdocbee2)中的OpenAI服务部署进行多模态大模型部署,并填写对应的 base_url 和 api_key 即可。

**注:** 如果因本地环境限制无法在本地部署多模态大模型,可以将代码中的含有“mllm”变量的行注释掉,仅使用大语言模型完成信息抽取。

Expand All @@ -872,6 +872,9 @@ devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://padd
```bash
paddleocr pp_chatocrv4_doc -i vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key

# 使用 MiniMax 开放平台作为大语言模型
paddleocr pp_chatocrv4_doc -i vehicle_certificate-1.png -k 驾驶室准乘人数 --minimax_api_key your_minimax_api_key

# 通过 --invoke_mllm 和 --pp_docbee_base_url 使用多模态大模型
paddleocr pp_chatocrv4_doc -i vehicle_certificate-1.png -k 驾驶室准乘人数 --qianfan_api_key your_api_key --invoke_mllm True --pp_docbee_base_url http://127.0.0.1:8080/
```
Expand Down Expand Up @@ -1239,6 +1242,12 @@ paddleocr pp_chatocrv4_doc -i vehicle_certificate-1.png -k 驾驶室准乘人数
<td></td>
</tr>
<tr>
<td><code>minimax_api_key</code></td>
<td><b>含义:</b><a href="https://platform.minimaxi.com/">MiniMax 开放平台</a> 的 API key。设置后将使用 MiniMax(MiniMax-M2.7,204K 上下文)作为大语言模型,替代千帆平台。也可通过 <code>MINIMAX_API_KEY</code> 环境变量设置。</td>
<td><code>str</code></td>
<td></td>
</tr>
<tr>
<td><code>device</code></td>
<td><b>含义:</b>用于推理的设备。<br/>
<b>说明:</b>支持指定具体卡号:
Expand Down Expand Up @@ -1332,6 +1341,18 @@ chat_bot_config = {
"api_key": "api_key", # your api_key
}

# 也可以使用 MiniMax 开放平台作为大语言模型:
# from paddleocr._pipelines.llm_config import get_minimax_chat_bot_config
# chat_bot_config = get_minimax_chat_bot_config() # 读取 MINIMAX_API_KEY 环境变量
# 或手动配置:
# chat_bot_config = {
# "module_name": "chat_bot",
# "model_name": "MiniMax-M2.7", # 204K 上下文窗口
# "base_url": "https://api.minimax.io/v1",
# "api_type": "openai",
# "api_key": "your_minimax_api_key",
# }

retriever_config = {
"module_name": "retriever",
"model_name": "embedding-v1",
Expand Down
13 changes: 13 additions & 0 deletions docs/version3.x/pipeline_usage/PP-DocTranslation.en.md
Original file line number Diff line number Diff line change
Expand Up @@ -693,6 +693,9 @@ You can download the [test file](https://paddle-model-ecology.bj.bcebos.com/padd

```bash
paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --qianfan_api_key your_api_key

# Use MiniMax Cloud as the LLM provider
paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --minimax_api_key your_minimax_api_key
```

<details><summary><b>Command line supports more parameter settings. Click to expand for detailed description of command line parameters</b></summary>
Expand Down Expand Up @@ -1244,6 +1247,12 @@ If not set, the pipeline initialized value will be used, default is <code>True</
<td></td>
</tr>
<tr>
<td><code>minimax_api_key</code></td>
<td><b>Meaning:</b>API key for <a href="https://platform.minimaxi.com/">MiniMax Cloud</a>. When set, uses MiniMax (MiniMax-M2.7, 204K context) as the LLM provider instead of Qianfan. Can also be set via the <code>MINIMAX_API_KEY</code> environment variable.</td>
<td><code>str</code></td>
<td></td>
</tr>
<tr>
<td><code>device</code></td>
<td><b>Meaning:</b>Device used for inference.<br/>
<b>Description:</b>
Expand Down Expand Up @@ -1342,6 +1351,10 @@ chat_bot_config = {
"api_key": "api_key", # your api_key
}

# Alternatively, use MiniMax Cloud as the LLM provider:
# from paddleocr._pipelines.llm_config import get_minimax_chat_bot_config
# chat_bot_config = get_minimax_chat_bot_config() # reads MINIMAX_API_KEY env var

if input_path.lower().endswith(".md"):
# Read markdown documents, supporting passing in directories and url links with the .md suffix
ori_md_info_list = pipeline.load_from_markdown(input_path)
Expand Down
13 changes: 13 additions & 0 deletions docs/version3.x/pipeline_usage/PP-DocTranslation.md
Original file line number Diff line number Diff line change
Expand Up @@ -690,6 +690,9 @@ devanagari_PP-OCRv3_mobile_rec_infer.tar">推理模型</a>/<a href="https://padd

```bash
paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --qianfan_api_key your_api_key

# 使用 MiniMax 开放平台作为大语言模型
paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --minimax_api_key your_minimax_api_key
```

<details><summary><b>命令行支持更多参数设置,点击展开以查看命令行参数的详细说明</b></summary>
Expand Down Expand Up @@ -1265,6 +1268,12 @@ paddleocr pp_doctranslation -i vehicle_certificate-1.png --target_language en --
<td></td>
</tr>
<tr>
<td><code>minimax_api_key</code></td>
<td><b>含义:</b><a href="https://platform.minimaxi.com/">MiniMax 开放平台</a> 的 API key。设置后将使用 MiniMax(MiniMax-M2.7,204K 上下文)作为大语言模型,替代千帆平台。也可通过 <code>MINIMAX_API_KEY</code> 环境变量设置。</td>
<td><code>str</code></td>
<td></td>
</tr>
<tr>
<td><code>device</code></td>
<td><b>含义:</b>用于推理的设备。<br/>
<b>含义:</b>
Expand Down Expand Up @@ -1366,6 +1375,10 @@ chat_bot_config = {
"api_key": "api_key", # your api_key
}

# 也可以使用 MiniMax 开放平台作为大语言模型:
# from paddleocr._pipelines.llm_config import get_minimax_chat_bot_config
# chat_bot_config = get_minimax_chat_bot_config() # 读取 MINIMAX_API_KEY 环境变量

if input_path.lower().endswith(".md"):
# 读取markdown文档,支持传入目录和以 .md 为后缀的 url 链接
ori_md_info_list = pipeline.load_from_markdown(input_path)
Expand Down
63 changes: 63 additions & 0 deletions paddleocr/_pipelines/llm_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# Copyright (c) 2025 PaddlePaddle Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

"""Helpers for building LLM chat_bot_config from third-party provider keys."""

import os


# MiniMax models and their context window sizes (tokens).
MINIMAX_MODELS = {
"MiniMax-M2.7": 204_800,
"MiniMax-M2.7-highspeed": 204_800,
}

MINIMAX_DEFAULT_MODEL = "MiniMax-M2.7"
MINIMAX_BASE_URL = "https://api.minimax.io/v1"


def get_minimax_chat_bot_config(api_key=None):
"""Return a ``chat_bot_config`` dict targeting MiniMax Cloud API.

Parameters
----------
api_key : str or None
MiniMax API key. When *None* the ``MINIMAX_API_KEY`` environment
variable is used as a fallback.

Returns
-------
dict
A config dict ready to be passed as ``chat_bot_config`` to any
pipeline that accepts it (PP-ChatOCRv4-doc, PP-DocTranslation, …).

Raises
------
ValueError
If no API key is provided and the environment variable is not set.
"""
api_key = api_key or os.environ.get("MINIMAX_API_KEY")
if not api_key:
raise ValueError(
"A MiniMax API key is required. Pass it via --minimax_api_key "
"or set the MINIMAX_API_KEY environment variable."
)

return {
"module_name": "chat_bot",
"model_name": MINIMAX_DEFAULT_MODEL,
"base_url": MINIMAX_BASE_URL,
"api_type": "openai",
"api_key": api_key,
}
15 changes: 14 additions & 1 deletion paddleocr/_pipelines/pp_chatocrv4_doc.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
str2bool,
)
from .base import PaddleXPipelineWrapper, PipelineCLISubcommandExecutor
from .llm_config import get_minimax_chat_bot_config
from .utils import create_config_from_structure


Expand Down Expand Up @@ -680,15 +681,27 @@ def _update_subparser(self, subparser):
type=str,
help="Configuration for the multimodal large language model.",
)
subparser.add_argument(
"--minimax_api_key",
type=str,
help="API key for MiniMax Cloud. When set, uses MiniMax as the "
"LLM provider for information extraction instead of Qianfan. "
"Can also be set via the MINIMAX_API_KEY environment variable.",
)

def execute_with_args(self, args):
params = get_subcommand_args(args)
input = params.pop("input")
keys = params.pop("keys")
save_path = params.pop("save_path")
invoke_mllm = params.pop("invoke_mllm")
minimax_api_key = params.pop("minimax_api_key")
qianfan_api_key = params.pop("qianfan_api_key")
if qianfan_api_key is not None:
if minimax_api_key is not None:
params["chat_bot_config"] = get_minimax_chat_bot_config(
minimax_api_key
)
elif qianfan_api_key is not None:
params["retriever_config"] = {
"module_name": "retriever",
"model_name": "embedding-v1",
Expand Down
15 changes: 14 additions & 1 deletion paddleocr/_pipelines/pp_doctranslation.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@
)
from .._utils.logging import logger
from .base import PaddleXPipelineWrapper, PipelineCLISubcommandExecutor
from .llm_config import get_minimax_chat_bot_config
from .utils import create_config_from_structure


Expand Down Expand Up @@ -906,14 +907,26 @@ def _update_subparser(self, subparser):
type=str,
help="Configuration for the embedding model.",
)
subparser.add_argument(
"--minimax_api_key",
type=str,
help="API key for MiniMax Cloud. When set, uses MiniMax as the "
"LLM provider for document translation instead of Qianfan. "
"Can also be set via the MINIMAX_API_KEY environment variable.",
)

def execute_with_args(self, args):
params = get_subcommand_args(args)
input = params.pop("input")
target_language = params.pop("target_language")
save_path = params.pop("save_path")
minimax_api_key = params.pop("minimax_api_key")
qianfan_api_key = params.pop("qianfan_api_key")
if qianfan_api_key is not None:
if minimax_api_key is not None:
params["chat_bot_config"] = get_minimax_chat_bot_config(
minimax_api_key
)
elif qianfan_api_key is not None:
params["chat_bot_config"] = {
"module_name": "chat_bot",
"model_name": "ernie-3.5-8k",
Expand Down
Loading
Loading