Skip to content

Commit 4f72347

Browse files
authored
[DataProcessor] Remove legacy vl_processor directories and deprecated files (#7422)
* delete code * add unit test * fix review * delete code & fix unit test * fix unit test
1 parent fbea241 commit 4f72347

46 files changed

Lines changed: 766 additions & 10468 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

docs/usage/code_overview.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,17 +169,17 @@ The main entry file `fastdeploy/__init__.py` exports core classes:
169169
| File | Function | Development Guide |
170170
|------|----------|-------------------|
171171
| `text_processor.py` | `BaseDataProcessor` text processor base class | Input processing extension |
172-
| `ernie4_5_processor.py` | ERNIE 4.5 input processor | Baidu model input processing |
172+
| `multimodal_processor.py` | Unified multimodal processor | Multimodal input processing |
173173
| `ernie4_5_tokenizer.py` | ERNIE 4.5 tokenizer | Tokenization logic modification |
174174
| `preprocess.py` | Input preprocessing utilities | Preprocessing flow |
175175

176176
**Multimodal Processing Subdirectories**:
177177

178178
| Directory | Function |
179179
|-----------|----------|
180-
| `ernie4_5_vl_processor/` | ERNIE 4.5 VL image/video processing |
181-
| `qwen_vl_processor/` | Qwen VL multimodal processing |
182-
| `paddleocr_vl_processor/` | PaddleOCR VL processing |
180+
| `encodings/` | Model-specific encoding strategies (Ernie, Qwen, PaddleOCR) |
181+
| `image_processors/` | Image preprocessing (Adaptive, Qwen, Qwen3, PaddleOCR) |
182+
| `multimodal_processor.py` | Unified multimodal processor |
183183

184184
---
185185

docs/zh/usage/code_overview.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -169,17 +169,17 @@ FastDeploy/
169169
| 文件 | 功能 | 开发指引 |
170170
|------|------|----------|
171171
| `text_processor.py` | `BaseDataProcessor` 文本处理器基类 | 输入处理扩展 |
172-
| `ernie4_5_processor.py` | ERNIE 4.5 输入处理器 | 百度模型输入处理 |
172+
| `multimodal_processor.py` | 统一多模态处理器 | 多模态输入处理 |
173173
| `ernie4_5_tokenizer.py` | ERNIE 4.5 分词器 | 分词逻辑修改 |
174174
| `preprocess.py` | 输入预处理工具 | 预处理流程 |
175175

176176
**多模态处理子目录**:
177177

178178
| 目录 | 功能 |
179179
|------|------|
180-
| `ernie4_5_vl_processor/` | ERNIE 4.5 VL 图像/视频处理 |
181-
| `qwen_vl_processor/` | Qwen VL 多模态处理 |
182-
| `paddleocr_vl_processor/` | PaddleOCR VL 处理 |
180+
| `encodings/` | 模型特定编码策略 (Ernie, Qwen, PaddleOCR) |
181+
| `image_processors/` | 图像预处理 (Adaptive, Qwen, Qwen3, PaddleOCR) |
182+
| `multimodal_processor.py` | 统一多模态处理器 |
183183

184184
---
185185

fastdeploy/input/base_processor.py

Lines changed: 7 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -15,11 +15,11 @@
1515
"""Abstract base class for all data processors.
1616
1717
Provides unified response-processing logic (ids2tokens, process_response_dict*,
18-
update_stop_seq, update_bad_words, pad_batch_data, …) extracted from the two
19-
existing concrete processors:
18+
update_stop_seq, update_bad_words, pad_batch_data, …) shared by all concrete
19+
processors:
2020
21-
DataProcessor (fastdeploy/input/text_processor.py)
22-
Ernie4_5Processor (fastdeploy/input/ernie4_5_processor.py)
21+
TextProcessor (fastdeploy/input/text_processor.py)
22+
MultiModalProcessor (fastdeploy/input/multimodal_processor.py)
2323
2424
Key design decisions
2525
--------------------
@@ -28,16 +28,12 @@
2828
of each subclass. Subclasses that do not call ``super().__init__()`` must
2929
initialise those three attributes themselves.
3030
31-
* ``process_response_dict`` reads ``stream`` from ``kwargs`` (DataProcessor
32-
convention). Callers that previously passed ``stream`` as a positional
33-
argument (ERNIE convention) must be updated to use ``stream=`` keyword.
31+
* ``process_response_dict`` reads ``stream`` from ``kwargs`` (default: True).
3432
35-
* EOS removal uses ``in self.eos_token_ids`` (list membership). ERNIE's
36-
``eos_token_ids`` contains exactly one element, so this is equivalent to the
37-
``==`` check it currently uses.
33+
* EOS removal uses ``in self.eos_token_ids`` (list membership).
3834
3935
* tool_parser result never updates ``outputs["text"]``; only ``tool_calls`` is
40-
set. This matches DataProcessor behaviour.
36+
set.
4137
4238
* ``ids2tokens`` always returns a three-tuple
4339
``(delta_text, previous_token_ids, previous_texts)``. The HF-tokeniser

fastdeploy/input/ernie4_5_processor.py

Lines changed: 0 additions & 42 deletions
This file was deleted.

fastdeploy/input/ernie4_5_vl_processor/__init__.py

Lines changed: 0 additions & 28 deletions
This file was deleted.

0 commit comments

Comments
 (0)