Skip to content

Commit add79ed

Browse files
authored
fix vlm dataset processor with no_padding formate (#224) (#235)
1 parent 0a95003 commit add79ed

10 files changed

Lines changed: 34 additions & 18 deletions

File tree

README.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -355,7 +355,7 @@ We evaluated the Eagle3 model trained by AngelSlim on tasks including code gener
355355

356356
#### 1.1 Qwen3 Series Models
357357

358-
Benchmark results for Qwen3 series models using Eagle3 speculative decoding on vLLM (v0.11.2) across **MT-bench**, **HumanEval**, **GSM8K** and **Alpaca**, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**).
358+
Benchmark results for Qwen3 series models using Eagle3 speculative decoding on vLLM (v0.11.2) across **MT-bench**, **HumanEval**, **GSM8K** and **Alpaca**, using a single GPU (**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**).
359359

360360
<table>
361361
<thead>
@@ -495,7 +495,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v
495495

496496
##### 1.2.1 Qwen3-VL Series Models
497497

498-
Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding on vLLM (v0.12.0) across language and multimodal tasks, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
498+
Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding on vLLM (v0.12.0) across language and multimodal tasks, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
499499

500500
<table><thead>
501501
<tr>
@@ -652,7 +652,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o
652652

653653
##### 1.2.2 HunyuanOCR Model
654654

655-
Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.13.0) across **[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
655+
Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.13.0) across **[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)** dataset, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
656656

657657
<table><thead>
658658
<tr>
@@ -685,7 +685,7 @@ Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.1
685685

686686
##### 1.3.1 Qwen2-Audio Model
687687

688-
Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.12.0) across **[LibriSpeech](https://www.openslr.org/12)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
688+
Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.12.0) across **[LibriSpeech](https://www.openslr.org/12)** dataset, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
689689

690690
<table><thead>
691691
<tr>
@@ -716,7 +716,7 @@ Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.
716716

717717
##### 1.3.2 Fun-CosyVoice3 Model
718718

719-
Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **[LibriTTS](https://www.openslr.org/60/)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
719+
Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **[LibriTTS](https://www.openslr.org/60/)** dataset, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
720720

721721
<table><thead>
722722
<tr>

README_cn.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -358,7 +358,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
358358

359359
#### 1.1 Qwen3系列模型
360360

361-
我们使用vLLM(v0.11.2)评测了Qwen3系列Eagle3模型在**MT-bench****HumanEval****GSM8K****Alpaca**等数据集上的接收长度和吞吐。全部结果都是在单张H20上用以下设置测得**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**
361+
我们使用vLLM(v0.11.2)评测了Qwen3系列Eagle3模型在**MT-bench****HumanEval****GSM8K****Alpaca**等数据集上的接收长度和吞吐。全部结果都是在单张GPU上用以下设置测得**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**
362362

363363
<table>
364364
<thead>
@@ -498,7 +498,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
498498

499499
##### 1.2.1 Qwen3-VL系列模型
500500

501-
我们使用(v0.12.0)评测了Qwen3-VL系列Eagle3模型在语言理解任务和多模态理解任务上的接收长度和吞吐。全部结果都是在单张H20上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
501+
我们使用(v0.12.0)评测了Qwen3-VL系列Eagle3模型在语言理解任务和多模态理解任务上的接收长度和吞吐。全部结果都是在单张GPU上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
502502

503503
<table><thead>
504504
<tr>
@@ -655,7 +655,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
655655

656656
##### 1.2.2 HunyuanOCR模型
657657

658-
我们使用(v0.13.0)评测了HunyuanOCR Eagle3模型在[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)上的接收长度和吞吐。结果是在单张H20上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
658+
我们使用(v0.13.0)评测了HunyuanOCR Eagle3模型在[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)上的接收长度和吞吐。结果是在单张GPU上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
659659

660660
<table><thead>
661661
<tr>
@@ -688,7 +688,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
688688

689689
##### 1.3.1 Qwen2-Audio模型
690690

691-
我们使用(v0.12.0)评测了Qwen2-Audio Eagle3模型在[LibriSpeech](https://www.openslr.org/12)数据集上的接收长度和吞吐。结果是在单张H20上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
691+
我们使用(v0.12.0)评测了Qwen2-Audio Eagle3模型在[LibriSpeech](https://www.openslr.org/12)数据集上的接收长度和吞吐。结果是在单张GPU上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
692692

693693
<table><thead>
694694
<tr>
@@ -718,7 +718,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
718718
</table>
719719

720720
##### 1.3.2 Fun-CosyVoice3模型
721-
我们评测了Fun-CosyVoice3 Eagle3模型在[LibriTTS](https://www.openslr.org/60/)数据集上的接收长度。结果是在单张H20上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
721+
我们评测了Fun-CosyVoice3 Eagle3模型在[LibriTTS](https://www.openslr.org/60/)数据集上的接收长度。结果是在单张GPU上用以下设置测得**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**
722722

723723
<table><thead>
724724
<tr>

angelslim/data/dataloader.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@ def create_data_loader(
4343
inference_settings: Dict = None,
4444
use_audio_in_video: bool = False,
4545
model_name: str = None,
46+
quantization_config: str = None,
4647
) -> DataLoader:
4748
"""
4849
Create appropriate DataLoader based on data source
@@ -94,6 +95,7 @@ def create_data_loader(
9495
data_source=data_source,
9596
is_hf_dataset=not os.path.isfile(data_source),
9697
model_name=model_name,
98+
quantization_config=quantization_config,
9799
)
98100
elif data_type == "Text2ImageDataset":
99101
dataset = Text2ImageDataset(

angelslim/data/multimodal_dataset.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -37,10 +37,12 @@ def __init__(
3737
data_source: Union[str, Dict] = None,
3838
is_hf_dataset: bool = False,
3939
model_name: str = None,
40+
quantization_config: str = None,
4041
):
4142
super().__init__(processor, device, max_length)
4243
self.is_hf_dataset = is_hf_dataset
4344
self.model_name = model_name
45+
self.quant_algo = quantization_config.name if quantization_config else None
4446

4547
if is_hf_dataset:
4648
self._load_hf_dataset(data_source, num_samples)
@@ -174,14 +176,21 @@ def _load_hf_dataset(self, dataset: str, num_samples: int):
174176

175177
def _process_and_append(self, messages: List[Dict], tools=None):
176178
"""Process messages and append to dataset"""
179+
180+
# max_length padding for int4 gptq, gptaq and awq
181+
if "int4_" in self.quant_algo:
182+
padding = "max_length"
183+
else:
184+
padding = True
185+
177186
if self.model_name in ["Qwen3VL", "Qwen3VLMoE"]:
178187
inputs = self.processor.apply_chat_template(
179188
messages,
180189
tools=tools,
181190
tokenize=True,
182191
add_generation_prompt=True,
183192
return_dict=True,
184-
padding="max_length",
193+
padding=padding,
185194
truncation=True,
186195
return_tensors="pt",
187196
max_length=self.max_length,
@@ -196,7 +205,7 @@ def _process_and_append(self, messages: List[Dict], tools=None):
196205
inputs = self.processor(
197206
text=[text],
198207
images=image_inputs,
199-
padding="max_length",
208+
padding=padding,
200209
truncation=True,
201210
return_tensors="pt",
202211
max_length=self.max_length,
@@ -214,7 +223,7 @@ def _process_and_append(self, messages: List[Dict], tools=None):
214223
text=[text],
215224
images=image_inputs,
216225
videos=video_inputs,
217-
padding="max_length",
226+
padding=padding,
218227
truncation=True,
219228
return_tensors="pt",
220229
max_length=self.max_length,

angelslim/engine.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -149,6 +149,7 @@ def prepare_data(
149149
inference_settings=None,
150150
use_audio_in_video=False,
151151
model_name=None,
152+
quantization_config=None,
152153
) -> Optional[Any]:
153154
"""Prepare compression dataset"""
154155
if custom_dataloader is not None:
@@ -174,6 +175,7 @@ def prepare_data(
174175
inference_settings=inference_settings,
175176
use_audio_in_video=use_audio_in_video,
176177
model_name=model_name,
178+
quantization_config=quantization_config,
177179
)
178180
self.max_seq_length = max_length
179181

docs/source/features/speculative_decoding/eagle/audio_asr_eagle.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
本项目包括Eagle3的训练以及benchmark测试,并开源了Qwen2Audio的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)
55

66
我们训练的Qwen2Audio Eagle3模型的表现可以参见基准测试[benchmarks](../../performance/speculative_decoding/benchmarks.md)
7-
其中全部数据都是在单张H20上使用vLLM推理获得
7+
其中全部数据都是在单张GPU上使用vLLM推理获得
88

99
## 1. 支持模型列表
1010
- `Qwen2Audio`

docs/source/features/speculative_decoding/eagle/eagle.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
本项目包括Eagle3的训练以及benchmark测试,并开源了Qwen3和Hunyuan系列的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)
55

66
我们训练的Qwen3系列Eagle3模型的表现可以参见基准测试[benchmarks](../../../performance/speculative_decoding/benchmarks.md)
7-
其中全部数据都是在单张H20上使用vLLM推理获得
7+
其中全部数据都是在单张GPU上使用vLLM推理获得
88

99
## 1. 数据生成
1010

docs/source/features/speculative_decoding/eagle/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
本项目包括Eagle3的训练以及benchmark测试,并开源了Hunyuan、HunyuanOCR、Qwen3、Qwen3-VL、Qwen2Audio、Fun-CosyVoice3等模型的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)
55

66
我们训练的Eagle3模型的表现可以参见基准测试[benchmarks](../../../performance/speculative_decoding/benchmarks.md)
7-
其中全部数据都是在单张H20上使用vLLM推理获得
7+
其中全部数据都是在单张GPU上使用vLLM推理获得
88

99
:::{toctree}
1010
:caption: Contents

docs/source/features/speculative_decoding/eagle/vlm_eagle.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
本项目包括Eagle3的训练以及benchmark测试,并开源了HunyuanOCR和Qwen3-VL系列的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)
55

66
我们训练的HunyuanOCR和Qwen3-VL系列Eagle3模型的表现可以参见基准测试[benchmarks](../../../performance/speculative_decoding/benchmarks.md)
7-
其中全部数据都是在单张H20上使用vLLM推理获得
7+
其中全部数据都是在单张GPU上使用vLLM推理获得
88
## 1. 支持模型列表
99
- `HunyuanOCR`
1010
- `Qwen3-VL`
@@ -88,7 +88,9 @@ bash scripts/speculative/hunyuan_ocr/generate_vlm_hidden_for_draft_model.sh
8888
# For Qwen3-VL series
8989
bash scripts/speculative/qwen3_vl/generate_vlm_hidden_for_draft_model.sh
9090
```
91-
> 注意:qwen3_vl系列模型生成hidden states需要更新transformers库: `pip install git+https://github.com/huggingface/transformers.git`
91+
> 注意:qwen3_vl系列模型生成hidden states需要更新transformers>=5.0.0,
92+
或者cherry-pick: https://github.com/huggingface/transformers/pull/42609,
93+
否则抓取的hidden states不可用!!!
9294

9395
**脚本参数说明:**
9496

tools/run.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,7 @@ def run(config):
169169
inference_settings=dataset_config.inference_settings,
170170
use_audio_in_video=model_config.use_audio_in_video,
171171
model_name=model_config.name,
172+
quantization_config=compress_config.quantization,
172173
)
173174

174175
# Step 5: Initialize compressor

0 commit comments

Comments
 (0)