Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -371,7 +371,7 @@ We evaluated the Eagle3 model trained by AngelSlim on tasks including code gener

#### 1.1 Qwen3 Series Models

Benchmark results for Qwen3 series models using Eagle3 speculative decoding on vLLM (v0.11.2) across **MT-bench**, **HumanEval**, **GSM8K** and **Alpaca**, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**).
Benchmark results for Qwen3 series models using Eagle3 speculative decoding on vLLM (v0.11.2) across **MT-bench**, **HumanEval**, **GSM8K** and **Alpaca**, using a single GPU (**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**).

<table>
<thead>
Expand Down Expand Up @@ -511,7 +511,7 @@ Benchmark results for Qwen3 series models using Eagle3 speculative decoding on v

##### 1.2.1 Qwen3-VL Series Models

Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding on vLLM (v0.12.0) across language and multimodal tasks, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding on vLLM (v0.12.0) across language and multimodal tasks, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).

<table><thead>
<tr>
Expand Down Expand Up @@ -668,7 +668,7 @@ Benchmark results for Qwen3-VL series models using Eagle3 speculative decoding o

##### 1.2.2 HunyuanOCR Model

Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.13.0) across **[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.13.0) across **[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)** dataset, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).

<table><thead>
<tr>
Expand Down Expand Up @@ -701,7 +701,7 @@ Benchmark results for HunyuanOCR using Eagle3 speculative decoding on vLLM (v0.1

##### 1.3.1 Qwen2-Audio Model

Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.12.0) across **[LibriSpeech](https://www.openslr.org/12)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.12.0) across **[LibriSpeech](https://www.openslr.org/12)** dataset, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).

<table><thead>
<tr>
Expand Down Expand Up @@ -732,7 +732,7 @@ Benchmark results for Qwen2-Audio using Eagle3 speculative decoding on vLLM (v0.

##### 1.3.2 Fun-CosyVoice3 Model

Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **[LibriTTS](https://www.openslr.org/60/)** dataset, using a single NVIDIA H20 GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).
Benchmark results for Fun-CosyVoice3 using Eagle3 speculative decoding across **[LibriTTS](https://www.openslr.org/60/)** dataset, using a single GPU (**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**).

<table><thead>
<tr>
Expand Down
10 changes: 5 additions & 5 deletions README_cn.md
Original file line number Diff line number Diff line change
Expand Up @@ -372,7 +372,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta

#### 1.1 Qwen3系列模型

我们使用vLLM(v0.11.2)评测了Qwen3系列Eagle3模型在**MT-bench**、 **HumanEval**、 **GSM8K**、**Alpaca**等数据集上的接收长度和吞吐。全部结果都是在单张H20上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**。
我们使用vLLM(v0.11.2)评测了Qwen3系列Eagle3模型在**MT-bench**、 **HumanEval**、 **GSM8K**、**Alpaca**等数据集上的接收长度和吞吐。全部结果都是在单张GPU上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=2, batch_size=1, output_len=1024**。

<table>
<thead>
Expand Down Expand Up @@ -512,7 +512,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta

##### 1.2.1 Qwen3-VL系列模型

我们使用(v0.12.0)评测了Qwen3-VL系列Eagle3模型在语言理解任务和多模态理解任务上的接收长度和吞吐。全部结果都是在单张H20上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。
我们使用(v0.12.0)评测了Qwen3-VL系列Eagle3模型在语言理解任务和多模态理解任务上的接收长度和吞吐。全部结果都是在单张GPU上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。

<table><thead>
<tr>
Expand Down Expand Up @@ -669,7 +669,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta

##### 1.2.2 HunyuanOCR模型

我们使用(v0.13.0)评测了HunyuanOCR Eagle3模型在[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)上的接收长度和吞吐。结果是在单张H20上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。
我们使用(v0.13.0)评测了HunyuanOCR Eagle3模型在[OmniDocBench](https://huggingface.co/datasets/opendatalab/OmniDocBench)上的接收长度和吞吐。结果是在单张GPU上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。

<table><thead>
<tr>
Expand Down Expand Up @@ -702,7 +702,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta

##### 1.3.1 Qwen2-Audio模型

我们使用(v0.12.0)评测了Qwen2-Audio Eagle3模型在[LibriSpeech](https://www.openslr.org/12)数据集上的接收长度和吞吐。结果是在单张H20上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。
我们使用(v0.12.0)评测了Qwen2-Audio Eagle3模型在[LibriSpeech](https://www.openslr.org/12)数据集上的接收长度和吞吐。结果是在单张GPU上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。

<table><thead>
<tr>
Expand Down Expand Up @@ -732,7 +732,7 @@ bash scripts/deploy/lm_eval.sh -d 0,1 -t 2 -g 0.8 -r $RESULT_PATH -b "auto" --ta
</table>

##### 1.3.2 Fun-CosyVoice3模型
我们评测了Fun-CosyVoice3 Eagle3模型在[LibriTTS](https://www.openslr.org/60/)数据集上的接收长度。结果是在单张H20上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。
我们评测了Fun-CosyVoice3 Eagle3模型在[LibriTTS](https://www.openslr.org/60/)数据集上的接收长度。结果是在单张GPU上用以下设置测得:**tp=1, ep=1, num_speculative_tokens=4, batch_size=1, output_len=1024**。

<table><thead>
<tr>
Expand Down
2 changes: 2 additions & 0 deletions angelslim/data/dataloader.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ def create_data_loader(
inference_settings: Dict = None,
use_audio_in_video: bool = False,
model_name: str = None,
quantization_config: str = None,
) -> DataLoader:
"""
Create appropriate DataLoader based on data source
Expand Down Expand Up @@ -94,6 +95,7 @@ def create_data_loader(
data_source=data_source,
is_hf_dataset=not os.path.isfile(data_source),
model_name=model_name,
quantization_config=quantization_config,
)
elif data_type == "Text2ImageDataset":
dataset = Text2ImageDataset(
Expand Down
15 changes: 12 additions & 3 deletions angelslim/data/multimodal_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,12 @@ def __init__(
data_source: Union[str, Dict] = None,
is_hf_dataset: bool = False,
model_name: str = None,
quantization_config: str = None,
):
super().__init__(processor, device, max_length)
self.is_hf_dataset = is_hf_dataset
self.model_name = model_name
self.quant_algo = quantization_config.name if quantization_config else None

if is_hf_dataset:
self._load_hf_dataset(data_source, num_samples)
Expand Down Expand Up @@ -174,14 +176,21 @@ def _load_hf_dataset(self, dataset: str, num_samples: int):

def _process_and_append(self, messages: List[Dict], tools=None):
"""Process messages and append to dataset"""

# max_length padding for int4 gptq, gptaq and awq
if "int4_" in self.quant_algo:
padding = "max_length"
else:
padding = True

if self.model_name in ["Qwen3VL", "Qwen3VLMoE"]:
inputs = self.processor.apply_chat_template(
messages,
tools=tools,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
padding="max_length",
padding=padding,
truncation=True,
return_tensors="pt",
max_length=self.max_length,
Expand All @@ -196,7 +205,7 @@ def _process_and_append(self, messages: List[Dict], tools=None):
inputs = self.processor(
text=[text],
images=image_inputs,
padding="max_length",
padding=padding,
truncation=True,
return_tensors="pt",
max_length=self.max_length,
Expand All @@ -214,7 +223,7 @@ def _process_and_append(self, messages: List[Dict], tools=None):
text=[text],
images=image_inputs,
videos=video_inputs,
padding="max_length",
padding=padding,
truncation=True,
return_tensors="pt",
max_length=self.max_length,
Expand Down
2 changes: 2 additions & 0 deletions angelslim/engine.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,7 @@ def prepare_data(
inference_settings=None,
use_audio_in_video=False,
model_name=None,
quantization_config=None,
) -> Optional[Any]:
"""Prepare compression dataset"""
if custom_dataloader is not None:
Expand All @@ -174,6 +175,7 @@ def prepare_data(
inference_settings=inference_settings,
use_audio_in_video=use_audio_in_video,
model_name=model_name,
quantization_config=quantization_config,
)
self.max_seq_length = max_length

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
本项目包括Eagle3的训练以及benchmark测试,并开源了Qwen2Audio的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)。

我们训练的Qwen2Audio Eagle3模型的表现可以参见基准测试[benchmarks](../../performance/speculative_decoding/benchmarks.md),
其中全部数据都是在单张H20上使用vLLM推理获得
其中全部数据都是在单张GPU上使用vLLM推理获得

## 1. 支持模型列表
- `Qwen2Audio`
Expand Down
2 changes: 1 addition & 1 deletion docs/source/features/speculative_decoding/eagle/eagle.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
本项目包括Eagle3的训练以及benchmark测试,并开源了Qwen3和Hunyuan系列的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)。

我们训练的Qwen3系列Eagle3模型的表现可以参见基准测试[benchmarks](../../../performance/speculative_decoding/benchmarks.md),
其中全部数据都是在单张H20上使用vLLM推理获得
其中全部数据都是在单张GPU上使用vLLM推理获得

## 1. 数据生成

Expand Down
2 changes: 1 addition & 1 deletion docs/source/features/speculative_decoding/eagle/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
本项目包括Eagle3的训练以及benchmark测试,并开源了Hunyuan、HunyuanOCR、Qwen3、Qwen3-VL、Qwen2Audio、Fun-CosyVoice3等模型的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)。

我们训练的Eagle3模型的表现可以参见基准测试[benchmarks](../../../performance/speculative_decoding/benchmarks.md),
其中全部数据都是在单张H20上使用vLLM推理获得
其中全部数据都是在单张GPU上使用vLLM推理获得

:::{toctree}
:caption: Contents
Expand Down
6 changes: 4 additions & 2 deletions docs/source/features/speculative_decoding/eagle/vlm_eagle.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
本项目包括Eagle3的训练以及benchmark测试,并开源了HunyuanOCR和Qwen3-VL系列的[Eagle3权重](https://huggingface.co/collections/AngelSlim/eagle3)。

我们训练的HunyuanOCR和Qwen3-VL系列Eagle3模型的表现可以参见基准测试[benchmarks](../../../performance/speculative_decoding/benchmarks.md),
其中全部数据都是在单张H20上使用vLLM推理获得
其中全部数据都是在单张GPU上使用vLLM推理获得
## 1. 支持模型列表
- `HunyuanOCR`
- `Qwen3-VL`
Expand Down Expand Up @@ -88,7 +88,9 @@ bash scripts/speculative/hunyuan_ocr/generate_vlm_hidden_for_draft_model.sh
# For Qwen3-VL series
bash scripts/speculative/qwen3_vl/generate_vlm_hidden_for_draft_model.sh
```
> 注意:qwen3_vl系列模型生成hidden states需要更新transformers库: `pip install git+https://github.com/huggingface/transformers.git`
> 注意:qwen3_vl系列模型生成hidden states需要更新transformers>=5.0.0,
或者cherry-pick: https://github.com/huggingface/transformers/pull/42609,
否则抓取的hidden states不可用!!!

**脚本参数说明:**

Expand Down
1 change: 1 addition & 0 deletions tools/run.py
Original file line number Diff line number Diff line change
Expand Up @@ -169,6 +169,7 @@ def run(config):
inference_settings=dataset_config.inference_settings,
use_audio_in_video=model_config.use_audio_in_video,
model_name=model_config.name,
quantization_config=compress_config.quantization,
)

# Step 5: Initialize compressor
Expand Down
Loading