add attn_implementation in model config;fix deepseekv3.1-terminus load error by ali-88123 · Pull Request #130 · Tencent/AngelSlim

ali-88123 · 2025-11-05T03:29:56Z

Add attn_implementation to model config; for Qwen3-Omni quantization, it can be passed via the config.
Add instructions for installing FlashAttention2 package when attn_implementation is set to flash_attention_2 for Qwen3-Omni.
Fix the data type mismatch error when loading layerorm weight for DeepSeekV3.1-Terminus under torchrun startup with fp8 loading.

…d error

add attn_implementation in model config;fix deepseekv3.1-terminus loa…

13b8497

…d error

yghstill approved these changes Nov 5, 2025

View reviewed changes

yghstill merged commit 1741e95 into Tencent:main Nov 5, 2025
5 checks passed

ali-88123 deleted the fix_qwen3_omni branch November 5, 2025 06:15

dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026

add attn_implementation in model config;fix deepseekv3.1-terminus loa…

61233e3

…d error (Tencent#130)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add attn_implementation in model config;fix deepseekv3.1-terminus load error#130

add attn_implementation in model config;fix deepseekv3.1-terminus load error#130
yghstill merged 1 commit into
Tencent:mainfrom
ali-88123:fix_qwen3_omni

ali-88123 commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ali-88123 commented Nov 5, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants