Skip to content

DeepSeek; modify scale save and quantization config; fix thinking data calibration#64

Merged
ali-88123 merged 2 commits into
Tencent:mainfrom
ali-88123:dev_cen
Sep 5, 2025
Merged

DeepSeek; modify scale save and quantization config; fix thinking data calibration#64
ali-88123 merged 2 commits into
Tencent:mainfrom
ali-88123:dev_cen

Conversation

@ali-88123
Copy link
Copy Markdown
Collaborator

@ali-88123 ali-88123 commented Sep 5, 2025

  • 修改了scale的保存方式:同层scale和weight存储在同一个safetensors中
  • DeepSeek保存类重命名:单卡运行保存方式为DeepSeekV3PTQSaveSingle,多卡为DeepSeekV3PTQSaveMulti
def get_save_func(self):
    if self.deploy_backend in ["vllm", "trtllm"]:
        if self.model.using_multi_nodes:
            return DeepSeekV3PTQSaveMulti
        return DeepSeekV3PTQSaveSingle
    else:
        raise NotImplementedError(
            f"deploy_backend {self.deploy_backend} is not supported for saving."
        )
  • 更新了quantization_config
  • 修复了TextDataset无法处理conten包含数据的问题
  • 删除merge_model以及tp_model暂存文件夹

@ali-88123 ali-88123 requested a review from yghstill September 5, 2025 02:45
@ali-88123 ali-88123 merged commit bd5a894 into Tencent:main Sep 5, 2025
5 checks passed
@ali-88123 ali-88123 deleted the dev_cen branch September 10, 2025 11:12
dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants