[Kunlunxin]support DS R1/V3 w4a8int8 per-channel quantization by lishaoao · Pull Request #226 · Tencent/AngelSlim

lishaoao · 2026-01-22T13:00:41Z

What's Changed

Add new quantization field w4a8i8 to enable the w4a8int8 quantization method for DeepSeek R1/V3 models running on kunlunxin vllm
Introduce the W4A8INT8 module, inherited from GPTQ, to support deepseek r1/v3 w4a8int8 quantization with minimal code intrusion
Implement W4A8Int8QuantLinear to support quantized weight packing and custom compressed tensors format

Key Details

The W4A8INT8 module inherits from GPTQ to minimize code modification, specifically adapted for deepseek r1/v3 w4a8int8 quantization
Quantization rules: weights use per-channel symmetric int4 quantization, activations use per-token dynamic symmetric int8 quantization
MOE is supported for int4 per-channel symmetric quantization, while other components also adopt int8 per-channel symmetric quantization, aligned with the configuration file deepseek_r1_w4a8_int8.yaml
W4A8Int8QuantLinear core functions: quantized weight packing + custom compressed tensors format support

Testing

Tested the quantized Deepseek v3.1 w4a8int8 model on the GPQA_Diamond dataset
Achieved a score of 74.44 (official baseline score: 74.9), maintaining high consistency with the official performance

Usage

#!/usr/bin/env bash

SCRIPT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

PROJECT_ROOT=$SCRIPT_DIR

export PYTHONPATH=$PROJECT_ROOT:$PYTHONPATH

NNODES=1
NPROC_PER_NODE=8
MASTER_ADDR=localhost

CONFIG=$PROJECT_ROOT/configs/deepseek_r1/w4a8_int8/deepseek_r1_w4a8_int8_kunlun.yaml

torchrun \
    --nnodes $NNODES \
    --nproc-per-node $NPROC_PER_NODE \
    --node-rank 0 \
    --master-addr $MASTER_ADDR \
    $PROJECT_ROOT/tools/run.py \
    --config $CONFIG \
    --multi-nodes

yghstill · 2026-01-22T13:15:27Z

+        return None
+
+
+class DeepSeekV3W4A8Int8Save(DeepSeekV3PTQSaveMulti):


Place it class in core/save.py

lishaoao · 2026-01-23T03:43:08Z

+        scale = abs_max / qmax
+        quantized = torch.round(tensor / scale)
+        quantized = torch.clamp(quantized, -qmax, qmax)
+        return quantized.to(torch.int8), scale.to(torch.float32)


放到quant_func.py文件

lishaoao · 2026-01-23T03:45:50Z

+        return None
+
+
+class DeepSeekV3W4A8Int8Save(DeepSeekV3PTQSaveMulti):


放到save.py

lishaoao · 2026-01-23T03:49:56Z

+# Global configuration of pipeline
+global:
+  save_path: ./output
+  deploy_backend: vllm


文件名改成configs/deepseek_r1/w4a8_int8/deepseek_r1_w4a8_int8_kunlun.yaml

…t#226) Co-authored-by: lishaohao <lsh862702688@163.com>

support DS R1/V3 w4a8int8 per-channel quantization

351882c

yghstill reviewed Jan 23, 2026

View reviewed changes

lishaoao commented Jan 23, 2026

View reviewed changes

leolishaohao added 2 commits January 23, 2026 02:37

refine

8b60271

fix(gptq): restore cloning of layer weight buffer

7f576e2

yghstill approved these changes Jan 28, 2026

View reviewed changes

yghstill merged commit 1ea6e3d into Tencent:main Jan 28, 2026
5 checks passed

dawnranger pushed a commit to dawnranger/AngelSlim that referenced this pull request Mar 11, 2026

[Kunlunxin]support DS R1/V3 w4a8int8 per-channel quantization (Tencen…

35aaf32

…t#226) Co-authored-by: lishaohao <lsh862702688@163.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kunlunxin]support DS R1/V3 w4a8int8 per-channel quantization#226

[Kunlunxin]support DS R1/V3 w4a8int8 per-channel quantization#226
yghstill merged 3 commits into
Tencent:mainfrom
lishaoao:ds_w4a8in8_quant

lishaoao commented Jan 22, 2026 •

edited

Loading

Uh oh!

yghstill Jan 22, 2026

Uh oh!

lishaoao Jan 23, 2026

Uh oh!

lishaoao Jan 23, 2026

Uh oh!

lishaoao Jan 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		return None


		class DeepSeekV3W4A8Int8Save(DeepSeekV3PTQSaveMulti):

Conversation

lishaoao commented Jan 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What's Changed

Key Details

Testing

Usage

Uh oh!

yghstill Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

lishaoao Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

lishaoao Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

lishaoao Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

lishaoao commented Jan 22, 2026 •

edited

Loading