Skip to content

Commit 44e54aa

Browse files
cjluo-nvdanielkorzekwa
authored andcommitted
Fix DeepSeek PTQ script (#912)
## What does this PR do? **Type of change:** ? Bug fix **Overview:** ? Fix two bugs in the PTQ script ## Testing Run DeepseekV3.2 PTQ and export <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Refactor** * Enhanced data type handling in quantization examples for bf16 operations * Updated internal dependencies for quantization utilities to improve modularity <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: Chenjie Luo <chenjiel@nvidia.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
1 parent 8098f98 commit 44e54aa

File tree

2 files changed

+2
-2
lines changed

2 files changed

+2
-2
lines changed

examples/deepseek/ptq.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ def linear(
9999
weight = weight_quantizer(weight)
100100
return F.linear(x, weight, bias)
101101
elif gemm_impl == "bf16":
102-
weight = weight_dequant(weight, weight.scale)
102+
weight = weight_dequant(weight, weight.scale, dtype=torch.bfloat16)
103103
if act_quantizer is not None:
104104
x = act_quantizer(x)
105105
if weight_quantizer is not None:

examples/deepseek/quantize_to_nvfp4.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,11 +44,11 @@
4444
from typing import Any
4545

4646
import torch
47-
from ds_kernel import weight_dequant
4847
from safetensors.torch import load_file, save_file
4948
from tqdm import tqdm
5049

5150
from modelopt.torch.quantization.qtensor import NVFP4QTensor
51+
from modelopt.torch.quantization.triton import weight_dequant
5252

5353

5454
def _remap_key(key_dict: dict[str, Any]):

0 commit comments

Comments
 (0)