Fix DeepSeek PTQ script (#912)

cjluo-nv · danielkorzekwa · commit 44e54aa7084c · 2026-03-04T03:27:10.000-08:00
## What does this PR do?

**Type of change:** ? Bug fix

**Overview:** ?

Fix two bugs in the PTQ script

## Testing

Run DeepseekV3.2 PTQ and export

&lt;!-- This is an auto-generated comment: release notes by coderabbit.ai
--&gt;

## Summary by CodeRabbit

* **Refactor**
* Enhanced data type handling in quantization examples for bf16
operations
* Updated internal dependencies for quantization utilities to improve
modularity

&lt;!-- end of auto-generated comment: release notes by coderabbit.ai --&gt;

Signed-off-by: Chenjie Luo &lt;chenjiel@nvidia.com&gt;
Signed-off-by: Daniel Korzekwa &lt;dkorzekwa@nvidia.com&gt;
diff --git a/examples/deepseek/ptq.py b/examples/deepseek/ptq.py
@@ -99,7 +99,7 @@ def linear(
                 weight = weight_quantizer(weight)
             return F.linear(x, weight, bias)
         elif gemm_impl == "bf16":
-            weight = weight_dequant(weight, weight.scale)
+            weight = weight_dequant(weight, weight.scale, dtype=torch.bfloat16)
             if act_quantizer is not None:
                 x = act_quantizer(x)
             if weight_quantizer is not None:
diff --git a/examples/deepseek/quantize_to_nvfp4.py b/examples/deepseek/quantize_to_nvfp4.py
@@ -44,11 +44,11 @@
 from typing import Any
 
 import torch
-from ds_kernel import weight_dequant
 from safetensors.torch import load_file, save_file
 from tqdm import tqdm
 
 from modelopt.torch.quantization.qtensor import NVFP4QTensor
+from modelopt.torch.quantization.triton import weight_dequant
 
 
 def _remap_key(key_dict: dict[str, Any]):