Skip to content

Commit 65b3f88

Browse files
h-guo18danielkorzekwa
authored andcommitted
Fix: quant config error on quantized offline eagle (#925)
## What does this PR do? **Type of change:** ? <!-- Use one of the following: Bug fix, new feature, new example, new tests, documentation. --> **Overview:** ? ## Usage <!-- You can potentially add a usage example below. --> ```python # Add a code snippet demonstrating how to use this ``` ## Testing <!-- Mention how have you tested your change if applicable. --> ## Before your PR is "*Ready for review*" <!-- If you haven't finished some of the above items you can still open `Draft` PR. --> - **Make sure you read and follow [Contributor guidelines](https://github.com/NVIDIA/Model-Optimizer/blob/main/CONTRIBUTING.md)** and your commits are signed. - **Is this change backward compatible?**: Yes/No <!--- If No, explain why. --> - **Did you write any new necessary tests?**: Yes/No - **Did you add or update any necessary documentation?**: Yes/No - **Did you update [Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?**: Yes/No <!--- Only for new features, API changes, critical bug fixes or bw breaking changes. --> ## Additional Information <!-- E.g. related issue. --> <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit ## Release Notes * **Refactor** * Enhanced quantization configuration handling for transformer models through improved type validation, ensuring more robust processing of quantized model configurations. <!-- end of auto-generated comment: release notes by coderabbit.ai --> Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com> Signed-off-by: Daniel Korzekwa <dkorzekwa@nvidia.com>
1 parent 481cd83 commit 65b3f88

File tree

1 file changed

+4
-7
lines changed

1 file changed

+4
-7
lines changed

modelopt/torch/speculative/plugins/transformers.py

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@
4848
)
4949
from transformers.trainer_pt_utils import LabelSmoother
5050
from transformers.utils import ModelOutput
51-
from transformers.utils.quantization_config import QuantizationMethod
51+
from transformers.utils.quantization_config import CompressedTensorsConfig
5252

5353
from ..eagle.conversion import EagleDMRegistry
5454
from ..eagle.eagle_model import EagleModel
@@ -585,12 +585,9 @@ def modify(
585585
self.eagle_config._attn_implementation = "sdpa"
586586

587587
# Patch for Kimi-K2-Thinking, avoid quantizing drafter
588-
if (
589-
hasattr(self.config, "quantization_config")
590-
and self.config.quantization_config.quant_method
591-
== QuantizationMethod.COMPRESSED_TENSORS
592-
):
593-
self.config.quantization_config.quantization_config.ignore.append("re:.*eagle_module.*")
588+
quant_config = getattr(self.config, "quantization_config", None)
589+
if isinstance(quant_config, CompressedTensorsConfig):
590+
quant_config.ignore.append("re:.*eagle_module.*")
594591

595592
# Set default aux_hidden_state layers
596593
if (

0 commit comments

Comments
 (0)