Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
11 changes: 4 additions & 7 deletions modelopt/torch/speculative/plugins/transformers.py
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@
)
from transformers.trainer_pt_utils import LabelSmoother
from transformers.utils import ModelOutput
from transformers.utils.quantization_config import QuantizationMethod
from transformers.utils.quantization_config import CompressedTensorsConfig
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check current transformers environment and CompressedTensorsConfig availability
python3 << 'EOF'
import transformers
print("=== Transformers Environment ===")
print(f"transformers version: {transformers.__version__}")

try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    print("CompressedTensorsConfig: AVAILABLE")
    print(f"Type: {type(CompressedTensorsConfig)}")
    print(f"Module: {CompressedTensorsConfig.__module__}")
except ImportError as e:
    print(f"CompressedTensorsConfig: NOT AVAILABLE")
    print(f"Error: {e}")

# Also check what's available in the quantization_config module
try:
    import transformers.utils.quantization_config as qc
    print("\n=== Available in transformers.utils.quantization_config ===")
    public_items = [item for item in dir(qc) if not item.startswith('_')]
    for item in sorted(public_items):
        print(f"  - {item}")
except Exception as e:
    print(f"Could not inspect quantization_config: {e}")
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 190


🌐 Web query:

transformers CompressedTensorsConfig introduction version history release notes

💡 Result:

What it is (introduction)

CompressedTensorsConfig is the 🤗 Transformers quantization config wrapper used to load models stored in the compressed-tensors checkpoint format (a safetensors extension that can represent quantization + sparsity layouts, produced e.g. via llm-compressor). [1][2]

“Introduced in Transformers” (earliest version)

  • Not present in v4.44.0 API docs (no CompressedTensorsConfig entry). [3]
  • Present in v4.45.1 API docs (class documented under Main classes → Quantization). [4]

So, it was introduced between v4.44.0 (Aug 6, 2024) and v4.45.1 (Sep 26, 2024)—i.e., v4.45.x. Release dates from PyPI: v4.45.0 (Sep 25, 2024) and v4.45.1 (Sep 26, 2024). [5]

Notable API evolution (version history highlights)

  • v4.45.1: CompressedTensorsConfig exists with core fields like config_groups, format, quantization_status, kv_cache_scheme, ignore, etc. [4]
  • By v4.51.1 (Apr 8, 2025): docs include run_compressed (controls whether to alter submodules to emulate compressed execution). [6][5]
  • v4.55.4 (Aug 22, 2025): run_compressed: bool = True is documented in the signature. [7]
  • v5.0.0rc0 (Dec 1, 2025): still present; signature includes run_compressed, and docs clarify interaction between format and run_compressed. [8]

Release notes

Transformers’ public release notes do not reliably call out CompressedTensorsConfig specifically; the most concrete “release-note-grade” trace is the appearance/change in the versioned API docs above (v4.45.1 introduction; later addition of run_compressed). [4][6][7]

Sources: [1]–[8] correspond to the cited pages.


Guard the import against older transformers versions.

CompressedTensorsConfig was introduced in transformers v4.45.0 (September 2024). Importing it unconditionally at module level will raise an ImportError for any installation using transformers < v4.45.0 — even when the compressed-tensors quantization path is never exercised.

🛡️ Proposed safe import
-from transformers.utils.quantization_config import CompressedTensorsConfig
+try:
+    from transformers.utils.quantization_config import CompressedTensorsConfig
+except ImportError:
+    CompressedTensorsConfig = None

Then tighten the guard at the call site:

-if isinstance(quant_config, CompressedTensorsConfig):
+if CompressedTensorsConfig is not None and isinstance(quant_config, CompressedTensorsConfig):
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/transformers.py` at line 51, Importing
CompressedTensorsConfig unconditionally will break users on transformers <
v4.45.0; wrap the import in a try/except ImportError (or
ImportError/ModuleNotFoundError) and set CompressedTensorsConfig = None on
failure, and then at the usage site (the function/method that references
CompressedTensorsConfig in this module) guard the code path by checking if
CompressedTensorsConfig is not None before using it and raise a clear
RuntimeError or skip the compressed-tensors branch with a helpful message if
it's missing. Ensure you reference the symbol name CompressedTensorsConfig in
both the guarded import and the conditional check so the call site fails
gracefully when the class isn’t available.


from ..eagle.conversion import EagleDMRegistry
from ..eagle.eagle_model import EagleModel
Expand Down Expand Up @@ -585,12 +585,9 @@ def modify(
self.eagle_config._attn_implementation = "sdpa"

# Patch for Kimi-K2-Thinking, avoid quantizing drafter
if (
hasattr(self.config, "quantization_config")
and self.config.quantization_config.quant_method
== QuantizationMethod.COMPRESSED_TENSORS
):
self.config.quantization_config.quantization_config.ignore.append("re:.*eagle_module.*")
quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
quant_config.ignore.append("re:.*eagle_module.*")
Comment on lines +588 to +590
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

python << 'EOF'
try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    import inspect
    
    # Get the source code
    src = inspect.getsource(CompressedTensorsConfig)
    
    # Print the entire class definition (limit to first 200 lines to see init and field definitions)
    lines = src.splitlines()[:200]
    for i, line in enumerate(lines, 1):
        print(f"{i:3d}: {line}")
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245


🏁 Script executed:

# Also check if there's a default_factory or post_init that initializes ignore
python << 'EOF'
try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    import inspect
    
    # Check the __init__ method specifically
    if hasattr(CompressedTensorsConfig, '__init__'):
        init_src = inspect.getsource(CompressedTensorsConfig.__init__)
        print("=== __init__ method ===")
        print(init_src)
    
    # Check if it's a dataclass
    print("\n=== Class annotations ===")
    if hasattr(CompressedTensorsConfig, '__annotations__'):
        for field, type_hint in CompressedTensorsConfig.__annotations__.items():
            if 'ignore' in field.lower():
                print(f"{field}: {type_hint}")
                # Try to get default value
                if hasattr(CompressedTensorsConfig, field):
                    print(f"  Default: {getattr(CompressedTensorsConfig, field)}")
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245


🏁 Script executed:

# Let's also check the actual behavior - create an instance and inspect it
python << 'EOF'
try:
    from transformers.utils.quantization_config import CompressedTensorsConfig
    
    # Try creating an instance with minimal config
    config = CompressedTensorsConfig(quantization_config={})
    print(f"ignore attribute value: {repr(config.ignore)}")
    print(f"ignore attribute type: {type(config.ignore)}")
    print(f"ignore is None: {config.ignore is None}")
    
    # Check if we can safely append
    if config.ignore is not None:
        print("Can safely append to ignore")
    else:
        print("Cannot safely append - would cause AttributeError")
        
except Exception as e:
    print(f"Error: {e}")
    import traceback
    traceback.print_exc()
EOF

Repository: NVIDIA/Model-Optimizer

Length of output: 245


🏁 Script executed:

# Find imports and usage of CompressedTensorsConfig in the repository
rg "CompressedTensorsConfig" -n --context 5

Repository: NVIDIA/Model-Optimizer

Length of output: 2174


🏁 Script executed:

# Check the actual file where the code appears
cat -n modelopt/torch/speculative/plugins/transformers.py | sed -n '580,610p'

Repository: NVIDIA/Model-Optimizer

Length of output: 1440


🏁 Script executed:

# Search for any type hints or documentation about ignore field
rg "\.ignore" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 536


🌐 Web query:

CompressedTensorsConfig ignore field transformers library default value

💡 Result:

In Hugging Face Transformers, transformers.CompressedTensorsConfig defines ignore as an optional list with the default value None (i.e., no explicit ignore list unless you pass one). [1]

Source(s):
[1] Transformers quantization docs showing the CompressedTensorsConfig signature (includes ignore: Optional[list[str]] = None). (hugging-face.cn)

Citations:


🏁 Script executed:

# Verify if modify() can be called multiple times by searching for callers
rg "\.modify\(" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2

Repository: NVIDIA/Model-Optimizer

Length of output: 573


🏁 Script executed:

# Check if there are multiple instantiations or if modify is called in loops
rg "modify" modelopt/torch/speculative/plugins/transformers.py -n

Repository: NVIDIA/Model-Optimizer

Length of output: 277


quant_config.ignore may be None, causing AttributeError on .append().

The transformers library defines CompressedTensorsConfig.ignore as Optional[List[str]] with a default value of None. Checkpoints saved without explicit ignore entries deserialize with ignore = None, causing the direct .append() call on line 590 to fail.

Additionally, if modify() is called more than once, the pattern "re:.*eagle_module.*" will be appended multiple times without deduplication.

🛡️ Proposed fix
         quant_config = getattr(self.config, "quantization_config", None)
         if isinstance(quant_config, CompressedTensorsConfig):
+            if quant_config.ignore is None:
+                quant_config.ignore = []
+            pattern = "re:.*eagle_module.*"
+            if pattern not in quant_config.ignore:
-            quant_config.ignore.append("re:.*eagle_module.*")
+                quant_config.ignore.append(pattern)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
quant_config.ignore.append("re:.*eagle_module.*")
quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
if quant_config.ignore is None:
quant_config.ignore = []
pattern = "re:.*eagle_module.*"
if pattern not in quant_config.ignore:
quant_config.ignore.append(pattern)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@modelopt/torch/speculative/plugins/transformers.py` around lines 588 - 590,
quant_config may have ignore==None and the code blindly calls .append(), causing
AttributeError and duplicate patterns on repeated calls; update the handling
where quant_config is obtained (the quant_config variable of type
CompressedTensorsConfig) to ensure quant_config.ignore is initialized to a list
when None and only add the pattern "re:.*eagle_module.*" if it is not already
present (i.e., check membership before append) so repeated calls to the modifier
do not duplicate the entry.


# Set default aux_hidden_state layers
if (
Expand Down