-
Notifications
You must be signed in to change notification settings - Fork 364
Fix: quant config error on quantized offline eagle #925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -48,7 +48,7 @@ | |||||||||||||||||||||
| ) | ||||||||||||||||||||||
| from transformers.trainer_pt_utils import LabelSmoother | ||||||||||||||||||||||
| from transformers.utils import ModelOutput | ||||||||||||||||||||||
| from transformers.utils.quantization_config import QuantizationMethod | ||||||||||||||||||||||
| from transformers.utils.quantization_config import CompressedTensorsConfig | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| from ..eagle.conversion import EagleDMRegistry | ||||||||||||||||||||||
| from ..eagle.eagle_model import EagleModel | ||||||||||||||||||||||
|
|
@@ -585,12 +585,9 @@ def modify( | |||||||||||||||||||||
| self.eagle_config._attn_implementation = "sdpa" | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Patch for Kimi-K2-Thinking, avoid quantizing drafter | ||||||||||||||||||||||
| if ( | ||||||||||||||||||||||
| hasattr(self.config, "quantization_config") | ||||||||||||||||||||||
| and self.config.quantization_config.quant_method | ||||||||||||||||||||||
| == QuantizationMethod.COMPRESSED_TENSORS | ||||||||||||||||||||||
| ): | ||||||||||||||||||||||
| self.config.quantization_config.quantization_config.ignore.append("re:.*eagle_module.*") | ||||||||||||||||||||||
| quant_config = getattr(self.config, "quantization_config", None) | ||||||||||||||||||||||
| if isinstance(quant_config, CompressedTensorsConfig): | ||||||||||||||||||||||
| quant_config.ignore.append("re:.*eagle_module.*") | ||||||||||||||||||||||
|
Comment on lines
+588
to
+590
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: python << 'EOF'
try:
from transformers.utils.quantization_config import CompressedTensorsConfig
import inspect
# Get the source code
src = inspect.getsource(CompressedTensorsConfig)
# Print the entire class definition (limit to first 200 lines to see init and field definitions)
lines = src.splitlines()[:200]
for i, line in enumerate(lines, 1):
print(f"{i:3d}: {line}")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFRepository: NVIDIA/Model-Optimizer Length of output: 245 🏁 Script executed: # Also check if there's a default_factory or post_init that initializes ignore
python << 'EOF'
try:
from transformers.utils.quantization_config import CompressedTensorsConfig
import inspect
# Check the __init__ method specifically
if hasattr(CompressedTensorsConfig, '__init__'):
init_src = inspect.getsource(CompressedTensorsConfig.__init__)
print("=== __init__ method ===")
print(init_src)
# Check if it's a dataclass
print("\n=== Class annotations ===")
if hasattr(CompressedTensorsConfig, '__annotations__'):
for field, type_hint in CompressedTensorsConfig.__annotations__.items():
if 'ignore' in field.lower():
print(f"{field}: {type_hint}")
# Try to get default value
if hasattr(CompressedTensorsConfig, field):
print(f" Default: {getattr(CompressedTensorsConfig, field)}")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFRepository: NVIDIA/Model-Optimizer Length of output: 245 🏁 Script executed: # Let's also check the actual behavior - create an instance and inspect it
python << 'EOF'
try:
from transformers.utils.quantization_config import CompressedTensorsConfig
# Try creating an instance with minimal config
config = CompressedTensorsConfig(quantization_config={})
print(f"ignore attribute value: {repr(config.ignore)}")
print(f"ignore attribute type: {type(config.ignore)}")
print(f"ignore is None: {config.ignore is None}")
# Check if we can safely append
if config.ignore is not None:
print("Can safely append to ignore")
else:
print("Cannot safely append - would cause AttributeError")
except Exception as e:
print(f"Error: {e}")
import traceback
traceback.print_exc()
EOFRepository: NVIDIA/Model-Optimizer Length of output: 245 🏁 Script executed: # Find imports and usage of CompressedTensorsConfig in the repository
rg "CompressedTensorsConfig" -n --context 5Repository: NVIDIA/Model-Optimizer Length of output: 2174 🏁 Script executed: # Check the actual file where the code appears
cat -n modelopt/torch/speculative/plugins/transformers.py | sed -n '580,610p'Repository: NVIDIA/Model-Optimizer Length of output: 1440 🏁 Script executed: # Search for any type hints or documentation about ignore field
rg "\.ignore" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2Repository: NVIDIA/Model-Optimizer Length of output: 536 🌐 Web query:
💡 Result: In Hugging Face Transformers, Source(s): Citations: 🏁 Script executed: # Verify if modify() can be called multiple times by searching for callers
rg "\.modify\(" modelopt/torch/speculative/plugins/transformers.py -n -B 2 -A 2Repository: NVIDIA/Model-Optimizer Length of output: 573 🏁 Script executed: # Check if there are multiple instantiations or if modify is called in loops
rg "modify" modelopt/torch/speculative/plugins/transformers.py -nRepository: NVIDIA/Model-Optimizer Length of output: 277
The transformers library defines Additionally, if 🛡️ Proposed fix quant_config = getattr(self.config, "quantization_config", None)
if isinstance(quant_config, CompressedTensorsConfig):
+ if quant_config.ignore is None:
+ quant_config.ignore = []
+ pattern = "re:.*eagle_module.*"
+ if pattern not in quant_config.ignore:
- quant_config.ignore.append("re:.*eagle_module.*")
+ quant_config.ignore.append(pattern)📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||||||||||||
|
|
||||||||||||||||||||||
| # Set default aux_hidden_state layers | ||||||||||||||||||||||
| if ( | ||||||||||||||||||||||
|
|
||||||||||||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
Repository: NVIDIA/Model-Optimizer
Length of output: 190
🌐 Web query:
transformers CompressedTensorsConfig introduction version history release notes💡 Result:
What it is (introduction)
CompressedTensorsConfigis the 🤗 Transformers quantization config wrapper used to load models stored in the compressed-tensors checkpoint format (a safetensors extension that can represent quantization + sparsity layouts, produced e.g. viallm-compressor). [1][2]“Introduced in Transformers” (earliest version)
CompressedTensorsConfigentry). [3]So, it was introduced between v4.44.0 (Aug 6, 2024) and v4.45.1 (Sep 26, 2024)—i.e., v4.45.x. Release dates from PyPI: v4.45.0 (Sep 25, 2024) and v4.45.1 (Sep 26, 2024). [5]
Notable API evolution (version history highlights)
CompressedTensorsConfigexists with core fields likeconfig_groups,format,quantization_status,kv_cache_scheme,ignore, etc. [4]run_compressed(controls whether to alter submodules to emulate compressed execution). [6][5]run_compressed: bool = Trueis documented in the signature. [7]run_compressed, and docs clarify interaction betweenformatandrun_compressed. [8]Release notes
Transformers’ public release notes do not reliably call out
CompressedTensorsConfigspecifically; the most concrete “release-note-grade” trace is the appearance/change in the versioned API docs above (v4.45.1 introduction; later addition ofrun_compressed). [4][6][7]Sources: [1]–[8] correspond to the cited pages.
Guard the import against older
transformersversions.CompressedTensorsConfigwas introduced in transformers v4.45.0 (September 2024). Importing it unconditionally at module level will raise anImportErrorfor any installation using transformers < v4.45.0 — even when the compressed-tensors quantization path is never exercised.🛡️ Proposed safe import
Then tighten the guard at the call site:
🤖 Prompt for AI Agents