feat: support Qwen down_proj fallback for compressed-tensors ignored modules.#1254
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an ignored_modules configuration to QuantArgs, allowing specific modules to bypass quantization using exact names or regex patterns. The DenseMLP and Qwen decoder layers were updated to support this filtering mechanism. Review feedback highlights a performance concern regarding repeated regex compilation within a loop, the use of a magic number for the regex prefix, and a logic bug where gate_up_proj and parent modules are not correctly checked against the ignore list.
| if (pattern.size() > 3 && pattern.rfind("re:", 0) == 0) { | ||
| try { | ||
| if (std::regex_match(module_name, std::regex(pattern.substr(3)))) { | ||
| return true; | ||
| } | ||
| } catch (const std::regex_error&) { | ||
| } | ||
| } |
There was a problem hiding this comment.
Creating a std::regex object inside a loop for every module check is extremely inefficient as regex compilation is an expensive operation. This can significantly slow down model initialization, especially for models with many layers and multiple ignore patterns. Additionally, the magic number 3 should be replaced with a named constant (e.g., kRegexPrefixLength). Consider pre-compiling the regexes or using a more efficient matching strategy.
ccfd0a7 to
fb643ec
Compare
| } | ||
| } | ||
| }, | ||
| "ignore": [ |
There was a problem hiding this comment.
Is this rule applicable to all models?
There was a problem hiding this comment.
This field is part of the quantization config schema, not a model-specific rule. The JSON (including ignore) is generated by the quantization tool — at least AngelSlim produces this field. So the applicability depends on which quant tool was used, not the model itself.
No description provided.