Skip to content

feat: support Qwen down_proj fallback for compressed-tensors ignored modules.#1254

Merged
yingxudeng merged 1 commit intojd-opensource:release/v0.9.0from
yingxudeng:feat/compressed-tensors-ignore-modules
Apr 15, 2026
Merged

feat: support Qwen down_proj fallback for compressed-tensors ignored modules.#1254
yingxudeng merged 1 commit intojd-opensource:release/v0.9.0from
yingxudeng:feat/compressed-tensors-ignore-modules

Conversation

@yingxudeng
Copy link
Copy Markdown
Collaborator

No description provided.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces an ignored_modules configuration to QuantArgs, allowing specific modules to bypass quantization using exact names or regex patterns. The DenseMLP and Qwen decoder layers were updated to support this filtering mechanism. Review feedback highlights a performance concern regarding repeated regex compilation within a loop, the use of a magic number for the regex prefix, and a logic bug where gate_up_proj and parent modules are not correctly checked against the ignore list.

Comment on lines +69 to +76
if (pattern.size() > 3 && pattern.rfind("re:", 0) == 0) {
try {
if (std::regex_match(module_name, std::regex(pattern.substr(3)))) {
return true;
}
} catch (const std::regex_error&) {
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Creating a std::regex object inside a loop for every module check is extremely inefficient as regex compilation is an expensive operation. This can significantly slow down model initialization, especially for models with many layers and multiple ignore patterns. Additionally, the magic number 3 should be replaced with a named constant (e.g., kRegexPrefixLength). Consider pre-compiling the regexes or using a more efficient matching strategy.

Comment thread xllm/core/layers/common/dense_mlp.cpp
@yingxudeng yingxudeng force-pushed the feat/compressed-tensors-ignore-modules branch from ccfd0a7 to fb643ec Compare April 10, 2026 09:24
@yingxudeng yingxudeng changed the title feat: support ignored modules in compressed-tensors quant config. feat: support Qwen down_proj fallback for compressed-tensors ignored modules. Apr 10, 2026
Copy link
Copy Markdown
Collaborator

@DragonFive DragonFive left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

}
}
},
"ignore": [
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this rule applicable to all models?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is part of the quantization config schema, not a model-specific rule. The JSON (including ignore) is generated by the quantization tool — at least AngelSlim produces this field. So the applicability depends on which quant tool was used, not the model itself.

@yingxudeng yingxudeng merged commit fb1287d into jd-opensource:release/v0.9.0 Apr 15, 2026
30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants