Skip to content

Fix GGUF to Work Better with modules_to_not_convert / keep_in_fp32_modules#13697

Open
dg845 wants to merge 2 commits intomainfrom
gguf-fix-modules-not-to-convert
Open

Fix GGUF to Work Better with modules_to_not_convert / keep_in_fp32_modules#13697
dg845 wants to merge 2 commits intomainfrom
gguf-fix-modules-not-to-convert

Conversation

@dg845
Copy link
Copy Markdown
Collaborator

@dg845 dg845 commented May 8, 2026

What does this PR do?

This PR contains several fixes so the GGUF loading and inference work better with module_to_not_convert and _keep_in_fp32_modules.

Changelist

  1. src/diffusers/quantizers/gguf/utils.py
    1. _replace_with_gguf_linear: adds a check to see if any of the current module's named_children are in modules_to_not_convert, and if so, skip it. This allows us skip containers, rather than just leaf-level nn.Linear submodules as in the current code. For example, TimestepEmbedding modules are commonly added to _keep_in_fp32_modules (e.g. time_embedder in WanTransformer3DModel's WanTimeTextImageEmbedding condition embedder), but since they themselves contain leaf nn.Linear submodules such as linear_1, the current code will only check against leaf modules such as linear_1, and conclude incorrectly that they should be converted.
    2. _fused_mul_mat_gguf: in the UNQUANTIZED_TYPES case, also cast the dequantized weight to the activation x's dtype before performing the matrix multiplication, which should prevent dtype errors for BF16 weights.
  2. src/diffusers/quantizers/gguf/gguf_quantizer.py
    1. GGUFQuantizer.create_quantized_param: handles modules_to_not_convert by dequantizing them, so that they end up in their original unquantized form. This is intended to handle the case where a module in self.modules_to_not_convert (or one of its children) is in the GGUF file. Since it is in the file, it will be converted to a GGUFParameter, but we don't want it to be quantized, so we convert it back here.

Inspired by GGUF debugging in #13551, in particular #13551 (comment).

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@DN6
@sayakpaul

@dg845 dg845 requested a review from DN6 May 8, 2026 09:23
@github-actions github-actions Bot added the size/S PR with diff < 50 LOC label May 8, 2026
@dg845 dg845 requested a review from sayakpaul May 8, 2026 09:23
@HuggingFaceDocBuilderDev
Copy link
Copy Markdown

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

quantization size/S PR with diff < 50 LOC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants