Skip to content

What’s the Correct Way to Quantize Qwen3.5 (MoE/Dense) to NVFP4? #1255

@seindum

Description

@seindum

Hi, I’m currently unable to find any up-to-date documentation or guidance on properly quantizing Qwen3.5 (MoE/Dense) to NVFP4.

This process was working previously around the time #897 was merged, but now I’m running into issues. I’ve tried both --qformat nvfp4_mlp_only and --qformat nvfp4_experts_only, but neither seems to apply the expected quantization—the exported weights are still roughly equivalent in size to BF16.

I’d really appreciate any guidance or pointers on what might have changed or what I might be missing. Thanks in advance!

@Edwardf0t1 @cjluo-nv

Metadata

Metadata

Assignees

Labels

questionHelp is is needed

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions