What’s the Correct Way to Quantize Qwen3.5 (MoE/Dense) to NVFP4?

Hi, I’m currently unable to find any up-to-date documentation or guidance on properly quantizing Qwen3.5 (MoE/Dense) to NVFP4.

This process was working previously around the time #897 was merged, but now I’m running into issues. I’ve tried both `--qformat nvfp4_mlp_only` and `--qformat nvfp4_experts_only`, but neither seems to apply the expected quantization—the exported weights are still roughly equivalent in size to BF16.

I’d really appreciate any guidance or pointers on what might have changed or what I might be missing. Thanks in advance!

@Edwardf0t1 @cjluo-nv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What’s the Correct Way to Quantize Qwen3.5 (MoE/Dense) to NVFP4? #1255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

What’s the Correct Way to Quantize Qwen3.5 (MoE/Dense) to NVFP4? #1255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions