Question: Differences between the original DuQuant and QuantVLA implementation

Hello,

First, thank you for releasing the code and sharing your great work!
I found it very interesting that the attention layers in DiT are critical points and that excluding them from the quantization process leads to better performance.

Based on my understanding, QuantVLA is based on DuQuant, where the LLM component is fully quantized and the DiT is partially quantized. However, the DuQuant implementation in this repository appears to differ slightly from the original, which is also noted in the Appendix of your paper.

For example, it seems that **per-channel smoothing with a diagonal matrix** is omitted in the current code implementation. 

Could you clarify the key differences between the original DuQuant and the implementation used in QuantVLA? Specifically, I would like to know if these omissions were intentional or if there are alternative mechanisms in place.

Lastly, do you have any plans to release the code for the **Pi0.5 configuration** and the **SmoothQuant implementation**? It would be a great contribution to the community and for future research.

Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question: Differences between the original DuQuant and QuantVLA implementation #2

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Question: Differences between the original DuQuant and QuantVLA implementation #2

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions