Hello,
First, thank you for releasing the code and sharing your great work!
I found it very interesting that the attention layers in DiT are critical points and that excluding them from the quantization process leads to better performance.
Based on my understanding, QuantVLA is based on DuQuant, where the LLM component is fully quantized and the DiT is partially quantized. However, the DuQuant implementation in this repository appears to differ slightly from the original, which is also noted in the Appendix of your paper.
For example, it seems that per-channel smoothing with a diagonal matrix is omitted in the current code implementation.
Could you clarify the key differences between the original DuQuant and the implementation used in QuantVLA? Specifically, I would like to know if these omissions were intentional or if there are alternative mechanisms in place.
Lastly, do you have any plans to release the code for the Pi0.5 configuration and the SmoothQuant implementation? It would be a great contribution to the community and for future research.
Thank you!
Hello,
First, thank you for releasing the code and sharing your great work!
I found it very interesting that the attention layers in DiT are critical points and that excluding them from the quantization process leads to better performance.
Based on my understanding, QuantVLA is based on DuQuant, where the LLM component is fully quantized and the DiT is partially quantized. However, the DuQuant implementation in this repository appears to differ slightly from the original, which is also noted in the Appendix of your paper.
For example, it seems that per-channel smoothing with a diagonal matrix is omitted in the current code implementation.
Could you clarify the key differences between the original DuQuant and the implementation used in QuantVLA? Specifically, I would like to know if these omissions were intentional or if there are alternative mechanisms in place.
Lastly, do you have any plans to release the code for the Pi0.5 configuration and the SmoothQuant implementation? It would be a great contribution to the community and for future research.
Thank you!