Context
Matmul operation in OpenVINO assumes an implicit shape alignment for input arguments. It applies transpositions specified by optional transpose_a and transpose_b attributes: OV spec.
Currently, weight compression in NNCF does not support transpose_a=True.
Here's the check and test.
Potentially, it affects Mixed-Precision, AWQ, Scale Estimation and Lora Correction algorithms.
What needs to be done?
The task is to enable data-aware weight compression methods (Mixed-Precision, AWQ, Scale Estimation, Lora Correction) for models with transposed input matrix multiplications.
- At least one function
process_stats should be corrected, check - removed.
- test should pass and be a templated one in order to check OV and Torch backend at once.
- Tests that used
LMLinearModel with transpose_a=False by default should pass with transpose_a=True.
Example Pull Requests
#3179
#3129
Resources
Contact points
@ljaljushkin
Ticket
No response
Context
Matmul operation in OpenVINO assumes an implicit shape alignment for input arguments. It applies transpositions specified by optional
transpose_aandtranspose_battributes: OV spec.Currently, weight compression in NNCF does not support
transpose_a=True.Here's the check and test.
Potentially, it affects Mixed-Precision, AWQ, Scale Estimation and Lora Correction algorithms.
What needs to be done?
The task is to enable data-aware weight compression methods (Mixed-Precision, AWQ, Scale Estimation, Lora Correction) for models with transposed input matrix multiplications.
process_statsshould be corrected, check - removed.LMLinearModelwithtranspose_a=Falseby default should pass withtranspose_a=True.Example Pull Requests
#3179
#3129
Resources
Contact points
@ljaljushkin
Ticket
No response