Is your feature request related to a problem? Please describe.
Depending on the model, narrow distributions of INT weights can lead to underflow and accuracy degradation when running inference with limited accumulation lengths. Distributions could be adjusted automatically by tweaking the clip values (quantization boundaries) in scenarios when narrow weight distributions are detected.
Describe the solution you'd like
Analysis of the INT weights distributions in a trained checkpoint is needed. If deemed to narrow, recompute the clip values and INT weights using SAWB quantizer, and save the modified checkpoint.
Additional context
This feature was present in the older sq1e repository and can be carried over.
Is your feature request related to a problem? Please describe.
Depending on the model, narrow distributions of INT weights can lead to underflow and accuracy degradation when running inference with limited accumulation lengths. Distributions could be adjusted automatically by tweaking the clip values (quantization boundaries) in scenarios when narrow weight distributions are detected.
Describe the solution you'd like
Analysis of the INT weights distributions in a trained checkpoint is needed. If deemed to narrow, recompute the clip values and INT weights using SAWB quantizer, and save the modified checkpoint.
Additional context
This feature was present in the older
sq1erepository and can be carried over.