Skip to content

Question about Activation quantization #113

@Tfloow

Description

@Tfloow

Hello,

I am currently working on a fork of Omniquant and your work is truly brilliant. I have one question about activation quantization and what are the reason why you still activate it.

Running the process to replicate your paper result I run:

CUDA_VISIBLE_DEVICES=0 python main.py \
--model /PATH/TO/LLaMA/llama-7b  \
--epochs 0 --output_dir ./log/test \
--eval_ppl --wbits 3 --abits 16 --group_size 128 --lwc \
--resume /PATH/TO/Pretrained/Parameters 

But what I find odd in omniquant.py, you still activate activation quantization:

# init smooth parameters
set_quant_state(qlayer, weight_quant=False, act_quant=True) # weight will be manually quantized before forward
qlayer.let = args.let
use_shift = True
if is_llama or args.abits == 16:
use_shift = False # deactivate channel-wise shifting for llama model and weight-only quantization

Which can seem harmless since we are using 16 bits activation, but wouldn't this force FP16 into INT16 resulting in a loss of information? Especially since you init QuantMatMul

self.qkt_matmul = QuantMatMul(
args.q_quant_params, args.k_quant_params, matmul_func=torch.matmul
)
self.pv_matmul = QuantMatMul(
args.p_quant_params, args.v_quant_params, matmul_func=torch.matmul
)

And this will trigger the quantization procedure on every activation as in

# repeat k/v heads if n_kv_heads < n_heads
key_states = repeat_kv(key_states, self.num_key_value_groups)
value_states = repeat_kv(value_states, self.num_key_value_groups)
query_states = self.qkt_matmul.quant_x1(query_states)
key_states = self.qkt_matmul.quant_x2(key_states)
attn_weights = self.qkt_matmul(query_states, key_states.transpose(2, 3)) / math.sqrt(self.head_dim)

which will result in degraded performance while your desired behavior would be keeping in FP16 which is not the case as the flag use_act_quant is set for all QuantMatMul module. This result in a call to UniformAffineQuantizer and produce a quantized INT16 form of activations.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions