In your paper section 3.3 you said you calculate the regularization term using
During sparsity training, you use the unnormalized importance. You only use the max normalized importance during pruning.
However in your code,
You use the max normalized importance both during sparsity training and pruning.
Is there anything wrong ?
In your paper section 3.3 you said you calculate the regularization term using
During sparsity training, you use the unnormalized importance. You only use the max normalized importance during pruning.
However in your code,
You use the max normalized importance both during sparsity training and pruning.
Is there anything wrong ?