Is your feature request related to a problem? Please describe.
In some cases, gradient clipping or normalization is needed to stabilize the training of networks.
Describe the solution you'd like
Allow the option to do gradient clipping or normalization via an argument at the construction of Workflows.
Describe alternatives you've considered
Registering a hook for each model parameter to handle the gradient clipping, but is dirtier and it is not the main way PyTorch handles it. This would also invalidate the usage of gradient normalization since the PyTorch implementation is an inplace transformation, and the non-inplace gradient clipping will be deprecated. Furthermore, we need to handle the AMP to unscale the gradients before normalizing them, as per https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping.
Is your feature request related to a problem? Please describe.
In some cases, gradient clipping or normalization is needed to stabilize the training of networks.
Describe the solution you'd like
Allow the option to do gradient clipping or normalization via an argument at the construction of Workflows.
Describe alternatives you've considered
Registering a hook for each model parameter to handle the gradient clipping, but is dirtier and it is not the main way PyTorch handles it. This would also invalidate the usage of gradient normalization since the PyTorch implementation is an inplace transformation, and the non-inplace gradient clipping will be deprecated. Furthermore, we need to handle the AMP to unscale the gradients before normalizing them, as per https://pytorch.org/docs/stable/notes/amp_examples.html#gradient-clipping.