setting grad None after training to avoid memory leak#1
setting grad None after training to avoid memory leak#1kaloeffler wants to merge 1 commit intokach:mainfrom
Conversation
|
Hi there— thank you for writing! I'm not sure I fully understand where this leak comes from, but it looks from your proposed code change that you could equivalently just call mw.begin() at the end of each training loop. Could you try that and see if it works? (We would like to keep this repo as a frozen archive of the code from the paper, and will not be updating it unless there are severe issues without plausible workarounds.) Thanks! |
|
While calling Nevertheless, I understand that you would like to avoid making changes to the code. However, I would still encourage you to update the README with a hint concerning calling the begin method at the end of a training run, when training several models. This would help future users greatly to successfully and quickly apply your code, without wondering where the out of memory error comes from. |
|
Hi everyone, Thank you Kartik Chandra for creating this great repository. I have been using this code for a simple FNN and it works great, and there seems to be no memory leak. However, when switching to a simple RNN, memory increases drastically after each training batch, and at some point I am getting an out-of-memory error. As described above, this issue is caused by setting loss.backward(create_graph=True). I have tried both described solutions for this: 1) set param.grad=None at the end of training by calling the end() method created by kaloeffler. 2) calling mw.begin() at the end of each training loop. Both solutions did not work in my case. Also, it seems that the high increase in memory is cause by retaining the graph, as retain_graph is automatically set to True when creating the graph. As I only have this issue when training an RNN model, and since an RNN model has been trained using the ultimate optimizer in your paper, I was wondering if you had a similar issue and another fix for it? Thank you! |
Hi,
during training I've observed a memory leak, when training several models after another. I've used your example code from the Readme to create an example, see below. I'm using Python 3.7 and PyTorch 1.13.1. Per training run I observe an increase of about 10MiB concerning memory usage. This is especially an issue when training larger models, resulting in an out of memory error.
The memory leak is discussed in pytorch/pytorch#82528. In your code there is already a
beginmethod in theOptimizableclass which sets theparam.gradto None. However, it seems there is also a need to setparam.grad=Noneat the end of training a model to avoid memory leakage over several training runs. Hence, I suggest to add an end method which is called at the end of each training run.