Hello,
training process is initiated without problem, but when some times left, it is frozen like:

it doesn't show any change on console
and I check GPUs at that time and what I got is GPU-Util(not memory) is full when the process is frozen (that I think this is a clue of this problem):

I fixed parameter like batch_size, worker, etc, but it doesn't help
Can anyone help?
my env is on miniconda3, and using CUDA 11.8, so version is:
PyTorch 2.0.0
PyTorch Lightning 2.0.2
Hello,
training process is initiated without problem, but when some times left, it is frozen like:
it doesn't show any change on console
and I check GPUs at that time and what I got is GPU-Util(not memory) is full when the process is frozen (that I think this is a clue of this problem):
I fixed parameter like batch_size, worker, etc, but it doesn't help
Can anyone help?
my env is on miniconda3, and using CUDA 11.8, so version is:
PyTorch 2.0.0
PyTorch Lightning 2.0.2