How to reduce the memory used when running MD simulation with dp potential

### Summary

When using four parallel V100, the memory used by the program keeps increasing. Resulting in the program being minimize and only running for about 3000 steps during the relaxation phase before terminating

### DeePMD-kit Version

DeePMD-kit v3.0.1

### Backend and its version

pytorch

### Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

python version 3.12

### Details

ERROR on proc 3: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
  File "code/__torch__/deepmd/pt/model/model/transform_output.py", line 161, in forward_lower
    vvi = split_vv1[_45]
    svvi = split_svv1[_45]
    _46 = _37(vvi, svvi, coord_ext, do_virial, do_atomic_virial, create_graph, )
          ~~~ <--- HERE
    ffi, aviri, = _46
    ffi0 = torch.unsqueeze(ffi, -2)
  File "code/__torch__/deepmd/pt/model/model/transform_output.py", line 196, in task_deriv_one
  faked_grad = torch.ones_like(energy)
  lst = annotate(List[Optional[Tensor]], [faked_grad])
  _53 = torch.autograd.grad([energy], [extended_coord], lst, True, create_graph)
        ~~~~~~~~~~~~~~~~~~~ <--- HERE
  extended_force = _53[0]
  if torch.__isnot__(extended_force, None):

Traceback of TorchScript, original code (most recent call last):
  File "/root/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/model/model/transform_output.py", line 128, in forward_lower
    for vvi, svvi in zip(split_vv1, split_svv1):
        # nf x nloc x 3, nf x nloc x 9
        ffi, aviri = task_deriv_one(
                     ~~~~~~~~~~~~~~ <--- HERE
            vvi,
            svvi,
  File "/root/deepmd-kit/lib/python3.12/site-packages/deepmd/pt/model/model/transform_output.py", line 78, in task_deriv_one
    faked_grad = torch.ones_like(energy)
    lst = torch.jit.annotate(list[Optional[torch.Tensor]], [faked_grad])
extended_force = torch.autograd.grad(
                     ~~~~~~~~~~~~~~~~~~~ <--- HERE
        [energy],
        [extended_coord],
RuntimeError: CUDA out of memory. Tried to allocate 4.36 GiB. GPU 3 has a total capacity of 31.74 GiB of which 2.55 GiB is free. Including non-PyTorch memory, this process has 29.18 GiB memory in use. Of the allocated memory 21.53 GiB is allocated by PyTorch, and 6.53 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables) (/home/conda/feedstock_root/build_artifacts/deepmd-kit_1735001361510/work/source/lmp/pair_deepmd.cpp:220)
Last command: run               10000


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to reduce the memory used when running MD simulation with dp potential #4567

Summary

DeePMD-kit Version

Backend and its version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

How to reduce the memory used when running MD simulation with dp potential #4567

Description

Summary

DeePMD-kit Version

Backend and its version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions