[Q&A] PyTorch backend compression of converted TF se_e2_a models causes unstable LAMMPS MD
#5438
-
QuestionTitle: PyTorch backend compression of converted TF System Information:
Description: Steps to Reproduce: Approach 1: Direct Conversion and Compression dp convert-backend model5.pb model5.pth
dp --pt compress -i model5.pth -o model5_compressed.pth
Approach 2: Conversion + 0-Step Initialization + Compression dp --pt train convert-pt.json --init-frz-model model5.pb
dp --pt freeze -o model5.1.pth
dp --pt compress -i model5.1.pth -o model5.1_compressed.pth
LAMMPS Failure Output (Compressed Models): Workaround Limitation: DeePMD-kit VersionNo response Backend and its versionNo response Python Version, CUDA Version, GCC Version, LAMMPS Version, etcNo response DetailsNo response Reproducible Example, Input Files, and CommandsNo response Further Information, Files, and LinksNo response |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
|
Hi @Dm0216, thanks for the very detailed report. I tried to reproduce on a fresh fp64 Suggested quick workaround first: please verify that your model is If you'd like to understand the root cause regardless, the most likely structural cause for the unstable MD is runtime
The 7.6 GB file from Quick diagnostic to confirm and localize — please run before LAMMPS, in pure Python, against the exact starting structure LAMMPS reads. Important: if your LAMMPS run uses periodic boundary conditions (the default for import numpy as np
from ase import Atoms
from deepmd.infer import DeepPot
# coord (natoms, 3), atype (natoms,), box (3, 3) from your LAMMPS data file
# Set pbc=True/False matching your LAMMPS `boundary` setting.
USE_PBC = True
dp_un = DeepPot("model5.1.pth")
dp_co = DeepPot("model5.1_compressed.pth")
box_arg = box.reshape(1, 9) if USE_PBC else None
e_un, f_un, _ = dp_un.eval(coord.reshape(1, -1), box_arg, atype)
e_co, f_co, _ = dp_co.eval(coord.reshape(1, -1), box_arg, atype)
print("uncomp |F| max:", float(np.max(np.abs(f_un))))
print("comp |F| max:", float(np.max(np.abs(f_co))))
print("max force diff:", float(np.max(np.abs(f_un - f_co))))
# PBC-aware minimum-image pair distance (ase handles the cell properly)
atoms = Atoms(
positions=coord.reshape(-1, 3),
cell=box.reshape(3, 3) if USE_PBC else None,
pbc=USE_PBC,
)
d = atoms.get_all_distances(mic=USE_PBC)
np.fill_diagonal(d, np.inf) # mask self-distances
print("min pair distance (mic):", float(d.min()))
# And the min_nbor_dist baked into the .pth itself
import torch
m = torch.jit.load("model5.1.pth", map_location="cpu")
print("model min_nbor_dist: ", float(m.min_nbor_dist))Three outcomes tell us where to look:
For reference, the internal indexing/bound calculations I traced in pt-side compression appear correct (table net-name ↔ |
Beta Was this translation helpful? Give feedback.
-
|
please see #5524 for the fix |
Beta Was this translation helpful? Give feedback.
please see #5524 for the fix