[Q&A] PyTorch backend compression of converted TF `se_e2_a` models causes unstable LAMMPS MD #5438

Dm0216 · 2026-05-08T06:40:15Z

Dm0216
May 8, 2026

Question

Title: PyTorch backend compression of converted TF se_e2_a models causes unstable LAMMPS MD

System Information:

OS: Kubuntu / Linux
Hardware: NVIDIA RTX A4000
DeePMD-kit Version: v3.1.4.dev69+g0a481dede (PyTorch backend)
PyTorch Version: v2.10.0+cu130
LAMMPS Version: 22 Jul 2025 (Update 2)
Descriptor Type: se_e2_a (Model: 5 elements Pb I C H N)

Description:
Compressing a legacy TensorFlow se_e2_a potential that has been converted to the PyTorch backend consistently results in an unstable model in LAMMPS leaving residual forces that immediately cause a "Lost atoms" error during MD. The uncompressed .pth models execute flawlessly and maintain perfect energy/force parity with the original TF model.

Steps to Reproduce:

Approach 1: Direct Conversion and Compression

dp convert-backend model5.pb model5.pth
dp --pt compress -i model5.pth -o model5_compressed.pth

Result: MD test runs are completely stable for the uncompressed model5.pth. MD test runs immediately explode for model5_compressed.pth.

Approach 2: Conversion + 0-Step Initialization + Compression
To ensure the compression tables had the correct spatial boundaries (d_low), a 0-step initialization was performed using the full dataset.

dp --pt train convert-pt.json --init-frz-model model5.pb
dp --pt freeze -o model5.1.pth
dp --pt compress -i model5.1.pth -o model5.1_compressed.pth

Result: MD test runs are completely stable for the uncompressed model5.1.pth. MD test runs immediately explode for model5.1_compressed.pth.

LAMMPS Failure Output (Compressed Models):

Minimization stats:
  Stopping criterion = linesearch alpha is zero
  Energy initial, next-to-last, final = 
     -1362.52296661576  -1362.53777500403  -1362.53777500403
  Force max component initial, final = 75.44272 75.401807
...
ERROR: Lost atoms: original 348 current 347 (src/thermo.cpp:526)

Workaround Limitation:
Attempting to force mathematically smooth splines to bypass the CG minimizer failure by increasing the grid resolution (-s 0.001 -e 10) generates a 7.6 GB .pth file. Loading this file into LAMMPS immediately triggers a C++ libtorch deserialization crash:
ERROR on proc 0: DeePMD-kit C API Error: PK (/home/dm/deepmd-kit-v3.1.4/source/lmp/pair_deepmd.cpp:572)

DeePMD-kit Version

No response

Backend and its version

No response

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

No response

Details

No response

Reproducible Example, Input Files, and Commands

No response

Further Information, Files, and Links

No response

Answered by wanghan-iapcm

Jun 13, 2026

please see #5524 for the fix

View full answer

wanghan-iapcm · 2026-05-17T07:52:20Z

wanghan-iapcm
May 17, 2026
Maintainer

Hi @Dm0216, thanks for the very detailed report. I tried to reproduce on a fresh fp64 se_e2_a model (TF type_one_side=False, 2 types, 100-step training on the example water dataset) — dp convert-backend → dp --pt compress works cleanly there, with compressed vs uncompressed forces matching to ~3e-14 across configurations whose minimum pair distance is 0.9 Å. So the pt compression code is not categorically broken; the failure looks specific to either your 5-type weight set or to a runtime configuration LAMMPS exercises that pure-Python eval does not.

Suggested quick workaround first: please verify that your model is type_one_side=True, and if it isn't, re-train (or fine-tune) with type_one_side=True. Convert-backend and compression both work much more reliably on the type_one_side=True path — there is one embedding network per neighbor type instead of one per (center, neighbor) pair, so the tabulation builds fewer, larger-coverage tables that are less sensitive to per-pair boundary effects. Many TF legacy models default to type_one_side=False; flipping it for the new train usually has negligible accuracy impact for energy/force prediction. This may be the cleanest fix in your situation.

If you'd like to understand the root cause regardless, the most likely structural cause for the unstable MD is runtime ss exceeding the table's trained upper bound:

During compression, the table upper bound is computed from the model's stored min_nbor_dist (set at training time):
s_upper = ((1/min_nbor_dist) * sw - davg_s) / dstd_s
At runtime, ss = (1/r * sw - davg_s) / dstd_s. If LAMMPS hands the model an atom pair with r < min_nbor_dist (initial config, bad starting structure, or a thermal spike at the very first step), then ss > s_upper and the lookup falls into the extrapolation region.
The polynomial spline in the extrapolation region is fit to embedding-network output at xx values the network never saw during training, so both the spline values and their derivatives are essentially unconstrained — forces of arbitrary magnitude are possible. The 75 eV/Å your LAMMPS log shows is consistent with this failure mode but doesn't by itself prove it; the Python diagnostic below distinguishes.

The 7.6 GB file from -s 0.001 -e 10 failing to load is an unrelated LAMMPS-side libtorch deserialization issue (file too large for the legacy load path); the bound issue is what's causing the unstable MD with the default-stride compressed model.

Quick diagnostic to confirm and localize — please run before LAMMPS, in pure Python, against the exact starting structure LAMMPS reads. Important: if your LAMMPS run uses periodic boundary conditions (the default for boundary p p p), pass the cell to DeepPot.eval and compute the minimum distance with PBC accounted for; otherwise omit the box and use plain distances. Could you also confirm whether the run is PBC or non-periodic?

import numpy as np
from ase import Atoms
from deepmd.infer import DeepPot

# coord (natoms, 3), atype (natoms,), box (3, 3) from your LAMMPS data file
# Set pbc=True/False matching your LAMMPS `boundary` setting.
USE_PBC = True

dp_un = DeepPot("model5.1.pth")
dp_co = DeepPot("model5.1_compressed.pth")

box_arg = box.reshape(1, 9) if USE_PBC else None
e_un, f_un, _ = dp_un.eval(coord.reshape(1, -1), box_arg, atype)
e_co, f_co, _ = dp_co.eval(coord.reshape(1, -1), box_arg, atype)

print("uncomp |F| max:", float(np.max(np.abs(f_un))))
print("comp   |F| max:", float(np.max(np.abs(f_co))))
print("max force diff:", float(np.max(np.abs(f_un - f_co))))

# PBC-aware minimum-image pair distance (ase handles the cell properly)
atoms = Atoms(
    positions=coord.reshape(-1, 3),
    cell=box.reshape(3, 3) if USE_PBC else None,
    pbc=USE_PBC,
)
d = atoms.get_all_distances(mic=USE_PBC)
np.fill_diagonal(d, np.inf)  # mask self-distances
print("min pair distance (mic):", float(d.min()))

# And the min_nbor_dist baked into the .pth itself
import torch
m = torch.jit.load("model5.1.pth", map_location="cpu")
print("model min_nbor_dist:    ", float(m.min_nbor_dist))

Three outcomes tell us where to look:

comp |F| max matches the 75 eV/Å number and is much larger than uncomp |F| max → the bug is reproducible from Python eval on this single frame. If min pair distance (mic) is below model min_nbor_dist, the over-extrapolation hypothesis above is the cause. Workarounds: pre-relax with the uncompressed model, or freeze the model with a smaller min_nbor_dist (use dp neighbor-stat on a wider training set or set it manually before compress).
Diffs are O(1e-3) or larger but forces are reasonable → genuine compression accuracy regression for your specific weights. In that case it would be very useful if you could attach model5.1.pth and the read_data data file so we can reproduce locally; the compressed model is fully self-contained and reproduces deterministically.
Diffs are O(1e-10) or smaller → the bug is LAMMPS-side (pair_deepmd.cpp interpretation of compressed PT models), not in the compression itself. We'd then investigate the C++ dispatch path.

For reference, the internal indexing/bound calculations I traced in pt-side compression appear correct (table net-name ↔ embedding_net_nodes mapping, runtime (center, neighbor) ↔ embedding_idx mapping, and _get_env_mat_range extracting the s-component bound after the min/max collapse). The cosmetic over-extension of upper to the sx/sy/sz axis bounds (visible in the compression log as a 13.13 upper instead of the s-component's ~7.7) does not actually corrupt runtime values, because the table polynomials at queried indices are still computed from in-distribution vv/dd/d2.

2 replies

Dm0216 May 18, 2026
Author

Hi. Thank you so much for the response. My model was set to "type_one_side": false, and I also think that this looks specific to my 5-type weight set, either way, I'm retraining the model in PyTorch now.

Dm0216 Jun 4, 2026
Author

Hi @wanghan-iapcm ,

Thank you for the detailed feedback. I ran the diagnostics on our 5-type system (156 atoms from a reference cif structure) using both the original float32 models (model5 and model5.1) and a newly trained float64 (double-precision) model (model7).

The results reveal two distinct bugs in the custom tabulation operator code pathways that cause compressed models to fail in LAMMPS.
Also asked Gemini to for help.

1. Summary of LAMMPS MD Runs (Step 0 Potential Energy)

Run / Model	Precision	Model Path	Minimization	MD Step 0 PE	Trajectory / Outcome
model5 (uncomp)	float32	`model5.pth`	Converged	-781.749 eV	Stable
model5 (comp)	float32	`model5_comp.pth`	Stalled (alpha=0)	-725.05373 eV	Crashed (lost atoms)
model5.1 (uncomp)	float32	`model5.1.pth`	Converged	-781.74955 eV	Stable
model5.1 (comp)	float32	`model5.1_comp.pth`	Stalled (alpha=0)	-724.35919 eV	Crashed (lost atoms)
model7 (uncomp)	float64	`model7.pth`	Converged	-781.88847 eV	Stable
model7 (comp)	float64	`model7_comp.pth`	Stalled (alpha=0)	-619.56536 eV	Crashed (lost atoms)

2. Pure Python Diagnostic Results (`DeepPot.eval`)

In pure Python (where inputs match model precision), the uncompressed and compressed models match almost perfectly.

model5 (float32 model on 156 atoms starting structure):

Min pair distance (MIC) is 0.889534 Å, model min_nbor_dist is 0.628418 Å (extrapolation ruled out).
model5 (uncomp): Energy = -738.800444 eV, Max Force Component = 9.121986 eV/A
model5 (comp): Energy = -738.801108 eV, Max Force Component = 9.122191 eV/A
Difference (uncomp - comp): Energy = 0.000664 eV, Max Force = 0.000432 eV/A

model5.1 (float32 model on 156 atoms starting structure):

Min pair distance (MIC) is 0.889534 Å, model min_nbor_dist is 0.628418 Å (extrapolation ruled out).
model5.1 (uncomp): Energy = -738.800444 eV, Max Force Component = 9.121986 eV/A
model5.1 (comp): Energy = -738.801108 eV, Max Force Component = 9.122191 eV/A
Difference (uncomp - comp): Energy = 0.000664 eV, Max Force = 0.000433 eV/A

model7 (float64 model on 156 atoms starting structure):

Min pair distance (MIC) is 0.889534 Å, model min_nbor_dist is 0.628418 Å (extrapolation ruled out).
model7 (uncomp): Energy = -739.422031 eV, Max Force Component = 8.982652 eV/A
model7 (comp): Energy = -739.422030 eV, Max Force Component = 8.982652 eV/A
Difference (uncomp - comp): Energy = -0.000000 eV, Max Force = 0.000001 eV/A

3. Root Cause Analysis (by Gemini 3.5 Flash)

Bug 1: Type Mismatch in Custom PT Operators (For float32 Models)

When running a float32 model inside LAMMPS, LAMMPS constructs inputs in double precision (float64). The PyTorch graph evaluates environment matrices em_tensor and em_x_tensor as double.
However, inside tabulate_multi_device.cc, the operators select their template based on table_tensor.dtype() (which is float):

bool type_flag = (table_tensor.dtype() == torch::kDouble) ? true : false;

It calls forward_t<float>, which extracts data pointers using .data_ptr<float>() on the double-precision inputs:

const FPTYPE* em = em_tensor.view({-1}).data_ptr<FPTYPE>(); // FPTYPE is float, tensor is double

This causes a raw pointer memory reinterpretation (reading 8-byte doubles as 4-byte floats), corrupting the input descriptors.

Bug 2: Warp Shuffle Undefined Behavior (For float64 Models)

Even when using the double-precision model7 (where weights/tables are float64, and dtypes match), the compressed model still fails in LAMMPS due to a second bug in tabulate.cu.
In tabulate_fusion_se_a_fifth_order_polynomial, the sentinel value ago is broadcasted using warp shuffle GpuShuffleSync:

FPTYPE ago = GpuShuffleSync(0xffffffff, em_x[block_idx * nnei + nnei - 1], 0);

The block size of this kernel is last_layer_size (120 threads).
Since 120 is not a multiple of 32, warp 3 is not fully active (only 24 active lanes).
Calling __shfl_sync with mask 0xffffffff when there are inactive lanes in the warp results in undefined behavior.
In double precision, this undefined behavior causes ago to be corrupted (returns 0 or garbage), breaking the early loop termination condition (xx == ago && em[...] == 0. && is_sorted) on real atoms, leading to premature termination and truncated neighbor list evaluations on the GPU.

4. Proposed Fixes

For Bug 1 (Type Mismatch):
In tabulate_multi_device.cc, proactively cast input tensors to match table_tensor.dtype() before forwarding to the template implementation:
```
torch::Tensor em_x_tensor_cast = em_x_tensor.to(table_tensor.dtype());
torch::Tensor em_tensor_cast = em_tensor.to(table_tensor.dtype());
```
For Bug 2 (Warp Shuffle UB):
In tabulate.cu, remove the unnecessary and unsafe warp shuffles for values that are block-constant (like ago or warp_idx). Since em_x[block_idx * nnei + nnei - 1] is in global memory and identical for all threads in a block, it can be read directly:
- Replace:
```
FPTYPE ago = GpuShuffleSync(0xffffffff, em_x[block_idx * nnei + nnei - 1], 0);
```
  with:
```
FPTYPE ago = em_x[block_idx * nnei + nnei - 1];
```
- Replace warp shuffles for local thread indices:
```
int warp_idx = GpuShuffleSync(0xffffffff, threadIdx.x / WARP_SIZE, 0);
```
  with:
```
int warp_idx = threadIdx.x / WARP_SIZE;
```

wanghan-iapcm · 2026-06-13T01:18:53Z

wanghan-iapcm
Jun 13, 2026
Maintainer

please see #5524 for the fix

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Q&A] PyTorch backend compression of converted TF `se_e2_a` models causes unstable LAMMPS MD #5438

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

[Q&A] PyTorch backend compression of converted TF se_e2_a models causes unstable LAMMPS MD #5438

Uh oh!

Dm0216 May 8, 2026

Question

DeePMD-kit Version

Backend and its version

Python Version, CUDA Version, GCC Version, LAMMPS Version, etc

Details

Reproducible Example, Input Files, and Commands

Further Information, Files, and Links

Replies: 2 comments · 2 replies

Uh oh!

wanghan-iapcm May 17, 2026 Maintainer

Uh oh!

Dm0216 May 18, 2026 Author

Uh oh!

Dm0216 Jun 4, 2026 Author

1. Summary of LAMMPS MD Runs (Step 0 Potential Energy)

2. Pure Python Diagnostic Results (DeepPot.eval)

3. Root Cause Analysis (by Gemini 3.5 Flash)

Bug 1: Type Mismatch in Custom PT Operators (For float32 Models)

Bug 2: Warp Shuffle Undefined Behavior (For float64 Models)

4. Proposed Fixes

Uh oh!

wanghan-iapcm Jun 13, 2026 Maintainer

[Q&A] PyTorch backend compression of converted TF `se_e2_a` models causes unstable LAMMPS MD #5438

Dm0216
May 8, 2026

Replies: 2 comments 2 replies

wanghan-iapcm
May 17, 2026
Maintainer

Dm0216 May 18, 2026
Author

Dm0216 Jun 4, 2026
Author

2. Pure Python Diagnostic Results (`DeepPot.eval`)

wanghan-iapcm
Jun 13, 2026
Maintainer