Summary
DPA-2.4-7M is registered in the v3.1.3 pretrained model registry (deepmd/pretrained/registry.py, added via PR #5307), but deepmd-kit 3.1.3 cannot actually load this checkpoint. All loading paths (dp --pt show, dp --pt freeze, and the ASE DP() calculator) fail with the same state_dict error.
Error
RuntimeError: Error(s) in loading state_dict for ModelWrapper:
Missing key(s) in state_dict:
"model.Default.atomic_model.descriptor.repinit.type_embd_data",
"model.Default.atomic_model.descriptor.repinit_three_body.type_embd_data",
"model.Default.atomic_model.descriptor.repinit_three_body.compress_info.0",
"model.Default.atomic_model.descriptor.repinit_three_body.compress_data.0"
Reproduction
Environment: registry.dp.tech/dptech/deepmd-kit:3.1.3
deepmd-kit: 3.1.3
PyTorch: 2.10.0
GPU: NVIDIA RTX 4090 (issue is not GPU-related)
Model file: dpa-2.4-7M.pt (80.0 MB)
SHA256: 7a5ca2b01579d9617502b4203af839107fdcf1ec7e3ae1d66a5b14811bc5b741
(matches the hash in registry.py — confirmed same file)
Download source tested: https://bohrium.oss-cn-zhangjiakou.aliyuncs.com/13756/27666/store/upload/cd12300a-d3e6-4de9-9783-dd9899376cae/dpa-2.4-7M.pt
Steps to reproduce
# Any of these three paths fail:
# 1. dp show
dp --pt show dpa-2.4-7M.pt model-branch
# → RuntimeError: Missing key(s) in state_dict ...
# 2. dp freeze
dp --pt freeze -c dpa-2.4-7M.pt -o frozen.pth --head Omat24
# → same RuntimeError
# 3. ASE calculator
python3 -c "from deepmd.calculator import DP; calc = DP('dpa-2.4-7M.pt', head='Omat24')"
# → same RuntimeError
Control: DPA-3.1-3M works fine
In the same environment, DPA-3.1-3M loads, shows branches, freezes, and runs inference without any issues:
dp --pt show DPA-3.1-3M.pt model-branch # ✅ lists 31 branches
dp --pt freeze -c DPA-3.1-3M.pt -o frozen_dpa3.pth --head Omat24 # ✅ 18.0 MB
Analysis
The error occurs at deepmd/pt/infer/deep_eval.py:173:
self.dp.load_state_dict(state_dict)
The current ModelWrapper definition expects 4 keys related to repinit type embedding and compression data that do not exist in the DPA-2.4-7M checkpoint. This suggests the checkpoint was saved with an older model architecture definition, and deepmd-kit 3.1.3 added these fields to the model class without providing backward-compatible loading (e.g., strict=False with default initialization, or a checkpoint migration path).
Suggested Fix
One or more of:
- Use
strict=False in load_state_dict and initialize the missing keys with sensible defaults (zeros / empty tensors)
- Add a checkpoint migration utility (
dp --pt migrate) that patches old checkpoints
- Re-export DPA-2.4-7M with the new model class so the checkpoint includes the expected keys
- Add version metadata to checkpoints and auto-migrate on load
Impact
- Users cannot use DPA-2.4-7M with deepmd-kit 3.1.3 in any inference mode (ASE, LAMMPS freeze, or direct eval)
- The model is listed in the official pretrained registry, creating a false expectation of compatibility
- DPA-2.4-7M has domain-specific heads (e.g.,
H2O_H2O_PD, Electrolyte) that are not available in DPA-3.x models
Investigated and reported by MatMaster (AI agent for computational materials science)
Co-authored-by: @SchrodingersCattt
Summary
DPA-2.4-7Mis registered in the v3.1.3 pretrained model registry (deepmd/pretrained/registry.py, added via PR #5307), but deepmd-kit 3.1.3 cannot actually load this checkpoint. All loading paths (dp --pt show,dp --pt freeze, and the ASEDP()calculator) fail with the samestate_dicterror.Error
Reproduction
Environment:
registry.dp.tech/dptech/deepmd-kit:3.1.3deepmd-kit: 3.1.3
PyTorch: 2.10.0
GPU: NVIDIA RTX 4090 (issue is not GPU-related)
Model file:
dpa-2.4-7M.pt(80.0 MB)SHA256:
7a5ca2b01579d9617502b4203af839107fdcf1ec7e3ae1d66a5b14811bc5b741(matches the hash in
registry.py— confirmed same file)Download source tested:
https://bohrium.oss-cn-zhangjiakou.aliyuncs.com/13756/27666/store/upload/cd12300a-d3e6-4de9-9783-dd9899376cae/dpa-2.4-7M.ptSteps to reproduce
Control: DPA-3.1-3M works fine
In the same environment, DPA-3.1-3M loads, shows branches, freezes, and runs inference without any issues:
Analysis
The error occurs at
deepmd/pt/infer/deep_eval.py:173:The current
ModelWrapperdefinition expects 4 keys related torepinittype embedding and compression data that do not exist in the DPA-2.4-7M checkpoint. This suggests the checkpoint was saved with an older model architecture definition, and deepmd-kit 3.1.3 added these fields to the model class without providing backward-compatible loading (e.g.,strict=Falsewith default initialization, or a checkpoint migration path).Suggested Fix
One or more of:
strict=Falseinload_state_dictand initialize the missing keys with sensible defaults (zeros / empty tensors)dp --pt migrate) that patches old checkpointsImpact
H2O_H2O_PD,Electrolyte) that are not available in DPA-3.x modelsInvestigated and reported by MatMaster (AI agent for computational materials science)
Co-authored-by: @SchrodingersCattt