[Bug] DPA-2.4-7M checkpoint cannot be loaded by deepmd-kit v3.1.3 (registered in pretrained registry but incompatible)

## Summary

`DPA-2.4-7M` is registered in the v3.1.3 pretrained model registry ([`deepmd/pretrained/registry.py`](https://github.com/deepmodeling/deepmd-kit/blob/v3.1.3/deepmd/pretrained/registry.py), added via PR #5307), but deepmd-kit 3.1.3 cannot actually load this checkpoint. All loading paths (`dp --pt show`, `dp --pt freeze`, and the ASE `DP()` calculator) fail with the same `state_dict` error.

## Error

```
RuntimeError: Error(s) in loading state_dict for ModelWrapper:
    Missing key(s) in state_dict:
      "model.Default.atomic_model.descriptor.repinit.type_embd_data",
      "model.Default.atomic_model.descriptor.repinit_three_body.type_embd_data",
      "model.Default.atomic_model.descriptor.repinit_three_body.compress_info.0",
      "model.Default.atomic_model.descriptor.repinit_three_body.compress_data.0"
```

## Reproduction

**Environment**: `registry.dp.tech/dptech/deepmd-kit:3.1.3`  
**deepmd-kit**: 3.1.3  
**PyTorch**: 2.10.0  
**GPU**: NVIDIA RTX 4090 (issue is not GPU-related)

**Model file**: `dpa-2.4-7M.pt` (80.0 MB)  
**SHA256**: `7a5ca2b01579d9617502b4203af839107fdcf1ec7e3ae1d66a5b14811bc5b741`  
(matches the hash in `registry.py` — confirmed same file)

**Download source tested**: `https://bohrium.oss-cn-zhangjiakou.aliyuncs.com/13756/27666/store/upload/cd12300a-d3e6-4de9-9783-dd9899376cae/dpa-2.4-7M.pt`

### Steps to reproduce

```bash
# Any of these three paths fail:

# 1. dp show
dp --pt show dpa-2.4-7M.pt model-branch
# → RuntimeError: Missing key(s) in state_dict ...

# 2. dp freeze  
dp --pt freeze -c dpa-2.4-7M.pt -o frozen.pth --head Omat24
# → same RuntimeError

# 3. ASE calculator
python3 -c "from deepmd.calculator import DP; calc = DP('dpa-2.4-7M.pt', head='Omat24')"
# → same RuntimeError
```

### Control: DPA-3.1-3M works fine

In the **same environment**, DPA-3.1-3M loads, shows branches, freezes, and runs inference without any issues:

```bash
dp --pt show DPA-3.1-3M.pt model-branch   # ✅ lists 31 branches
dp --pt freeze -c DPA-3.1-3M.pt -o frozen_dpa3.pth --head Omat24  # ✅ 18.0 MB
```

## Analysis

The error occurs at `deepmd/pt/infer/deep_eval.py:173`:
```python
self.dp.load_state_dict(state_dict)
```

The current `ModelWrapper` definition expects 4 keys related to `repinit` type embedding and compression data that do not exist in the DPA-2.4-7M checkpoint. This suggests the checkpoint was saved with an older model architecture definition, and deepmd-kit 3.1.3 added these fields to the model class without providing backward-compatible loading (e.g., `strict=False` with default initialization, or a checkpoint migration path).

## Suggested Fix

One or more of:
1. Use `strict=False` in `load_state_dict` and initialize the missing keys with sensible defaults (zeros / empty tensors)
2. Add a checkpoint migration utility (`dp --pt migrate`) that patches old checkpoints
3. Re-export DPA-2.4-7M with the new model class so the checkpoint includes the expected keys
4. Add version metadata to checkpoints and auto-migrate on load

## Impact

- Users cannot use DPA-2.4-7M with deepmd-kit 3.1.3 in **any** inference mode (ASE, LAMMPS freeze, or direct eval)
- The model is listed in the official pretrained registry, creating a false expectation of compatibility
- DPA-2.4-7M has domain-specific heads (e.g., `H2O_H2O_PD`, `Electrolyte`) that are not available in DPA-3.x models

Investigated and reported by MatMaster (AI agent for computational materials science)
Co-authored-by: @SchrodingersCattt



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] DPA-2.4-7M checkpoint cannot be loaded by deepmd-kit v3.1.3 (registered in pretrained registry but incompatible) #5444

Summary

Error

Reproduction

Steps to reproduce

Control: DPA-3.1-3M works fine

Analysis

Suggested Fix

Impact

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[Bug] DPA-2.4-7M checkpoint cannot be loaded by deepmd-kit v3.1.3 (registered in pretrained registry but incompatible) #5444

Description

Summary

Error

Reproduction

Steps to reproduce

Control: DPA-3.1-3M works fine

Analysis

Suggested Fix

Impact

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions