doc: update DPA3 doc and example (#4655)

iProzd · njzjz · web-flow · commit 0918b2218647 · 2025-03-29T05:04:23.000Z
&lt;!-- This is an auto-generated comment: release notes by coderabbit.ai
--&gt;
## Summary by CodeRabbit

- **Documentation**
  - Updated documentation to correctly reference a DPA-2 example.
- Introduced new documentation for the advanced DPA-3 model outlining
its capabilities, training benchmarks, and installation requirements.
- Expanded the documentation index and spin configuration sections to
include DPA-3.

- **New Features**
- Added a README with configuration details for training a 6-layer DPA-3
model.
- Provided a comprehensive JSON configuration file with training
parameters.
  - Updated simulation instructions to support both DPA-2 and DPA-3.

- **Tests**
  - Extended testing to cover DPA-3 configurations.
&lt;!-- end of auto-generated comment: release notes by coderabbit.ai --&gt;

---------

Signed-off-by: Duo &lt;50307526+iProzd@users.noreply.github.com&gt;
Co-authored-by: Jinzhe Zeng &lt;jinzhe.zeng@rutgers.edu&gt;
diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
@@ -26,7 +26,7 @@ When using the JAX backend, 2 or more MPI ranks are not supported. One must set
 atom_modify map yes
 ```
 
-See the example `examples/water/lmp/jax_dpa2.lammps`.
+See the example `examples/water/lmp/jax_dpa.lammps`.
 
 ## Data format
 
diff --git a/doc/model/dpa3.md b/doc/model/dpa3.md
@@ -0,0 +1,76 @@
+# Descriptor DPA-3 {{ pytorch_icon }} {{ jax_icon }} {{ dpmodel_icon }}
+
+:::{note}
+**Supported backends**: PyTorch {{ pytorch_icon }}, JAX {{ jax_icon }}, DP {{ dpmodel_icon }}
+:::
+
+DPA-3 is an advanced interatomic potential leveraging the message passing architecture.
+Designed as a large atomic model (LAM), DPA-3 is tailored to integrate and simultaneously train on datasets from various disciplines,
+encompassing diverse chemical and materials systems across different research domains.
+Its model design ensures exceptional fitting accuracy and robust generalization both within and beyond the training domain.
+Furthermore, DPA-3 maintains energy conservation and respects the physical symmetries of the potential energy surface,
+making it a dependable tool for a wide range of scientific applications.
+
+Reference: will be released soon.
+
+Training example: `examples/water/dpa3/input_torch.json`.
+
+## Hyperparameter tests
+
+We systematically conducted DPA-3 training on six representative DFT datasets (available at [AIS-Square](https://www.aissquare.com/datasets/detail?pageType=datasets&name=DPA3_hyperparameter_search&id=316)):
+metallic systems (`Alloy`, `AlMgCu`, `W`), covalent material (`Boron`), molecular system (`Drug`), and liquid water (`Water`).
+Under consistent training conditions (0.5M training steps, batch_size "auto:128"),
+we rigorously evaluated the impacts of some critical hyperparameters on validation accuracy.
+
+The comparative analysis focused on average RMSEs (Root Mean Square Error) for both energy, force and virial predictions across all six systems,
+with results tabulated below to guide scenario-specific hyperparameter selection:
+
+| Model            | comment         | nlayers | n_dim   | e_dim  | a_dim | e_sel   | a_sel  | start_lr | stop_lr  | loss prefactors           | rmse_e (meV/atom) | rmse_f (meV/Å) | rmse_v (meV/atom) | Training wall time (h) |
+| ---------------- | --------------- | ------- | ------- | ------ | ----- | ------- | ------ | -------- | -------- | ------------------------- | ----------------- | -------------- | ----------------- | ---------------------- |
+| DPA3-L3          | Default         | 3       | 256     | 128    | 32    | 120     | 30     | 1e-3     | 3e-5     | 0.2\|20, 100\|60, 0.02\|1 | 5.74              | 85.4           | 43.1              | 9.8                    |
+|                  | Small dimension | 3       | **128** | **64** | 32    | 120     | 30     | 1e-3     | 3e-5     | 0.2\|20, 100\|60, 0.02\|1 | 6.99              | 93.6           | 46.7              | 8.0                    |
+|                  | Large sel       | 3       | 256     | 128    | 32    | **154** | **48** | 1e-3     | 3e-5     | 0.2\|20, 100\|60, 0.02\|1 | 5.70              | 83.7           | 43.4              | 14.1                   |
+| DPA3-L6          | Default         | 6       | 256     | 128    | 32    | 120     | 30     | 1e-3     | 3e-5     | 0.2\|20, 100\|60, 0.02\|1 | 4.85              | 79.9           | 39.7              | 19.2                   |
+|                  | Small dimension | 6       | **128** | **64** | 32    | 120     | 30     | 1e-3     | 3e-5     | 0.2\|20, 100\|60, 0.02\|1 | 5.11              | 77.7           | 41.2              | 14.1                   |
+|                  | Large sel       | 6       | 256     | 128    | 32    | **154** | **48** | 1e-3     | 3e-5     | 0.2\|20, 100\|60, 0.02\|1 | 4.76              | 78.4           | 40.2              | 31.8                   |
+| DPA2-L6 (medium) | Default         | 6       | -       | -      | -     | -       | -      | 1e-3     | 3.51e-08 | 0.02\|1, 1000\|1, 0.02\|1 | 12.12             | 109.3          | 83.1              | 12.2                   |
+
+The loss prefactors (0.2|20, 100|60, 0.02|1) correspond to (`start_pref_e`|`limit_pref_e`, `start_pref_f`|`limit_pref_f`, `start_pref_v`|`limit_pref_v`) respectively.
+Virial RMSEs were averaged exclusively for systems containing virial labels (`Alloy`, `AlMgCu`, `W`, and `Boron`).
+
+Note that we set `float32` in all DPA-3 models, while `float64` in other models by default.
+
+## Requirements of installation from source code {{ pytorch_icon }}
+
+To run the DPA-3 model on LAMMPS via source code installation
+(users can skip this step if using [easy installation](../install/easy-install.md)),
+the custom OP library for Python interface integration must be compiled and linked
+during the [model freezing process](../freeze/freeze.md).
+
+The customized OP library for the Python interface can be installed by setting environment variable {envvar}`DP_ENABLE_PYTORCH` to `1` during installation.
+
+If one runs LAMMPS with MPI, the customized OP library for the C++ interface should be compiled against the same MPI library as the runtime MPI.
+If one runs LAMMPS with MPI and CUDA devices, it is recommended to compile the customized OP library for the C++ interface with a [CUDA-Aware MPI](https://developer.nvidia.com/mpi-solutions-gpus) library and CUDA,
+otherwise the communication between GPU cards falls back to the slower CPU implementation.
+
+## Limitations of the JAX backend with LAMMPS {{ jax_icon }}
+
+When using the JAX backend, 2 or more MPI ranks are not supported. One must set `map` to `yes` using the [`atom_modify`](https://docs.lammps.org/atom_modify.html) command.
+
+```lammps
+atom_modify map yes
+```
+
+See the example `examples/water/lmp/jax_dpa.lammps`.
+
+## Data format
+
+DPA-3 supports both the [standard data format](../data/system.md) and the [mixed type data format](../data/system.md#mixed-type).
+
+## Type embedding
+
+Type embedding is within this descriptor with the same dimension as the node embedding: {ref}`n_dim <model[standard]/descriptor[dpa3]/repflow/n_dim>` argument.
+
+## Model compression
+
+Model compression is not supported in this descriptor.
diff --git a/doc/model/index.rst b/doc/model/index.rst
@@ -10,6 +10,7 @@ Model
    train-se-e3
    train-se-atten
    dpa2
+   dpa3
    train-hybrid
    sel
    train-energy
diff --git a/doc/model/train-energy-spin.md b/doc/model/train-energy-spin.md
@@ -51,6 +51,7 @@ In PyTorch/DP, the spin implementation is more flexible and so far supports the
 - `se_e2_a`
 - `dpa1`(`se_atten`)
 - `dpa2`
+- `dpa3`
 
 See `se_e2_a` examples in `$deepmd_source_dir/examples/spin/se_e2_a/input_torch.json`, the {ref}`spin <model/spin>` section is defined as the following with a much more clear interface:
 
diff --git a/examples/water/dpa3/README.md b/examples/water/dpa3/README.md
@@ -0,0 +1,4 @@
+# Input for the DPA-3 model
+
+This directory stores configuration files for training the 6-layer DPA-3 model.
+For comprehensive hyperparameter selection, consult the [DPA-3 documentation](../../../doc/model/dpa3.md/#hyperparameter-tests).
diff --git a/examples/water/dpa3/input_torch.json b/examples/water/dpa3/input_torch.json
@@ -0,0 +1,94 @@
+{
+  "_comment": "that's all",
+  "model": {
+    "type_map": [
+      "O",
+      "H"
+    ],
+    "descriptor": {
+      "type": "dpa3",
+      "repflow": {
+        "n_dim": 256,
+        "e_dim": 128,
+        "a_dim": 32,
+        "nlayers": 6,
+        "e_rcut": 6.0,
+        "e_rcut_smth": 3.0,
+        "e_sel": 120,
+        "a_rcut": 4.0,
+        "a_rcut_smth": 2.0,
+        "a_sel": 30,
+        "axis_neuron": 4,
+        "skip_stat": true,
+        "a_compress_rate": 1,
+        "a_compress_e_rate": 2,
+        "a_compress_use_split": true,
+        "update_angle": true,
+        "update_style": "res_residual",
+        "update_residual": 0.1,
+        "update_residual_init": "const"
+      },
+      "activation_function": "silut:10.0",
+      "use_tebd_bias": false,
+      "precision": "float32",
+      "concat_output_tebd": false
+    },
+    "fitting_net": {
+      "neuron": [
+        240,
+        240,
+        240
+      ],
+      "resnet_dt": true,
+      "precision": "float32",
+      "activation_function": "silut:10.0",
+      "seed": 1,
+      "_comment": " that's all"
+    },
+    "_comment": " that's all"
+  },
+  "learning_rate": {
+    "type": "exp",
+    "decay_steps": 5000,
+    "start_lr": 0.001,
+    "stop_lr": 3e-5,
+    "_comment": "that's all"
+  },
+  "loss": {
+    "type": "ener",
+    "start_pref_e": 0.2,
+    "limit_pref_e": 20,
+    "start_pref_f": 100,
+    "limit_pref_f": 60,
+    "start_pref_v": 0.02,
+    "limit_pref_v": 1,
+    "_comment": " that's all"
+  },
+  "training": {
+    "stat_file": "./dpa3.hdf5",
+    "training_data": {
+      "systems": [
+        "../data/data_0",
+        "../data/data_1",
+        "../data/data_2"
+      ],
+      "batch_size": 1,
+      "_comment": "that's all"
+    },
+    "validation_data": {
+      "systems": [
+        "../data/data_3"
+      ],
+      "batch_size": 1,
+      "_comment": "that's all"
+    },
+    "numb_steps": 1000000,
+    "warmup_steps": 0,
+    "gradient_max_norm": 5.0,
+    "seed": 10,
+    "disp_file": "lcurve.out",
+    "disp_freq": 100,
+    "save_freq": 2000,
+    "_comment": "that's all"
+  }
+}
diff --git a/examples/water/lmp/jax_dpa.lammps b/examples/water/lmp/jax_dpa.lammps
@@ -5,7 +5,7 @@
 units           metal
 boundary        p p p
 atom_style      atomic
-# Below line is required when using DPA-2 with the JAX backend
+# Below line is required when using DPA-2/3 with the JAX backend
 atom_modify     map yes
 
 neighbor        2.0 bin
diff --git a/source/tests/common/test_examples.py b/source/tests/common/test_examples.py
@@ -58,6 +58,7 @@
     p_examples / "water" / "dpa2" / "input_torch_medium.json",
     p_examples / "water" / "dpa2" / "input_torch_large.json",
     p_examples / "water" / "dpa2" / "input_torch_compressible.json",
+    p_examples / "water" / "dpa3" / "input_torch.json",
     p_examples / "property" / "train" / "input_torch.json",
     p_examples / "water" / "se_e3_tebd" / "input_torch.json",
     p_examples / "hessian" / "single_task" / "input.json",