doc: add theory section for DPA-2 descriptor

OpenClaw · OpenClaw · commit 308552a17707 · 2026-02-24T07:54:00.000Z
Authored by OpenClaw (model: gpt-5.3-codex)
diff --git a/doc/model/dpa2.md b/doc/model/dpa2.md
@@ -8,6 +8,42 @@ The DPA-2 model implementation. See [DPA-2 paper](https://doi.org/10.1038/s41524
 
 Training example: `examples/water/dpa2/input_torch_medium.json`, see [README](../../examples/water/dpa2/README.md) for inputs in different levels.
 
+## Theory
+
+DPA-2 is an attention-based descriptor designed to learn expressive local atomic representations while preserving the physical symmetries required by interatomic potentials.
+
+### Local environment and representation
+
+For each central atom $\alpha$, neighbors $\beta \in \mathcal{N}(\alpha)$ are selected within a cutoff radius. DPA-2 encodes each local environment through geometric features (relative coordinates and derived invariants) and element/type information.
+
+The descriptor is built hierarchically:
+
+1. **Initial embedding**: geometric and type features are projected into latent channels.
+1. **Attention-based interaction**: stacked attention layers model neighbor-neighbor and center-neighbor correlations in the local environment.
+1. **Output descriptor**: atom-wise latent features after the final layer are used as descriptor outputs for downstream fitting/model components.
+
+### Attention-based message passing
+
+DPA-2 uses attention to aggregate neighbor information with data-dependent weights. Conceptually, each layer computes:
+
+```math
+\mathbf{h}_\alpha^{(l+1)} = \mathbf{h}_\alpha^{(l)} + \mathrm{Attn}^{(l)}\left(\mathbf{h}_\alpha^{(l)}, \{\mathbf{h}_\beta^{(l)}\}_{\beta\in\mathcal{N}(\alpha)}, \{\mathbf{g}_{\alpha\beta}\}_{\beta\in\mathcal{N}(\alpha)}\right)
+```
+
+where $\mathbf{h}$ denotes latent node features and $\mathbf{g}_{\alpha\beta}$ denotes geometry-conditioned pair features. Residual updates enable stable deep stacking.
+
+### Physical symmetries
+
+DPA-2 is constructed to satisfy key symmetry requirements of atomistic modeling:
+
+1. **Translational invariance**: only relative coordinates are used.
+1. **Rotational behavior**: internal geometric constructions are designed so that final scalar descriptor channels used downstream are rotationally invariant.
+1. **Permutational invariance**: atoms of the same species are treated identically under permutation (re-labeling) operations.
+
+### Multi-task training context
+
+DPA-2 is commonly used in a multi-task setting. The descriptor is shared, while task-specific heads/objectives are handled downstream. See [Multi-task training](../train/multi-task-training.md) for framework details.
+
 ## Requirements of installation {{ pytorch_icon }}
 
 If one wants to run the DPA-2 model on LAMMPS, the customized OP library for the Python interface must be installed when [freezing the model](../freeze/freeze.md).