refact(dpmodel,pt_expt): embedding net (#5205)

wanghan-iapcm · Han Wang · njzjz · web-flow · commit cd67bbeabfe2 · 2026-02-10T16:34:54.000Z
# EmbeddingNet Refactoring: Factory Function to Concrete Class ## Summary This refactoring converts `EmbeddingNet` from a factory-generated dynamic class to a concrete class in the dpmodel backend. This change enables the auto-detection registry mechanism in pt_expt to work seamlessly with EmbeddingNet attributes. This PR is considered after #5194 and #5204 ## Motivation **Before**: `EmbeddingNet` was created by a factory function `make_embedding_network(NativeNet, NativeLayer)`, producing a dynamically-typed class `make_embedding_network.<locals>.EN`. This caused two problems: 1. **Cannot be registered**: Dynamic classes can't be imported or registered at module import time in the pt_expt registry 2. **Name-based hacks required**: pt_expt wrappers had to explicitly check for `name == "embedding_net"` in `__setattr__` instead of using the type-based auto-detection mechanism **After**: `EmbeddingNet` is now a concrete class that can be registered in the pt_expt auto-conversion registry, eliminating the need for name-based special cases. ## Changes ### 1. dpmodel: Concrete `EmbeddingNet` class **File**: `deepmd/dpmodel/utils/network.py` - Replaced factory-generated class with concrete `EmbeddingNet(NativeNet)` class - Moved constructor logic from factory into `__init__` - Fixed `deserialize` to use `type(obj.layers[0])` instead of hardcoding `super(EmbeddingNet, obj)`, allowing pt_expt subclass to preserve its converted torch layers - Kept `make_embedding_network` factory for pt/pd backends that use different base classes (MLP) ```python class EmbeddingNet(NativeNet): """The embedding network.""" def __init__(self, in_dim, neuron=[24, 48, 96], activation_function="tanh", resnet_dt=False, precision=DEFAULT_PRECISION, seed=None, bias=True, trainable=True): layers = [] i_in = in_dim if isinstance(trainable, bool): trainable = [trainable] * len(neuron) for idx, ii in enumerate(neuron): i_ot = ii layers.append( NativeLayer( i_in, i_ot, bias=bias, use_timestep=resnet_dt, activation_function=activation_function, resnet=True, precision=precision, seed=child_seed(seed, idx), trainable=trainable[idx] ).serialize() ) i_in = i_ot super().__init__(layers) self.in_dim = in_dim self.neuron = neuron self.activation_function = activation_function self.resnet_dt = resnet_dt self.precision = precision self.bias = bias @classmethod def deserialize(cls, data): data = data.copy() check_version_compatibility(data.pop("@Version", 1), 2, 1) data.pop("@Class", None) layers = data.pop("layers") obj = cls(**data) # Use type(obj.layers[0]) to respect subclass layer types layer_type = type(obj.layers[0]) obj.layers = type(obj.layers)( [layer_type.deserialize(layer) for layer in layers] ) return obj ``` ### 2. pt_expt: Wrapper and registration **File**: `deepmd/pt_expt/utils/network.py` - Created `EmbeddingNet(EmbeddingNetDP, torch.nn.Module)` wrapper - Converts dpmodel layers to pt_expt `NativeLayer` (torch modules) in `__init__` - Registered in auto-conversion registry ```python class EmbeddingNet(EmbeddingNetDP, torch.nn.Module): def __init__(self, *args: Any, **kwargs: Any) -> None: torch.nn.Module.__init__(self) EmbeddingNetDP.__init__(self, *args, **kwargs) # Convert dpmodel layers to pt_expt NativeLayer self.layers = torch.nn.ModuleList( [NativeLayer.deserialize(layer.serialize()) for layer in self.layers] ) def __call__(self, *args: Any, **kwargs: Any) -> Any: return torch.nn.Module.__call__(self, *args, **kwargs) def forward(self, x: torch.Tensor) -> torch.Tensor: return self.call(x) register_dpmodel_mapping( EmbeddingNetDP, lambda v: EmbeddingNet.deserialize(v.serialize()), ) ``` ### 3. TypeEmbedNet: Simplified to use registry **File**: `deepmd/pt_expt/utils/type_embed.py` - No longer needs name-based `embedding_net` check in `__setattr__` - Uses common `dpmodel_setattr` which auto-converts via registry - Imports `network` module to ensure `EmbeddingNet` registration happens first ```python class TypeEmbedNet(TypeEmbedNetDP, torch.nn.Module): def __setattr__(self, name: str, value: Any) -> None: # Auto-converts embedding_net via registry handled, value = dpmodel_setattr(self, name, value) if not handled: super().__setattr__(name, value) ``` ## Tests ### dpmodel tests **File**: `source/tests/common/dpmodel/test_network.py` Added to `TestEmbeddingNet` class: 1. **`test_is_concrete_class`**: Verifies `EmbeddingNet` is now a concrete class, not factory output 2. **`test_forward_pass`**: Tests dpmodel forward pass produces correct shapes 3. **`test_trainable_parameter_variants`**: Tests different trainable configurations (all trainable, all frozen, mixed) (The existing `test_embedding_net` test already covers serialization/deserialization round-trip) ### pt_expt integration tests **File**: `source/tests/pt_expt/utils/test_network.py` Created `TestEmbeddingNetRefactor` test suite with 8 tests: 1. **`test_pt_expt_embedding_net_wraps_dpmodel`**: Verifies pt_expt wrapper inherits correctly and converts layers 2. **`test_pt_expt_embedding_net_forward`**: Tests pt_expt forward pass returns torch.Tensor 3. **`test_serialization_round_trip_pt_expt`**: Tests pt_expt serialize/deserialize 4. **`test_deserialize_preserves_layer_type`**: Tests the key fix - `deserialize` uses `type(obj.layers[0])` to preserve pt_expt's torch layers 5. **`test_cross_backend_consistency`**: Tests numerical consistency between dpmodel and pt_expt 6. **`test_registry_converts_dpmodel_to_pt_expt`**: Tests `try_convert_module` auto-converts dpmodel to pt_expt 7. **`test_auto_conversion_in_setattr`**: Tests `dpmodel_setattr` auto-converts EmbeddingNet attributes 8. **`test_trainable_parameter_handling`**: Tests trainable vs frozen parameters work correctly in pt_expt ## Verification All tests pass: ```bash # dpmodel EmbeddingNet tests python -m pytest source/tests/common/dpmodel/test_network.py::TestEmbeddingNet -v # 4 passed in 0.41s # pt_expt EmbeddingNet integration tests python -m pytest source/tests/pt_expt/utils/test_network.py::TestEmbeddingNetRefactor -v # 8 passed in 0.41s # All pt_expt network tests python -m pytest source/tests/pt_expt/utils/test_network.py -v # 10 passed in 0.41s # Descriptor tests (verify refactoring doesn't break existing code) python -m pytest source/tests/pt_expt/descriptor/test_se_e2_a.py -v -k consistency # 1 passed python -m pytest source/tests/universal/pt_expt/descriptor/test_descriptor.py -v # 8 passed in 3.27s ``` ## Benefits 1. **Type-based auto-detection**: No more name-based special cases in `__setattr__` 2. **Maintainability**: Single source of truth for EmbeddingNet in dpmodel 3. **Consistency**: Same pattern as other dpmodel classes (AtomExcludeMask, NetworkCollection, etc.) 4. **Future-proof**: New attributes in dpmodel automatically work in pt_expt via registry ## Backward Compatibility - Serialization format unchanged (version 2.1) - All existing tests pass - `make_embedding_network` factory kept for pt/pd backends - No changes to public API ## Files Changed ### Modified - `deepmd/dpmodel/utils/network.py`: Concrete EmbeddingNet class + deserialize fix - `deepmd/pt_expt/utils/network.py`: EmbeddingNet wrapper + registration - `deepmd/pt_expt/utils/type_embed.py`: Simplified to use registry - `source/tests/common/dpmodel/test_network.py`: Added dpmodel EmbeddingNet tests (3 new tests) - `source/tests/pt_expt/utils/test_network.py`: Added pt_expt integration tests (8 new tests) ### No changes required - All descriptor wrappers (se_e2_a, se_r, se_t, se_t_tebd) automatically work via registry - No changes to dpmodel logic or array_api_compat code  ## Summary by CodeRabbit ## Release Notes * **New Features** * Added PyTorch compatibility layer enabling DPModel neural network components to be used with PyTorch workflows for training and inference * Enhanced embedding network with explicit serialization and deserialization capabilities * **Refactor** * Restructured embedding network with explicit class design for improved type stability and control flow management  --------- Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn> Co-authored-by: Han Wang <wang_han@iapcm.ac.cn> Co-authored-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
diff --git a/deepmd/dpmodel/utils/network.py b/deepmd/dpmodel/utils/network.py
@@ -785,7 +785,115 @@ def deserialize(cls, data: dict) -> "EmbeddingNet":
     return EN
 
 
-EmbeddingNet = make_embedding_network(NativeNet, NativeLayer)
+class EmbeddingNet(NativeNet):
+    """The embedding network.
+
+    Parameters
+    ----------
+    in_dim
+        Input dimension.
+    neuron
+        The number of neurons in each layer. The output dimension
+        is the same as the dimension of the last layer.
+    activation_function
+        The activation function.
+    resnet_dt
+        Use time step at the resnet architecture.
+    precision
+        Floating point precision for the model parameters.
+    seed : int, optional
+        Random seed.
+    bias : bool, Optional
+        Whether to use bias in the embedding layer.
+    trainable : bool or list[bool], Optional
+        Whether the weights are trainable. If a list, each element
+        corresponds to a layer.
+    """
+
+    def __init__(
+        self,
+        in_dim: int,
+        neuron: list[int] = [24, 48, 96],
+        activation_function: str = "tanh",
+        resnet_dt: bool = False,
+        precision: str = DEFAULT_PRECISION,
+        seed: int | list[int] | None = None,
+        bias: bool = True,
+        trainable: bool | list[bool] = True,
+    ) -> None:
+        layers = []
+        i_in = in_dim
+        if isinstance(trainable, bool):
+            trainable = [trainable] * len(neuron)
+        for idx, ii in enumerate(neuron):
+            i_ot = ii
+            layers.append(
+                NativeLayer(
+                    i_in,
+                    i_ot,
+                    bias=bias,
+                    use_timestep=resnet_dt,
+                    activation_function=activation_function,
+                    resnet=True,
+                    precision=precision,
+                    seed=child_seed(seed, idx),
+                    trainable=trainable[idx],
+                ).serialize()
+            )
+            i_in = i_ot
+        super().__init__(layers)
+        self.in_dim = in_dim
+        self.neuron = neuron
+        self.activation_function = activation_function
+        self.resnet_dt = resnet_dt
+        self.precision = precision
+        self.bias = bias
+
+    def serialize(self) -> dict:
+        """Serialize the network to a dict.
+
+        Returns
+        -------
+        dict
+            The serialized network.
+        """
+        return {
+            "@class": "EmbeddingNetwork",
+            "@version": 2,
+            "in_dim": self.in_dim,
+            "neuron": self.neuron.copy(),
+            "activation_function": self.activation_function,
+            "resnet_dt": self.resnet_dt,
+            "bias": self.bias,
+            # make deterministic
+            "precision": np.dtype(PRECISION_DICT[self.precision]).name,
+            "layers": [layer.serialize() for layer in self.layers],
+        }
+
+    @classmethod
+    def deserialize(cls, data: dict) -> "EmbeddingNet":
+        """Deserialize the network from a dict.
+
+        Parameters
+        ----------
+        data : dict
+            The dict to deserialize from.
+        """
+        data = data.copy()
+        check_version_compatibility(data.pop("@version", 1), 2, 1)
+        data.pop("@class", None)
+        layers = data.pop("layers")
+        obj = cls(**data)
+        # Reinitialize layers from serialized data, using the same layer type
+        # that __init__ created (respects subclass overrides via MRO).
+        if obj.layers:
+            layer_type = type(obj.layers[0])
+            obj.layers = type(obj.layers)(
+                [layer_type.deserialize(layer) for layer in layers]
+            )
+        else:
+            obj.layers = type(obj.layers)([])
+        return obj
 
 
 def make_fitting_network(
diff --git a/deepmd/pt/utils/env.py b/deepmd/pt/utils/env.py
@@ -34,7 +34,7 @@
     # only linux
     ncpus = len(os.sched_getaffinity(0))
 except AttributeError:
-    ncpus = os.cpu_count()
+    ncpus = os.cpu_count() or 1
 NUM_WORKERS = int(os.environ.get("NUM_WORKERS", min(4, ncpus)))
 if multiprocessing.get_start_method() != "fork":
     # spawn or forkserver does not support NUM_WORKERS > 0 for DataLoader
diff --git a/deepmd/pt_expt/utils/env.py b/deepmd/pt_expt/utils/env.py
@@ -34,7 +34,7 @@
     # only linux
     ncpus = len(os.sched_getaffinity(0))
 except AttributeError:
-    ncpus = os.cpu_count()
+    ncpus = os.cpu_count() or 1
 NUM_WORKERS = int(os.environ.get("NUM_WORKERS", min(4, ncpus)))
 if multiprocessing.get_start_method() != "fork":
     # spawn or forkserver does not support NUM_WORKERS > 0 for DataLoader
diff --git a/deepmd/pt_expt/utils/network.py b/deepmd/pt_expt/utils/network.py
@@ -10,11 +10,11 @@
 from deepmd.dpmodel.common import (
     NativeOP,
 )
+from deepmd.dpmodel.utils.network import EmbeddingNet as EmbeddingNetDP
 from deepmd.dpmodel.utils.network import LayerNorm as LayerNormDP
 from deepmd.dpmodel.utils.network import NativeLayer as NativeLayerDP
 from deepmd.dpmodel.utils.network import NetworkCollection as NetworkCollectionDP
 from deepmd.dpmodel.utils.network import (
-    make_embedding_network,
     make_fitting_network,
     make_multilayer_network,
 )
@@ -91,8 +91,27 @@ def forward(self, x: torch.Tensor) -> torch.Tensor:
         return self.call(x)
 
 
-class EmbeddingNet(make_embedding_network(NativeNet, NativeLayer)):
-    pass
+class EmbeddingNet(EmbeddingNetDP, torch.nn.Module):
+    def __init__(self, *args: Any, **kwargs: Any) -> None:
+        torch.nn.Module.__init__(self)
+        EmbeddingNetDP.__init__(self, *args, **kwargs)
+        # EmbeddingNetDP.__init__ creates dpmodel NativeLayer instances.
+        # Convert to pt_expt NativeLayer and wrap in ModuleList.
+        self.layers = torch.nn.ModuleList(
+            [NativeLayer.deserialize(layer.serialize()) for layer in self.layers]
+        )
+
+    def __call__(self, *args: Any, **kwargs: Any) -> Any:
+        return torch.nn.Module.__call__(self, *args, **kwargs)
+
+    def forward(self, x: torch.Tensor) -> torch.Tensor:
+        return self.call(x)
+
+
+register_dpmodel_mapping(
+    EmbeddingNetDP,
+    lambda v: EmbeddingNet.deserialize(v.serialize()),
+)
 
 
 class FittingNet(make_fitting_network(EmbeddingNet, NativeNet, NativeLayer)):
diff --git a/source/tests/common/dpmodel/test_network.py b/source/tests/common/dpmodel/test_network.py
@@ -180,6 +180,114 @@ def test_embedding_net(self) -> None:
             inp = np.ones([ni], dtype=get_xp_precision(np, prec))
             np.testing.assert_allclose(en0.call(inp), en1.call(inp))
 
+    def test_is_concrete_class(self) -> None:
+        """Verify EmbeddingNet is a concrete class, not factory-generated."""
+        in_dim = 4
+        neuron = [8, 16, 32]
+        net = EmbeddingNet(
+            in_dim=in_dim,
+            neuron=neuron,
+            activation_function="tanh",
+            resnet_dt=True,
+            precision="float64",
+        )
+        # Check it's the actual EmbeddingNet class, not a dynamic class
+        self.assertEqual(net.__class__.__name__, "EmbeddingNet")
+        self.assertEqual(net.__class__.__module__, "deepmd.dpmodel.utils.network")
+        # Verify it has the expected attributes
+        self.assertEqual(net.in_dim, in_dim)
+        self.assertEqual(net.neuron, neuron)
+        self.assertEqual(net.activation_function, "tanh")
+        self.assertEqual(net.resnet_dt, True)
+        self.assertEqual(len(net.layers), len(neuron))
+
+    def test_forward_pass(self) -> None:
+        """Test EmbeddingNet forward pass produces correct shapes."""
+        in_dim = 4
+        neuron = [8, 16, 32]
+        net = EmbeddingNet(
+            in_dim=in_dim,
+            neuron=neuron,
+            activation_function="tanh",
+            resnet_dt=True,
+            precision="float64",
+        )
+        rng = np.random.default_rng()
+        x = rng.standard_normal((5, in_dim))
+        out = net.call(x)
+        self.assertEqual(out.shape, (5, neuron[-1]))
+        self.assertEqual(out.dtype, np.float64)
+
+    def test_trainable_parameter_variants(self) -> None:
+        """Test EmbeddingNet with different trainable configurations."""
+        in_dim = 4
+        neuron = [8, 16]
+
+        # All trainable
+        net_trainable = EmbeddingNet(
+            in_dim=in_dim,
+            neuron=neuron,
+            trainable=True,
+        )
+        for layer in net_trainable.layers:
+            self.assertTrue(layer.trainable)
+
+        # All frozen
+        net_frozen = EmbeddingNet(
+            in_dim=in_dim,
+            neuron=neuron,
+            trainable=False,
+        )
+        for layer in net_frozen.layers:
+            self.assertFalse(layer.trainable)
+
+        # Mixed trainable
+        net_mixed = EmbeddingNet(
+            in_dim=in_dim,
+            neuron=neuron,
+            trainable=[True, False],
+        )
+        self.assertTrue(net_mixed.layers[0].trainable)
+        self.assertFalse(net_mixed.layers[1].trainable)
+
+    def test_empty_layers_round_trip(self) -> None:
+        """Test EmbeddingNet with empty neuron list (edge case for deserialize).
+
+        This tests the fix for IndexError when neuron=[] results in empty layers.
+        The deserialize method should handle this case without trying to access
+        layers[0] when the list is empty.
+        """
+        in_dim = 4
+        neuron = []  # Empty neuron list
+
+        # Create network with empty layers
+        net = EmbeddingNet(
+            in_dim=in_dim,
+            neuron=neuron,
+            activation_function="tanh",
+            resnet_dt=True,
+            precision="float64",
+        )
+
+        # Verify it has no layers
+        self.assertEqual(len(net.layers), 0)
+
+        # Serialize and deserialize
+        serialized = net.serialize()
+        net_restored = EmbeddingNet.deserialize(serialized)
+
+        # Verify restored network also has no layers
+        self.assertEqual(len(net_restored.layers), 0)
+        self.assertEqual(net_restored.in_dim, in_dim)
+        self.assertEqual(net_restored.neuron, neuron)
+
+        # Verify forward pass works (should return input unchanged)
+        rng = np.random.default_rng()
+        x = rng.standard_normal((5, in_dim))
+        out = net_restored.call(x)
+        # With no layers, output should equal input
+        np.testing.assert_allclose(out, x)
+
 
 class TestFittingNet(unittest.TestCase):
     def test_fitting_net(self) -> None:
diff --git a/source/tests/pt_expt/utils/test_network.py b/source/tests/pt_expt/utils/test_network.py