perf: use torch.embedding for type embedding (deepmodeling#4747)

caic99 · web-flow · commit 81761735682d · 2025-05-20T08:15:34.000Z
The backward step of indexing operation is costly on GPU. Using dedicated `torch.embedding` mitigates this problem. <details><summary>Profiling results</summary> <p> Before: 32ms <img width="612" alt="image" src="https://github.com/user-attachments/assets/6e2a4de1-433a-4b6a-8b59-a8458d66897c" /> --- After: 0.5ms <img width="334" alt="image" src="https://github.com/user-attachments/assets/199ac925-8382-4a43-a6bb-584bb60159b2" /> </p> </details>  ## Summary by CodeRabbit - **Refactor** - Updated the embedding lookup mechanism in the model, potentially improving how embeddings are retrieved internally. No changes to the user interface or method signatures.
diff --git a/deepmd/pt/model/network/network.py b/deepmd/pt/model/network/network.py
@@ -286,7 +286,7 @@ def forward(self, atype):
         type_embedding:
 
         """
-        return self.embedding(atype.device)[atype]
+        return torch.embedding(self.embedding(atype.device), atype)
 
     def get_full_embedding(self, device: torch.device):
         """