-
Notifications
You must be signed in to change notification settings - Fork 608
Fix: Square atom_norm in non-Huber energy and virial loss calculations. #5332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
f261151
247f053
1bea859
11be908
e8331ab
601b2ef
e86952b
38c97ff
60c5ff9
b097404
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -90,6 +90,13 @@ class EnergyLoss(Loss): | |
| If true, use L2 norm of force vectors for loss calculation when loss_func='mae' or use_huber is True. | ||
| Instead of computing loss on force components, computes loss on ||F_pred - F_label||_2. | ||
| This treats the force vector as a whole rather than three independent components. | ||
| intensive : bool | ||
| If true, energy and virial losses are computed as intensive quantities, | ||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||
| with system size. The default is false for backward compatibility with models trained | ||
| using deepmd-kit <= 3.0.1. | ||
| **kwargs | ||
| Other keyword arguments. | ||
| """ | ||
|
|
@@ -116,6 +123,7 @@ def __init__( | |
| huber_delta: float | list[float] = 0.01, | ||
| loss_func: str = "mse", | ||
| f_use_norm: bool = False, | ||
| intensive: bool = False, | ||
| **kwargs: Any, | ||
| ) -> None: | ||
| # Validate loss_func | ||
|
|
@@ -155,6 +163,7 @@ def __init__( | |
| self.use_huber = use_huber | ||
| self.huber_delta = huber_delta | ||
| self.f_use_norm = f_use_norm | ||
| self.intensive = intensive | ||
| if self.f_use_norm and not (self.use_huber or self.loss_func == "mae"): | ||
| raise RuntimeError( | ||
| "f_use_norm can only be True when use_huber or loss_func='mae'." | ||
|
|
@@ -256,11 +265,15 @@ def call( | |
|
|
||
| loss = 0 | ||
| more_loss = {} | ||
| # Normalization exponent controls loss scaling with system size: | ||
| # - norm_exp=2 (intensive=True): loss uses 1/N² scaling, making it independent of system size | ||
| # - norm_exp=1 (intensive=False, legacy): loss uses 1/N scaling, which varies with system size | ||
| norm_exp = 2 if self.intensive else 1 | ||
| if self.has_e: | ||
| if self.loss_func == "mse": | ||
| l2_ener_loss = xp.mean(xp.square(energy - energy_hat)) | ||
| if not self.use_huber: | ||
| loss += atom_norm_ener * (pref_e * l2_ener_loss) | ||
| loss += atom_norm_ener**norm_exp * (pref_e * l2_ener_loss) | ||
|
Comment on lines
+268
to
+276
|
||
| else: | ||
| l_huber_loss = custom_huber_loss( | ||
| atom_norm_ener * energy, | ||
|
|
@@ -335,7 +348,7 @@ def call( | |
| xp.square(virial_hat_reshape - virial_reshape), | ||
| ) | ||
| if not self.use_huber: | ||
| loss += atom_norm * (pref_v * l2_virial_loss) | ||
| loss += atom_norm**norm_exp * (pref_v * l2_virial_loss) | ||
| else: | ||
| l_huber_loss = custom_huber_loss( | ||
| atom_norm * virial_reshape, | ||
|
|
@@ -525,7 +538,7 @@ def serialize(self) -> dict: | |
| """ | ||
| return { | ||
| "@class": "EnergyLoss", | ||
| "@version": 2, | ||
| "@version": 3, | ||
| "starter_learning_rate": self.starter_learning_rate, | ||
| "start_pref_e": self.start_pref_e, | ||
| "limit_pref_e": self.limit_pref_e, | ||
|
|
@@ -546,6 +559,7 @@ def serialize(self) -> dict: | |
| "huber_delta": self.huber_delta, | ||
| "loss_func": self.loss_func, | ||
| "f_use_norm": self.f_use_norm, | ||
| "intensive": self.intensive, | ||
| } | ||
|
|
||
| @classmethod | ||
|
|
@@ -563,6 +577,10 @@ def deserialize(cls, data: dict) -> "Loss": | |
| The deserialized loss module | ||
| """ | ||
| data = data.copy() | ||
| check_version_compatibility(data.pop("@version"), 2, 1) | ||
| version = data.pop("@version") | ||
| check_version_compatibility(version, 3, 1) | ||
| data.pop("@class") | ||
| # Backward compatibility: version 1-2 used legacy normalization | ||
| if version < 3: | ||
| data.setdefault("intensive", False) | ||
| return cls(**data) | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -50,6 +50,13 @@ class EnergySpinLoss(Loss): | |||||||||||||||||||||||||||
| if true, the energy will be computed as \sum_i c_i E_i | ||||||||||||||||||||||||||||
| loss_func : str | ||||||||||||||||||||||||||||
| Loss function type: 'mse' or 'mae'. | ||||||||||||||||||||||||||||
| intensive : bool | ||||||||||||||||||||||||||||
| If true, energy and virial losses are computed as intensive quantities, | ||||||||||||||||||||||||||||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||||||||||||||||||||||||||||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||||||||||||||||||||||||||||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||||||||||||||||||||||||||||
| with system size. The default is false for backward compatibility with models trained | ||||||||||||||||||||||||||||
| using deepmd-kit <= 3.0.1. | ||||||||||||||||||||||||||||
|
Comment on lines
+54
to
+59
|
||||||||||||||||||||||||||||
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| If true, the MSE energy and virial terms use intensive normalization, | |
| i.e. an additional normalization by the square of the number of atoms | |
| (1/N^2) instead of the legacy (1/N) behavior. This keeps those MSE loss | |
| terms consistent with per-atom RMSE reporting and less dependent on | |
| system size. This option does not change the MAE formulation, which is | |
| handled separately. The default is false for backward compatibility with | |
| models trained using deepmd-kit <= 3.0.1. |
Copilot
AI
Apr 17, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Existing loss tests (e.g. source/tests/consistent/loss/test_ener_spin.py) don’t cover the new intensive option. Please extend tests to run with intensive=True and validate the expected 1/N² scaling for MSE energy/virial (and confirm serialize()/deserialize() preserve the flag/version bump).
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -61,6 +61,7 @@ def __init__( | |||||||||||||||||||||||||||||||
| use_huber: bool = False, | ||||||||||||||||||||||||||||||||
| huber_delta: float | list[float] = 0.01, | ||||||||||||||||||||||||||||||||
| f_use_norm: bool = False, | ||||||||||||||||||||||||||||||||
| intensive: bool = False, | ||||||||||||||||||||||||||||||||
| **kwargs: Any, | ||||||||||||||||||||||||||||||||
| ) -> None: | ||||||||||||||||||||||||||||||||
| r"""Construct a layer to compute loss on energy, force and virial. | ||||||||||||||||||||||||||||||||
|
|
@@ -119,6 +120,13 @@ def __init__( | |||||||||||||||||||||||||||||||
| f_use_norm : bool | ||||||||||||||||||||||||||||||||
| If True, use L2 norm of force vectors for loss calculation. | ||||||||||||||||||||||||||||||||
| Not implemented in PD backend, only for serialization compatibility. | ||||||||||||||||||||||||||||||||
| intensive : bool | ||||||||||||||||||||||||||||||||
| If true, energy and virial losses are computed as intensive quantities, | ||||||||||||||||||||||||||||||||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||||||||||||||||||||||||||||||||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||||||||||||||||||||||||||||||||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||||||||||||||||||||||||||||||||
| with system size. The default is false for backward compatibility with models trained | ||||||||||||||||||||||||||||||||
| using deepmd-kit <= 3.0.1. | ||||||||||||||||||||||||||||||||
|
Comment on lines
+124
to
+129
|
||||||||||||||||||||||||||||||||
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| Controls the normalization used for energy and virial losses in the | |
| non-Huber MSE path. If true, those terms use intensive normalization | |
| (1/N^2) instead of the legacy normalization (1/N), where N is the | |
| number of atoms. This makes the corresponding MSE loss consistent with | |
| per-atom RMSE reporting. | |
| This flag does not change MAE normalization, which remains 1/N, and it | |
| does not affect the Huber branch, which already operates on per-atom | |
| residuals scaled by 1/N regardless of this setting. The default is false | |
| for backward compatibility with models trained using deepmd-kit <= 3.0.1. |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -61,6 +61,7 @@ def __init__( | |||||||||||||||||||||||||
| use_huber: bool = False, | ||||||||||||||||||||||||||
| f_use_norm: bool = False, | ||||||||||||||||||||||||||
| huber_delta: float | list[float] = 0.01, | ||||||||||||||||||||||||||
| intensive: bool = False, | ||||||||||||||||||||||||||
| **kwargs: Any, | ||||||||||||||||||||||||||
| ) -> None: | ||||||||||||||||||||||||||
| r"""Construct a layer to compute loss on energy, force and virial. | ||||||||||||||||||||||||||
|
|
@@ -120,6 +121,13 @@ def __init__( | |||||||||||||||||||||||||
| The threshold delta (D) used for Huber loss, controlling transition between | ||||||||||||||||||||||||||
| L2 and L1 loss. It can be either one float shared by all terms or a list of | ||||||||||||||||||||||||||
| three values ordered as [energy, force, virial]. | ||||||||||||||||||||||||||
| intensive : bool | ||||||||||||||||||||||||||
| If true, energy and virial losses are computed as intensive quantities, | ||||||||||||||||||||||||||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||||||||||||||||||||||||||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||||||||||||||||||||||||||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||||||||||||||||||||||||||
| with system size. The default is false for backward compatibility with models trained | ||||||||||||||||||||||||||
| using deepmd-kit <= 3.0.1. | ||||||||||||||||||||||||||
|
Comment on lines
+125
to
+130
|
||||||||||||||||||||||||||
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| Controls size normalization for energy and virial loss terms. For the non-Huber | |
| MSE path, setting this to true applies 1/N^2 scaling, while false uses the legacy | |
| 1/N scaling. For MAE, the normalization remains 1/N. For Huber loss, residuals are | |
| first normalized by 1/N before applying the Huber formula, so this option does not | |
| provide a pure 1/N versus 1/N^2 toggle in that path. The default is false for | |
| backward compatibility with models trained using deepmd-kit <= 3.0.1. |
Copilot
AI
Mar 23, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are tests comparing losses across backends, but they don’t assert the intended natoms-scaling explicitly. Since this change fixes a subtle normalization bug (MSE energy/virial terms should scale with 1/natoms^2), consider adding a small regression test that checks the loss contribution scales correctly when natoms changes, so future refactors can’t accidentally revert to 1/natoms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot apply changes based on this feedback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
intensiveparameter doc says losses are normalized by 1/N² when enabled, but in code it only affects the non-Huber MSE energy/virial branches; MAE remains 1/N and the Huber path scales residuals by 1/N before applying Huber. Please narrow the wording so it accurately reflects which loss modes are impacted.