Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332
Fix: Square atom_norm in non-Huber energy and virial loss calculations.#5332anyangml wants to merge 10 commits intodeepmodeling:masterfrom
Conversation
for more information, see https://pre-commit.ci
📝 WalkthroughWalkthroughThe PR updates the normalization factor in energy and virial loss calculations across four framework implementations (dpmodel, pd, pt, tf). For non-Huber MSE paths, the atomic normalization factor changes from Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Pull request overview
Fixes normalization in energy/virial MSE loss terms so that per-atom scaling is consistent with the Huber-loss code paths and the reported per-atom RMSE metrics.
Changes:
- Square
atom_norm/atom_norm_enerin non-Huber (MSE) energy loss accumulation. - Square
atom_normin non-Huber (MSE) virial loss accumulation. - Apply the same normalization fix across TF, PyTorch, Paddle, and
dpmodelbackends for the updated loss implementations.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
deepmd/tf/loss/ener.py |
Squares atom normalization in non-Huber MSE energy/virial loss accumulation (EnerStdLoss path). |
deepmd/pt/loss/ener.py |
Squares atom_norm in non-Huber MSE energy and virial loss accumulation. |
deepmd/pd/loss/ener.py |
Squares atom_norm in non-Huber MSE energy and virial loss accumulation. |
deepmd/dpmodel/loss/ener.py |
Squares atom normalization in non-Huber MSE energy and virial loss accumulation in the array-API backend. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if not self.use_huber: | ||
| loss += atom_norm * (pref_e * l2_ener_loss) | ||
| loss += atom_norm**2 * (pref_e * l2_ener_loss) | ||
| else: |
There was a problem hiding this comment.
There are tests comparing losses across backends, but they don’t assert the intended natoms-scaling explicitly. Since this change fixes a subtle normalization bug (MSE energy/virial terms should scale with 1/natoms^2), consider adding a small regression test that checks the loss contribution scales correctly when natoms changes, so future refactors can’t accidentally revert to 1/natoms.
| @@ -370,7 +370,9 @@ def build( | |||
| ) | |||
| if self.has_v: | |||
| if not self.use_huber: | |||
| loss += global_cvt_2_ener_float(atom_norm * (pref_v * l2_virial_loss)) | |||
| loss += global_cvt_2_ener_float( | |||
| atom_norm**2 * (pref_v * l2_virial_loss) | |||
| ) | |||
There was a problem hiding this comment.
This updates EnerStdLoss’ non-Huber MSE scaling to use atom_norm2 / atom_norm_ener2, but the same file still has other non-Huber MSE loss accumulations using a linear atom_norm factor (e.g., EnerSpinLoss build around l2_loss += atom_norm_ener * ... and virial term with atom_norm * ..., and EnerDipoleLoss build around l2_loss += atom_norm_ener * ...). That leaves inconsistent normalization depending on which loss class is used, and also conflicts with the per-atom RMSE reporting (rmse_e computed as sqrt(l2_ener_loss)/natoms). Please update those remaining energy/virial MSE paths to square the atom normalization as well, or clarify why those classes should behave differently.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
deepmd/dpmodel/loss/ener.py (1)
248-248: Add a regression test for the new normalization rule.Because the displayed RMSE stays unchanged, this
1/Nvs1/N^2mistake is easy to reintroduce. A tinyEnergyLoss.call()case withnatoms > 1and fixed energy/virial deltas would lock it down.Also applies to: 315-315
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@deepmd/dpmodel/loss/ener.py` at line 248, The energy normalization is currently prone to a 1/N vs 1/N^2 regression; add a unit/regression test targeting EnergyLoss.call() in deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed per-atom energy deltas and virials, and asserts the computed energy loss scales with 1/N (not 1/N^2) by comparing expected loss values for different natom counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the test fails if the normalization is squared incorrectly and include an explicit check for both energy-only and energy+virial modes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@deepmd/tf/loss/ener.py`:
- Line 349: EnerSpinLoss.build and EnerDipoleLoss.build still use the old
single-factor 1/N normalization; update both to use the same squared per-atom
normalization pattern used in EnerStdLoss (i.e., multiply the loss term by
atom_norm_ener**2 * (pref_e * l2_ener_loss) instead of a single atom_norm_ener
factor) so all energy loss variants apply the corrected normalization
consistently; locate the energy-loss accumulation in EnerSpinLoss.build and
EnerDipoleLoss.build and replace the old single-factor scaling with the squared
atom_norm_ener**2 * (pref_e * l2_ener_loss) pattern used in EnerStdLoss.
---
Nitpick comments:
In `@deepmd/dpmodel/loss/ener.py`:
- Line 248: The energy normalization is currently prone to a 1/N vs 1/N^2
regression; add a unit/regression test targeting EnergyLoss.call() in
deepmd/dpmodel/loss/ener.py that constructs a small batch with natoms > 1, fixed
per-atom energy deltas and virials, and asserts the computed energy loss scales
with 1/N (not 1/N^2) by comparing expected loss values for different natom
counts; reference EnergyLoss.call() and the atom_norm_ener accumulation (the
expression involving atom_norm_ener**2 * (pref_e * l2_ener_loss)) to ensure the
test fails if the normalization is squared incorrectly and include an explicit
check for both energy-only and energy+virial modes.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: a73a40cd-4fda-4353-8c33-d0685030359b
📒 Files selected for processing (4)
deepmd/dpmodel/loss/ener.pydeepmd/pd/loss/ener.pydeepmd/pt/loss/ener.pydeepmd/tf/loss/ener.py
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #5332 +/- ##
==========================================
- Coverage 80.35% 80.34% -0.01%
==========================================
Files 819 819
Lines 85445 85491 +46
Branches 4139 4139
==========================================
+ Hits 68661 68691 +30
- Misses 15508 15524 +16
Partials 1276 1276 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…alization Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/98719546-6f9c-433e-a3de-dc65d98c68fb Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/98719546-6f9c-433e-a3de-dc65d98c68fb Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/8cff4193-b092-4923-afce-df5de29d63d3 Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/0be1bd64-1461-465b-8cc2-8bc019babc04 Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
Agent-Logs-Url: https://github.com/anyangml/deepmd-kit/sessions/6f9609a4-4571-42ca-b057-d44f7614df9c Co-authored-by: anyangml <137014849+anyangml@users.noreply.github.com>
…tion-fix feat(loss): add intensive parameter for backward-compatible RMSE normalization
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 7 changed files in this pull request and generated 11 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| "If true, energy and virial losses are computed as intensive quantities, " | ||
| "normalized by the square of the number of atoms (1/N^2). " | ||
| "This ensures the loss value is independent of system size and consistent with per-atom RMSE reporting. " | ||
| "If false (default), uses the legacy normalization (1/N), which may cause the loss to scale with system size. " |
There was a problem hiding this comment.
The new intensive flag doc implies energy/virial losses always switch between 1/N² and 1/N. In the implementations, the 1/N² scaling is only applied to the non-Huber MSE terms; MAE stays 1/N, and when use_huber=True the residual is already scaled by 1/N before applying Huber (so intensive may have no effect). Please tighten this doc to reflect the actual behavior and avoid confusing users.
| "If true, energy and virial losses are computed as intensive quantities, " | |
| "normalized by the square of the number of atoms (1/N^2). " | |
| "This ensures the loss value is independent of system size and consistent with per-atom RMSE reporting. " | |
| "If false (default), uses the legacy normalization (1/N), which may cause the loss to scale with system size. " | |
| "Controls intensive normalization for energy and virial loss terms in the current implementation. " | |
| "For non-Huber MSE energy/virial terms, setting this to true uses 1/N^2 normalization instead of the legacy 1/N scaling. " | |
| "This matches per-atom-style reporting more closely for those terms. " | |
| "For MAE, the normalization remains 1/N. When `use_huber=True`, the residual is already scaled by 1/N before applying the Huber loss, " | |
| "so this flag may have limited or no effect for those terms. " |
| "If true, energy and virial losses are computed as intensive quantities, " | ||
| "normalized by the square of the number of atoms (1/N^2). " | ||
| "This ensures the loss value is independent of system size and consistent with per-atom RMSE reporting. " | ||
| "If false (default), uses the legacy normalization (1/N), which may cause the loss to scale with system size. " |
There was a problem hiding this comment.
Same as in loss_ener(): the intensive doc currently states 1/N² vs 1/N generally, but in code the scaling change applies to MSE energy/virial terms; MAE remains 1/N. Please update the help text so the normalization behavior is described accurately for both loss functions.
| "If true, energy and virial losses are computed as intensive quantities, " | |
| "normalized by the square of the number of atoms (1/N^2). " | |
| "This ensures the loss value is independent of system size and consistent with per-atom RMSE reporting. " | |
| "If false (default), uses the legacy normalization (1/N), which may cause the loss to scale with system size. " | |
| "Controls normalization of the energy and virial loss terms. " | |
| "For `loss_func='mse'`, if true, energy and virial losses are computed as intensive quantities, " | |
| "normalized by the square of the number of atoms (1/N^2); if false (default), the legacy normalization " | |
| "(1/N) is used. " | |
| "For `loss_func='mae'`, energy and virial losses remain normalized by the number of atoms (1/N). " |
| If true, energy and virial losses are computed as intensive quantities, | ||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||
| with system size. The default is false for backward compatibility with models trained | ||
| using deepmd-kit <= 3.0.1. |
There was a problem hiding this comment.
Docstring for intensive suggests it toggles legacy 1/N vs 1/N² generally. In this TF loss, the scaling change is only applied in the non-Huber MSE branch; with use_huber=True the residual is still normalized by 1/N before applying Huber (so intensive may not change behavior). Please adjust the docstring to match the actual normalization paths.
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| Controls the normalization used for energy and virial terms in the non-Huber | |
| MSE branch of this TF loss. If true, that branch uses intensive normalization | |
| by the square of the number of atoms (1/N^2); if false (default), it uses the | |
| legacy normalization (1/N). When ``use_huber=True``, the residual is still | |
| normalized by 1/N before applying the Huber loss, so ``intensive`` may not | |
| change behavior in that path. The default is false for backward compatibility | |
| with models trained using deepmd-kit <= 3.0.1. |
| If true, energy and virial losses are computed as intensive quantities, | ||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||
| with system size. The default is false for backward compatibility with models trained | ||
| using deepmd-kit <= 3.0.1. |
There was a problem hiding this comment.
The intensive docstring currently presents normalization as a simple 1/N² vs 1/N toggle. In practice here, the new exponent is only used for non-Huber MSE energy/virial; MAE remains 1/N and the Huber branch already works on per-atom residuals (scaled by 1/N) regardless. Please update the doc to reflect these specifics so users know when intensive actually changes the loss.
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| Controls the normalization used for energy and virial losses in the | |
| non-Huber MSE path. If true, those terms use intensive normalization | |
| (1/N^2) instead of the legacy normalization (1/N), where N is the | |
| number of atoms. This makes the corresponding MSE loss consistent with | |
| per-atom RMSE reporting. | |
| This flag does not change MAE normalization, which remains 1/N, and it | |
| does not affect the Huber branch, which already operates on per-atom | |
| residuals scaled by 1/N regardless of this setting. The default is false | |
| for backward compatibility with models trained using deepmd-kit <= 3.0.1. |
| If true, energy and virial losses are computed as intensive quantities, | ||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||
| with system size. The default is false for backward compatibility with models trained | ||
| using deepmd-kit <= 3.0.1. |
There was a problem hiding this comment.
The intensive parameter doc says losses are normalized by 1/N² when enabled, but in code it only affects the non-Huber MSE energy/virial branches; MAE remains 1/N and the Huber path scales residuals by 1/N before applying Huber. Please narrow the wording so it accurately reflects which loss modes are impacted.
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| If true, the non-Huber MSE energy and virial losses use intensive normalization, | |
| i.e. a 1/N^2 factor instead of the legacy 1/N scaling. This matches per-atom | |
| RMSE-style normalization for those terms. MAE and Huber modes use different | |
| scaling and are not affected in the same way by this flag. | |
| If false (default), the legacy normalization is used for the affected terms. | |
| The default is false for backward compatibility with models trained using | |
| deepmd-kit <= 3.0.1. |
| # RMSE normalization exponent: | ||
| # - norm_exp=2 (intensive=True): loss uses 1/N² scaling, making it independent of system size | ||
| # - norm_exp=1 (intensive=False, legacy): loss uses 1/N scaling, which varies with system size | ||
| norm_exp = 2 if self.intensive else 1 | ||
| pref_e = global_cvt_2_ener_float( |
There was a problem hiding this comment.
This comment says “RMSE normalization exponent”, but norm_exp is used to scale the loss terms (energy/virial) rather than the RMSE values. Renaming this comment to “loss normalization exponent” (or similar) would avoid confusion for future readers.
| # Normalization exponent controls loss scaling with system size: | ||
| # - norm_exp=2 (intensive=True): loss uses 1/N² scaling, making it independent of system size | ||
| # - norm_exp=1 (intensive=False, legacy): loss uses 1/N scaling, which varies with system size | ||
| norm_exp = 2 if self.intensive else 1 | ||
| if self.has_e: | ||
| if self.loss_func == "mse": | ||
| l2_ener_loss = xp.mean(xp.square(energy - energy_hat)) | ||
| if not self.use_huber: | ||
| loss += atom_norm_ener * (pref_e * l2_ener_loss) | ||
| loss += atom_norm_ener**norm_exp * (pref_e * l2_ener_loss) |
There was a problem hiding this comment.
There are existing cross-backend loss consistency tests (e.g. source/tests/consistent/loss/test_ener.py) but they don’t exercise the new intensive normalization. Please add test coverage that toggles intensive=True and verifies the expected 1/N² scaling for the non-Huber MSE energy/virial terms (and that serialization round-trips preserve the flag).
| If true, energy and virial losses are computed as intensive quantities, | ||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||
| with system size. The default is false for backward compatibility with models trained | ||
| using deepmd-kit <= 3.0.1. |
There was a problem hiding this comment.
The intensive docstring says energy/virial loss is normalized by 1/N² when enabled. In this implementation, 1/N² scaling is applied to the non-Huber MSE energy/virial terms; MAE remains 1/N and the Huber path normalizes residuals by 1/N before applying Huber (so intensive does not provide a pure 1/N vs 1/N² toggle there). Please clarify the docstring accordingly.
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| Controls size normalization for energy and virial loss terms. For the non-Huber | |
| MSE path, setting this to true applies 1/N^2 scaling, while false uses the legacy | |
| 1/N scaling. For MAE, the normalization remains 1/N. For Huber loss, residuals are | |
| first normalized by 1/N before applying the Huber formula, so this option does not | |
| provide a pure 1/N versus 1/N^2 toggle in that path. The default is false for | |
| backward compatibility with models trained using deepmd-kit <= 3.0.1. |
| If true, energy and virial losses are computed as intensive quantities, | ||
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | ||
| value is independent of system size and consistent with per-atom RMSE reporting. | ||
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | ||
| with system size. The default is false for backward compatibility with models trained | ||
| using deepmd-kit <= 3.0.1. |
There was a problem hiding this comment.
EnergySpinLoss supports MAE, but the intensive doc currently claims 1/N² normalization generally. In the code, intensive only changes the MSE energy/virial normalization via atom_norm**norm_exp; MAE is handled differently. Please update the docstring to state explicitly which loss functions/terms are affected.
| If true, energy and virial losses are computed as intensive quantities, | |
| normalized by the square of the number of atoms (1/N^2). This ensures the loss | |
| value is independent of system size and consistent with per-atom RMSE reporting. | |
| If false (default), uses the legacy normalization (1/N), which may cause the loss to scale | |
| with system size. The default is false for backward compatibility with models trained | |
| using deepmd-kit <= 3.0.1. | |
| If true, the MSE energy and virial terms use intensive normalization, | |
| i.e. an additional normalization by the square of the number of atoms | |
| (1/N^2) instead of the legacy (1/N) behavior. This keeps those MSE loss | |
| terms consistent with per-atom RMSE reporting and less dependent on | |
| system size. This option does not change the MAE formulation, which is | |
| handled separately. The default is false for backward compatibility with | |
| models trained using deepmd-kit <= 3.0.1. |
| @@ -130,7 +143,7 @@ def call( | |||
| energy_pred = xp.sum(atom_ener_coeff * atom_ener_pred, axis=1) | |||
| if self.loss_func == "mse": | |||
| l2_ener_loss = xp.mean(xp.square(energy_pred - energy_label)) | |||
| loss += atom_norm * (pref_e * l2_ener_loss) | |||
| loss += atom_norm**norm_exp * (pref_e * l2_ener_loss) | |||
There was a problem hiding this comment.
Existing loss tests (e.g. source/tests/consistent/loss/test_ener_spin.py) don’t cover the new intensive option. Please extend tests to run with intensive=True and validate the expected 1/N² scaling for MSE energy/virial (and confirm serialize()/deserialize() preserve the flag/version bump).
|
Some AI suggestions are needed to be resovled. Others LGTM. |
This is to normalize all L2 loss so that in multitask training, the scale of gradients becomes independent of system size. Do we want to update pref in examples? @iProzd @njzjz
Summary by CodeRabbit
Bug Fixes