Skip to content

Add OrbMol-v2 with learnable electrostatics#162

Merged
vsimkus merged 27 commits into
orbital-materials:mainfrom
timduignan:orbmol-v2-port
May 26, 2026
Merged

Add OrbMol-v2 with learnable electrostatics#162
vsimkus merged 27 commits into
orbital-materials:mainfrom
timduignan:orbmol-v2-port

Conversation

@timduignan
Copy link
Copy Markdown
Contributor

@timduignan timduignan commented May 7, 2026

Summary

Adds OrbMol-v2 — extends the OrbMol architecture with learnable per-atom electrostatics: a LatentChargeHead predicts charges that satisfy the system total-charge constraint and a new CoulombModule adds long-range Coulomb energy on top of the GNN — bare 1/r direct sum for non-periodic systems, Particle Mesh Ewald via nvalchemiops for periodic. The energy head (ChargeConditionedEnergyHead) is conditioned on the predicted charges and spins per atom.

The published checkpoint is at https://huggingface.co/orbital-materials/orbmol-v2, verified to reproduce internal reference values for H₂O and Cu fcc to ≤1e-5 eV / eV/Å / eV/ų on both CPU and H100.

What changed

New files

  • orb_models/forcefield/models/coulomb_module.pyCoulombModule, direct + PME paths
  • tests/forcefield/test_orbmol_v2_smoke.py — opt-in network test (ORB_RUN_NETWORK_TESTS=1) checking H₂O / Cu predictions against reference values

New classes in existing files

  • forcefield_heads.py: ChargeConditionedEnergyHead, LatentChargeHead, LatentSpinHead
  • pretrained.py: orbmol_v2() loader (s3-hosted weights), registry entry

Surgical edits

  • conservative_regressor.py: new coulomb_module and pair_repulsion_node_aggregation kwargs; predicts charges/spins before energy; bifurcated forward path for ChargeConditionedEnergyHead; adds Coulomb energy and explicit forces/virial
  • pyproject.toml: bump nvalchemi-toolkit-ops to >=0.3.1,<0.4 (PME hybrid_forces API)
  • MODELS.md: documents orbmol-v2

Backwards compatibility

  • EnergyHead and other base heads are untouched — same forward, same predict(), same denormalize(). All existing v3 conservative omol/omat/mpa models behave identically.
  • ZBLBasis default changed to node_aggregation="sum". All existing architectures (orb-v3-conservative) keep their training-time aggregation ("mean").
  • 4 BC guard tests added.

Test plan

  • All 92 forcefield unit tests pass on CPU (88 existing + 4 new BC guards)
  • Network smoke test reproduces reference H₂O / Cu values from a fresh clone, on both Mac CPU and H100 GPU

Comment thread orb_models/forcefield/models/conservative_regressor.py Outdated
Comment thread orb_models/forcefield/models/conservative_regressor.py Outdated
Comment thread orb_models/forcefield/models/conservative_regressor.py Outdated
Comment thread orb_models/forcefield/models/conservative_regressor.py Outdated
Comment thread orb_models/forcefield/models/forcefield_heads.py
Comment thread orb_models/forcefield/models/conservative_regressor.py Outdated
Comment thread orb_models/forcefield/models/coulomb_module.py Outdated
Comment thread orb_models/forcefield/pretrained.py
Comment thread orb_models/forcefield/pretrained.py Outdated
Comment thread orb_models/forcefield/pretrained.py Outdated
Comment thread MODELS.md Outdated
Comment thread MODELS.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
timduignan and others added 8 commits May 7, 2026 19:28
Wholesale port from the reference codebase (post the ZBL-sum + Coulomb constant fix, with
ZBL-sum + correct Coulomb constant fixes), targeting only the s11doh8x:v199
public release. Drops all backwards-compat for prior electrostatics_config
checkpoints; skips Fukui, global_context, and self_message internal-only
features.

New:
  - orb_models/forcefield/models/coulomb_module.py: CoulombModule + direct/PME
  - scripts/convert_orbmol_v2_ckpt.py: extract EMA-applied flat state_dict
    from wandb-format checkpoint (orbmol_v2() expects flat state_dicts per
    orb-models S3 convention)

Modified:
  - forcefield_heads.py: ChargeConditionedEnergyHead, LatentChargeHead,
    LatentSpinHead. Added EnergyHead.absolute_energy() helper.
  - conservative_regressor.py: coulomb_module field, latent_charges/spins
    predicted before energy, ChargeConditionedEnergyHead path, Coulomb
    energy + explicit forces/virial plumbing.
  - pair_repulsion.py: default node_aggregation "mean" -> "sum".
  - pretrained.py: orbmol_v2_architecture() and orbmol_v2() loader
    mirroring the source codebase. CoulombModule() defaults (no
    erf damping); enforce_total_charge=True; no coulomb_constant override.

Verified against gold values internal reference values for s11doh8x:v199 (with EMA
applied, CPU fp32):
  - H2O energy: -2079.86339 eV (diff 2.7e-7)
  - H2O forces[0]: matches all components within 4e-6 eV/A
  - Cu fcc energy: -178549.38592 eV (relative diff ~6e-10)
  - Cu fcc stress (Voigt 6): matches within 5e-6 eV/A^3
All 88 existing forcefield tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Revert pair_repulsion.py default to "mean" (preserves BC for older
  orb-v3 conservative models trained pre ZBL-sum cutoff).
- ConservativeForcefieldRegressor accepts pair_repulsion_node_aggregation
  kwarg (defaults to "mean"); orbmol_v2_architecture passes "sum" explicitly.
- EnergyHead.absolute_energy: drop fp64 arg, always do the addition in
  fp64 and return fp64. OMol references reach ~1e5 eV so kJ/mol resolution
  requires fp64; option only added confusion. Used only by ChargeConditioned
  path so legacy heads are unaffected.
- ConservativeForcefieldRegressor.predict drops fp64_energy arg accordingly.
- Delete scripts/convert_orbmol_v2_ckpt.py — core/scripts/misc/export_model.py
  is the existing tool used for all other public orb-models releases.
- Update orbmol_v2() docstring to point at core's export_model.py.

Re-verified gold values match (H2O 2.7e-7; Cu fp32-noise level on energy,
1e-6 on stress). All 88 forcefield tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- test_pair_repulsion_default_aggregation_is_mean: catches anyone
  changing the regressor default from "mean" to "sum", which would
  silently break all public orb-v3 conservative models on reload.
- test_pair_repulsion_sum_when_specified: confirms orbmol-v2 opt-in
  via the kwarg works.
- test_energy_head_does_not_have_absolute_energy: guards against the
  fp64-promoting helper migrating onto the base EnergyHead, where it
  would alter v3 conservative omol/omat/mpa predictions.
- test_orbmol_v2_architecture_uses_sum_zbl: integration check that
  the architecture wires CoulombModule + sum-aggregation ZBL +
  latent_charges/latent_spins heads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- Default weights_path now points at HF (orbital-materials/orbmol-v2/resolve/main).
  Matches the date-in-filename versioning convention used by other public orb-models
  S3 ckpts; no separate revision pin (HF main behaves like a stable S3 path).
- Bump nvalchemi-toolkit-ops>=0.3.1: the orbmol_v2 PME path uses
  particle_mesh_ewald(..., hybrid_forces=True) which only exists on >=0.3.1.
- New tests/forcefield/test_orbmol_v2_smoke.py: end-to-end check that the
  published weights produce the gold H2O / Cu predictions from
  internal reference tests.
  Gated by ORB_RUN_NETWORK_TESTS=1 since it downloads ~100 MB; opt-in only.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Upstream bumped APIs between 0.3.0 and 0.3.1 (added the hybrid_forces
parameter we depend on for PME). Upper-bounding at the next minor
prevents 0.4.x from silently breaking the orbmol-v2 PME path on user
upgrades — matches the bound style of other deps internally.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds an entry under OrbMol Models describing the learnable electrostatics
extension (LatentChargeHead, LatentSpinHead, CoulombModule, and the
ChargeConditionedEnergyHead) with usage example.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…cy note

s11doh8x trains with CoulombModule defaults (no erf damping); the
non-periodic Coulomb path is bare 1/r, not erf-damped. Also drops the
size-consistency wording from the head description.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- README.md: add orbmol-v2 release note in the "What's new" section
- coulomb_module.py: fix misleading docstrings (the non-periodic path is
  bare 1/r when sigma is None, not erf-damped); update stale 14.33 →
  14.40 reference for COULOMB_CONSTANT; minor typo fix
- forcefield_heads.py: drop size-consistency claims from
  ChargeConditionedEnergyHead docstrings (separate concern; not asserted
  in the public release)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@vsimkus vsimkus force-pushed the orbmol-v2-port branch 3 times, most recently from a7587b9 to 7dfdd99 Compare May 7, 2026 22:37
- Don't assert per-atom spins are physical observables (we haven't
  validated this; reword as "auxiliary per-atom features that take the
  system's spin multiplicity into account") in MODELS.md and README.md.
- Add a note in the README orbmol-v2 update that energies are now fp64
  by default for kJ/mol resolution against OMol25-scale references; opt
  out via fp64_energy=False on predict().

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- MODELS.md and README.md: rephrase LatentChargeHead/LatentSpinHead as
  predicting per-atom *latent features* (constrained at the system level)
  rather than asserting per-atom charges and spins. Caveat blockquote
  added in the previous commit explains the emergent nature.
- pyproject.toml: bump dev `torch-sim-atomistic` from >=0.5.1 to >=0.6.0.
  The 0.6.0 release adds `SimState.has_extras()`, which the existing
  `test_forcefield_adapter_parses_spin_and_charge_from_simstate` test
  already calls; 0.5.2 (allowed by the old constraint) does not have
  this method. All 121 forcefield tests now pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread README.md Outdated
Comment thread README.md Outdated
Per Ben's review feedback (PR orbital-materials#162 thread):
- First bullet now leads with the CoulombModule, not the latent heads
- Adds Speed (H100 QPS at 1k/10k atoms, periodic systems) and Accuracy
  (GSCDB138 Normalized Error Ratio, ex single-atom-species reactions)
  highlights
- Moves the LatentChargeHead / LatentSpinHead detail and the Caution
  blockquote out of the README; they remain in MODELS.md
- Links out to MODELS.md for the full architecture description

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
timduignan and others added 9 commits May 22, 2026 16:05
aggregate_nodes(..., reduction="mean") now forwards n_node directly to
scatter_mean, which takes pre-computed group sizes instead of building
a divisor inside the graph from scatter_sum(ones, ...). The old form
triggered a torch.compile + autograd miscompile (~1 eV on H2O / ethanol
/ NaCl). Math is bit-identical in eager.

Drops segment_mean() and its test (unused after this change).

Ports orbital-materials/orb#3074.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
LatentChargeHead / LatentSpinHead centering now uses repeat_interleave
over n_node instead of gather-by-node_batch_index. Compiles cleanly
under dynamic shapes. conservative_regressor drops the in-place
interaction_energy += coulomb_energy. pair_repulsion casts the poly
cutoff exponent with p.to(torch.float32) instead of float(p), which
avoids a Tensor.item() graph break, and zeros stress entries beyond
1e10.

Ports orbital-materials/orb#3074.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Removes the backbone-only .compile() overrides on the
{Conservative,Direct}ForcefieldRegressor classes. model.compile(...)
now wraps the full forward via standard nn.Module.compile(). Note: the
GraphRegressor.compile() override was already absent in public, so no
change there.

CoulombModule's PME path is split into two @torch.compiler.disable
helpers (_particle_mesh_ewald and _estimate_pme_params_and_neighbors)
so dynamo skips nvalchemiops's ctypes-using PME entirely while the
rest of the regressor compiles.

End-to-end (per core PR): ~1.5-1.6x faster inference (1.62x at 10k
atoms, single H100); compiled-vs-eager numerical diff at float-noise
level.

Adds tests/forcefield/test_direct_regressor.py and
tests/expensive_tests/test_compilation.py; updates conftest and
test_conservative compile tests. Adds scripts/compile_numerical_check.py
for offline diagnosis (adapted to public pretrained.orbmol_v2 loader).

Ports orbital-materials/orb#3074.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates weights_path to the new teqabfhg checkpoint
(orbmol-v2-teqabfhg-20260523.ckpt) which has no LatentSpinHead. Adds
use_per_atom_spins to orb_v3_conservative_architecture (default False)
so the spin head and ChargeConditionedEnergyHead.use_spins are
controlled together. System-level charge/spin conditioning via
ChargeSpinConditioner is unchanged.

A future spin-enabled checkpoint can pass use_per_atom_spins=True from
its pretrained.* entry without further architecture changes.

Updates test_backwards_compatibility goldens to match the new
checkpoint:
- H2O: -2079.86339 -> -2079.86222 eV
- Cu fcc: -178549.3860 -> -178550.9810 eV; stress retuned
- forces/stress vectors regenerated on Mac arm64 CPU

Old checkpoint URL (orbmol-v2-s11doh8x-20260507.ckpt) was removed from
S3 and now 404s; users loading that file via state_dict manually will
hit a shape mismatch on the energy head MLP and the missing latent
spin head.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
orbmol_v2's default checkpoint (teqabfhg) has no per-atom spin head.
Updates the MODELS.md description to mention only LatentChargeHead and
clarifies that system-level total charge / spin multiplicity still flow
through the ChargeSpinConditioner. Caution paragraph updated to drop
the per-atom spin language.

README headline GSCDB138 numbers reflect the previous s11doh8x
checkpoint and will be refreshed once a full per-category breakdown for
teqabfhg is available.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Updates GSCDB138 numbers from the s11doh8x measurement to the new
teqabfhg checkpoint (same 109 evaluable subsets / 5152 reactions):

- Overall NER: 6.05 -> 1.62 (was 1.83 on s11doh8x)
- NC, TC, TM, BH, INC, ISO per-category numbers refreshed
- ISO regresses slightly (1.03 -> 1.36); other categories improve

Adds a second bullet covering GMTKN55 WTMAD-2 (5.41 -> 4.37 kcal/mol),
WIGGLE150 RMSE (1.23 -> 1.19), and the two new evaluation entries BEGDB
(MAE 0.235) and ACONFL (RMSE 0.40).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
orbmol_v2's ChargeSpinConditioner requires total_charge and
spin_multiplicity in system_features. Without them the adapter's
_get_charge_and_spin returns an empty dict and the conditioner errors.
Defaulting to charge=0, spin=1 (singlet) matches the convention used in
the internal speed harness.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the s11doh8x-era "44 QPS at 1k atoms / 9 QPS at 10k atoms /
within ~5% of v1" claim with absolute ms timings from BENCH on the new
teqabfhg checkpoint (full-model torch.compile, single GPU, periodic
systems): 30/42/116/191 ms at 100/1k/5k/10k atoms. Notes that
the no-spin variant is 1-12% faster than the prior spin-having v2
development variant at every size we tested.

The v1 parity claim is removed pending a fresh v1-vs-v2 comparison run.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replaces the standalone v2-teqabfhg ms numbers with a v1-vs-v2 table at
100/1k/5k/10k atoms. v2 is slower at small sizes (electrostatics
overhead) and faster at large sizes; crossover around 5k atoms, 1.46x
speedup at 10k. v1 runs with backbone-only compile (its best
available), v2 with full-model compile thanks to the port of
orbital-materials/orb#3074.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@timduignan
Copy link
Copy Markdown
Contributor Author

@vsimkus mind giving this another look? Since your earlier pass it now also includes:

  • Port of orbital-materials/orb#3074 (full-model torch.compile for orbmol-v2)
  • Switch of the orbmol_v2 default checkpoint to teqabfhg (no per-atom spin head) with use_per_atom_spins parameterized for the future spin-having checkpoint
  • Refreshed GSCDB138 + GMTKN55 / WIGGLE150 / BEGDB / ACONFL numbers
  • v1-vs-v2 speed table in the README

BC test goldens were regenerated on Mac arm64; Linux CI may want a small atol tweak if it triggers.

The full-model torch.compile changes ported from #3074 also benefit
OrbMol-v1, not just v2. New BENCH numbers show v1-full-compile is
1.3-1.8x faster than the old backbone-only workaround (biggest at 10k
atoms: 278 -> 158 ms). v2-teqabfhg is ~20-60% slower than v1
full-compile at the same system size, reflecting the real cost of the
PME / Coulomb path; the previous narrative ("v2 crosses over and
becomes faster than v1 at 5k+ atoms") was an artifact of comparing
v1-backbone-compile to v2-full-compile.

Also softens the lead-in line about LES being free; it isn't, but the
accuracy gain (3.7x lower NER) is the trade.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Comment thread scripts/compile_numerical_check.py Outdated
Comment thread MODELS.md Outdated
Comment thread README.md Outdated
Comment thread README.md Outdated
Comment thread tests/expensive_tests/test_backwards_compatibility.py Outdated
Previously is_conservative_model = "conservative" in args.base_model,
which misclassified orbmol_v2 as direct. Build the model first, then
check "grad_forces" in model.loss_weights, and push custom CLI weights
via model.loss_weights.update(...).
- Remove scripts/compile_numerical_check.py (covered by internal tests now)
- MODELS.md: reword charge/spin requirement to match orbmol-v1 phrasing
- README.md: drop internal commit ref; trim May 2026 update to two
  bullets (electrostatics+headline result; full-model compile speedup)
- test_backwards_compatibility.py: restore full-precision cu_energy_gold
  (-178550.98098)
@timduignan
Copy link
Copy Markdown
Contributor Author

Thanks for the review @vsimkus — hopefully all addressed now:

  • Removed scripts/compile_numerical_check.py
  • MODELS.md: reworded to "Similar to orbmol-v1, system-level total charge and spin are required."
  • README.md: dropped the internal commit reference; trimmed to two bullets (electrostatics + GSCDB138 result; full-model compile speedup)
  • test_backwards_compatibility.py: restored cu_energy_gold = -178550.98098

Copy link
Copy Markdown
Contributor

@vsimkus vsimkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! 🥳 A few minor comments, but otherwise it's good.

Comment thread orb_models/forcefield/pretrained.py Outdated
Comment thread README.md Outdated
Comment thread finetune.py Outdated
- finetune.py: use isinstance(model, ConservativeForcefieldRegressor)
  instead of inspecting loss_weights; fix stale comment
- pretrained.py:418: drop "spins" from orbmol_v2 docstring
- README.md:27: drop "applies to v1 and v2" parenthetical (compile
  speedup applies to all models, the parenthetical was misleading)
Copy link
Copy Markdown
Contributor

@benrhodes26 benrhodes26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@timduignan
Copy link
Copy Markdown
Contributor Author

Brilliant thanks guys!

@vsimkus vsimkus merged commit 05c86ea into orbital-materials:main May 26, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants