Skip to content

Batched workflows for TorchSim#1505

Merged
JaGeo merged 28 commits into
materialsproject:mainfrom
akwarii:batching
Jul 1, 2026
Merged

Batched workflows for TorchSim#1505
JaGeo merged 28 commits into
materialsproject:mainfrom
akwarii:batching

Conversation

@akwarii

@akwarii akwarii commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Include a summary of major changes in bullet points:

  • When using TorchSim, the deformed structures needed to compute elastic properties are batched
  • Similarly, the displaced structures on the phonon workflow are also batched

Note that batching can be disabled by setting socket=False. Also, I didn't perform any benchmark (yet) to get an idea of the performance improvement introduced by this PR.

Additional dependencies introduced (if any)

I don't expect to introduce new dependencies in this PR

TODO (if any)

If this is a work-in-progress, write something about what else needs to be done.

  • Debug the elastic workflow (currently returns a bulk and shear modulus of ~0.04 GPa instead of 9.7, as expected from the forcefields test)
  • Batch the phonon displacements (the workflow supports TS but displacements are currently processed in serial mode)

Not sure if I will implement the others workflows for the moment as I don't really need them for my work but if someone is interested feel free to get in touch or to contribute to this PR.

Checklist

Work-in-progress pull requests are encouraged, but please put [WIP] in the pull request
title.

Before a pull request can be merged, the following items must be checked:

  • Code is in the standard Python style.
    The easiest way to handle this is to run the following in the correct sequence on
    your local machine. Start with running ruff and ruff format on your new code. This will
    automatically reformat your code to PEP8 conventions and fix many linting issues.
  • Doc strings have been added in the Numpy docstring format.
    Run ruff on your code.
  • Type annotations are highly encouraged. Run mypy to
    type check your code.
  • Tests have been added for any new functionality or bug fixes.
  • All linting and tests pass.

Note that the CI system will run all the above checks. But it will be much more
efficient if you already fix most errors prior to submitting the PR. It is highly
recommended that you use the pre-commit hook provided in the repository. Simply run
pre-commit install and a check will be run prior to allowing commits.

@JaGeo

JaGeo commented Jun 26, 2026

Copy link
Copy Markdown
Member

@akwarii Please note that the elastic workflow depends a lot on the symmetry of the optimization step. If you can enforce symmtry there, it might stabilize the results. At least, this was true for our other force field implementations

@akwarii

akwarii commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Yes I saw the issue discussing about this. Next week I plan to try to enforce the symmetry using torchsim FixSymmetry filter but I will probably have to update atomate2.torchsim.core.TorchSimOptimizeMaker to do so. From what I know, the phonon workflow doesn't have this problem right?

@JaGeo

JaGeo commented Jun 26, 2026

Copy link
Copy Markdown
Member

@akwarii Yep, exactly. The phonon workflow is more robust in this regard.

I am fine with you updating the other optimizer as long as it is not a breaking change.

@akwarii

akwarii commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Note:

When using the test mace model ({test_dir}/forcefields/mace/MACE.model) the output is very different from ASE, but bigger models such as mace medium (default from download_mace_mp_checkpoint) will actually give similar results. I only tested the si_structure.

Energies:

ASE (mace test) TS (mace test) ASE (mace medium) TS (mace medium)
-0.0710307133271508 -0.07015640801005693 -10.827855209161402 -10.82781378519521
-0.07112084574852455 -0.06979949936921036 -10.829091596667464 -10.829070856677937
-0.07112152875655092 -0.06893745091158408 -10.829097121116778 -10.829116852080723
-0.07103527927723186 -0.068435834774206 -10.827897623996325 -10.827937286730187
-0.07115043598034938 -0.06937597101480968 -10.82606770845616 -10.826066640116384
-0.0711505573759968 -0.06938844285313492 -10.828641285443723 -10.828640736180976

Elastic constants

ASE (mace test) TS (mace test) ASE (mace medium) TS (mace medium)
$C_{11}$ 9.703 7.650 126.967 127.03
$C_{12}$ 9.699 7.647 65.841 65.89
$C_{44}$ 0.002 0.334 67.297 67.33

@JaGeo any idea about what can be the problem here?

@JaGeo

JaGeo commented Jun 29, 2026

Copy link
Copy Markdown
Member

Is there maybe still a problem with the equillibrium structure? Have you compared the optimized structures? i am not familiar with the optimization in torchsim

@akwarii

akwarii commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

Good call! The lattice parameters are changing a lot when using the test model with the ASE backend. The bigger mace problem doesn't have this problem. This seems weird to me since both backend are using the same algorithm to relax the box, ie Frechet cell filter

Original cell

Full Formula (Si2)
Reduced Formula: Si
abc   :   3.866975   3.866975   3.866975
angles:  60.000000  60.000000  60.000000
pbc   :       True       True       True
Sites (2)
  #  SP       a     b     c
---  ----  ----  ----  ----
  0  Si    0.75  0.75  0.75
  1  Si    0.5   0.5   0.5

TorchSim (test model)

Full Formula (Si2)
Reduced Formula: Si
abc   :   3.851698   3.851698   3.851698
angles:  60.000000  60.000000  60.000000
pbc   :       True       True       True
Sites (2)
  #  SP       a     b     c
---  ----  ----  ----  ----
  0  Si    0.75  0.75  0.75
  1  Si    0.5   0.5   0.5

ASE (test model)

Full Formula (Si2)
Reduced Formula: Si
abc   :   3.801120   3.801120   3.801120
angles:  60.000000  60.000000  60.000000
pbc   :       True       True       True
Sites (2)
  #  SP       a     b     c
---  ----  ----  ----  ----
  0  Si    0.75  0.75  0.75
  1  Si    0.5   0.5   0.5

TorchSim (mace medium)

Full Formula (Si2)
Reduced Formula: Si
abc   :   3.865821   3.865821   3.865821
angles:  60.000000  60.000000  60.000000
pbc   :       True       True       True
Sites (2)
  #  SP       a     b     c
---  ----  ----  ----  ----
  0  Si    0.75  0.75  0.75
  1  Si    0.5   0.5   0.5

ASE (mace medium)

Full Formula (Si2)
Reduced Formula: Si
abc   :   3.866058   3.866058   3.866058
angles:  60.000000  60.000000  60.000000
pbc   :       True       True       True
Sites (2)
  #  SP       a     b     c
---  ----  ----  ----  ----
  0  Si    0.75  0.75  0.75
  1  Si    0.5   0.5   0.5

@JaGeo

JaGeo commented Jun 29, 2026

Copy link
Copy Markdown
Member

Different floating point precision or stopping criterion?

@akwarii

akwarii commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

In both tests I have set fmax / force_tol=1e-5 and the dtype is the default torch.float64, I'm also running both on the cpu.

@JaGeo

JaGeo commented Jun 29, 2026

Copy link
Copy Markdown
Member

Any other specifics of the optimizer? Do they differ in some way? Potentially, the small test mace model has Additional local minima that one algorithm falls into, the other one mot.

@akwarii

akwarii commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

I was unable to find any differences between the two after investigating the classes / functions signatures. I also tried to bump TS to a newer commit since they had an issue with constrained optimization (TorchSim/torch-sim#552) but it didn't change the results.

As you pointed out, the small model probably has more minima and might get stuck into one due to small implementation differences. However, I also ran TorchSim's test_optimizers_vs_ase.py using the same model and structure as here, and the test passed. At this point, I think someone with deeper knowledge of the differences between ASE and TorchSim will have to step in.

If we want to merge I can update the TS test values to make them pass (since real production models still give the same results), but I think we still need to see this issue through to the end.

@JaGeo

JaGeo commented Jun 30, 2026

Copy link
Copy Markdown
Member

Thanks.

It might be an option to ask the TorchSim developers for help. After all, it should be in their interest to get the same results or at least have an explanation for the differences.
I personally don't have experience with TorchSim

@akwarii

akwarii commented Jun 30, 2026

Copy link
Copy Markdown
Contributor Author

side note: i just saw that my modifications from #1504 are also included here but in any case it should be ready

@JaGeo

JaGeo commented Jul 1, 2026

Copy link
Copy Markdown
Member

Thanks! i will take a look until beginning of next week!

Comment thread src/atomate2/common/jobs/phonons.py
assert task_doc.output.stress is not None
assert len(task_doc.output.stress) == 2
# Each stress should be a 3x3 matrix
for stress in task_doc.output.stress:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to test at least one numerical value herw?

@JaGeo

JaGeo commented Jul 1, 2026

Copy link
Copy Markdown
Member

I think I have one real comment: to check one of the computed results at least to spot drastic implementation changes etc. Beyond this, I am happy!

@akwarii

akwarii commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

I completely agree, I will add the check.

Also, it seems like the problem with the elastic test was due to an overlook from my side: in TS, enabling a cell filter doesn't mean by default that the cell forces are used in the convergence check (see TorchSim/torch-sim#582).

@JaGeo

JaGeo commented Jul 1, 2026

Copy link
Copy Markdown
Member

Ah! That's great to know!

@JaGeo

JaGeo commented Jul 1, 2026

Copy link
Copy Markdown
Member

One more point: do you think the current documentation is sufficient? maybe you can use an llm and one of your tests to provide more info on the socket implementations with torchsim?

@akwarii

akwarii commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

I can extend a bit the docstring explanation and add examples to the torchsim tutorial notebook, would that be alright?

@JaGeo

JaGeo commented Jul 1, 2026

Copy link
Copy Markdown
Member

Yes! Absolutely!

@JaGeo JaGeo changed the title [WIP] Batched workflows for TorchSim Batched workflows for TorchSim Jul 1, 2026
Comment thread tests/torchsim/flows/test_phonons.py
@JaGeo JaGeo merged commit 9c1ef51 into materialsproject:main Jul 1, 2026
18 checks passed
@JaGeo

JaGeo commented Jul 1, 2026

Copy link
Copy Markdown
Member

Thanks! Do you want to add yourself to the list of contributors? And, is #1504 still needed?

@akwarii

akwarii commented Jul 1, 2026

Copy link
Copy Markdown
Contributor Author

#1504 was superseded by this PR so it can be closed without problem. As for the list of contributors, i will gladly be part of it.

@JaGeo

JaGeo commented Jul 1, 2026

Copy link
Copy Markdown
Member

I will close the other PR. Please raise a short PR to add your details 😃

@akwarii akwarii deleted the batching branch July 1, 2026 15:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants