Split vacancies benchmark by ThomasWarford · Pull Request #437 · ddmms/ml-peg

ThomasWarford · 2026-03-23T22:25:23Z

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

I've confirmed the contribution guidelines.

Summary

Based on this paper: Identifying split vacancy defects with machine-learned foundation models and electrostatics

The metrics:

Formation energy of split-vacancy defects from fully ionised defects
Spearman's coefficient for ranking the energies of initial defect structures.
RMSD of MLIP relaxed structures vs DFT relaxed structures. (Note the initial structure for the MLIP relaxation is currently the DFT relaxed structure.)

Linked issue

Closes #335

Progress

Calculations
Analysis
Application
Documentation

Testing

New decorators/callbacks

ThomasWarford · 2026-03-23T22:36:14Z

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

ThomasWarford · 2026-03-23T22:41:37Z

Here's how the table looks at the moment

joehart2001 · 2026-03-24T16:08:02Z

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

Yes i totally agree. We have some other PRs that could maybe fit in a defects category, like #337 creates an interstitial category, so we could merge you two. but its something we can move around easily anyway

joehart2001 · 2026-03-24T16:09:32Z

Here's how the table looks at the moment

Looking good, potentially don't need rmsd as well as MAE? hopefully we'll have an option soon to switch between different types of errors automatically.

ThomasWarford · 2026-03-24T17:59:09Z

The RMSD metric is still subject to change, since it is sensitive to the accuracy of the bulk structure as well as the defect structure (which is what we are interested in).

kavanase · 2026-04-06T17:14:26Z

This looks very nice @ThomasWarford!

…chmark

kavanase · 2026-04-10T17:10:35Z

I had a look through this code @ThomasWarford. It's very nice!

One comment:
For the max_dist description:

Maximum atomic displacement between the MLIP-relaxed and DFT-relaxed matched
structures, normalised by :math:(N/V)^{1/3} (where :math:N is the number of
atoms and :math:V the cell volume) to give a unitless quantity comparable across
different supercell sizes. Only computed for structure pairs that pass the
StructureMatcher test. The match criterion itself is a normalised max dist below 0.3.

I would suggest changing "across different supercell sizes" to "across different crystal structures". 'Different supercell sizes' sounds like this number will differ for a defect in e.g. a 2x2x2 MgO supercell and a 4x4x4 supercell, which is not the case. Also just to check, is it not (V/N)^{1/3} ? i.e. average free length per atom (https://github.com/materialsproject/pymatgen/blob/v2025.6.14/src/pymatgen/analysis/structure_matcher.py#L534)

Fyi, if the StructureMatcher scans are very slow to run, the StructureMatcher_scan_stol function in doped gives the same functionalities and orders of magnitude faster.

…chmark

ThomasWarford · 2026-04-12T10:20:27Z

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

kavanase · 2026-04-12T15:19:14Z

Ok cool! The structure matching stuff was just if it's annoyingly slow

ThomasWarford · 2026-04-12T15:41:22Z

It seems about ~1/4 of the time is spent structure matching, when running on GPU

joehart2001 · 2026-04-12T17:08:53Z

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

ThomasWarford · 2026-04-13T08:43:15Z

Here's the app:

Clicking on a violin plot point shows the MLIP relaxed and DFT relaxed structures.

ThomasWarford · 2026-04-13T08:46:17Z

@joehart2001 perhaps it's not worth it for a 25% speedup. If a future PR uses StructureMatcher heavily it might be worth looking at again.

kavanase · 2026-04-13T13:57:06Z

Looks very nice! One thing that was initially surprising to me looking at that table (before remembering the origins) -- the errors are larger for PBE than PBEsol, despite the models mostly all being PBE models (outsite the R2SCAN one). But this is more due to their compositions (PBE -> nitrides, PBEsol -> oxides here) right?

I see this is shown in the lower violin plot ("Oxides, PBEsol"), but I think it would be best if that could be included in the table headers, just to minimise confusion about this. I know this is more text to add so could make it messy, but if needed I think using "Oxides/Nitrides" in the headers would be better than "PBEsol/PBE" as it's the former that is the bigger difference here, whereas the slightly different functional choices are not so big factors?

ElliottKasoar · 2026-04-14T16:11:58Z

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.
I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?
It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

Generally I prefer adding a dependency over copying code, otherwise we miss out on any bug fixes, performance improvements, etc., but if it were say a single function we needed and licensing was fine, copying things over could be considered (I'd still reference it, even if it were not required).

Adding dependencies of course comes with potential conflicts etc. to worry about, but we can also make things optional and declare conflicting dependencies if we need to as well, so I also wouldn't overthink it.

(More generally, we need to formalise and potentially revisit some of our decisions with respect to test-specific dependencies, but that's not a job for this!)

ThomasWarford · 2026-04-14T16:19:20Z

@kavanase thanks for pointing this out, I changed it in the latest commit. Materials project is pretty oxide-heavy, which probably explains it.

ElliottKasoar · 2026-04-20T16:29:25Z

Thanks for this, @ThomasWarford, it's looking great! I think something may have gone slightly wrong with your merges, it looks like you've picked up some stray file changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ThomasWarford · 2026-04-20T17:07:52Z

Sorry about that, fixed in the latest commit @ElliottKasoar

joehart2001 · 2026-04-20T20:46:14Z

+from ml_peg.models.models import current_models
+
+MODELS = load_models(current_models)
+github_uri = "https://github.com/ThomasWarford/defect_data/raw/refs/heads/main/"


uri or url?

also would you prefer for us to upload the data to the s3 bucket (cant rememebr if we've already discussed this) or are you downloading from your link?

We use github_uri, so I imagine it's based on that. URLs are URIs, URI is just a broader term

@joehart2001 Yes, upload the data to S3 if you'd like. I didn't do it yet in case the data would have to change but I think it won't change now.

This is now uploaded as inputs/defects/split_vacancy/split_vacancy.zip (and unpacks to split_vacancy), so you can update this @ThomasWarford

ElliottKasoar

Hi @ThomasWarford, I'm running this for a couple of models to give more in depth feedback for the analysis and app, but it's looking great!

Sorry about the delayed review.

With regard to merging with the existing defect category, I'm also very happy to move those into your defects directory. defects is arguably a nicer name than defect anyway. You can use git mv in the same way as you'd use mv to move files and track them, if that helps!

ElliottKasoar · 2026-06-23T15:45:09Z

There is (now) a merged defect.rst file, so can you combine these two?

ElliottKasoar · 2026-06-23T15:45:23Z

    molecular_crystal
    molecular
    bulk_crystal
+    defects


ElliottKasoar · 2026-06-23T15:49:15Z

As above, this exists now, and should be merged

ElliottKasoar · 2026-06-23T15:54:25Z

Can this be renamed to calc_split_vacancy.py to work with the pattern matching (or change the directory name and name elsewhere)?

Also as above, merging of the defect/defects category

ElliottKasoar · 2026-06-23T15:56:18Z

+from ml_peg.models.get_models import get_model_names
+from ml_peg.models.models import current_models


Suggested change

from ml_peg.models.get_models import get_model_names

from ml_peg.models.models import current_models

from ml_peg.models import current_models

from ml_peg.models.get_models import get_model_names

This has moved, sorry!

ElliottKasoar · 2026-06-23T16:14:19Z

+
+                for atoms_path in atoms_paths:
+                    relaxed_atoms = []
+                    atoms_list = read(atoms_path, ":")


Can you add (default) spin (multiplicity) and charge to these structures? Some models (e.g. orb-v3-consv-omol) require these to be set

ElliottKasoar · 2026-06-23T16:23:22Z

+
+
+@pytest.mark.parametrize("mlip", MODELS.items())
+def test_relax_and_calculate_energy(mlip: tuple[str, Any]):


I think we probably want to mark this as slow?

ElliottKasoar · 2026-06-23T16:24:07Z

+# @pytest.fixture
+# @plot_violin(
+#     title="RMSD Distribution (Oxides, PBEsol)",
+#     y_label="RMSD / Å",
+#     filename=OUT_PATH / "figure_rmsd_pbesol.json",
+# )
+# def rmsd_pbesol_dist(build_results_pbesol) -> dict[str, list]:
+#     _, _, result_rmsd, _, _ = build_results_pbesol
+#     return result_rmsd


Can we delete this?

ElliottKasoar · 2026-06-23T16:24:19Z

+# @pytest.fixture
+# @plot_violin(
+#     title="RMSD Distribution (Nitrides, PBE(+U))",
+#     y_label="RMSD / Å",
+#     filename=OUT_PATH / "figure_rmsd_pbe.json",
+# )
+# def rmsd_pbe_dist(build_results_pbe) -> dict[str, list]:
+#     _, _, result_rmsd, _, _ = build_results_pbe
+#     return result_rmsd


Can we delete this?

ElliottKasoar · 2026-06-23T16:25:58Z

I'm in the process of running this to test, but can you check that this works with models that have no/missing data (e.g. NaNs, or models that weren't run)?

Also note with the new mock calculator (once you rebase) that you can potentially use that as the source of reference data etc. which can be safer than relying on the first model you happen to analyse.

ElliottKasoar · 2026-06-25T13:11:08Z

The directory name for this should be the benchmark name, so ml_peg/analysis/defects/split_vacancy/analyse_split_vacancy.py, otherwise it won't be automatically discovered

ElliottKasoar · 2026-06-25T13:19:56Z

+DATA_PATH = APP_ROOT / "data" / "defects" / "split_vacancy"
+
+# for dash, assets/ is equivalent to APP_ROOT/data/
+STRUCTS_URL = "assets/defects/split_vacancy"


Suggested change

STRUCTS_URL = "assets/defects/split_vacancy"

STRUCTS_URL = "/assets/defects/split_vacancy"

This is needed for the path to work with more recent changes

ElliottKasoar · 2026-06-25T13:21:12Z

+STRUCTS_URL = "assets/defects/split_vacancy"
+
+
+def struct_pair_from_violin(


It might be nice to be able to visualise from the scatters too, if the data is already there, and we have the callbacks already?

ThomasWarford added 21 commits February 3, 2026 13:43

WIP calculations

3b0f613

bugfixes

3c890aa

float32 to float64

4e3d27c

very WIP

ac29957

passed, still many TODOs

358b32b

rename

819918b

remove sv_from_nv_xyz_path for simplicity

b1f9098

Working app

ef2ddbc

update metric names

c1684d2

copy reference structures

8b0f0f4

add rmsd

8dd25b8

tidy

a784828

split by functional

f489af9

slow

fb72af9

calculate rmsd (expensive) in calculate stage

172c814

fix some bugs

2a84003

clean

0eb4a5e

structure matching

32d2575

update description

1d0ddd9

remove structure match metric (always 1)

76c55d4

merge models.yml

a65e1e8

ThomasWarford force-pushed the split_vacancies_benchmark branch from 5a053c7 to a65e1e8 Compare March 23, 2026 22:34

ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Apr 2, 2026

Merge remote-tracking branch 'upstream/main' into split_vacancies_ben…

1281163

…chmark

ThomasWarford added 5 commits April 10, 2026 19:44

NaN handling

ef03c40

more robust structure matching

0b7f942

matching description

7795251

show info for selected structure

900005d

Merge remote-tracking branch 'upstream/main' into split_vacancies_ben…

231a16f

…chmark

warning for unmatched structures

32ac38b

track convergence of relaxations

980a01c

ThomasWarford marked this pull request as ready for review April 13, 2026 08:43

pbesol/pbe -> oxide/nitride

9aff18e

Restore accidentally removed .github files and revert models.yml changes

eaad363

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Merge branch 'main' into split_vacancies_benchmark

f0d6c80

ThomasWarford force-pushed the split_vacancies_benchmark branch from 46afbd6 to f0d6c80 Compare April 20, 2026 17:52

remove unused import, end of line change

f646f79

joehart2001 reviewed Apr 20, 2026

View reviewed changes

ElliottKasoar reviewed Jun 23, 2026

View reviewed changes

ElliottKasoar reviewed Jun 25, 2026

View reviewed changes

		from ml_peg.models.get_models import get_model_names
		from ml_peg.models.models import current_models



		@pytest.mark.parametrize("mlip", MODELS.items())
		def test_relax_and_calculate_energy(mlip: tuple[str, Any]):

	STRUCTS_URL = "assets/defects/split_vacancy"
	STRUCTS_URL = "/assets/defects/split_vacancy"

		STRUCTS_URL = "assets/defects/split_vacancy"


		def struct_pair_from_violin(

Uh oh!

Conversation

ThomasWarford commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-review checklist for PR author

Summary

Linked issue

Progress

Testing

New decorators/callbacks

Uh oh!

ThomasWarford commented Mar 23, 2026

Uh oh!

ThomasWarford commented Mar 23, 2026

Uh oh!

joehart2001 commented Mar 24, 2026

Uh oh!

joehart2001 commented Mar 24, 2026

Uh oh!

ThomasWarford commented Mar 24, 2026

Uh oh!

kavanase commented Apr 6, 2026

Uh oh!

kavanase commented Apr 10, 2026

Uh oh!

ThomasWarford commented Apr 12, 2026

Uh oh!

kavanase commented Apr 12, 2026

Uh oh!

ThomasWarford commented Apr 12, 2026

Uh oh!

joehart2001 commented Apr 12, 2026

Uh oh!

ThomasWarford commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasWarford commented Apr 13, 2026

Uh oh!

kavanase commented Apr 13, 2026

Uh oh!

ElliottKasoar commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThomasWarford commented Apr 14, 2026

Uh oh!

ElliottKasoar commented Apr 20, 2026

Uh oh!

ThomasWarford commented Apr 20, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ElliottKasoar left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ThomasWarford commented Mar 23, 2026 •

edited

Loading

ThomasWarford commented Apr 13, 2026 •

edited

Loading

ElliottKasoar commented Apr 14, 2026 •

edited

Loading