Skip to content

Split vacancies benchmark#437

Open
ThomasWarford wants to merge 48 commits into
ddmms:mainfrom
ThomasWarford:split_vacancies_benchmark
Open

Split vacancies benchmark#437
ThomasWarford wants to merge 48 commits into
ddmms:mainfrom
ThomasWarford:split_vacancies_benchmark

Conversation

@ThomasWarford

@ThomasWarford ThomasWarford commented Mar 23, 2026

Copy link
Copy Markdown
Contributor

Pre-review checklist for PR author

PR author must check the checkboxes below when creating the PR.

Summary

Based on this paper: Identifying split vacancy defects with machine-learned foundation models and electrostatics

The metrics:

  • Formation energy of split-vacancy defects from fully ionised defects
  • Spearman's coefficient for ranking the energies of initial defect structures.
  • RMSD of MLIP relaxed structures vs DFT relaxed structures. (Note the initial structure for the MLIP relaxation is currently the DFT relaxed structure.)

Linked issue

Closes #335

Progress

  • Calculations
  • Analysis
  • Application
  • Documentation

Testing

New decorators/callbacks

@ThomasWarford ThomasWarford force-pushed the split_vacancies_benchmark branch from 5a053c7 to a65e1e8 Compare March 23, 2026 22:34
@ThomasWarford

Copy link
Copy Markdown
Contributor Author

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

@ThomasWarford

Copy link
Copy Markdown
Contributor Author

Here's how the table looks at the moment

image

@joehart2001

Copy link
Copy Markdown
Collaborator

@joehart2001 @ElliottKasoar do you think this belongs in a "defects" section rather than bulk crystals?

Yes i totally agree. We have some other PRs that could maybe fit in a defects category, like #337 creates an interstitial category, so we could merge you two. but its something we can move around easily anyway

@joehart2001

Copy link
Copy Markdown
Collaborator

Here's how the table looks at the moment

image

Looking good, potentially don't need rmsd as well as MAE? hopefully we'll have an option soon to switch between different types of errors automatically.

@ThomasWarford

Copy link
Copy Markdown
Contributor Author

The RMSD metric is still subject to change, since it is sensitive to the accuracy of the bulk structure as well as the defect structure (which is what we are interested in).

@ElliottKasoar ElliottKasoar added the new benchmark Proposals and suggestions for new benchmarks label Apr 2, 2026
@kavanase

kavanase commented Apr 6, 2026

Copy link
Copy Markdown

This looks very nice @ThomasWarford!

@kavanase

Copy link
Copy Markdown

I had a look through this code @ThomasWarford. It's very nice!

One comment:
For the max_dist description:

Maximum atomic displacement between the MLIP-relaxed and DFT-relaxed matched
structures, normalised by :math:(N/V)^{1/3} (where :math:N is the number of
atoms and :math:V the cell volume) to give a unitless quantity comparable across
different supercell sizes. Only computed for structure pairs that pass the
StructureMatcher test. The match criterion itself is a normalised max dist below 0.3.

I would suggest changing "across different supercell sizes" to "across different crystal structures". 'Different supercell sizes' sounds like this number will differ for a defect in e.g. a 2x2x2 MgO supercell and a 4x4x4 supercell, which is not the case. Also just to check, is it not (V/N)^{1/3} ? i.e. average free length per atom (https://github.com/materialsproject/pymatgen/blob/v2025.6.14/src/pymatgen/analysis/structure_matcher.py#L534)

Fyi, if the StructureMatcher scans are very slow to run, the StructureMatcher_scan_stol function in doped gives the same functionalities and orders of magnitude faster.

@ThomasWarford

Copy link
Copy Markdown
Contributor Author

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

@kavanase

Copy link
Copy Markdown

Ok cool! The structure matching stuff was just if it's annoyingly slow

@ThomasWarford

Copy link
Copy Markdown
Contributor Author

It seems about ~1/4 of the time is spent structure matching, when running on GPU

@joehart2001

Copy link
Copy Markdown
Collaborator

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.

I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?

It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

@ThomasWarford

ThomasWarford commented Apr 13, 2026

Copy link
Copy Markdown
Contributor Author

Here's the app:
image

Clicking on a violin plot point shows the MLIP relaxed and DFT relaxed structures.

@ThomasWarford ThomasWarford marked this pull request as ready for review April 13, 2026 08:43
@ThomasWarford

Copy link
Copy Markdown
Contributor Author

@joehart2001 perhaps it's not worth it for a 25% speedup. If a future PR uses StructureMatcher heavily it might be worth looking at again.

@kavanase

Copy link
Copy Markdown

Looks very nice! One thing that was initially surprising to me looking at that table (before remembering the origins) -- the errors are larger for PBE than PBEsol, despite the models mostly all being PBE models (outsite the R2SCAN one). But this is more due to their compositions (PBE -> nitrides, PBEsol -> oxides here) right?

I see this is shown in the lower violin plot ("Oxides, PBEsol"), but I think it would be best if that could be included in the table headers, just to minimise confusion about this. I know this is more text to add so could make it messy, but if needed I think using "Oxides/Nitrides" in the headers would be better than "PBEsol/PBE" as it's the former that is the bigger difference here, whereas the slightly different functional choices are not so big factors?

@ElliottKasoar

ElliottKasoar commented Apr 14, 2026

Copy link
Copy Markdown
Collaborator

@kavanase thanks for taking a look, I've made the changes you suggested to the docs.
I'll profile this later and see how long the structure matching is taking (I suspect it's quite a small amount of time compared to the relaxations, but maybe not on GPU!). @joehart2001 @ElliottKasoar how would you suggest adding StructureMatcher_scan_stol from doped, and what's your attitude to dependencies?
It looks like doped.utils.efficiency doesn't depend much on the rest of doped so the code could be copied with proper attribution (MIT license)

In general, the fewer dependencies the better, but if this is more than a small helper and the rest of doped is likely to be useful elsewhere, adding it could be justified. @ElliottKasoar thoughts?

Generally I prefer adding a dependency over copying code, otherwise we miss out on any bug fixes, performance improvements, etc., but if it were say a single function we needed and licensing was fine, copying things over could be considered (I'd still reference it, even if it were not required).

Adding dependencies of course comes with potential conflicts etc. to worry about, but we can also make things optional and declare conflicting dependencies if we need to as well, so I also wouldn't overthink it.

(More generally, we need to formalise and potentially revisit some of our decisions with respect to test-specific dependencies, but that's not a job for this!)

@ThomasWarford

Copy link
Copy Markdown
Contributor Author

@kavanase thanks for pointing this out, I changed it in the latest commit. Materials project is pretty oxide-heavy, which probably explains it.

@ElliottKasoar

Copy link
Copy Markdown
Collaborator

Thanks for this, @ThomasWarford, it's looking great! I think something may have gone slightly wrong with your merges, it looks like you've picked up some stray file changes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ThomasWarford

Copy link
Copy Markdown
Contributor Author

Sorry about that, fixed in the latest commit @ElliottKasoar

@ThomasWarford ThomasWarford force-pushed the split_vacancies_benchmark branch from 46afbd6 to f0d6c80 Compare April 20, 2026 17:52
from ml_peg.models.models import current_models

MODELS = load_models(current_models)
github_uri = "https://github.com/ThomasWarford/defect_data/raw/refs/heads/main/"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uri or url?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also would you prefer for us to upload the data to the s3 bucket (cant rememebr if we've already discussed this) or are you downloading from your link?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use github_uri, so I imagine it's based on that. URLs are URIs, URI is just a broader term

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joehart2001 Yes, upload the data to S3 if you'd like. I didn't do it yet in case the data would have to change but I think it won't change now.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now uploaded as inputs/defects/split_vacancy/split_vacancy.zip (and unpacks to split_vacancy), so you can update this @ThomasWarford

@ElliottKasoar ElliottKasoar left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ThomasWarford, I'm running this for a couple of models to give more in depth feedback for the analysis and app, but it's looking great!

Sorry about the delayed review.

With regard to merging with the existing defect category, I'm also very happy to move those into your defects directory. defects is arguably a nicer name than defect anyway. You can use git mv in the same way as you'd use mv to move files and track them, if that helps!

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is (now) a merged defect.rst file, so can you combine these two?

molecular_crystal
molecular
bulk_crystal
defects

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above, this exists now, and should be merged

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be renamed to calc_split_vacancy.py to work with the pattern matching (or change the directory name and name elsewhere)?

Also as above, merging of the defect/defects category

Comment on lines +13 to +14
from ml_peg.models.get_models import get_model_names
from ml_peg.models.models import current_models

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
from ml_peg.models.get_models import get_model_names
from ml_peg.models.models import current_models
from ml_peg.models import current_models
from ml_peg.models.get_models import get_model_names

This has moved, sorry!


for atoms_path in atoms_paths:
relaxed_atoms = []
atoms_list = read(atoms_path, ":")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add (default) spin (multiplicity) and charge to these structures? Some models (e.g. orb-v3-consv-omol) require these to be set



@pytest.mark.parametrize("mlip", MODELS.items())
def test_relax_and_calculate_energy(mlip: tuple[str, Any]):

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably want to mark this as slow?

Comment on lines +447 to +455
# @pytest.fixture
# @plot_violin(
# title="RMSD Distribution (Oxides, PBEsol)",
# y_label="RMSD / Å",
# filename=OUT_PATH / "figure_rmsd_pbesol.json",
# )
# def rmsd_pbesol_dist(build_results_pbesol) -> dict[str, list]:
# _, _, result_rmsd, _, _ = build_results_pbesol
# return result_rmsd

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this?

Comment on lines +480 to +488
# @pytest.fixture
# @plot_violin(
# title="RMSD Distribution (Nitrides, PBE(+U))",
# y_label="RMSD / Å",
# filename=OUT_PATH / "figure_rmsd_pbe.json",
# )
# def rmsd_pbe_dist(build_results_pbe) -> dict[str, list]:
# _, _, result_rmsd, _, _ = build_results_pbe
# return result_rmsd

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this?

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in the process of running this to test, but can you check that this works with models that have no/missing data (e.g. NaNs, or models that weren't run)?

Also note with the new mock calculator (once you rebase) that you can potentially use that as the source of reference data etc. which can be safer than relying on the first model you happen to analyse.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The directory name for this should be the benchmark name, so ml_peg/analysis/defects/split_vacancy/analyse_split_vacancy.py, otherwise it won't be automatically discovered

DATA_PATH = APP_ROOT / "data" / "defects" / "split_vacancy"

# for dash, assets/ is equivalent to APP_ROOT/data/
STRUCTS_URL = "assets/defects/split_vacancy"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
STRUCTS_URL = "assets/defects/split_vacancy"
STRUCTS_URL = "/assets/defects/split_vacancy"

This is needed for the path to work with more recent changes

STRUCTS_URL = "assets/defects/split_vacancy"


def struct_pair_from_violin(

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be nice to be able to visualise from the scatters too, if the data is already there, and we have the callbacks already?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new benchmark Proposals and suggestions for new benchmarks

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Split Vacancy Defect Tests

4 participants