Skip to content

Validate CONTCAR before copying to POSCAR to prevent corruption after forced VASP termination#416

Merged
Andrew-S-Rosen merged 2 commits into
materialsproject:masterfrom
janosh:only-copy-valid-contcar
Dec 14, 2025
Merged

Validate CONTCAR before copying to POSCAR to prevent corruption after forced VASP termination#416
Andrew-S-Rosen merged 2 commits into
materialsproject:masterfrom
janosh:only-copy-valid-contcar

Conversation

@janosh
Copy link
Copy Markdown
Member

@janosh janosh commented Dec 13, 2025

CONTCAR file validation before copying to POSCAR across all VASP error handlers. This is a follow-up to #414 which fixed Custodian to properly terminate all MPI worker processes (not just the parent) using os.killpg().

Motivation

When VASP is forcefully terminated via os.killpg() (as implemented in #414), the CONTCAR file may be left in an incomplete or corrupted state if VASP was interrupted mid-write. Previously, handlers would blindly copy CONTCAR to POSCAR without validation, potentially propagating empty, truncated, or malformed structure files to subsequent calculation restarts.

Changes

  1. New is_valid_poscar() utility (src/custodian/vasp/utils.py)

    • Checks file exists and is non-empty
    • Verifies file can be parsed as a valid VASP structure
    • Validates structure contains atoms
    • Logs warnings for invalid files
  2. Updated VASP handlers (src/custodian/vasp/handlers.py)

    • All handlers now validate CONTCAR before copying to POSCAR:
      • fexcf, brions, zbrent errors
      • eddrmm, edddav errors
      • zheev/eddiag errors
      • DriftErrorHandler
      • UnconvergedErrorHandler
      • StoppedRunHandler
    • Refactored duplicate try/except Poscar.from_file() patterns to use the new utility
  3. unit tests (tests/vasp/test_utils.py)

    • Valid POSCAR files return True
    • Empty, missing, malformed, and truncated files return False

Related

Closes the remaining issue from #414 where forced process termination could leave incomplete CONTCAR files that would corrupt subsequent restarts.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Dec 13, 2025

Codecov Report

❌ Patch coverage is 47.50000% with 21 lines in your changes missing coverage. Please review.
✅ Project coverage is 53.57%. Comparing base (ea00e6c) to head (1cedf95).
⚠️ Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
src/custodian/vasp/handlers.py 5.55% 17 Missing ⚠️
src/custodian/vasp/utils.py 81.81% 4 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #416      +/-   ##
==========================================
+ Coverage   53.33%   53.57%   +0.23%     
==========================================
  Files          39       39              
  Lines        3549     3567      +18     
==========================================
+ Hits         1893     1911      +18     
  Misses       1656     1656              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@Andrew-S-Rosen Andrew-S-Rosen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a head's up that there is some leftover changes that should be reverted to get it to mergeable state.

Comment thread src/custodian/vasp/jobs.py
- Add is_valid_poscar() utility to check if POSCAR/CONTCAR files are valid
  and parseable before copying, preventing empty/truncated files from being
  used after VASP termination

- Update all VASP handlers to validate CONTCAR before copying to POSCAR:
  fexcf, brions, zbrent, eddrmm, edddav, zheev/eddiag, DriftErrorHandler,
  UnconvergedErrorHandler, and StoppedRunHandler

- Add tests for VaspJob.terminate() process group termination using
  os.killpg() to ensure all MPI workers are killed

- Add tests for is_valid_poscar() covering valid, empty, missing,
  malformed, and truncated POSCAR files
@janosh janosh force-pushed the only-copy-valid-contcar branch from 694895e to ddc879a Compare December 14, 2025 04:52
Comment thread src/custodian/vasp/utils.py Outdated
@Andrew-S-Rosen
Copy link
Copy Markdown
Member

Merging! Thanks!

@Andrew-S-Rosen Andrew-S-Rosen merged commit cafd514 into materialsproject:master Dec 14, 2025
8 checks passed
@Andrew-S-Rosen
Copy link
Copy Markdown
Member

Andrew-S-Rosen commented Dec 14, 2025

@janosh: This PR is a bit noisy in practice.

An example:

ERROR:VaspErrorHandler:WARNING in EDDRMM: call to ZHEGV failed
INFO:custodian.custodian:Terminating job
INFO:custodian.vasp.jobs:Sending SIGTERM to process group 581389
INFO:custodian.vasp.jobs:Process 581389 terminated gracefully
INFO:root:Backing up run to /scratch/gpfs/ROSENGROUP/asrosen/jobflow/vasp/21/8f/d7/218fd7c2-3cb2-4c29-9c85-cce1a074371e_1/tmp-quacc-2025-12-14-15-52-00-425751-51591/error.1.tar.gz
WARNING:custodian.vasp.utils:CONTCAR is empty
ERROR:custodian.custodian:VaspErrorHandler

Many of the errors that Custodian will catch will be in the first SCF loop, so the CONTCAR will be blank. This is expected. But the WARNING may give the user a false sense of concern, especially because it does not provide any context.

Custodian must have been handling 0-byte files fine before because I never had it mv CONTCAR POSCAR a blank file, but I can't find where in the logic that occurred. I do think the validity checker is still wise because it checks for corrupted files.

Anyway, as for my proposal, perhaps we remove the logging from this utility function? It should be fairly self evident that if the _file_copy action is not included in the custodian.json, then it's because the file was not suitable. But I am certainly open to other suggestions!

Andrew-S-Rosen added a commit that referenced this pull request Dec 14, 2025
In #416, after the rebase, a utility function `copy_contcar_to_poscar_if_valid` was added. But it is unused. We should remove the unused code.
Andrew-S-Rosen added a commit that referenced this pull request Dec 14, 2025
In #416, a utility function `copy_contcar_to_poscar_if_valid` was added, but it is unused. We should remove the unused code to prevent bloat.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants