Validate CONTCAR before copying to POSCAR to prevent corruption after forced VASP termination#416
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #416 +/- ##
==========================================
+ Coverage 53.33% 53.57% +0.23%
==========================================
Files 39 39
Lines 3549 3567 +18
==========================================
+ Hits 1893 1911 +18
Misses 1656 1656 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Andrew-S-Rosen
left a comment
There was a problem hiding this comment.
Just a head's up that there is some leftover changes that should be reverted to get it to mergeable state.
- Add is_valid_poscar() utility to check if POSCAR/CONTCAR files are valid and parseable before copying, preventing empty/truncated files from being used after VASP termination - Update all VASP handlers to validate CONTCAR before copying to POSCAR: fexcf, brions, zbrent, eddrmm, edddav, zheev/eddiag, DriftErrorHandler, UnconvergedErrorHandler, and StoppedRunHandler - Add tests for VaspJob.terminate() process group termination using os.killpg() to ensure all MPI workers are killed - Add tests for is_valid_poscar() covering valid, empty, missing, malformed, and truncated POSCAR files
694895e to
ddc879a
Compare
|
Merging! Thanks! |
|
@janosh: This PR is a bit noisy in practice. An example: Many of the errors that Custodian will catch will be in the first SCF loop, so the CONTCAR will be blank. This is expected. But the WARNING may give the user a false sense of concern, especially because it does not provide any context. Custodian must have been handling 0-byte files fine before because I never had it Anyway, as for my proposal, perhaps we remove the logging from this utility function? It should be fairly self evident that if the |
In #416, after the rebase, a utility function `copy_contcar_to_poscar_if_valid` was added. But it is unused. We should remove the unused code.
In #416, a utility function `copy_contcar_to_poscar_if_valid` was added, but it is unused. We should remove the unused code to prevent bloat.
CONTCARfile validation before copying toPOSCARacross all VASP error handlers. This is a follow-up to #414 which fixed Custodian to properly terminate all MPI worker processes (not just the parent) usingos.killpg().Motivation
When VASP is forcefully terminated via
os.killpg()(as implemented in #414), the CONTCAR file may be left in an incomplete or corrupted state if VASP was interrupted mid-write. Previously, handlers would blindly copy CONTCAR to POSCAR without validation, potentially propagating empty, truncated, or malformed structure files to subsequent calculation restarts.Changes
New
is_valid_poscar()utility (src/custodian/vasp/utils.py)Updated VASP handlers (
src/custodian/vasp/handlers.py)fexcf,brions,zbrenterrorseddrmm,edddaverrorszheev/eddiagerrorsDriftErrorHandlerUnconvergedErrorHandlerStoppedRunHandlerPoscar.from_file()patterns to use the new utilityunit tests (
tests/vasp/test_utils.py)TrueFalseRelated
Closes the remaining issue from #414 where forced process termination could leave incomplete
CONTCARfiles that would corrupt subsequent restarts.