Commit f1e144e
Han Wang
fix(test): explicit
The mpirun-driven LAMMPS test runners called ``MPI.Finalize()`` at the
end of the script with the ``lammps`` Python object still alive. When
the interpreter then shut down, the LAMMPS C++ destructor ran in a
state where MPI was already finalized — and LAMMPS' ``Finish::end``,
fix/compute teardown, and the deep[m|spin] pair-style destructor chain
all issue MPI collectives (``MPI_Gather`` / ``MPI_Reduce``) during
cleanup. On the empty-subdomain rank (no local atoms but live ghost
atoms), the asymmetric MPI traffic during destruction occasionally
hit an MPI-after-Finalize error path and crashed the rank with SIGFPE,
manifesting in CUDA CI as ``exit status 136`` of the subprocess for
``test_pair_deepmd_mpi_dpa3_spin_empty_subdomain``.
The crash was intermittent (1 fail in ~5 runs) on the GitHub Actions
CUDA runner, not reproducible on a V100 dev box. PR #5446 (unrelated
to MPI / spin / CUDA code) hit the same flake — confirming it's a
pre-existing teardown race in the test runners, not a regression in
either PR.
The fix is mechanical and identical in all four runners: ``del lammps``
before ``MPI.Finalize()`` so the LAMMPS instance is torn down while
the communicator is still valid.del lammps before MPI.Finalize() in mpirun runners1 parent 4604131 commit f1e144e
4 files changed
Lines changed: 19 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
65 | 68 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
225 | 225 | | |
226 | 226 | | |
227 | 227 | | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
| 231 | + | |
228 | 232 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
62 | 62 | | |
63 | 63 | | |
64 | 64 | | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
65 | 68 | | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
144 | 144 | | |
145 | 145 | | |
146 | 146 | | |
| 147 | + | |
| 148 | + | |
| 149 | + | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
147 | 156 | | |
0 commit comments