fix(test): explicit del lammps before MPI.Finalize() in mpirun runners

Han Wang · Han Wang · commit f1e144ee92f2 · 2026-05-23T14:14:55.000+08:00
The mpirun-driven LAMMPS test runners called ``MPI.Finalize()`` at the end of the script with the ``lammps`` Python object still alive. When the interpreter then shut down, the LAMMPS C++ destructor ran in a state where MPI was already finalized — and LAMMPS' ``Finish::end``, fix/compute teardown, and the deep[m|spin] pair-style destructor chain all issue MPI collectives (``MPI_Gather`` / ``MPI_Reduce``) during cleanup. On the empty-subdomain rank (no local atoms but live ghost atoms), the asymmetric MPI traffic during destruction occasionally hit an MPI-after-Finalize error path and crashed the rank with SIGFPE, manifesting in CUDA CI as ``exit status 136`` of the subprocess for ``test_pair_deepmd_mpi_dpa3_spin_empty_subdomain``. The crash was intermittent (1 fail in ~5 runs) on the GitHub Actions CUDA runner, not reproducible on a V100 dev box. PR #5446 (unrelated to MPI / spin / CUDA code) hit the same flake — confirming it's a pre-existing teardown race in the test runners, not a regression in either PR. The fix is mechanical and identical in all four runners: ``del lammps`` before ``MPI.Finalize()`` so the LAMMPS instance is torn down while the communicator is still valid.
diff --git a/source/lmp/tests/run_mpi_pair_deepmd.py b/source/lmp/tests/run_mpi_pair_deepmd.py
@@ -62,4 +62,7 @@
     pe = lammps.eval("pe")
     arr = [pe]
     np.savetxt(output, np.array(arr))
+# Tear down LAMMPS before MPI.Finalize() to avoid MPI-after-Finalize
+# in the LAMMPS destructor.  See run_mpi_pair_deepmd_spin_dpa3_pt2.py.
+del lammps
 MPI.Finalize()
diff --git a/source/lmp/tests/run_mpi_pair_deepmd_dpa3_pt2.py b/source/lmp/tests/run_mpi_pair_deepmd_dpa3_pt2.py
@@ -225,4 +225,8 @@
             row = np.concatenate([fi, vi])
             f.write(" ".join(f"{v:.16e}" for v in row) + "\n")
 
+# Tear down LAMMPS before MPI.Finalize() — see the matching comment in
+# ``run_mpi_pair_deepmd_spin_dpa3_pt2.py``.  Same teardown-order race
+# class; spin happens to hit it more often on CUDA CI.
+del lammps
 MPI.Finalize()
diff --git a/source/lmp/tests/run_mpi_pair_deepmd_spin.py b/source/lmp/tests/run_mpi_pair_deepmd_spin.py
@@ -62,4 +62,7 @@
     pe = lammps.eval("pe")
     arr = [pe]
     np.savetxt(output, np.array(arr))
+# Tear down LAMMPS before MPI.Finalize() to avoid MPI-after-Finalize
+# in the LAMMPS destructor.  See run_mpi_pair_deepmd_spin_dpa3_pt2.py.
+del lammps
 MPI.Finalize()
diff --git a/source/lmp/tests/run_mpi_pair_deepmd_spin_dpa3_pt2.py b/source/lmp/tests/run_mpi_pair_deepmd_spin_dpa3_pt2.py
@@ -144,4 +144,13 @@
             row = np.concatenate([fi, fmi, vi])
             f.write(" ".join(f"{v:.16e}" for v in row) + "\n")
 
+# Tear down the LAMMPS instance *before* ``MPI.Finalize()`` so its
+# destructor's MPI calls (fix/compute cleanup, timing reductions inside
+# ``Finish::end``, the deep-spin pair-style destructor chain, etc.) run
+# while the communicator is still valid.  Without this, Python keeps
+# ``lammps`` alive past ``MPI.Finalize()`` and only releases it during
+# interpreter shutdown — and the empty-subdomain rank then hits an
+# MPI-after-Finalize call which crashes with SIGFPE on some CUDA CI
+# runners (intermittent; not reproducible on V100).
+del lammps
 MPI.Finalize()