From e10fdfb7d9af98984d22df8013a6a9d8382db6ce Mon Sep 17 00:00:00 2001 From: Future-Outlier Date: Fri, 8 May 2026 13:33:37 +0800 Subject: [PATCH 1/6] [docs] Slurm: require an init process as PID 1 when running symmetric-run in Docker MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit When ray symmetric-run finishes, ray stop sends SIGTERM to each Ray process and waits for them via psutil.wait_procs. Reaping the resulting zombies is the parent's job. On a normal Linux host PID 1 is systemd, which reaps. Inside a containerized Slurm compute node PID 1 is slurmd, which does not register a SIGCHLD handler — so the processes stay zombies, psutil.wait_procs treats them as still alive, and ray stop reports "Stopped 0 out of N". Document the deployment-layer fix: give the container a real init (tini / dumb-init / docker run --init / compose init: true), and link to the root-cause analysis and confirmation in PR #62591. Refs: - https://github.com/ray-project/ray/pull/62591#issuecomment-4396615458 - https://github.com/ray-project/ray/pull/62591#issuecomment-4403602546 Signed-off-by: Future-Outlier --- .../vms/user-guides/community/slurm.rst | 67 +++++++++++++++++++ 1 file changed, 67 insertions(+) diff --git a/doc/source/cluster/vms/user-guides/community/slurm.rst b/doc/source/cluster/vms/user-guides/community/slurm.rst index 2a02aed01d66..68766525a383 100644 --- a/doc/source/cluster/vms/user-guides/community/slurm.rst +++ b/doc/source/cluster/vms/user-guides/community/slurm.rst @@ -125,6 +125,73 @@ After the training job is completed, the Ray cluster will be stopped automatical .. note:: The -u argument tells python to print to stdout unbuffered, which is important with how slurm deals with rerouting output. If this argument is not included, you may get strange printing behavior such as printed statements not being logged by slurm until the program has terminated. +.. _ray-slurm-docker-init: + +Running inside Docker containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +If your SLURM compute nodes run the job inside a Docker container, make sure +PID 1 inside the container is a proper init process (``tini``, ``dumb-init``, +or Docker's built-in ``--init``). Otherwise ``ray symmetric-run`` will leave +Ray processes as zombies on teardown, and ``ray stop`` will report +``Stopped 0 out of N``. + +Why this matters +^^^^^^^^^^^^^^^^ + +When ``ray symmetric-run`` finishes, it calls ``ray stop``, which sends +``SIGTERM`` to each Ray process and then waits for them to exit using +``psutil.wait_procs``. After a process exits, the kernel marks it as a zombie +and sends ``SIGCHLD`` to its current parent — the parent must call +``waitpid`` to reap the zombie. + +* On a normal Linux host, PID 1 is ``systemd``, which reaps orphaned + processes. Zombies disappear immediately and ``psutil.wait_procs`` reports + them as ``gone``. +* Inside a containerized SLURM compute node where PID 1 is ``slurmd``, + ``slurmd`` does not register a ``SIGCHLD`` handler and does not reap + children. The dead Ray processes stay in the process table as zombies, + ``psutil.wait_procs`` classifies them as still alive, and ``ray stop`` + reports ``Stopped 0 out of N``. + +This is a deployment-layer issue (a container without a proper init), not a +Ray bug and not a SLURM bug. + +How to fix it +^^^^^^^^^^^^^ + +Give the container an init process that reaps zombies. Pick one: + +* ``docker run --init ...`` — Docker injects ``tini`` as PID 1. +* ``init: true`` in ``docker-compose.yaml`` — same effect for Compose-managed + services. +* Bake ``tini`` or ``dumb-init`` into the image and use it as the + ``ENTRYPOINT``. + +Example (Compose service for a containerized SLURM compute node): + +.. code-block:: yaml + + services: + c1: + image: slurm-docker-cluster:25.11.2 + init: true # PID 1 becomes tini, which reaps zombies + command: ["slurmd"] + +After this change, ``ray symmetric-run`` teardown reports each Ray process +with status ``terminated`` instead of ``zombie``, and ``ray stop`` reports +``Stopped N out of N``. + +References +^^^^^^^^^^ + +* Root-cause analysis (PID 1 / ``SIGCHLD`` reaping in containerized SLURM): + `ray-project/ray#62591 (comment) `__ +* Confirmation that adding an init process resolves the issue: + `ray-project/ray#62591 (comment) `__ +* Docker docs on multi-service containers and ``--init``: + `docs.docker.com/engine/containers/multi-service_container `__ + .. _slurm-network-ray: SLURM networking caveats From 4ea3f9d762520c2832718e25faec6228c42437bb Mon Sep 17 00:00:00 2001 From: Future-Outlier Date: Fri, 8 May 2026 16:19:22 +0800 Subject: [PATCH 2/6] [docs] Slurm/Docker init: tighten wording and add official references - Drop the imprecise "Docker's built-in --init" wording; --init is a flag that injects tini, not an init process itself. - Replace the unverifiable claim about kernel signal delivery to zombies with the documented fact that only waitpid(2) removes a zombie from the process table (cited from wait(2)). - Add inline citations to wait(2), signal(7), and the psutil source for wait_pid_posix's /proc/ polling fallback (which is what makes ray stop classify zombies as still alive). - Link the SIGKILL escalation to its actual call site in python/ray/scripts/scripts.py. - Add tini and dumb-init repository references with their stated purpose (zombie reaping; dumb-init also handles Linux's PID 1 signal special case). - Group References by topic: Linux semantics, container init runtimes, ray stop tooling, and PR #62591 discussion. Signed-off-by: Future-Outlier --- .../vms/user-guides/community/slurm.rst | 178 +++++++++++++----- 1 file changed, 135 insertions(+), 43 deletions(-) diff --git a/doc/source/cluster/vms/user-guides/community/slurm.rst b/doc/source/cluster/vms/user-guides/community/slurm.rst index 68766525a383..a2655451cd67 100644 --- a/doc/source/cluster/vms/user-guides/community/slurm.rst +++ b/doc/source/cluster/vms/user-guides/community/slurm.rst @@ -130,67 +130,159 @@ After the training job is completed, the Ray cluster will be stopped automatical Running inside Docker containers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If your SLURM compute nodes run the job inside a Docker container, make sure -PID 1 inside the container is a proper init process (``tini``, ``dumb-init``, -or Docker's built-in ``--init``). Otherwise ``ray symmetric-run`` will leave -Ray processes as zombies on teardown, and ``ray stop`` will report -``Stopped 0 out of N``. +If your SLURM compute nodes run the job inside a Docker container, make +sure PID 1 inside the container is a real init process — ``tini``, +``dumb-init``, or whatever Docker injects when you pass ``--init`` +(currently ``tini``). Otherwise the Ray processes that ``ray stop`` +terminates remain as zombies in the kernel's process table, and +``ray stop`` reports ``Stopped only 0 out of N`` with every remaining +process showing ``status='zombie'``. + +Symptom +^^^^^^^ + +When the container's PID 1 is not a reaper, ``ray symmetric-run`` (or a +manual ``ray stop``) prints a warning like this at teardown: + +.. code-block:: text + + WARN scripts.py:1392 -- Stopped only 0 out of 6 Ray processes within the grace period 16 seconds. Set `-v` to see more details. Remaining processes [psutil.Process(pid=2226, name='raylet', status='zombie'), psutil.Process(pid=2225, name='python3.12', status='zombie'), psutil.Process(pid=1761, name='python3.12', status='zombie'), psutil.Process(pid=1759, name='python3.12', status='zombie'), psutil.Process(pid=1760, name='python3.12', status='zombie'), psutil.Process(pid=1709, name='gcs_server', status='zombie')] will be forcefully terminated. + WARN scripts.py:1399 -- You can also use `--force` to forcefully terminate processes or set higher `--grace-period` to wait longer time for proper termination. + +What this actually means: + +* The Ray processes already exited in response to the ``SIGTERM`` that + ``ray stop`` sent — they are zombies, not still running. Per Linux + `wait(2) `__, + *"a child that terminates, but has not been waited for, becomes a + zombie"* and *"as long as a zombie is not removed from the system via a + wait, it will consume a slot in the kernel process table"*. Because the + container's PID 1 (e.g. ``slurmd``) never calls ``waitpid(2)``, the + zombies stay in that table. +* ``ray stop`` waits for termination via + `psutil.wait_procs `__, + which on POSIX uses + `psutil._psposix.wait_pid_posix + `__. + For PIDs that aren't children of the caller it falls back to polling + whether ``/proc/`` still exists. ``/proc/`` exists for a + zombie, so the wait times out and the zombies land in the ``alive`` + list. ``ray stop`` then counts them as "not stopped". +* The "forcefully terminated" line in the warning refers to ``ray stop`` + sending ``SIGKILL`` after the grace period — see the ``proc.kill()`` + loop in + `python/ray/scripts/scripts.py `__ + (``ray stop`` implementation). This does not change a zombie's state: + a zombie has no executing context, and only ``waitpid`` by its parent + removes it from the process table (``wait(2)``). Why this matters ^^^^^^^^^^^^^^^^ When ``ray symmetric-run`` finishes, it calls ``ray stop``, which sends -``SIGTERM`` to each Ray process and then waits for them to exit using -``psutil.wait_procs``. After a process exits, the kernel marks it as a zombie -and sends ``SIGCHLD`` to its current parent — the parent must call -``waitpid`` to reap the zombie. - -* On a normal Linux host, PID 1 is ``systemd``, which reaps orphaned - processes. Zombies disappear immediately and ``psutil.wait_procs`` reports - them as ``gone``. +``SIGTERM`` to each Ray process and then waits for them to exit. After a +process exits, the kernel marks it as a zombie and delivers ``SIGCHLD`` +to its current parent (see +`signal(7) `__: +*"Child stopped, terminated, or continued"*). The parent must then call +``waitpid`` (`wait(2) `__) +to reap the zombie. + +* On a typical modern Linux distribution, PID 1 is ``systemd``, which + reaps orphaned children. Zombies disappear immediately and + ``psutil.wait_procs`` reports them as ``gone``. * Inside a containerized SLURM compute node where PID 1 is ``slurmd``, - ``slurmd`` does not register a ``SIGCHLD`` handler and does not reap - children. The dead Ray processes stay in the process table as zombies, - ``psutil.wait_procs`` classifies them as still alive, and ``ray stop`` - reports ``Stopped 0 out of N``. - -This is a deployment-layer issue (a container without a proper init), not a -Ray bug and not a SLURM bug. + ``slurmd`` registers handlers for ``SIGINT``, ``SIGTERM``, ``SIGQUIT``, + ``SIGHUP``, ``SIGUSR2``, ``SIGPIPE``, and ``SIGPROF`` — but not + ``SIGCHLD`` — and so does not reap re-parented orphan processes. (The + official `slurmd(8) `__ SIGNALS + section likewise omits ``SIGCHLD``; ``slurmd.c`` source references are + linked in the discussion comment cited under "References" below.) The + dead Ray processes stay in the process table with ``status='zombie'``, + ``psutil.wait_procs`` returns them in the ``alive`` list, and + ``ray stop`` reports ``Stopped only 0 out of N``. + +This is a deployment-layer issue (a container without a real init), not +a Ray bug and not a SLURM bug. How to fix it ^^^^^^^^^^^^^ -Give the container an init process that reaps zombies. Pick one: - -* ``docker run --init ...`` — Docker injects ``tini`` as PID 1. -* ``init: true`` in ``docker-compose.yaml`` — same effect for Compose-managed - services. -* Bake ``tini`` or ``dumb-init`` into the image and use it as the - ``ENTRYPOINT``. - -Example (Compose service for a containerized SLURM compute node): +Give the container a real init that reaps zombies (calls ``waitpid`` on +exit). Pick the option that matches how you launch the container: + +* **Plain Docker** — pass ``--init`` to ``docker run``. Docker injects + ``tini`` as PID 1 for you. See + `docker run --init `__. +* **Docker Compose** — set ``init: true`` on the service in your + ``docker-compose.yaml``. Same effect as ``docker run --init``. See + `Compose: init `__. +* **Bake it into the image** — install + `tini `__ (or + `dumb-init `__) in the + ``Dockerfile`` and use it as the ``ENTRYPOINT``. ``tini`` exists, in + its own words, *"to protect you from software that accidentally + creates zombie processes, which can (over time!) starve your entire + system for PIDs"* — by reaping them. ``dumb-init`` does the same and + additionally addresses Linux's special signal-handling rules for + PID 1. This is the path Docker recommends for containers running + multiple processes: see + `Run multiple services in a container `__. + +Example — a ``docker-compose.yaml`` snippet for a containerized SLURM +compute node. The line that fixes the zombie problem is ``init: true``: .. code-block:: yaml + :caption: docker-compose.yaml - services: - c1: - image: slurm-docker-cluster:25.11.2 - init: true # PID 1 becomes tini, which reaps zombies - command: ["slurmd"] + services: + c1: + image: slurm-docker-cluster:25.11.2 + init: true # PID 1 becomes tini, which reaps zombies + command: ["slurmd"] -After this change, ``ray symmetric-run`` teardown reports each Ray process -with status ``terminated`` instead of ``zombie``, and ``ray stop`` reports -``Stopped N out of N``. +After this change, ``ray stop`` sees each Ray process exit with status +``terminated`` (not ``zombie``) and reports ``Stopped N out of N``. References ^^^^^^^^^^ -* Root-cause analysis (PID 1 / ``SIGCHLD`` reaping in containerized SLURM): - `ray-project/ray#62591 (comment) `__ -* Confirmation that adding an init process resolves the issue: - `ray-project/ray#62591 (comment) `__ -* Docker docs on multi-service containers and ``--init``: - `docs.docker.com/engine/containers/multi-service_container `__ +Linux process and signal semantics: + +* `wait(2) `__ — + zombie definition, ``waitpid``, and PID 1 reaping orphans. +* `signal(7) `__ — + ``SIGCHLD`` semantics. + +Container init runtimes: + +* `tini `__ — the minimal init that + Docker bundles for ``--init``. +* `dumb-init `__ — alternative + minimal init, same purpose. +* `Docker: Run multiple services in a container + `__ + — Docker's guidance on running an init in the container. + +Tooling used by ``ray stop``: + +* `psutil.wait_procs source + `__ + — the function ``ray stop`` uses to wait for termination. +* `python/ray/scripts/scripts.py + `__ + — ``ray stop`` implementation, including the warning text and the + post-grace-period ``SIGKILL`` escalation. + +Discussion specific to this issue: + +* `ray-project/ray#62591 (root-cause analysis) + `__ + — comparison of ``systemd`` vs. ``slurmd`` PID 1 behavior, with linked + ``slurmd.c`` source references. +* `ray-project/ray#62591 (init fix confirmation) + `__ + — confirms ``init: true`` resolves the issue. .. _slurm-network-ray: From 712d99e918f43f8c2f6089495ffc3b2d14615cff Mon Sep 17 00:00:00 2001 From: Future-Outlier Date: Fri, 8 May 2026 16:30:27 +0800 Subject: [PATCH 3/6] update Signed-off-by: Future-Outlier --- doc/source/cluster/vms/user-guides/community/slurm.rst | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/doc/source/cluster/vms/user-guides/community/slurm.rst b/doc/source/cluster/vms/user-guides/community/slurm.rst index a2655451cd67..c19a1fba6f0b 100644 --- a/doc/source/cluster/vms/user-guides/community/slurm.rst +++ b/doc/source/cluster/vms/user-guides/community/slurm.rst @@ -241,8 +241,13 @@ compute node. The line that fixes the zombie problem is ``init: true``: init: true # PID 1 becomes tini, which reaps zombies command: ["slurmd"] -After this change, ``ray stop`` sees each Ray process exit with status -``terminated`` (not ``zombie``) and reports ``Stopped N out of N``. +After this change, the Ray processes are properly reaped on teardown, +so ``psutil.wait_procs`` no longer classifies them as alive and +``ray stop`` prints its success message instead of the warning above: + +.. code-block:: text + + SUCCESS scripts.py:1488 -- Stopped all 6 Ray processes. References ^^^^^^^^^^ From cb50a1161d1520a9dd6e587200a6d2baf9577e76 Mon Sep 17 00:00:00 2001 From: Future-Outlier Date: Fri, 8 May 2026 16:40:36 +0800 Subject: [PATCH 4/6] [docs] Slurm-Docker init: use the proper noun "Slurm" in the new section Spell "Slurm" (proper noun) consistently in the four occurrences inside the new "Running inside Docker containers" section. Pre-existing "SLURM" occurrences elsewhere in this file are intentionally left untouched. Signed-off-by: Future-Outlier --- doc/source/cluster/vms/user-guides/community/slurm.rst | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/doc/source/cluster/vms/user-guides/community/slurm.rst b/doc/source/cluster/vms/user-guides/community/slurm.rst index c19a1fba6f0b..44957a377405 100644 --- a/doc/source/cluster/vms/user-guides/community/slurm.rst +++ b/doc/source/cluster/vms/user-guides/community/slurm.rst @@ -130,7 +130,7 @@ After the training job is completed, the Ray cluster will be stopped automatical Running inside Docker containers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -If your SLURM compute nodes run the job inside a Docker container, make +If your Slurm compute nodes run the job inside a Docker container, make sure PID 1 inside the container is a real init process — ``tini``, ``dumb-init``, or whatever Docker injects when you pass ``--init`` (currently ``tini``). Otherwise the Ray processes that ``ray stop`` @@ -191,7 +191,7 @@ to reap the zombie. * On a typical modern Linux distribution, PID 1 is ``systemd``, which reaps orphaned children. Zombies disappear immediately and ``psutil.wait_procs`` reports them as ``gone``. -* Inside a containerized SLURM compute node where PID 1 is ``slurmd``, +* Inside a containerized Slurm compute node where PID 1 is ``slurmd``, ``slurmd`` registers handlers for ``SIGINT``, ``SIGTERM``, ``SIGQUIT``, ``SIGHUP``, ``SIGUSR2``, ``SIGPIPE``, and ``SIGPROF`` — but not ``SIGCHLD`` — and so does not reap re-parented orphan processes. (The @@ -203,7 +203,7 @@ to reap the zombie. ``ray stop`` reports ``Stopped only 0 out of N``. This is a deployment-layer issue (a container without a real init), not -a Ray bug and not a SLURM bug. +a Ray bug and not a Slurm bug. How to fix it ^^^^^^^^^^^^^ @@ -229,7 +229,7 @@ exit). Pick the option that matches how you launch the container: multiple processes: see `Run multiple services in a container `__. -Example — a ``docker-compose.yaml`` snippet for a containerized SLURM +Example — a ``docker-compose.yaml`` snippet for a containerized Slurm compute node. The line that fixes the zombie problem is ``init: true``: .. code-block:: yaml From 458338c8b80f2a175f2fd10a64e099510b410eb7 Mon Sep 17 00:00:00 2001 From: Future-Outlier Date: Tue, 12 May 2026 22:55:58 +0800 Subject: [PATCH 5/6] simple is the best Signed-off-by: Future-Outlier --- .../vms/user-guides/community/slurm.rst | 181 ++---------------- 1 file changed, 17 insertions(+), 164 deletions(-) diff --git a/doc/source/cluster/vms/user-guides/community/slurm.rst b/doc/source/cluster/vms/user-guides/community/slurm.rst index 44957a377405..12b04321fc59 100644 --- a/doc/source/cluster/vms/user-guides/community/slurm.rst +++ b/doc/source/cluster/vms/user-guides/community/slurm.rst @@ -125,170 +125,6 @@ After the training job is completed, the Ray cluster will be stopped automatical .. note:: The -u argument tells python to print to stdout unbuffered, which is important with how slurm deals with rerouting output. If this argument is not included, you may get strange printing behavior such as printed statements not being logged by slurm until the program has terminated. -.. _ray-slurm-docker-init: - -Running inside Docker containers -~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - -If your Slurm compute nodes run the job inside a Docker container, make -sure PID 1 inside the container is a real init process — ``tini``, -``dumb-init``, or whatever Docker injects when you pass ``--init`` -(currently ``tini``). Otherwise the Ray processes that ``ray stop`` -terminates remain as zombies in the kernel's process table, and -``ray stop`` reports ``Stopped only 0 out of N`` with every remaining -process showing ``status='zombie'``. - -Symptom -^^^^^^^ - -When the container's PID 1 is not a reaper, ``ray symmetric-run`` (or a -manual ``ray stop``) prints a warning like this at teardown: - -.. code-block:: text - - WARN scripts.py:1392 -- Stopped only 0 out of 6 Ray processes within the grace period 16 seconds. Set `-v` to see more details. Remaining processes [psutil.Process(pid=2226, name='raylet', status='zombie'), psutil.Process(pid=2225, name='python3.12', status='zombie'), psutil.Process(pid=1761, name='python3.12', status='zombie'), psutil.Process(pid=1759, name='python3.12', status='zombie'), psutil.Process(pid=1760, name='python3.12', status='zombie'), psutil.Process(pid=1709, name='gcs_server', status='zombie')] will be forcefully terminated. - WARN scripts.py:1399 -- You can also use `--force` to forcefully terminate processes or set higher `--grace-period` to wait longer time for proper termination. - -What this actually means: - -* The Ray processes already exited in response to the ``SIGTERM`` that - ``ray stop`` sent — they are zombies, not still running. Per Linux - `wait(2) `__, - *"a child that terminates, but has not been waited for, becomes a - zombie"* and *"as long as a zombie is not removed from the system via a - wait, it will consume a slot in the kernel process table"*. Because the - container's PID 1 (e.g. ``slurmd``) never calls ``waitpid(2)``, the - zombies stay in that table. -* ``ray stop`` waits for termination via - `psutil.wait_procs `__, - which on POSIX uses - `psutil._psposix.wait_pid_posix - `__. - For PIDs that aren't children of the caller it falls back to polling - whether ``/proc/`` still exists. ``/proc/`` exists for a - zombie, so the wait times out and the zombies land in the ``alive`` - list. ``ray stop`` then counts them as "not stopped". -* The "forcefully terminated" line in the warning refers to ``ray stop`` - sending ``SIGKILL`` after the grace period — see the ``proc.kill()`` - loop in - `python/ray/scripts/scripts.py `__ - (``ray stop`` implementation). This does not change a zombie's state: - a zombie has no executing context, and only ``waitpid`` by its parent - removes it from the process table (``wait(2)``). - -Why this matters -^^^^^^^^^^^^^^^^ - -When ``ray symmetric-run`` finishes, it calls ``ray stop``, which sends -``SIGTERM`` to each Ray process and then waits for them to exit. After a -process exits, the kernel marks it as a zombie and delivers ``SIGCHLD`` -to its current parent (see -`signal(7) `__: -*"Child stopped, terminated, or continued"*). The parent must then call -``waitpid`` (`wait(2) `__) -to reap the zombie. - -* On a typical modern Linux distribution, PID 1 is ``systemd``, which - reaps orphaned children. Zombies disappear immediately and - ``psutil.wait_procs`` reports them as ``gone``. -* Inside a containerized Slurm compute node where PID 1 is ``slurmd``, - ``slurmd`` registers handlers for ``SIGINT``, ``SIGTERM``, ``SIGQUIT``, - ``SIGHUP``, ``SIGUSR2``, ``SIGPIPE``, and ``SIGPROF`` — but not - ``SIGCHLD`` — and so does not reap re-parented orphan processes. (The - official `slurmd(8) `__ SIGNALS - section likewise omits ``SIGCHLD``; ``slurmd.c`` source references are - linked in the discussion comment cited under "References" below.) The - dead Ray processes stay in the process table with ``status='zombie'``, - ``psutil.wait_procs`` returns them in the ``alive`` list, and - ``ray stop`` reports ``Stopped only 0 out of N``. - -This is a deployment-layer issue (a container without a real init), not -a Ray bug and not a Slurm bug. - -How to fix it -^^^^^^^^^^^^^ - -Give the container a real init that reaps zombies (calls ``waitpid`` on -exit). Pick the option that matches how you launch the container: - -* **Plain Docker** — pass ``--init`` to ``docker run``. Docker injects - ``tini`` as PID 1 for you. See - `docker run --init `__. -* **Docker Compose** — set ``init: true`` on the service in your - ``docker-compose.yaml``. Same effect as ``docker run --init``. See - `Compose: init `__. -* **Bake it into the image** — install - `tini `__ (or - `dumb-init `__) in the - ``Dockerfile`` and use it as the ``ENTRYPOINT``. ``tini`` exists, in - its own words, *"to protect you from software that accidentally - creates zombie processes, which can (over time!) starve your entire - system for PIDs"* — by reaping them. ``dumb-init`` does the same and - additionally addresses Linux's special signal-handling rules for - PID 1. This is the path Docker recommends for containers running - multiple processes: see - `Run multiple services in a container `__. - -Example — a ``docker-compose.yaml`` snippet for a containerized Slurm -compute node. The line that fixes the zombie problem is ``init: true``: - -.. code-block:: yaml - :caption: docker-compose.yaml - - services: - c1: - image: slurm-docker-cluster:25.11.2 - init: true # PID 1 becomes tini, which reaps zombies - command: ["slurmd"] - -After this change, the Ray processes are properly reaped on teardown, -so ``psutil.wait_procs`` no longer classifies them as alive and -``ray stop`` prints its success message instead of the warning above: - -.. code-block:: text - - SUCCESS scripts.py:1488 -- Stopped all 6 Ray processes. - -References -^^^^^^^^^^ - -Linux process and signal semantics: - -* `wait(2) `__ — - zombie definition, ``waitpid``, and PID 1 reaping orphans. -* `signal(7) `__ — - ``SIGCHLD`` semantics. - -Container init runtimes: - -* `tini `__ — the minimal init that - Docker bundles for ``--init``. -* `dumb-init `__ — alternative - minimal init, same purpose. -* `Docker: Run multiple services in a container - `__ - — Docker's guidance on running an init in the container. - -Tooling used by ``ray stop``: - -* `psutil.wait_procs source - `__ - — the function ``ray stop`` uses to wait for termination. -* `python/ray/scripts/scripts.py - `__ - — ``ray stop`` implementation, including the warning text and the - post-grace-period ``SIGKILL`` escalation. - -Discussion specific to this issue: - -* `ray-project/ray#62591 (root-cause analysis) - `__ - — comparison of ``systemd`` vs. ``slurmd`` PID 1 behavior, with linked - ``slurmd.c`` source references. -* `ray-project/ray#62591 (init fix confirmation) - `__ - — confirms ``init: true`` resolves the issue. - .. _slurm-network-ray: SLURM networking caveats @@ -426,3 +262,20 @@ Here are some community-contributed templates for using SLURM with Ray: .. _`YASPI`: https://github.com/albanie/yaspi .. _`Convenient python interface`: https://github.com/pengzhenghao/use-ray-with-slurm + + +Troubleshooting +--------------- + +.. _ray-slurm-docker-init: + +Zombie processes in Docker containers +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Running Ray on Slurm inside a Docker container can produce zombie processes. See `this comment +`_ for details. + +Two recommended fixes: + +1. Use ``docker run --init`` (`Docker CLI docs `_). +2. Set ``init: true`` in ``docker-compose.yaml`` (`Docker Compose docs `_). From 6e09ffc3f564d97ecb9d94cad59c17dafcca73be Mon Sep 17 00:00:00 2001 From: Future-Outlier Date: Tue, 12 May 2026 23:13:18 +0800 Subject: [PATCH 6/6] Trigger CI Signed-off-by: Future-Outlier