Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 15 additions & 0 deletions source/isaaclab/changelog.d/fix-openblas-fork-crash.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
Fixed
^^^^^

* Fixed a ``SIGSEGV`` crash during Kit startup caused by NumPy's bundled
OpenBLAS ``pthread_atfork`` handler. When ``import torch`` (or any
transitive NumPy import) runs before :class:`AppLauncher` creates the
:class:`~isaacsim.SimulationApp`, OpenBLAS spawns worker threads and
registers ``blas_thread_shutdown_`` as a child-side ``atfork`` handler.
Kit's ``libomni.platforminfo.plugin`` then calls ``fork()`` during
startup; in the child process the handler tries to ``pthread_join``
threads that no longer exist, causing a segmentation fault. The fix
sets ``OPENBLAS_NUM_THREADS=1`` (via ``setdefault``) before the library
is loaded so that no worker threads are created and the handler is a
safe no-op. Both :mod:`app_launcher` (for standalone scripts) and
``tools/conftest.py`` (for CI test subprocesses) are patched.
10 changes: 10 additions & 0 deletions source/isaaclab/isaaclab/app/app_launcher.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,16 @@
import sys
from typing import Any, Literal

# Prevent OpenBLAS fork-safety crash. NumPy/SciPy ship a bundled OpenBLAS
# that spawns worker threads and registers a pthread_atfork child handler
# (blas_thread_shutdown_). When Kit's platform-info plugin calls fork()
# during startup the handler runs in the child and tries to pthread_join
# threads that were not carried across the fork → SIGSEGV. Setting the
# thread count to 1 *before* the library is loaded avoids the crash because
# no worker threads are created and the atfork handler becomes a no-op.
# Uses setdefault so that an explicit user/CI setting is respected.
os.environ.setdefault("OPENBLAS_NUM_THREADS", "1")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Single-threaded BLAS applied globally to all users

OPENBLAS_NUM_THREADS=1 is set unconditionally at module scope, so every process that imports app_launcher — including users running batch physics computations, inverse-kinematics solves, or any heavy NumPy/SciPy workload — silently loses multi-threaded BLAS performance. The fork-safety hazard only materialises during startup when Kit calls fork(), so on hardware where the issue does not reproduce, users pay the single-thread tax with no benefit. A narrower alternative would be to reset the env var (or the pool) only when a fork is about to occur, e.g. via os.register_at_fork, but if the performance cost is accepted for Isaac Lab's GPU-first workloads this is fine as-is.


with contextlib.suppress(ModuleNotFoundError):
import isaacsim # noqa: F401
from isaacsim import SimulationApp
Expand Down
8 changes: 8 additions & 0 deletions tools/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -316,6 +316,14 @@ def run_individual_tests(test_files, workspace_root, isaacsim_ci):
file_name = os.path.basename(test_file)
env = os.environ.copy()
env["PYTHONFAULTHANDLER"] = "1"
# Prevent OpenBLAS fork-safety crash: when NumPy or SciPy is imported
# before Kit starts, OpenBLAS spawns a worker-thread pool and registers
# a pthread_atfork handler (blas_thread_shutdown_). Kit's platform-info
# plugin calls fork() during startup; in the child the handler tries to
# pthread_join threads that no longer exist → SIGSEGV. Limiting
# OpenBLAS to a single thread before the subprocess starts avoids the
# crash because no worker threads are created and the handler is a no-op.
env.setdefault("OPENBLAS_NUM_THREADS", "1")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Coverage gap for directly-invoked test runs

The guard is injected only when tests are dispatched through run_individual_tests, which spawns each test file as a child process. Developers who run pytest tests/test_foo.py directly on their workstation (a common local workflow) do not get this protection — the parent process has no guarantee that OPENBLAS_NUM_THREADS is set before import torch at the top of a test module fires. The crash is an intermittent race, so this may go unnoticed for long stretches and then surface unexpectedly. Placing the same os.environ.setdefault("OPENBLAS_NUM_THREADS", "1") at the top of this conftest.py module (so pytest applies it before collecting test files) would close the gap for all invocation paths.


timeout = test_settings.PER_TEST_TIMEOUTS.get(file_name, test_settings.DEFAULT_TIMEOUT)

Expand Down
Loading