Skip to content

Fix pthreads normal-mutex debug implementation #26622

Open
slowriot wants to merge 10 commits intoemscripten-core:mainfrom
slowriot:pthread-mutex-debug-fix
Open

Fix pthreads normal-mutex debug implementation #26622
slowriot wants to merge 10 commits intoemscripten-core:mainfrom
slowriot:pthread-mutex-debug-fix

Conversation

@slowriot
Copy link
Copy Markdown
Contributor

@slowriot slowriot commented Apr 3, 2026

This PR fixes a regression in Emscripten's debug-only PTHREAD_MUTEX_NORMAL path that showed up under -pthread, and adds regression coverage for the two failure modes we reduced from the original application.

See #26619 for details on the regression.

Problem

In 4.0.11, Emscripten changed debug builds so PTHREAD_MUTEX_NORMAL no longer always takes the old fast a_cas(..., 0, EBUSY) path. The motivation was valid: in debug builds, a same-thread relock of a normal mutex should trap instead of hanging forever.

The problem is that this changed the observable semantics of the mutex lock word for normal mutexes in debug builds. In particular, normal mutexes started carrying owner-style state through the slow path instead of preserving the historical 0/EBUSY encoding.

That broke dlmalloc under -pthread debug system-library builds. dlmalloc uses normal pthread mutexes for its internal locks, and with the new debug path we were able to trigger allocator aborts with a minimal browser repro consisting only of:

  • main thread allocation churn
  • wasm worker allocation/deallocation churn
  • -pthread
  • -sWASM_WORKERS

No application logic, rendering, or complex containers were required.

The first part of the fix was therefore to restore historical _m_lock semantics for normal mutexes in debug Emscripten builds while keeping the new debug deadlock trapping behavior.

However, that exposed a second issue.

To preserve the debug deadlock check without using _m_lock, the intermediate fix stored normal-mutex ownership in _m_count. That was enough to fix the allocator regression, but it introduced a second false-positive deadlock under contention: _m_count is shared mutable state, so contending threads can observe stale owner information and incorrectly conclude that the current thread already owns the mutex.

This second issue showed up in the harness-facing repro as a deadlock assert during steady-state execution/reporting, even though the mutex was not actually being recursively acquired by the same thread.

Root Cause

There are really two related problems here:

  1. Debug normal mutexes were no longer preserving the historical 0/EBUSY _m_lock encoding.
    That broke internal users such as dlmalloc.

  2. _m_count is not a safe place to track debug-only ownership for normal mutexes under contention.
    It is shared state, so reading it from another thread is inherently racy and can produce stale-owner false positives.

So the correct design is:

  • keep _m_lock semantics unchanged for normal mutexes
  • keep debug deadlock detection
  • do not put debug normal-mutex ownership into shared mutex fields that contending threads will race on

Fix

This PR implements that design.

Changes:

  • system/lib/libc/musl/src/thread/pthread_mutex_trylock.c
  • system/lib/libc/musl/src/thread/pthread_mutex_unlock.c
  • system/lib/libc/musl/src/thread/pthread_mutex_timedlock.c
  • system/lib/libc/musl/src/internal/pthread_impl.h

What the fix does:

  • For debug Emscripten builds, PTHREAD_MUTEX_NORMAL still keeps _m_lock in the historical 0/EBUSY form.
  • Ownership for debug deadlock detection is tracked in a per-thread list anchored on struct pthread.
  • On successful normal-mutex lock, the mutex is added to that thread's list.
  • On normal-mutex unlock, it is removed from that thread's list.
  • The debug deadlock assertion for normal mutexes checks that per-thread ownership list, not _m_count and not _m_lock.

Why this design:

  • It preserves the intent of the 4.0.11 debug change: same-thread relock of a normal mutex still traps instead of silently hanging.
  • It preserves the old _m_lock semantics that dlmalloc depended on.
  • It avoids using shared mutable state for debug-only owner tracking, which was the source of the stale-owner false positives.
  • It leaves non-normal mutex behavior alone.
  • It leaves release/NDEBUG behavior alone.

A subtle but important implementation detail is that debug normal-mutex ownership is tracked in a per-thread list anchored on struct pthread, rather than in a separate _Thread_local variable or in shared mutex fields such as _m_lock / _m_count. This preserves the historical mutex lock-word semantics for normal mutexes, avoids racy shared ownership state, and also avoids introducing extra TLS state that broke other.test_threadprofiler during CI.

Why not simpler alternatives

I intentionally did not take these approaches:

  • Reverting the 4.0.11 debug change entirely.
    That would discard the intended "trap instead of deadlock" behavior that motivated the change.

  • Keeping _m_count as the owner field.
    That fixes the first failure but causes the second one because _m_count is shared and racy.

  • Disabling assertions / forcing NDEBUG.
    That works around the issue but does not fix the debug runtime.

The patch here is meant to be a real upstream fix, not a workaround.

Tests

This PR adds two browser regression tests in test/test_browser.py:

  • test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp
    This captures the original allocator failure mode: debug -pthread + -sWASM_WORKERS plus trivial main/worker allocation traffic must stay clean.

  • test/wasm_worker/pthread_mutex_debug_reporting_teardown.cpp
    This captures the stale-owner false-positive that appeared after the first fix was applied.

I also verified that the existing deadlock behavior is still intact via:

  • other.test_pthread_mutex_deadlock
  • other.test_threadprofiler

So the final state is:

  • original allocator regression fixed
  • follow-on stale-owner deadlock fixed
  • intended debug deadlock trapping for real same-thread relock preserved
  • existing --threadprofiler startup behavior preserved

Validation

The fix is validated with:

  • browser.test_wasm_worker_pthread_mutex_debug_allocator_regression
  • browser.test_wasm_worker_pthread_mutex_debug_reporting_teardown
  • other.test_pthread_mutex_deadlock
  • other.test_threadprofiler

Copilot AI review requested due to automatic review settings April 3, 2026 21:34
@slowriot slowriot changed the title Fix Fix pthreads normal-mutex debug implementation Apr 3, 2026
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a debug-only regression in Emscripten’s musl pthread normal-mutex path under -pthread (notably affecting dlmalloc with -sWASM_WORKERS) by restoring historical _m_lock semantics while keeping same-thread relock deadlock trapping via debug-only thread-local ownership tracking. It also adds browser regression tests covering the allocator abort and the subsequent stale-owner deadlock false-positive.

Changes:

  • Restore PTHREAD_MUTEX_NORMAL debug behavior to keep _m_lock in the historical 0/EBUSY encoding while maintaining deadlock trapping.
  • Add debug-only thread-local tracking of held normal mutexes using _m_prev/_m_next, and use it for deadlock assertions.
  • Add two browser regression tests for the allocator regression and the reporting/teardown false-positive.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
system/lib/libc/musl/src/thread/pthread_mutex_trylock.c Implements debug normal-mutex lock-word preservation and hooks normal-mutex ownership tracking.
system/lib/libc/musl/src/thread/pthread_mutex_unlock.c Hooks normal-mutex unlock to remove from debug thread-local ownership tracking.
system/lib/libc/musl/src/thread/pthread_mutex_timedlock.c Updates debug deadlock assertions to consult thread-local normal-mutex ownership list.
system/lib/libc/musl/src/internal/pthread_impl.h Adds debug-only thread-local list + helpers for normal-mutex ownership tracking.
test/test_browser.py Registers two new browser regression tests under the wasm worker suite.
test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp New browser repro focused on allocator churn under -pthread + wasm workers.
test/wasm_worker/pthread_mutex_debug_reporting_teardown.cpp New browser repro focused on the follow-on stale-owner deadlock assert.
AUTHORS Adds contributor entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@slowriot
Copy link
Copy Markdown
Contributor Author

slowriot commented Apr 5, 2026

It appears the failing test is failing on pip install on mac arm64 - doesn't seem to be related to this change set.

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented Apr 5, 2026

I'm not sure the root cause analysis here is correct/complete.

My guess is that the issue here is simply that pthead_mutex API was never really tested or guardnattes to work when called from a wasm worker. In particular IIUC the pthread_self() function always return NULL for all wasm workers, which is likely to confuse pretty much all pthread APIs. The fact that normal pthread_mutex lock happened to work before #24607 seems like more of a lucky coincidence perhaps?

In particular I don't under understand "The problem is that this changed the observable semantics of the mutex lock word for normal mutexes in debug builds". Surely the contents of the mutex word should be exterally observed by the user program? Its purely internal the pthread implementation right?

When you say "That broke dlmalloc under -pthread debug system-library builds".. I'm not sure I understand why it broke exactly? Do you know why it breaks in this one configuration? What exactly is the issue?

@sbc100
Copy link
Copy Markdown
Collaborator

sbc100 commented Apr 5, 2026

I think its rather just revert #24607 than add more complex emscripten patches to the internal pthread API. However, I think it might be better to try to make it so that calling pthead APIs from wasm workers is actually well defined.

@slowriot
Copy link
Copy Markdown
Contributor Author

slowriot commented Apr 6, 2026

@sbc100 apologies if I've been unclear - I'm only becoming familiar with this code as I'm working through it, and learning more as I go. My understanding at this point is:

  • 4.0.11 debug builds disable the old fast path for PTHREAD_MUTEX_NORMAL, so pthread_mutex_lock() always falls through to __pthread_mutex_timedlock() instead of doing the old direct a_cas(..., 0, EBUSY) acquire. See pthread_mutex_lock.c#L3 and pthread_mutex_timedlock.c#L62.
  • __pthread_mutex_timedlock() immediately calls __pthread_mutex_trylock(), which in this debug configuration always routes normal mutexes to __pthread_mutex_trylock_owner(). See pthread_mutex_timedlock.c#L74 and pthread_mutex_trylock.c#L95.
  • The first thing __pthread_mutex_trylock_owner() does is:
    • pthread_t self = __pthread_self();
    • int tid = self->tid;
      See pthread_mutex_trylock.c#L7. This is the first new UB point in 4.0.11.
  • On this target, __pthread_self() is just ((pthread_t)__get_tp()), and tid is a field inside struct pthread. See pthread_impl.h#L41 and pthread_impl.h#L175.
  • In the pthread runtime, __get_tp() returns the global thread_ptr, and __set_thread_state() writes that global. See emscripten_thread_state.S#L23 and emscripten_thread_state.S#L29.
  • In the mixed -pthread + -sWASM_WORKERS wasm-worker startup path, JS explicitly does ___set_thread_state(/*thread_ptr=*/0, ...). See libwasm_worker.js#L120. Source-wise, that means __get_tp() == 0 on that path, hence __pthread_self() == 0.
  • Therefore the new 4.0.11 slow path reads self->tid from address 0 + offsetof(struct pthread, tid), not from a real struct pthread. So this is where it all goes wrong - tid contains arbitrary garbage from memory.
  • The same invalid self is then used for more than just that initial load:
  • dlmalloc is the hot internal caller that keeps hitting this path:

The changes in this PR stop PTHREAD_MUTEX_NORMAL in debug builds from depending on __pthread_self()->tid at all. They restore the old owner-less 0/EBUSY lock-word behavior for normal mutexes, and move the debug deadlock check to separate per-thread bookkeeping instead of deriving ownership from the invalid wasm-worker pthread_t path.

The advantage over simply reverting, is that this change preserves the intended post-#24607 debug behaviour: same-thread relock of a normal mutex still traps instead of silently hanging forever. Simply reverting would discard that safety improvement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants