Fix pthreads normal-mutex debug implementation by slowriot · Pull Request #26622 · emscripten-core/emscripten

slowriot · 2026-04-03T21:34:47Z

This PR fixes a regression in Emscripten's debug-only PTHREAD_MUTEX_NORMAL path that showed up under -pthread, and adds regression coverage for the two failure modes we reduced from the original application.

See #26619 for details on the regression.

Problem

In 4.0.11, Emscripten changed debug builds so PTHREAD_MUTEX_NORMAL no longer always takes the old fast a_cas(..., 0, EBUSY) path. The motivation was valid: in debug builds, a same-thread relock of a normal mutex should trap instead of hanging forever.

The problem is that this changed the observable semantics of the mutex lock word for normal mutexes in debug builds. In particular, normal mutexes started carrying owner-style state through the slow path instead of preserving the historical 0/EBUSY encoding.

That broke dlmalloc under -pthread debug system-library builds. dlmalloc uses normal pthread mutexes for its internal locks, and with the new debug path we were able to trigger allocator aborts with a minimal browser repro consisting only of:

main thread allocation churn
wasm worker allocation/deallocation churn
-pthread
-sWASM_WORKERS

No application logic, rendering, or complex containers were required.

The first part of the fix was therefore to restore historical _m_lock semantics for normal mutexes in debug Emscripten builds while keeping the new debug deadlock trapping behavior.

However, that exposed a second issue.

To preserve the debug deadlock check without using _m_lock, the intermediate fix stored normal-mutex ownership in _m_count. That was enough to fix the allocator regression, but it introduced a second false-positive deadlock under contention: _m_count is shared mutable state, so contending threads can observe stale owner information and incorrectly conclude that the current thread already owns the mutex.

This second issue showed up in the harness-facing repro as a deadlock assert during steady-state execution/reporting, even though the mutex was not actually being recursively acquired by the same thread.

Root Cause

There are really two related problems here:

Debug normal mutexes were no longer preserving the historical 0/EBUSY _m_lock encoding.
That broke internal users such as dlmalloc.
_m_count is not a safe place to track debug-only ownership for normal mutexes under contention.
It is shared state, so reading it from another thread is inherently racy and can produce stale-owner false positives.

So the correct design is:

keep _m_lock semantics unchanged for normal mutexes
keep debug deadlock detection
do not put debug normal-mutex ownership into shared mutex fields that contending threads will race on

Fix

This PR implements that design.

Changes:

system/lib/libc/musl/src/thread/pthread_mutex_trylock.c
system/lib/libc/musl/src/thread/pthread_mutex_unlock.c
system/lib/libc/musl/src/thread/pthread_mutex_timedlock.c
system/lib/libc/musl/src/internal/pthread_impl.h

What the fix does:

For debug Emscripten builds, PTHREAD_MUTEX_NORMAL still keeps _m_lock in the historical 0/EBUSY form.
Ownership for debug deadlock detection is tracked in a per-thread list anchored on struct pthread.
On successful normal-mutex lock, the mutex is added to that thread's list.
On normal-mutex unlock, it is removed from that thread's list.
The debug deadlock assertion for normal mutexes checks that per-thread ownership list, not _m_count and not _m_lock.

Why this design:

It preserves the intent of the 4.0.11 debug change: same-thread relock of a normal mutex still traps instead of silently hanging.
It preserves the old _m_lock semantics that dlmalloc depended on.
It avoids using shared mutable state for debug-only owner tracking, which was the source of the stale-owner false positives.
It leaves non-normal mutex behavior alone.
It leaves release/NDEBUG behavior alone.

A subtle but important implementation detail is that debug normal-mutex ownership is tracked in a per-thread list anchored on struct pthread, rather than in a separate _Thread_local variable or in shared mutex fields such as _m_lock / _m_count. This preserves the historical mutex lock-word semantics for normal mutexes, avoids racy shared ownership state, and also avoids introducing extra TLS state that broke other.test_threadprofiler during CI.

Why not simpler alternatives

I intentionally did not take these approaches:

Reverting the 4.0.11 debug change entirely.
That would discard the intended "trap instead of deadlock" behavior that motivated the change.
Keeping _m_count as the owner field.
That fixes the first failure but causes the second one because _m_count is shared and racy.
Disabling assertions / forcing NDEBUG.
That works around the issue but does not fix the debug runtime.

The patch here is meant to be a real upstream fix, not a workaround.

Tests

This PR adds two browser regression tests in test/test_browser.py:

test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp
This captures the original allocator failure mode: debug -pthread + -sWASM_WORKERS plus trivial main/worker allocation traffic must stay clean.
test/wasm_worker/pthread_mutex_debug_reporting_teardown.cpp
This captures the stale-owner false-positive that appeared after the first fix was applied.

I also verified that the existing deadlock behavior is still intact via:

other.test_pthread_mutex_deadlock
other.test_threadprofiler

So the final state is:

original allocator regression fixed
follow-on stale-owner deadlock fixed
intended debug deadlock trapping for real same-thread relock preserved
existing --threadprofiler startup behavior preserved

Validation

The fix is validated with:

browser.test_wasm_worker_pthread_mutex_debug_allocator_regression
browser.test_wasm_worker_pthread_mutex_debug_reporting_teardown
other.test_pthread_mutex_deadlock
other.test_threadprofiler

…mscripten-core#26619

…re#26619

Copilot

Pull request overview

This PR fixes a debug-only regression in Emscripten’s musl pthread normal-mutex path under -pthread (notably affecting dlmalloc with -sWASM_WORKERS) by restoring historical _m_lock semantics while keeping same-thread relock deadlock trapping via debug-only thread-local ownership tracking. It also adds browser regression tests covering the allocator abort and the subsequent stale-owner deadlock false-positive.

Changes:

Restore PTHREAD_MUTEX_NORMAL debug behavior to keep _m_lock in the historical 0/EBUSY encoding while maintaining deadlock trapping.
Add debug-only thread-local tracking of held normal mutexes using _m_prev/_m_next, and use it for deadlock assertions.
Add two browser regression tests for the allocator regression and the reporting/teardown false-positive.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
`system/lib/libc/musl/src/thread/pthread_mutex_trylock.c`	Implements debug normal-mutex lock-word preservation and hooks normal-mutex ownership tracking.
`system/lib/libc/musl/src/thread/pthread_mutex_unlock.c`	Hooks normal-mutex unlock to remove from debug thread-local ownership tracking.
`system/lib/libc/musl/src/thread/pthread_mutex_timedlock.c`	Updates debug deadlock assertions to consult thread-local normal-mutex ownership list.
`system/lib/libc/musl/src/internal/pthread_impl.h`	Adds debug-only thread-local list + helpers for normal-mutex ownership tracking.
`test/test_browser.py`	Registers two new browser regression tests under the wasm worker suite.
`test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp`	New browser repro focused on allocator churn under `-pthread` + wasm workers.
`test/wasm_worker/pthread_mutex_debug_reporting_teardown.cpp`	New browser repro focused on the follow-on stale-owner deadlock assert.
`AUTHORS`	Adds contributor entry.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

system/lib/libc/musl/src/thread/pthread_mutex_trylock.c

test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp

test/wasm_worker/pthread_mutex_debug_reporting_teardown.cpp

…/emscripten into pthread-mutex-debug-fix

slowriot · 2026-04-05T15:26:05Z

It appears the failing test is failing on pip install on mac arm64 - doesn't seem to be related to this change set.

sbc100 · 2026-04-05T17:18:51Z

I'm not sure the root cause analysis here is correct/complete.

My guess is that the issue here is simply that pthead_mutex API was never really tested or guardnattes to work when called from a wasm worker. In particular IIUC the pthread_self() function always return NULL for all wasm workers, which is likely to confuse pretty much all pthread APIs. The fact that normal pthread_mutex lock happened to work before #24607 seems like more of a lucky coincidence perhaps?

In particular I don't under understand "The problem is that this changed the observable semantics of the mutex lock word for normal mutexes in debug builds". Surely the contents of the mutex word should be exterally observed by the user program? Its purely internal the pthread implementation right?

When you say "That broke dlmalloc under -pthread debug system-library builds".. I'm not sure I understand why it broke exactly? Do you know why it breaks in this one configuration? What exactly is the issue?

sbc100 · 2026-04-05T17:20:10Z

I think its rather just revert #24607 than add more complex emscripten patches to the internal pthread API. However, I think it might be better to try to make it so that calling pthead APIs from wasm workers is actually well defined.

slowriot · 2026-04-06T00:01:58Z

@sbc100 apologies if I've been unclear - I'm only becoming familiar with this code as I'm working through it, and learning more as I go. My understanding at this point is:

4.0.11 debug builds disable the old fast path for PTHREAD_MUTEX_NORMAL, so pthread_mutex_lock() always falls through to __pthread_mutex_timedlock() instead of doing the old direct a_cas(..., 0, EBUSY) acquire. See pthread_mutex_lock.c#L3 and pthread_mutex_timedlock.c#L62.
__pthread_mutex_timedlock() immediately calls __pthread_mutex_trylock(), which in this debug configuration always routes normal mutexes to __pthread_mutex_trylock_owner(). See pthread_mutex_timedlock.c#L74 and pthread_mutex_trylock.c#L95.
The first thing __pthread_mutex_trylock_owner() does is:
- pthread_t self = __pthread_self();
- int tid = self->tid;
  See pthread_mutex_trylock.c#L7. This is the first new UB point in 4.0.11.
On this target, __pthread_self() is just ((pthread_t)__get_tp()), and tid is a field inside struct pthread. See pthread_impl.h#L41 and pthread_impl.h#L175.
In the pthread runtime, __get_tp() returns the global thread_ptr, and __set_thread_state() writes that global. See emscripten_thread_state.S#L23 and emscripten_thread_state.S#L29.
In the mixed -pthread + -sWASM_WORKERS wasm-worker startup path, JS explicitly does ___set_thread_state(/*thread_ptr=*/0, ...). See libwasm_worker.js#L120. Source-wise, that means __get_tp() == 0 on that path, hence __pthread_self() == 0.
Therefore the new 4.0.11 slow path reads self->tid from address 0 + offsetof(struct pthread, tid), not from a real struct pthread. So this is where it all goes wrong - tid contains arbitrary garbage from memory.
The same invalid self is then used for more than just that initial load:
- robust-list reads/writes at pthread_mutex_trylock.c#L28
- clearing pending at pthread_mutex_trylock.c#L49
- storing m->_m_count = self->tid for debug owner tracking at pthread_mutex_trylock.c#L69
- deadlock checks against __pthread_self()->tid at pthread_mutex_timedlock.c#L90
dlmalloc is the hot internal caller that keeps hitting this path:
- allocator locks are pthread_mutex_t at dlmalloc.c#L2087
- PREACTION/POSTACTION call pthread_mutex_lock/unlock at dlmalloc.c#L2827
- malloc hot paths enter PREACTION(gm) at dlmalloc.c#L4658

The changes in this PR stop PTHREAD_MUTEX_NORMAL in debug builds from depending on __pthread_self()->tid at all. They restore the old owner-less 0/EBUSY lock-word behavior for normal mutexes, and move the debug deadlock check to separate per-thread bookkeeping instead of deriving ownership from the invalid wasm-worker pthread_t path.

The advantage over simply reverting, is that this change preserves the intended post-#24607 debug behaviour: same-thread relock of a normal mutex still traps instead of silently hanging forever. Simply reverting would discard that safety improvement.

slowriot added 3 commits April 3, 2026 21:25

preserve normal mutex lock-word semantics in debug builds, resolves e…

69a9565

…mscripten-core#26619

add unit tests

99793de

track debug normal mutex ownership per thread, resolves emscripten-co…

88e5fa2

…re#26619

Copilot AI review requested due to automatic review settings April 3, 2026 21:34

Copilot started reviewing on behalf of slowriot April 3, 2026 21:35 View session

slowriot changed the title ~~Fix~~ Fix pthreads normal-mutex debug implementation Apr 3, 2026

Copilot AI reviewed Apr 3, 2026

View reviewed changes

system/lib/libc/musl/src/thread/pthread_mutex_trylock.c Outdated Show resolved Hide resolved

test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp Show resolved Hide resolved

test/wasm_worker/pthread_mutex_debug_reporting_teardown.cpp Show resolved Hide resolved

slowriot and others added 6 commits April 3, 2026 23:00

clear robust_list pending before debug normal-mutex return

bd60bb0

rebaseline pthread codesize expectations

01cc266

store debug normal-mutex list on pthread struct

894731c

Merge branch 'main' into pthread-mutex-debug-fix

c57e7bc

rebaseline pthread struct info

2a934d5

Merge branch 'pthread-mutex-debug-fix' of https://github.com/slowriot…

8a55ca0

…/emscripten into pthread-mutex-debug-fix

Merge branch 'main' into pthread-mutex-debug-fix

b418a60

slowriot mentioned this pull request Apr 6, 2026

Regression 4.0.10 to 4.0.11: -pthread + -sWASM_WORKERS aborts on trivial allocation/deallocation workload #26619

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix pthreads normal-mutex debug implementation #26622

Fix pthreads normal-mutex debug implementation #26622
slowriot wants to merge 10 commits intoemscripten-core:mainfrom
slowriot:pthread-mutex-debug-fix

slowriot commented Apr 3, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slowriot commented Apr 5, 2026

Uh oh!

sbc100 commented Apr 5, 2026

Uh oh!

sbc100 commented Apr 5, 2026

Uh oh!

slowriot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

slowriot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Root Cause

Fix

Why not simpler alternatives

Tests

Validation

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slowriot commented Apr 5, 2026

Uh oh!

sbc100 commented Apr 5, 2026

Uh oh!

sbc100 commented Apr 5, 2026

Uh oh!

slowriot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

slowriot commented Apr 3, 2026 •

edited

Loading