Add docs/design directory along with one initial design doc. by sbc100 · Pull Request #26621 · emscripten-core/emscripten

sbc100 · 2026-04-03T18:29:41Z

No description provided.

sbc100 · 2026-04-06T16:02:35Z

I add some more context to the README. I didn't completely remove the reference to AI here, since that is one of the main motivations here, but I did re-phrase to make it clear its only part of the rationale.

kripken

lgtm in general, and also the first design doc seems generally good to me, though I'm not an expert on the locking stuff.

tlively · 2026-04-06T18:48:12Z

+The current implementation uses a 1ms wakeup interval for the main runtime
+thread and a 100ms interval for cancellable pthreads. This leads to unnecessary


This is going to be out of date once the new design lands. Should this be phrased as "the previous implementation" or edited out entirely to future-proof the doc?

I'm not really sure how to address this problem. The perhaps by updating these documents if/when the bugs get closed. i.e. perhaps when Draft is changed to Done or Complete?

tlively · 2026-04-06T18:51:23Z

+
+// A counter that is incremented every time the thread wakes up from a futex wait.
+// Used by wakers to ensure the target thread has actually acknowledged the wake.
+_Atomic(uint32_t) wait_counter;


Maybe this should be named wake_counter.

Its the waiter that increments it though. i.e. it counts how much times a given threads has called futex_wait.

Yes, but it increments upon waking. The current name suggests to me that it should be incremented when going to sleep, but in fact it is incremented when waking.

tlively · 2026-04-06T18:56:06Z

+1.  **Pre-check**: Check `wait_reasons`. If non-zero, handle the reasons (e.g., process mailbox or handle cancellation).
+2.  **Publish**: Set `waiting_on_address = addr`.
+3.  **Counter Snapshot**: Read `current_counter = wait_counter`.
+4.  **Double-check**: This is critical to avoid the race where a reason was added just before `waiting_on_address` was set. If `wait_reasons` is now non-zero, clear `waiting_on_address` and go to step 1.
+5.  **Wait**: Call `ret = __builtin_wasm_memory_atomic_wait32(addr, val, timeout)`.
+6.  **Unpublish**:
+    -   Set `waiting_on_address = NULL`.
+    -   Atomically increment `wait_counter`.
+7.  **Post-check**: Check `wait_reasons`. If non-zero, handle the reasons.
+8.  **Return**: Return the result of the wait to the caller.
+    -   If `ret == ATOMICS_WAIT_OK`, return `0`.
+    -   If `ret == ATOMICS_WAIT_TIMED_OUT`, return `-ETIMEDOUT`.
+    -   If `ret == ATOMICS_WAIT_NOT_EQUAL`, return `-EWOULDBLOCK`.


As a meta-comment, I'm not sure it's worth checking in a design doc if it is going to contain precise pseudocode listing like this. It might as well be real code.

tlively · 2026-04-06T18:57:47Z

+3.  If `target_addr` is not NULL:
+    -   Read `start_c = target->wait_counter` (`SEQ_CST`).
+    -   Enter a loop:
+        -   Call `emscripten_futex_wake(target_addr, 1)`.


Why wake just one waiter instead of all waiters at once? If all were woken at once, we could get rid of the wait_counter entirely, right? And maybe get rid of waiting_on_address as well?

I agree, I don't see any reason not to wake all waiters.

Regarding getting rid waiting_on_address .. I don't see how that is possible? If you need to wake a given thread you need to know what address its currently waiting on don't you?

Regarding getting rid of wait_counter, did you grok the reason why its actually needed? i.e. to fix the possible race between setting waiting_on_address and the actual atomic.wait instruction. Maybe you have some other idea about how to solve that race?

did you grok

Probably not :)

Ok, I got it. We need the waiting_on_address so we know what address to notify when we want to send a waiting thread a message. It's possible we could send the notification before the waiting thread actually goes to sleep, so to protect against that we notify it repeatedly until we observe it changing its wake count, at which point we know it will proceed to handle the notification.

But instead of a wake count, perhaps we could just have the waiting thread clear its wait_reasons when it reads them after waking up. Then the notifying thread can just observe the wait_reasons get cleared to know the target thread has woken up.

tlively · 2026-04-06T19:02:27Z

+### 5. Overlapping and Spurious Wakeups
+The design must handle cases where "real" wakeups (triggered by the application) and "side-channel" wakeups (cancellation/mailbox) occur simultaneously.
+
+1.  **Spurious Wakeups for Other Threads**: If multiple threads are waiting on the same address (e.g., a shared mutex), a side-channel `atomic_wake(addr, 1)` targeted at Thread A might be delivered by the kernel to Thread B.


There is no kernel in this case. Is this describing how native futexes work?

I guess "kernel" should be replaced by "host" here. Would that makes sense here. This sentence does make sense to me with that replacment.

Its referring to the fact that normal wakups can overlap with the side channel wakeups.

tlively · 2026-04-06T20:57:05Z

+
+1.  **Pre-check**: Check `wait_reasons`. If non-zero, handle the reasons (e.g., process mailbox or handle cancellation).
+2.  **Publish**: Set `waiting_on_address = addr`.
+3.  **Counter Snapshot**: Read `current_counter = wait_counter`.


This doesn't appear to be used for anything. (But it would be easier to be sure if looking at real code.)

tlively · 2026-04-06T20:58:04Z

+1.  **Pre-check**: Check `wait_reasons`. If non-zero, handle the reasons (e.g., process mailbox or handle cancellation).
+2.  **Publish**: Set `waiting_on_address = addr`.
+3.  **Counter Snapshot**: Read `current_counter = wait_counter`.
+4.  **Double-check**: This is critical to avoid the race where a reason was added just before `waiting_on_address` was set. If `wait_reasons` is now non-zero, clear `waiting_on_address` and go to step 1.


This could be slightly simplified by using the low bits of waiting_on_address as the wait_reason, so both could atomically checked and updated in a CAS loop.

Actually I think we can just remove wait_reasons completely.

That was something that Gemini added rather than me, and I don't think its actually safe since we could have multiple wakers trying to wake a sleeping thread for different reasons.

This is also why the counter is needed since it removes the multiple-wakers problem.

tlively · 2026-04-06T21:00:52Z

+3.  If `target_addr` is not NULL:
+    -   Read `start_c = target->wait_counter` (`SEQ_CST`).
+    -   Enter a loop:
+        -   Call `emscripten_futex_wake(target_addr, 1)`.


Ok, I got it. We need the waiting_on_address so we know what address to notify when we want to send a waiting thread a message. It's possible we could send the notification before the waiting thread actually goes to sleep, so to protect against that we notify it repeatedly until we observe it changing its wake count, at which point we know it will proceed to handle the notification.

But instead of a wake count, perhaps we could just have the waiting thread clear its wait_reasons when it reads them after waking up. Then the notifying thread can just observe the wait_reasons get cleared to know the target thread has woken up.

tlively · 2026-04-06T22:26:01Z

Proposed simpler protocol using a bit instead of a counter:

// Side-channel notify a target thread after enqueuing work.
addr = fetch_or(&target->wait_addr, NOTIFY_BIT);
if (addr == NULL) {
  // The thread is not waiting. We can leave the notify bit set.
  return;
}
if ((addr & NOTIFY_BIT) != 0) {
  // Some other thread is already notifying this target.
  return;
}
// We set the bit and are responsible for notifying the target. 
while (atomic_load(&target->wait_addr) == (addr | NOTIFY_BIT)) {
  // The target is not awake yet.
  notify_all(addr)
  sched_yield()
}

// Futex
while (!cas_weak(&self->wait_addr, NULL, addr)) {
  // Handle notifications
  atomic_store(&target->wait_addr, NULL)
  read_mailbox()
}
// From here, anyone trying to side-channel notify us will keep trying until they observe us wake up.
ret = atomic_wait(addr, expected, timeout)
if (atomic_exchange(&self->wait_addr, NULL) != addr) {
  // Handle notifications received while sleeping.
  read_mailbox();
}
return ret;

sbc100 · 2026-04-06T23:36:18Z

I like it!

sbc100 requested review from kripken and tlively April 3, 2026 18:32

sbc100 force-pushed the design_docs branch 3 times, most recently from 18b8ce5 to b08e859 Compare April 6, 2026 16:01

kripken reviewed Apr 6, 2026

View reviewed changes

Comment thread docs/design/01-precise-futex-wakeups.md Outdated

Comment thread docs/design/README.md Outdated

Comment thread docs/design/README.md Outdated

sbc100 force-pushed the design_docs branch 2 times, most recently from f0eefd3 to bb72808 Compare April 6, 2026 18:35

sbc100 requested a review from kripken April 6, 2026 18:35

Add docs/design directory along with one initial design doc.

d504a60

sbc100 force-pushed the design_docs branch from bb72808 to d504a60 Compare April 6, 2026 18:45

kripken approved these changes Apr 6, 2026

View reviewed changes

sbc100 merged commit ecb9e39 into emscripten-core:main Apr 6, 2026
8 of 20 checks passed

sbc100 deleted the design_docs branch April 6, 2026 18:46

tlively reviewed Apr 6, 2026

View reviewed changes

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 6, 2026

Design doc feedback from emscripten-core#26621. NFC

9d00f95

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 6, 2026

Design doc feedback from emscripten-core#26621. NFC

e5fc329

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 6, 2026

Design doc feedback from emscripten-core#26621. NFC

bdf2483

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 6, 2026

Design doc feedback from emscripten-core#26621. NFC

af86ae5

tlively reviewed Apr 6, 2026

View reviewed changes

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 7, 2026

Design doc feedback from emscripten-core#26621. NFC

c7aef85

sbc100 mentioned this pull request Apr 7, 2026

Design doc feedback from #26621. NFC #26636

Merged

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 7, 2026

Design doc feedback from emscripten-core#26621. NFC

5ba12da

sbc100 added a commit to sbc100/emscripten that referenced this pull request Apr 7, 2026

Design doc feedback from emscripten-core#26621. NFC

6c31932

sbc100 added a commit that referenced this pull request Apr 7, 2026

Design doc feedback from #26621. NFC (#26636)

426fb2d

dschuff mentioned this pull request Apr 28, 2026

Update test_duration in parallel testsuite #26804

Open

		The current implementation uses a 1ms wakeup interval for the main runtime
		thread and a 100ms interval for cancellable pthreads. This leads to unnecessary

Conversation

sbc100 commented Apr 3, 2026

Uh oh!

sbc100 commented Apr 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlively commented Apr 6, 2026

Uh oh!

sbc100 commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants