Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
174 changes: 86 additions & 88 deletions docs/design/01-precise-futex-wakeups.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# Design Doc: Precise Futex Wakeups in Emscripten
# Design Doc: Precise Futex Wakeups

- **Status**: Draft
- **Bug**: https://github.com/emscripten-core/emscripten/issues/26633
Expand All @@ -24,7 +24,7 @@ CPU wakeups and increased latency for events.
## Non-Goals
- **Main Browser Thread**: Changes to the busy-wait loop in `futex_wait_main_browser_thread` are out of scope.
- **Direct Atomics Usage**: Threads that call `atomic.wait` directly (bypassing `emscripten_futex_wait`) will remain un-interruptible.
- **Wasm Workers**: Wasm Worker do not have a `pthread` structure, they are not covered by this design.
- **Wasm Workers**: Wasm Workers do not have a `pthread` structure, so they are not covered by this design.

## Proposed Design

Expand All @@ -38,103 +38,101 @@ As part of this design we will need to explicitly state that
application.

### 1. `struct pthread` Extensions
We will add the following fields to `struct pthread` (in
`system/lib/libc/musl/src/internal/pthread_impl.h`). All operations on these
fields must use `memory_order_seq_cst` to ensure the handshake is robust.
We will add a single atomic `wait_addr` field to `struct pthread` (in
`system/lib/libc/musl/src/internal/pthread_impl.h`).

```c
// The address the thread is currently waiting on in emscripten_futex_wait.
// NULL if the thread is not currently in a futex wait.
_Atomic(void*) waiting_on_address;

// A counter that is incremented every time the thread wakes up from a futex wait.
// Used by wakers to ensure the target thread has actually acknowledged the wake.
_Atomic(uint32_t) wait_counter;

// A bitmask of reasons why the thread was woken for a side-channel event.
_Atomic(uint32_t) wait_reasons;

#define WAIT_REASON_CANCEL (1 << 0)
#define WAIT_REASON_MAILBOX (1 << 1)
//
// This field encodes the state using the following bitmask:
// - NULL: Not waiting, no pending notification.
// - NOTIFY_BIT (0x1): Not waiting, but a notification was sent.
// - addr: Waiting on `addr`, no pending notification.
// - addr | NOTIFY_BIT: Waiting on `addr`, notification sent.
//
// Since futex addresses must be 4-byte aligned, the low bit is safe to use.
_Atomic uintptr_t wait_addr;

#define NOTIFY_BIT (1 << 0)
```

### 2. Waiter Logic (`emscripten_futex_wait`)
The waiter will follow this logic (using `SEQ_CST` for all atomic accesses):

1. **Pre-check**: Check `wait_reasons`. If non-zero, handle the reasons (e.g., process mailbox or handle cancellation).
2. **Publish**: Set `waiting_on_address = addr`.
3. **Counter Snapshot**: Read `current_counter = wait_counter`.
4. **Double-check**: This is critical to avoid the race where a reason was added just before `waiting_on_address` was set. If `wait_reasons` is now non-zero, clear `waiting_on_address` and go to step 1.
5. **Wait**: Call `ret = __builtin_wasm_memory_atomic_wait32(addr, val, timeout)`.
6. **Unpublish**:
- Set `waiting_on_address = NULL`.
- Atomically increment `wait_counter`.
7. **Post-check**: Check `wait_reasons`. If non-zero, handle the reasons.
8. **Return**: Return the result of the wait to the caller.
- If `ret == ATOMICS_WAIT_OK`, return `0`.
- If `ret == ATOMICS_WAIT_TIMED_OUT`, return `-ETIMEDOUT`.
- If `ret == ATOMICS_WAIT_NOT_EQUAL`, return `-EWOULDBLOCK`.

Note: We do **not** loop internally if `ret == ATOMICS_WAIT_OK`. Even if we suspect the wake was caused by a side-channel event, we must return to the user to avoid "swallowing" a simultaneous real application wake that might not have changed the memory value.
The waiter will follow this logic:

1. **Notification Loop**:
```c
uintptr_t expected_null = 0;
while (!atomic_compare_exchange_strong(&self->wait_addr, &expected_null, (uintptr_t)addr)) {
// If the CAS failed, it means NOTIFY_BIT was set by another thread.
assert(expected_null == NOTIFY_BIT);
// Let the notifier know that we received the wakeup notification by
// resetting wait_addr.
self->wait_addr = 0;
handle_wakeup(); // Process mailbox or handle cancellation
// Reset expected_null because CAS updates it to the observed value on failure.
expected_null = 0;
}
```
2. **Wait**: Call `ret = __builtin_wasm_memory_atomic_wait32(addr, val, timeout)`.
3. **Unpublish & Check**:
```c
// Clear wait_addr and check if a notification arrived while we were sleeping.
if ((atomic_exchange(&self->wait_addr, 0) & NOTIFY_BIT) != 0) {
handle_wakeup();
}
```
4. **Return**: Return the result of the wait.

Note: We do **not** loop internally if `ret == ATOMICS_WAIT_OK`. Even if we
suspect the wake was caused by a side-channel event, we must return to the user
to avoid "swallowing" a simultaneous real application wake.

### 3. Waker Logic
When a thread needs to wake another thread for a side-channel event (e.g., in `pthread_cancel` or `em_task_queue_enqueue`):

1. Atomically OR the appropriate bit into the target thread's `wait_reasons` (`SEQ_CST`).
2. Read `target_addr = target->waiting_on_address` (`SEQ_CST`).
3. If `target_addr` is not NULL:
- Read `start_c = target->wait_counter` (`SEQ_CST`).
- Enter a loop:
- Call `emscripten_futex_wake(target_addr, 1)`.
- Exit loop if `target->wait_counter != start_c` OR `target->waiting_on_address != target_addr`.
- **Yield**: Call `sched_yield()` (or a small sleep) to allow the target thread to proceed if it is currently being scheduled.
When a thread needs to wake another thread for a side-channel event:

1. **Enqueue Work**: Add the task to the target's mailbox or set the cancellation flag.
2. **Signal**:
```c
uintptr_t addr = atomic_fetch_or(&target->wait_addr, NOTIFY_BIT);
if (addr == 0 || (addr & NOTIFY_BIT) != 0) {
// Either the thread wasn't waiting (it will see NOTIFY_BIT later),
// or someone else is already in the process of notifying it.
return;
}
// We set the bit and are responsible for waking the target.
// The target is currently waiting on `addr`.
while (target->wait_addr == (addr | NOTIFY_BIT)) {
emscripten_futex_wake((void*)addr, INT_MAX);
sched_yield();
}
```

### 4. Handling the Race Condition
The "Lost Wakeup" race is handled by the combination of:
- The waiter double-checking `wait_reasons` after publishing its `waiting_on_address`.
- The waker looping `atomic.wake` until the waiter increments its `wait_counter`.

Even if the waker's first `atomic.wake` occurs after the waiter's double-check
but *before* the waiter actually enters the `atomic.wait` instruction, the waker
will continue to loop and call `atomic.wake` again. The subsequent call(s) will
successfully wake the waiter once it is actually sleeping.

Multiple wakers can safely call this logic simultaneously; they will all exit
the loop as soon as the waiter acknowledges the wake by incrementing the
counter.

### 5. Overlapping and Spurious Wakeups
The design must handle cases where "real" wakeups (triggered by the application) and "side-channel" wakeups (cancellation/mailbox) occur simultaneously.

1. **Spurious Wakeups for Other Threads**: If multiple threads are waiting on the same address (e.g., a shared mutex), a side-channel `atomic_wake(addr, 1)` targeted at Thread A might be delivered by the kernel to Thread B.
- **Thread B's response**: It will wake up, increment its `wait_counter`, see that its `wait_reasons` are empty, and return `0` to its caller.
- **Thread C (the waker)**: It will see that Thread A's `wait_counter` has *not* changed and `waiting_on_address` is still `addr`. It will therefore continue its loop and call `atomic_wake` again until Thread A is finally woken.
- **Result**: Thread B experiences a "spurious" wakeup. This is acceptable and expected behavior for futex-based synchronization.
2. **Handling Side-Channel Success**: If Thread A is woken by the side-channel, it handles the event and returns `0`. The user's code will typically see that its own synchronization condition is not yet met and immediately call `emscripten_futex_wait` again. This effectively "resumes" the wait from the user's perspective while having allowed the side-channel event to be processed.
3. **No Lost "Real" Wakeups**: By returning to the caller whenever `atomic.wait` returns `OK`, we ensure that we never miss or swallow a real application-level `atomic.wake`.

### 6. Counter Wrap-around
The `wait_counter` is a `uint32_t` and will wrap around to zero after $2^{32}$ wakeups. This is safe because:
1. **Impossibility of Racing**: For the waker to "miss" a wake-up due to wrap-around, the waiter would have to wake up and re-enter a sleep state exactly $2^{32}$ times in the very brief window between the waker's `atomic_wake` and its subsequent check of `wait_counter`. Even at extreme wakeup frequencies (e.g., 1 million per second), this would take over an hour.
2. **Address Change Check**: The waker loop also checks `target->waiting_on_address != target_addr`. If the waiter wakes up and either stops waiting or starts waiting on a *different* address, the waker will exit the loop regardless of the counter value.

### 6. Benefits
- **Lower Power Consumption**: Threads can sleep indefinitely (or for the full duration of a user-requested timeout) without periodic wakeups.
- **Lower Latency**: Mailbox events and cancellation requests are processed immediately rather than waiting for the next 1ms or 100ms tick.
- **Simpler Loop**: The complex logic for calculating remaining timeout slices in `emscripten_futex_wait` is removed.
The protocol handles the "Lost Wakeup" race by having the waker loop until the
waiter clears its `wait_addr`. If the waker sets the `NOTIFY_BIT` just before
the waiter enters `atomic.wait`, the `atomic_wake` will be delivered once the
waiter is asleep. If the waiter wakes up for any reason (timeout, real wake, or
side-channel wake), its `atomic_exchange` will satisfy the waker's loop
condition.

## Benefits

- **Lower Power Consumption**: Threads can sleep indefinitely (or for the full duration of a user-requested timeout) without periodic wakeups.
- **Lower Latency**: Mailbox events and cancellation requests are processed immediately rather than waiting for the next 1ms or 100ms tick.
- **Simpler Loop**: The complex logic for calculating remaining timeout slices in `emscripten_futex_wait` is removed.

## Alternatives Considered
- **Signal-based wakeups**: Not currently feasible in Wasm as signals are not
implemented in a way that can interrupt `atomic.wait`.
- **A single global "wake-up" address per thread**: This would require the
waiter to wait on *two* addresses simultaneously (the user's futex and its
own wakeup address), which `atomic.wait` does not support. The proposed
design works around this by having the waker use the *user's* futex address.
- **Signal-based wakeups**: Not currently feasible in Wasm as signals are not
implemented in a way that can interrupt `atomic.wait`.
- **A single global "wake-up" address per thread**: This would require the
waiter to wait on *two* addresses simultaneously (the user's futex and its
own wakeup address), which `atomic.wait` does not support. The proposed
design works around this by having the waker use the *user's* futex address.

## Security/Safety Considerations
- The `waiting_on_address` must be managed carefully to ensure wakers don't
call `atomic.wake` on stale addresses. The `wait_counter` and clearing the
address upon wake mitigate this.
- The waker loop should have a reasonable fallback (like a yield) to prevent a
busy-wait deadlock if the waiter is somehow prevented from waking up (though
`atomic.wait` is generally guaranteed to wake if `atomic.wake` is called).
- **The `wait_addr` must be managed carefully** to ensure wakers don't
call `atomic.wake` on stale addresses. Clearing the address upon wake
mitigates this.
- **The waker loop should have a reasonable fallback** (like a yield) to prevent a
busy-wait deadlock if the waiter is somehow prevented from waking up (though
`atomic.wait` is generally guaranteed to wake if `atomic.wake` is called).
19 changes: 15 additions & 4 deletions docs/design/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,19 @@
This directory contains design documents for emscripten features and major
changes/refactors.

We are experimenting with keeping these document here under source control with
We are experimenting with keeping these documents here under source control with
the hope that this will increase understandability of the codebase. This has
some advantages over doing all planning in Google Docs or GitHub issues. For
example, it allows us to track the history of designs and it allows them to be
searchable using standard tools like `git grep`.
some advantages over doing all our planning in Google Docs or GitHub issues.
For example, it allows us to track the history of designs and it allows them to
be searchable using standard tools like `git grep`.

## Document Format

Each document in this directory should be a markdown file. At the top of each
document should be a `Status` which can be either `Draft`, `Accepted`,
`Completed`.

When a document is marked as `Completed` it should also be updated such that
it is clear the work has been done, and is now in the past. For example,
phrases such as "The current behavior" should be replaced with "The previous
behavior".
Loading