Skip to content

Commit 12cb1b4

Browse files
committed
Rebase CABI onto explicit stack-switching interface (no behavior change)
1 parent 20db3d8 commit 12cb1b4

File tree

4 files changed

+885
-667
lines changed

4 files changed

+885
-667
lines changed

design/mvp/CanonicalABI.md

Lines changed: 192 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ specified here.
1616
* [Component Instance State](#component-instance-state)
1717
* [Table State](#table-state)
1818
* [Resource State](#resource-state)
19+
* [Stack Switching](#stack-switching)
1920
* [Thread State](#thread-state)
2021
* [Waitable State](#waitable-state)
2122
* [Task State](#task-state)
@@ -587,69 +588,200 @@ class ResourceType(Type):
587588
```
588589

589590

590-
#### Thread State
591+
#### Stack Switching
591592

592-
As described in the [concurrency explainer], threads are created both
593-
*implicitly*, when calling a component export (in `canon_lift` below), and
594-
*explicitly*, when core wasm code calls the `thread.new-indirect` built-in (in
595-
`canon_thread_new_indirect` below). Threads are represented here by the
596-
`Thread` class and the [current thread] is represented by explicitly threading
597-
a reference to a `Thread` through all Core WebAssembly calls so that the
598-
`thread` parameter always points to "the current thread". The `Thread` class
599-
provides a set of primitive control-flow operations that are used by the rest
600-
of the Canonical ABI definitions.
601-
602-
While `Thread`s are semantically created for each component export call by the
603-
Python `canon_lift` code, an optimizing runtime should be able to allocate
604-
`Thread`s lazily, only when needed for actual thread switching operations,
605-
thereby avoiding cross-component call overhead for simple, short-running
606-
cross-component calls. To assist in this optimization, `Thread`s are put into
607-
their own per-component-instance `threads` table so that thread table indices
608-
and elements can be more-readily reused between calls without interference from
609-
the other kinds of handles.
610-
611-
`Thread` is implemented using the Python standard library's [`threading`]
612-
module. While a Python [`threading.Thread`] is a preemptively-scheduled [kernel
613-
thread], it is coerced to behave like a cooperatively-scheduled [fiber] by
614-
careful use of [`threading.Lock`]. If Python had built-in fibers (or algebraic
615-
effects), those could have been used instead since all that's needed is the
616-
ability to switch stacks. In any case, the use of `threading.Thread` is
617-
encapsulated by the `Thread` class so that the rest of the Canonical ABI can
618-
simply use `suspend`, `resume`, etc.
619-
620-
When a `Thread` is suspended and then resumed, it receives a `Cancelled`
621-
value indicating whether the caller has cooperatively requested that the thread
622-
cancel itself which is communicated to Core WebAssembly with the following
623-
integer values:
593+
Component Model concurrency support is defined in terms of the Core WebAssembly
594+
[stack-switching] proposal's `cont.new`, `resume` and `suspend` instructions.
595+
However, the Component Model only needs a limited subset of the full
596+
[stack-switching] proposal:
597+
598+
First, there are only two global [control tags] used with `suspend`:
599+
```wat
600+
(tag $block (param $switch-to (ref null $Thread)) (result $cancelled bool))
601+
(tag $current-thread (result (ref $Thread)))
602+
```
603+
The `$block` tag is used to suspend a [thread] until some future event. The
604+
parameters and results will be described in the next section, where they are
605+
used to define `Thread`. The `$current-thread` tag is used to retrieve the
606+
current `Thread`, which is semantically stored in the `resume` handler's local
607+
state (although an optimizing implementation would instead maintain the current
608+
thread in the VM's execution context so that it could be retrieved with a single
609+
load and/or kept in register state).
610+
611+
Second, there is only a single continuation type used with `resume`:
612+
```wat
613+
(type $ct (cont (func (param $cancelled bool) (result (ref null $Thread)))))
614+
```
615+
Thus, continuations are only produced for the `$block` event; the continuation
616+
produced for `$current-thread` is immediately resumed and so never "escapes".
617+
618+
Third, *every* `resume` performed by the Canonical ABI always handles *both*
619+
`$block` and `$current-thread` and thus every Canonical ABI `suspend` is always
620+
handled by the innermost Canonical ABI `resume` without a dynamic handler/tag
621+
search.
622+
623+
Given these restrictions, specialized versions of `cont.new`, `resume` and
624+
`suspend` that are "monomorphized" to the above types and tags can be easily
625+
implemented in terms of Python's standard preemptive threading primitives, using
626+
[`threading.Thread`] to provide a native stack, [`threading.Lock`] to only allow
627+
a single `threading.Thread` to execute at a time, and [`threading.local`] to
628+
maintain the dynamic handler scope using thread-local storage. This could have
629+
been implemented more directly and efficiently using [fibers], but the Python
630+
standard library doesn't have fibers. However, a realistic implementation is
631+
expected to use (a pool of) fibers.
632+
633+
Starting with `cont.new`, the monomorphized version takes a function type
634+
matching `$ct` above:
624635
```python
625636
class Cancelled(IntEnum):
626637
FALSE = 0
627638
TRUE = 1
628-
```
639+
640+
class Continuation:
641+
lock: threading.Lock
642+
handler: Handler
643+
cancelled: Cancelled
644+
645+
class Handler:
646+
tls = threading.local()
647+
lock: threading.Lock
648+
current: Thread
649+
cont: Optional[Continuation]
650+
switch_to: Optional[Thread]
651+
652+
def cont_new(f: Callable[[Cancelled], Optional[Thread]]) -> Continuation:
653+
cont = Continuation()
654+
cont.lock = threading.Lock()
655+
cont.lock.acquire()
656+
def wrapper():
657+
cont.lock.acquire()
658+
Handler.tls.value = cont.handler
659+
f(cont.cancelled)
660+
handler = Handler.tls.value
661+
handler.cont = None
662+
handler.switch_to = switch_to
663+
handler.lock.release()
664+
threading.Thread(target = wrapper).start()
665+
return cont
666+
```
667+
Here, `Continuation` is used to pass parameters from `resume` to the
668+
continuation's thread. These parameters are set on `Continuation` right before
669+
`resume` calls `Continuation.lock.release()` to transfer control flow to the
670+
continuation. The `Handler` object is created by `resume` with the expectation
671+
that `Handler.lock.release()` will be called to transfer control flow and
672+
results back to `resume` handler. The `Handler` is stored in the thread-local
673+
storage of the internal `threading.Thread` to implement the dynamic scoping
674+
required by stack-switching. Because a single `threading.Thread` can be
675+
suspended and resumed many times (each time with a new `Continuation` /
676+
`Handler`), the `Handler` must be re-loaded from TLS after `f` returns since it
677+
may have changed.
678+
679+
Next, `resume` is monomorphized to take a continuation of type `$ct`, the
680+
`cancelled` argument passed to `$ct` and, lastly, the `current` `Thread` which
681+
is to be immediately returned by the `(on $current-thread)` handler. The
682+
`(on $block)` and "returned without suspending" cases are merged into a single
683+
return value, where the latter "returned without suspended" case produces
684+
`None` for the returned `Optional[Continuation]`.
685+
```python
686+
def resume(cont: Continuation, cancelled: Cancelled, current: Thread) -> \
687+
tuple[Optional[Continuation], Optional[Thread]]:
688+
handler = Handler()
689+
handler.lock = threading.Lock()
690+
handler.lock.acquire()
691+
handler.current = current
692+
cont.handler = handler
693+
cont.cancelled = cancelled
694+
cont.lock.release()
695+
handler.lock.acquire()
696+
return (handler.cont, handler.switch_to)
697+
```
698+
699+
Lastly, `suspend` is monomorphized into 2 functions for the `$block` and
700+
`$current-thread` tags shown above, so that their signatures and implementations
701+
can be specialized. Since `$current-thread` has a trivial handler that simply
702+
returns the `current` `Thread` passed to `resume`, it can simply return
703+
`Handler.current` directly without any stack switching.
704+
```python
705+
def block(switch_to: Optional[Thread]) -> Cancelled:
706+
cont = Continuation()
707+
cont.lock = threading.Lock()
708+
cont.lock.acquire()
709+
handler = Handler.tls.value
710+
handler.cont = cont
711+
handler.switch_to = switch_to
712+
handler.lock.release()
713+
cont.lock.acquire()
714+
Handler.tls.value = cont.handler
715+
return cont.cancelled
716+
717+
def current_thread() -> Thread:
718+
return Handler.tls.value.current
719+
```
720+
721+
In the future, when Core WebAssembly gets [stack-switching], the Component Model
722+
`$block` and `$current-thread` tags would not be exposed to Core WebAssembly.
723+
Thus, an optimizing implementation would continue to be able to implement
724+
`block()` as a direct control flow transfer to the innermost `resume()` and
725+
`current_thread()` via implicit context, both without an O(n) handler-stack tag
726+
search. In particular, this avoids the pathological O(N<sup>2</sup>) behavior
727+
which would otherwise arise if Component Model cooperative threads were used in
728+
conjunction with deeply-nested Core WebAssembly handlers.
729+
730+
Additionally, once Core WebAssembly has stack switching, any unhandled events
731+
that originate in Core WebAssembly would turn into traps if they reach a
732+
component boundary (just like unhandled exceptions do now; see
733+
`call_and_trap_on_throw` below). Thus, all cross-component/cross-language stack
734+
switching would continue to be mediated by the Component Model's types and
735+
Canonical ABI, with Core WebAssembly stack-switching used to implement
736+
intra-component concurrency according to the language's own internal ABI, which
737+
can be different inside each component.
738+
739+
740+
#### Thread State
741+
742+
As described in the [concurrency explainer], threads are created both
743+
*implicitly*, when calling a component export (in `canon_lift` below), and
744+
*explicitly*, when core wasm code calls the `thread.new-indirect` built-in (in
745+
`canon_thread_new_indirect` below). While threads are *logically* created for
746+
each component export call, an optimizing runtime should be able to allocate
747+
threads lazily when needed for actual thread switching operations, thereby
748+
avoiding cross-component call overhead for simple, short-running cross-component
749+
calls. To assist in this optimization, threads are put into their own
750+
`ComponentInstance.threads` table to reduce interference from the other kinds of
751+
handles.
752+
753+
Threads are represented in the Canonical ABI by the `Thread` class defined in
754+
this section. The `Thread` class is implemented in terms of the `cont_new`,
755+
`resume`, `block` and `current_thread` stack-switching primitives defined in the
756+
previous section. `Thread` defines a set of higher-level concurrency operations
757+
that are used by all the other Canonical ABI definitions. In particular, a
758+
"thread" adds the higher-level concepts of:
759+
* [waiting on external I/O]
760+
* [async call stack]
761+
* [thread index]
762+
* [thread-local storage]
763+
* [cancellation]
629764

630765
Introducing the `Thread` class in chunks, a `Thread` has the following fields
631766
and can be in one of the following 3 states based on these fields:
632-
* `running`: actively executing with a "parent" thread that is waiting
633-
to run once the `running` thread suspends or returns
634-
* `suspended`: waiting to be `resume`d by another thread
635-
* `waiting`: waiting to be `resume`d by `Store.tick` once `ready`
767+
* `running`: actively executing on the stack
768+
* `suspended`: waiting to be resumed by some other thread `running` in
769+
the same component instance (via its `index`)
770+
* `pending`: waiting to to be resumed by the host (in `Store.tick` once `ready`
636771

637772
```python
638773
class Thread:
639-
task: Task
640-
fiber: threading.Thread
641-
fiber_lock: threading.Lock
642-
parent_lock: Optional[threading.Lock]
774+
cont: Optional[Continuation]
643775
ready_func: Optional[Callable[[], bool]]
644-
cancellable: bool
645-
cancelled: Cancelled
776+
task: Task
646777
index: Optional[int]
647778
context: list[int]
779+
cancellable: bool
648780

649781
CONTEXT_LENGTH = 2
650782

651783
def running(self):
652-
return self.parent_lock is not None
784+
return self.cont is None
653785

654786
def suspended(self):
655787
return not self.running() and self.ready_func is None
@@ -3494,10 +3626,11 @@ optimization to avoid allocating stacks for async languages that have avoided
34943626
the need for stackful coroutines by design (e.g., `async`/`await` in JS,
34953627
Python, C# and Rust).
34963628

3497-
Uncaught Core WebAssembly [exceptions] result in a trap at component
3498-
boundaries. Thus, if a component wishes to signal an error, it must use some
3499-
sort of explicit type such as `result` (whose `error` case particular language
3500-
bindings may choose to map to and from exceptions):
3629+
Uncaught Core WebAssembly [exceptions] or, in a future with [stack-switching],
3630+
unhandled events, result in a trap at component boundaries. Thus, if a component
3631+
wishes to signal an error, it must use some sort of explicit type such as
3632+
`result` (whose `error` case particular language bindings may choose to map to
3633+
and from exceptions):
35013634
```python
35023635
def call_and_trap_on_throw(callee, thread, args):
35033636
try:
@@ -4981,16 +5114,22 @@ def canon_thread_available_parallelism():
49815114
[Shared-Everything Dynamic Linking]: examples/SharedEverythingDynamicLinking.md
49825115
[Concurrency Explainer]: Concurrency.md
49835116
[Suspended]: Concurrency#thread-built-ins
5117+
[Thread Index]: Concurrency#thread-built-ins
5118+
[Async Call Stack]: Concurrency.md#subtasks-and-supertasks
49845119
[Structured Concurrency]: Concurrency.md#subtasks-and-supertasks
49855120
[Recursive Reentrance]: Concurrency.md#subtasks-and-supertasks
49865121
[Backpressure]: Concurrency.md#backpressure
5122+
[Thread]: Concurrency.md#threads-and-tasks
5123+
[Threads]: Concurrency.md#threads-and-tasks
49875124
[Current Thread]: Concurrency.md#current-thread-and-task
49885125
[Current Task]: Concurrency.md#current-thread-and-task
49895126
[Block]: Concurrency.md#blocking
5127+
[Waiting on External I/O]: Concurrency.md#blocking
49905128
[Subtasks]: Concurrency.md#subtasks-and-supertasks
49915129
[Readable and Writable Ends]: Concurrency.md#streams-and-futures
49925130
[Readable or Writable End]: Concurrency.md#streams-and-futures
49935131
[Thread-Local Storage]: Concurrency.md#thread-local-storage
5132+
[Cancellation]: Concurrency.md#cancellation
49945133
[Subtask State Machine]: Concurrency.md#cancellation
49955134
[Stream Readiness]: Concurrency.md#stream-readiness
49965135

@@ -5013,6 +5152,7 @@ def canon_thread_available_parallelism():
50135152
[WASI]: https://github.com/webassembly/wasi
50145153
[Deterministic Profile]: https://github.com/WebAssembly/profiles/blob/main/proposals/profiles/Overview.md
50155154
[stack-switching]: https://github.com/WebAssembly/stack-switching
5155+
[Control Tags]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Explainer.md#declaring-control-tags
50165156
[`memaddr`]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-memaddr
50175157
[`memaddrs` table]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-moduleinst
50185158
[`memidx`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-memidx
@@ -5028,8 +5168,7 @@ def canon_thread_available_parallelism():
50285168
[Code Units]: https://www.unicode.org/glossary/#code_unit
50295169
[Surrogate]: https://unicode.org/faq/utf_bom.html#utf16-2
50305170
[Name Mangling]: https://en.wikipedia.org/wiki/Name_mangling
5031-
[Kernel Thread]: https://en.wikipedia.org/wiki/Thread_(computing)#kernel_thread
5032-
[Fiber]: https://en.wikipedia.org/wiki/Fiber_(computer_science)
5171+
[Fibers]: https://en.wikipedia.org/wiki/Fiber_(computer_science)
50335172
[Asyncify]: https://emscripten.org/docs/porting/asyncify.html
50345173

50355174
[`import_name`]: https://clang.llvm.org/docs/AttributeReference.html#import-name
@@ -5040,7 +5179,8 @@ def canon_thread_available_parallelism():
50405179

50415180
[`threading`]: https://docs.python.org/3/library/threading.html
50425181
[`threading.Thread`]: https://docs.python.org/3/library/threading.html#thread-objects
5043-
[`threading.Lock`]: https://docs.python.org/3/library/threading.html#lock-objects
5182+
[`threading.Lock`]: https://docs.python.org/3/library/threading.html#lock-objects
5183+
[`threading.local`]: https://docs.python.org/3/library/threading.html#thread-local-data
50445184

50455185
[OIO]: https://en.wikipedia.org/wiki/Overlapped_I/O
50465186
[io_uring]: https://en.wikipedia.org/wiki/Io_uring

design/mvp/Concurrency.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,7 @@ feature is necessary in any case (due to iloops and traps).
348348

349349
### Current Thread and Task
350350

351+
TODO
351352
At any point in time while executing Core WebAssembly code or a [canonical
352353
built-in] called by Core WebAssembly code, there is a well-defined **current
353354
thread** whose containing task is the **current task**. The "current thread" is

0 commit comments

Comments
 (0)