Skip to content

Commit e408544

Browse files
committed
Rebase CABI onto explicit stack-switching interface (no behavior change)
1 parent 20db3d8 commit e408544

File tree

4 files changed

+895
-667
lines changed

4 files changed

+895
-667
lines changed

design/mvp/CanonicalABI.md

Lines changed: 198 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ specified here.
1616
* [Component Instance State](#component-instance-state)
1717
* [Table State](#table-state)
1818
* [Resource State](#resource-state)
19+
* [Stack Switching](#stack-switching)
1920
* [Thread State](#thread-state)
2021
* [Waitable State](#waitable-state)
2122
* [Task State](#task-state)
@@ -587,69 +588,206 @@ class ResourceType(Type):
587588
```
588589

589590

590-
#### Thread State
591+
#### Stack Switching
591592

592-
As described in the [concurrency explainer], threads are created both
593-
*implicitly*, when calling a component export (in `canon_lift` below), and
594-
*explicitly*, when core wasm code calls the `thread.new-indirect` built-in (in
595-
`canon_thread_new_indirect` below). Threads are represented here by the
596-
`Thread` class and the [current thread] is represented by explicitly threading
597-
a reference to a `Thread` through all Core WebAssembly calls so that the
598-
`thread` parameter always points to "the current thread". The `Thread` class
599-
provides a set of primitive control-flow operations that are used by the rest
600-
of the Canonical ABI definitions.
601-
602-
While `Thread`s are semantically created for each component export call by the
603-
Python `canon_lift` code, an optimizing runtime should be able to allocate
604-
`Thread`s lazily, only when needed for actual thread switching operations,
605-
thereby avoiding cross-component call overhead for simple, short-running
606-
cross-component calls. To assist in this optimization, `Thread`s are put into
607-
their own per-component-instance `threads` table so that thread table indices
608-
and elements can be more-readily reused between calls without interference from
609-
the other kinds of handles.
610-
611-
`Thread` is implemented using the Python standard library's [`threading`]
612-
module. While a Python [`threading.Thread`] is a preemptively-scheduled [kernel
613-
thread], it is coerced to behave like a cooperatively-scheduled [fiber] by
614-
careful use of [`threading.Lock`]. If Python had built-in fibers (or algebraic
615-
effects), those could have been used instead since all that's needed is the
616-
ability to switch stacks. In any case, the use of `threading.Thread` is
617-
encapsulated by the `Thread` class so that the rest of the Canonical ABI can
618-
simply use `suspend`, `resume`, etc.
619-
620-
When a `Thread` is suspended and then resumed, it receives a `Cancelled`
621-
value indicating whether the caller has cooperatively requested that the thread
622-
cancel itself which is communicated to Core WebAssembly with the following
623-
integer values:
593+
Component Model concurrency is defined in terms of the Core WebAssembly
594+
[stack-switching] proposal's `cont.new`, `resume` and `suspend` instructions so
595+
that there is a clear composition story between Component Model and Core
596+
WebAssembly concurrency in the future. Since Python does not natively provide
597+
algebraic effects, `cont.new`, `resume` and `suspend` are implemented in this
598+
section in terms of other Python primitives. Since the Component Model only
599+
needs a limited subset of the full expressivity of stack-switching, only that
600+
subset is implemented, which simplifies things. In particular, the
601+
Component Model uses stack-switching in the following restricted manner:
602+
603+
First, there are only two global [control tags] used with `suspend`:
604+
```wat
605+
(tag $block (param $switch-to (ref null $Thread)) (result $cancelled bool))
606+
(tag $current-thread (result (ref $Thread)))
607+
```
608+
The `$block` tag is used to suspend a [thread] until some future event. The
609+
parameters and results will be described in the next section, where they are
610+
used to define `Thread`. The `$current-thread` tag is used to retrieve the
611+
current `Thread`, which is semantically stored in the `resume` handler's local
612+
state (although an optimizing implementation would instead maintain the current
613+
thread in the VM's execution context so that it could be retrieved with a single
614+
load and/or kept in register state).
615+
616+
Second, there is only a single type of continuation passed to `resume`:
617+
```wat
618+
(type $ct (cont (func (param $cancelled bool) (result (ref null $Thread)))))
619+
```
620+
Thus, continuations are only produced for the `$block` event; the
621+
`$current-thread` continuations are immediately resumed and never "escape".
622+
623+
Third, *every* `resume` performed by the Canonical ABI always handles *both*
624+
`$block` and `$current-thread` and *every* Canonical ABI `suspend` is, by
625+
construction, always scoped by a Canonical ABI `resume`. Thus, every Canonical
626+
ABI `suspend` unconditionally transfers control flow directly to the innermost
627+
enclosing Canonical ABI `resume` without a general handler/tag search.
628+
629+
Given this restricted usage, specialized versions of `cont.new`, `resume` and
630+
`suspend` that are "monomorphized" to the above types and tags can be easily
631+
implemented in terms of Python's standard preemptive threading primitives, using
632+
[`threading.Thread`] to provide a native stack, [`threading.Lock`] to only allow
633+
a single `threading.Thread` to execute at a time, and [`threading.local`] to
634+
maintain the dynamic handler scope using thread-local storage. This could have
635+
been implemented more directly and efficiently using [fibers], but the Python
636+
standard library doesn't have fibers. However, a realistic implementation is
637+
expected to use (a pool of) fibers.
638+
639+
Starting with `cont.new`, the monomorphized version takes a function type
640+
matching `$ct`, as defined above:
624641
```python
625642
class Cancelled(IntEnum):
626643
FALSE = 0
627644
TRUE = 1
628-
```
645+
646+
class Continuation:
647+
lock: threading.Lock
648+
handler: Handler
649+
block_arg: Cancelled
650+
651+
class Handler:
652+
thread_local = threading.local()
653+
lock: threading.Lock
654+
current: Thread
655+
cont: Optional[Continuation]
656+
block_result: Optional[Thread]
657+
658+
def new_already_acquired_lock() -> threading.Lock:
659+
lock = threading.Lock()
660+
lock.acquire()
661+
return lock
662+
663+
def cont_new(f: Callable[[Cancelled], Optional[Thread]]) -> Continuation:
664+
cont = Continuation()
665+
cont.lock = new_already_acquired_lock()
666+
def wrapper():
667+
cont.lock.acquire()
668+
Handler.thread_local.value = cont.handler
669+
block_result = f(cont.block_arg)
670+
handler = Handler.thread_local.value
671+
handler.cont = None
672+
handler.block_result = block_result
673+
handler.lock.release()
674+
threading.Thread(target = wrapper).start()
675+
return cont
676+
```
677+
`Continuation.block_arg` and `Continuation.handler` are set by `resume` right
678+
before `resume` calls `Continuation.lock.release()` to transfer control flow to
679+
the continuation. After resuming the continuation, `resume` calls
680+
`Handler.lock.acquire()` to wait until the continuation signals suspension or
681+
return by calling `Handler.lock.release()`. The `Handler` is stored in the
682+
thread-local variable `Handler.thread_local.value` to implement the dynamic
683+
scoping that is needed by `suspend`. Because the thread created by `cont_new`
684+
can be suspended and resumed many times (each time with a new `Continuation` and
685+
`Handler`, resp.), `Handler` must be re-loaded from `Handler.thread_local.value`
686+
after `f` returns since it may have changed since the initial `resume`.
687+
688+
Next, `resume` is monomorphized to take: a continuation of type `$ct`, the
689+
argument to pass to the continuation, and the `current` `Thread` to use for
690+
`resume`'s `(on $current-thread)` handler. The remaining `(on $block)` and
691+
"returned" cases are merged to return a single value, with the `(on $block)`
692+
case returning a `Continuation` and the "returned" case returning `None`:
693+
```python
694+
def resume(cont: Continuation, block_arg: Cancelled, current: Thread) -> \
695+
tuple[Optional[Continuation], Optional[Thread]]:
696+
handler = Handler()
697+
handler.lock = new_already_acquired_lock()
698+
handler.current = current
699+
cont.handler = handler
700+
cont.block_arg = block_arg
701+
cont.lock.release()
702+
handler.lock.acquire()
703+
return (handler.cont, handler.block_result)
704+
```
705+
706+
Lastly, `suspend` is monomorphized into 2 functions for the `$block` and
707+
`$current-thread` tags shown above. Since `$current-thread` has a trivial
708+
handler that immediately `resume`s with the `current` `Thread` passed to
709+
`resume` (in a loop), it can simply return `Handler.current` without any stack
710+
switching.
711+
```python
712+
def block(block_result: Optional[Thread]) -> Cancelled:
713+
cont = Continuation()
714+
cont.lock = new_already_acquired_lock()
715+
handler = Handler.thread_local.value
716+
handler.cont = cont
717+
handler.block_result = block_result
718+
handler.lock.release()
719+
cont.lock.acquire()
720+
Handler.thread_local.value = cont.handler
721+
return cont.block_arg
722+
723+
def current_thread() -> Thread:
724+
return Handler.thread_local.value.current
725+
```
726+
727+
In the future, when Core WebAssembly gets [stack-switching], the Component Model
728+
`$block` and `$current-thread` tags would not be exposed to Core WebAssembly.
729+
Thus, an optimizing implementation would continue to be able to implement
730+
`block()` as a direct control flow transfer and `current_thread()` as implicit
731+
execution context, both without a general handler/tag search. In particular,
732+
this avoids the pathological O(N<sup>2</sup>) behavior which would otherwise
733+
arise if Component Model cooperative threads were used in conjunction with
734+
deeply-nested Core WebAssembly handlers.
735+
736+
Additionally, once Core WebAssembly has stack switching, any unhandled events
737+
that originate in Core WebAssembly would turn into traps if they reach a
738+
component boundary (just like unhandled exceptions do now; see
739+
`call_and_trap_on_throw` below). Thus, all cross-component/cross-language stack
740+
switching would continue to be mediated by the Component Model's types and
741+
Canonical ABI, with Core WebAssembly stack-switching used to implement
742+
intra-component concurrency according to the language's own internal ABI, which
743+
can be different in each component.
744+
745+
746+
#### Thread State
747+
748+
As described in the [concurrency explainer], threads are created both
749+
*implicitly*, when calling a component export (in `canon_lift` below), and
750+
*explicitly*, when core wasm code calls the `thread.new-indirect` built-in (in
751+
`canon_thread_new_indirect` below). While threads are *logically* created for
752+
each component export call, an optimizing runtime should be able to allocate
753+
threads lazily when needed for actual thread switching operations, thereby
754+
avoiding cross-component call overhead for simple, short-running cross-component
755+
calls. To assist in this optimization, threads are put into their own
756+
`ComponentInstance.threads` table to reduce interference from the other kinds of
757+
handles.
758+
759+
Threads are represented in the Canonical ABI by the `Thread` class defined in
760+
this section. The `Thread` class is implemented in terms of the `cont_new`,
761+
`resume`, `block` and `current_thread` stack-switching primitives defined in the
762+
previous section. `Thread` defines a set of higher-level concurrency operations
763+
that are used by all the other Canonical ABI definitions. In particular, a
764+
"thread" adds the higher-level concepts of:
765+
* [waiting on external I/O]
766+
* [async call stack]
767+
* [thread index]
768+
* [thread-local storage]
769+
* [cancellation]
629770

630771
Introducing the `Thread` class in chunks, a `Thread` has the following fields
631772
and can be in one of the following 3 states based on these fields:
632-
* `running`: actively executing with a "parent" thread that is waiting
633-
to run once the `running` thread suspends or returns
634-
* `suspended`: waiting to be `resume`d by another thread
635-
* `waiting`: waiting to be `resume`d by `Store.tick` once `ready`
773+
* `running`: actively executing on the stack
774+
* `suspended`: waiting to be resumed by some other thread `running` in
775+
the same component instance (via its `index`)
776+
* `pending`: waiting to to be resumed by the host (in `Store.tick` once `ready`
636777

637778
```python
638779
class Thread:
639-
task: Task
640-
fiber: threading.Thread
641-
fiber_lock: threading.Lock
642-
parent_lock: Optional[threading.Lock]
780+
cont: Optional[Continuation]
643781
ready_func: Optional[Callable[[], bool]]
644-
cancellable: bool
645-
cancelled: Cancelled
782+
task: Task
646783
index: Optional[int]
647784
context: list[int]
785+
cancellable: bool
648786

649787
CONTEXT_LENGTH = 2
650788

651789
def running(self):
652-
return self.parent_lock is not None
790+
return self.cont is None
653791

654792
def suspended(self):
655793
return not self.running() and self.ready_func is None
@@ -3494,10 +3632,11 @@ optimization to avoid allocating stacks for async languages that have avoided
34943632
the need for stackful coroutines by design (e.g., `async`/`await` in JS,
34953633
Python, C# and Rust).
34963634

3497-
Uncaught Core WebAssembly [exceptions] result in a trap at component
3498-
boundaries. Thus, if a component wishes to signal an error, it must use some
3499-
sort of explicit type such as `result` (whose `error` case particular language
3500-
bindings may choose to map to and from exceptions):
3635+
Uncaught Core WebAssembly [exceptions] or, in a future with [stack-switching],
3636+
unhandled events, result in a trap at component boundaries. Thus, if a component
3637+
wishes to signal an error, it must use some sort of explicit type such as
3638+
`result` (whose `error` case particular language bindings may choose to map to
3639+
and from exceptions):
35013640
```python
35023641
def call_and_trap_on_throw(callee, thread, args):
35033642
try:
@@ -4981,16 +5120,22 @@ def canon_thread_available_parallelism():
49815120
[Shared-Everything Dynamic Linking]: examples/SharedEverythingDynamicLinking.md
49825121
[Concurrency Explainer]: Concurrency.md
49835122
[Suspended]: Concurrency#thread-built-ins
5123+
[Thread Index]: Concurrency#thread-built-ins
5124+
[Async Call Stack]: Concurrency.md#subtasks-and-supertasks
49845125
[Structured Concurrency]: Concurrency.md#subtasks-and-supertasks
49855126
[Recursive Reentrance]: Concurrency.md#subtasks-and-supertasks
49865127
[Backpressure]: Concurrency.md#backpressure
5128+
[Thread]: Concurrency.md#threads-and-tasks
5129+
[Threads]: Concurrency.md#threads-and-tasks
49875130
[Current Thread]: Concurrency.md#current-thread-and-task
49885131
[Current Task]: Concurrency.md#current-thread-and-task
49895132
[Block]: Concurrency.md#blocking
5133+
[Waiting on External I/O]: Concurrency.md#blocking
49905134
[Subtasks]: Concurrency.md#subtasks-and-supertasks
49915135
[Readable and Writable Ends]: Concurrency.md#streams-and-futures
49925136
[Readable or Writable End]: Concurrency.md#streams-and-futures
49935137
[Thread-Local Storage]: Concurrency.md#thread-local-storage
5138+
[Cancellation]: Concurrency.md#cancellation
49945139
[Subtask State Machine]: Concurrency.md#cancellation
49955140
[Stream Readiness]: Concurrency.md#stream-readiness
49965141

@@ -5013,6 +5158,7 @@ def canon_thread_available_parallelism():
50135158
[WASI]: https://github.com/webassembly/wasi
50145159
[Deterministic Profile]: https://github.com/WebAssembly/profiles/blob/main/proposals/profiles/Overview.md
50155160
[stack-switching]: https://github.com/WebAssembly/stack-switching
5161+
[Control Tags]: https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Explainer.md#declaring-control-tags
50165162
[`memaddr`]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-memaddr
50175163
[`memaddrs` table]: https://webassembly.github.io/spec/core/exec/runtime.html#syntax-moduleinst
50185164
[`memidx`]: https://webassembly.github.io/spec/core/syntax/modules.html#syntax-memidx
@@ -5028,8 +5174,7 @@ def canon_thread_available_parallelism():
50285174
[Code Units]: https://www.unicode.org/glossary/#code_unit
50295175
[Surrogate]: https://unicode.org/faq/utf_bom.html#utf16-2
50305176
[Name Mangling]: https://en.wikipedia.org/wiki/Name_mangling
5031-
[Kernel Thread]: https://en.wikipedia.org/wiki/Thread_(computing)#kernel_thread
5032-
[Fiber]: https://en.wikipedia.org/wiki/Fiber_(computer_science)
5177+
[Fibers]: https://en.wikipedia.org/wiki/Fiber_(computer_science)
50335178
[Asyncify]: https://emscripten.org/docs/porting/asyncify.html
50345179

50355180
[`import_name`]: https://clang.llvm.org/docs/AttributeReference.html#import-name
@@ -5040,7 +5185,8 @@ def canon_thread_available_parallelism():
50405185

50415186
[`threading`]: https://docs.python.org/3/library/threading.html
50425187
[`threading.Thread`]: https://docs.python.org/3/library/threading.html#thread-objects
5043-
[`threading.Lock`]: https://docs.python.org/3/library/threading.html#lock-objects
5188+
[`threading.Lock`]: https://docs.python.org/3/library/threading.html#lock-objects
5189+
[`threading.local`]: https://docs.python.org/3/library/threading.html#thread-local-data
50445190

50455191
[OIO]: https://en.wikipedia.org/wiki/Overlapped_I/O
50465192
[io_uring]: https://en.wikipedia.org/wiki/Io_uring

design/mvp/Concurrency.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -348,6 +348,7 @@ feature is necessary in any case (due to iloops and traps).
348348

349349
### Current Thread and Task
350350

351+
TODO
351352
At any point in time while executing Core WebAssembly code or a [canonical
352353
built-in] called by Core WebAssembly code, there is a well-defined **current
353354
thread** whose containing task is the **current task**. The "current thread" is

0 commit comments

Comments
 (0)