@@ -16,6 +16,7 @@ specified here.
1616 * [ Component Instance State] ( #component-instance-state )
1717 * [ Table State] ( #table-state )
1818 * [ Resource State] ( #resource-state )
19+ * [ Stack Switching] ( #stack-switching )
1920 * [ Thread State] ( #thread-state )
2021 * [ Waitable State] ( #waitable-state )
2122 * [ Task State] ( #task-state )
@@ -587,69 +588,206 @@ class ResourceType(Type):
587588```
588589
589590
590- #### Thread State
591+ #### Stack Switching
591592
592- As described in the [ concurrency explainer] , threads are created both
593- * implicitly* , when calling a component export (in ` canon_lift ` below), and
594- * explicitly* , when core wasm code calls the ` thread.new-indirect ` built-in (in
595- ` canon_thread_new_indirect ` below). Threads are represented here by the
596- ` Thread ` class and the [ current thread] is represented by explicitly threading
597- a reference to a ` Thread ` through all Core WebAssembly calls so that the
598- ` thread ` parameter always points to "the current thread". The ` Thread ` class
599- provides a set of primitive control-flow operations that are used by the rest
600- of the Canonical ABI definitions.
601-
602- While ` Thread ` s are semantically created for each component export call by the
603- Python ` canon_lift ` code, an optimizing runtime should be able to allocate
604- ` Thread ` s lazily, only when needed for actual thread switching operations,
605- thereby avoiding cross-component call overhead for simple, short-running
606- cross-component calls. To assist in this optimization, ` Thread ` s are put into
607- their own per-component-instance ` threads ` table so that thread table indices
608- and elements can be more-readily reused between calls without interference from
609- the other kinds of handles.
610-
611- ` Thread ` is implemented using the Python standard library's [ ` threading ` ]
612- module. While a Python [ ` threading.Thread ` ] is a preemptively-scheduled [ kernel
613- thread] , it is coerced to behave like a cooperatively-scheduled [ fiber] by
614- careful use of [ ` threading.Lock ` ] . If Python had built-in fibers (or algebraic
615- effects), those could have been used instead since all that's needed is the
616- ability to switch stacks. In any case, the use of ` threading.Thread ` is
617- encapsulated by the ` Thread ` class so that the rest of the Canonical ABI can
618- simply use ` suspend ` , ` resume ` , etc.
619-
620- When a ` Thread ` is suspended and then resumed, it receives a ` Cancelled `
621- value indicating whether the caller has cooperatively requested that the thread
622- cancel itself which is communicated to Core WebAssembly with the following
623- integer values:
593+ Component Model concurrency is defined in terms of the Core WebAssembly
594+ [ stack-switching] proposal's ` cont.new ` , ` resume ` and ` suspend ` instructions so
595+ that there is a clear composition story between Component Model and Core
596+ WebAssembly concurrency in the future. Since Python does not natively provide
597+ algebraic effects, ` cont.new ` , ` resume ` and ` suspend ` are implemented in this
598+ section in terms of other Python primitives. Since the Component Model only
599+ needs a limited subset of the full expressivity of stack-switching, only that
600+ subset is implemented, which simplifies things. In particular, the
601+ Component Model uses stack-switching in the following restricted manner:
602+
603+ First, there are only two global [ control tags] used with ` suspend ` :
604+ ``` wat
605+ (tag $block (param $switch-to (ref null $Thread)) (result $cancelled bool))
606+ (tag $current-thread (result (ref $Thread)))
607+ ```
608+ The ` $block ` tag is used to suspend a [ thread] until some future event. The
609+ parameters and results will be described in the next section, where they are
610+ used to define ` Thread ` . The ` $current-thread ` tag is used to retrieve the
611+ current ` Thread ` , which is semantically stored in the ` resume ` handler's local
612+ state (although an optimizing implementation would instead maintain the current
613+ thread in the VM's execution context so that it could be retrieved with a single
614+ load and/or kept in register state).
615+
616+ Second, there is only a single type of continuation passed to ` resume ` :
617+ ``` wat
618+ (type $ct (cont (func (param $cancelled bool) (result (ref null $Thread)))))
619+ ```
620+ Thus, continuations are only produced for the ` $block ` event; the
621+ ` $current-thread ` continuations are immediately resumed and never "escape".
622+
623+ Third, * every* ` resume ` performed by the Canonical ABI always handles * both*
624+ ` $block ` and ` $current-thread ` and * every* Canonical ABI ` suspend ` is, by
625+ construction, always scoped by a Canonical ABI ` resume ` . Thus, every Canonical
626+ ABI ` suspend ` unconditionally transfers control flow directly to the innermost
627+ enclosing Canonical ABI ` resume ` without a general handler/tag search.
628+
629+ Given this restricted usage, specialized versions of ` cont.new ` , ` resume ` and
630+ ` suspend ` that are "monomorphized" to the above types and tags can be easily
631+ implemented in terms of Python's standard preemptive threading primitives, using
632+ [ ` threading.Thread ` ] to provide a native stack, [ ` threading.Lock ` ] to only allow
633+ a single ` threading.Thread ` to execute at a time, and [ ` threading.local ` ] to
634+ maintain the dynamic handler scope using thread-local storage. This could have
635+ been implemented more directly and efficiently using [ fibers] , but the Python
636+ standard library doesn't have fibers. However, a realistic implementation is
637+ expected to use (a pool of) fibers.
638+
639+ Starting with ` cont.new ` , the monomorphized version takes a function type
640+ matching ` $ct ` , as defined above:
624641``` python
625642class Cancelled (IntEnum ):
626643 FALSE = 0
627644 TRUE = 1
628- ```
645+
646+ class Continuation :
647+ lock: threading.Lock
648+ handler: Handler
649+ block_arg: Cancelled
650+
651+ class Handler :
652+ thread_local = threading.local()
653+ lock: threading.Lock
654+ current: Thread
655+ cont: Optional[Continuation]
656+ block_result: Optional[Thread]
657+
658+ def new_already_acquired_lock () -> threading.Lock:
659+ lock = threading.Lock()
660+ lock.acquire()
661+ return lock
662+
663+ def cont_new (f : Callable[[Cancelled], Optional[Thread]]) -> Continuation:
664+ cont = Continuation()
665+ cont.lock = new_already_acquired_lock()
666+ def wrapper ():
667+ cont.lock.acquire()
668+ Handler.thread_local.value = cont.handler
669+ block_result = f(cont.block_arg)
670+ handler = Handler.thread_local.value
671+ handler.cont = None
672+ handler.block_result = block_result
673+ handler.lock.release()
674+ threading.Thread(target = wrapper).start()
675+ return cont
676+ ```
677+ ` Continuation.block_arg ` and ` Continuation.handler ` are set by ` resume ` right
678+ before ` resume ` calls ` Continuation.lock.release() ` to transfer control flow to
679+ the continuation. After resuming the continuation, ` resume ` calls
680+ ` Handler.lock.acquire() ` to wait until the continuation signals suspension or
681+ return by calling ` Handler.lock.release() ` . The ` Handler ` is stored in the
682+ thread-local variable ` Handler.thread_local.value ` to implement the dynamic
683+ scoping that is needed by ` suspend ` . Because the thread created by ` cont_new `
684+ can be suspended and resumed many times (each time with a new ` Continuation ` and
685+ ` Handler ` , resp.), ` Handler ` must be re-loaded from ` Handler.thread_local.value `
686+ after ` f ` returns since it may have changed since the initial ` resume ` .
687+
688+ Next, ` resume ` is monomorphized to take: a continuation of type ` $ct ` , the
689+ argument to pass to the continuation, and the ` current ` ` Thread ` to use for
690+ ` resume ` 's ` (on $current-thread) ` handler. The remaining ` (on $block) ` and
691+ "returned" cases are merged to return a single value, with the ` (on $block) `
692+ case returning a ` Continuation ` and the "returned" case returning ` None ` :
693+ ``` python
694+ def resume (cont : Continuation, block_arg : Cancelled, current : Thread) -> \
695+ tuple[Optional[Continuation], Optional[Thread]]:
696+ handler = Handler()
697+ handler.lock = new_already_acquired_lock()
698+ handler.current = current
699+ cont.handler = handler
700+ cont.block_arg = block_arg
701+ cont.lock.release()
702+ handler.lock.acquire()
703+ return (handler.cont, handler.block_result)
704+ ```
705+
706+ Lastly, ` suspend ` is monomorphized into 2 functions for the ` $block ` and
707+ ` $current-thread ` tags shown above. Since ` $current-thread ` has a trivial
708+ handler that immediately ` resume ` s with the ` current ` ` Thread ` passed to
709+ ` resume ` (in a loop), it can simply return ` Handler.current ` without any stack
710+ switching.
711+ ``` python
712+ def block (block_result : Optional[Thread]) -> Cancelled:
713+ cont = Continuation()
714+ cont.lock = new_already_acquired_lock()
715+ handler = Handler.thread_local.value
716+ handler.cont = cont
717+ handler.block_result = block_result
718+ handler.lock.release()
719+ cont.lock.acquire()
720+ Handler.thread_local.value = cont.handler
721+ return cont.block_arg
722+
723+ def current_thread () -> Thread:
724+ return Handler.thread_local.value.current
725+ ```
726+
727+ In the future, when Core WebAssembly gets [ stack-switching] , the Component Model
728+ ` $block ` and ` $current-thread ` tags would not be exposed to Core WebAssembly.
729+ Thus, an optimizing implementation would continue to be able to implement
730+ ` block() ` as a direct control flow transfer and ` current_thread() ` as implicit
731+ execution context, both without a general handler/tag search. In particular,
732+ this avoids the pathological O(N<sup >2</sup >) behavior which would otherwise
733+ arise if Component Model cooperative threads were used in conjunction with
734+ deeply-nested Core WebAssembly handlers.
735+
736+ Additionally, once Core WebAssembly has stack switching, any unhandled events
737+ that originate in Core WebAssembly would turn into traps if they reach a
738+ component boundary (just like unhandled exceptions do now; see
739+ ` call_and_trap_on_throw ` below). Thus, all cross-component/cross-language stack
740+ switching would continue to be mediated by the Component Model's types and
741+ Canonical ABI, with Core WebAssembly stack-switching used to implement
742+ intra-component concurrency according to the language's own internal ABI, which
743+ can be different in each component.
744+
745+
746+ #### Thread State
747+
748+ As described in the [ concurrency explainer] , threads are created both
749+ * implicitly* , when calling a component export (in ` canon_lift ` below), and
750+ * explicitly* , when core wasm code calls the ` thread.new-indirect ` built-in (in
751+ ` canon_thread_new_indirect ` below). While threads are * logically* created for
752+ each component export call, an optimizing runtime should be able to allocate
753+ threads lazily when needed for actual thread switching operations, thereby
754+ avoiding cross-component call overhead for simple, short-running cross-component
755+ calls. To assist in this optimization, threads are put into their own
756+ ` ComponentInstance.threads ` table to reduce interference from the other kinds of
757+ handles.
758+
759+ Threads are represented in the Canonical ABI by the ` Thread ` class defined in
760+ this section. The ` Thread ` class is implemented in terms of the ` cont_new ` ,
761+ ` resume ` , ` block ` and ` current_thread ` stack-switching primitives defined in the
762+ previous section. ` Thread ` defines a set of higher-level concurrency operations
763+ that are used by all the other Canonical ABI definitions. In particular, a
764+ "thread" adds the higher-level concepts of:
765+ * [ waiting on external I/O]
766+ * [ async call stack]
767+ * [ thread index]
768+ * [ thread-local storage]
769+ * [ cancellation]
629770
630771Introducing the ` Thread ` class in chunks, a ` Thread ` has the following fields
631772and can be in one of the following 3 states based on these fields:
632- * ` running ` : actively executing with a "parent" thread that is waiting
633- to run once the ` running ` thread suspends or returns
634- * ` suspended ` : waiting to be ` resume ` d by another thread
635- * ` waiting ` : waiting to be ` resume ` d by ` Store.tick ` once ` ready `
773+ * ` running ` : actively executing on the stack
774+ * ` suspended ` : waiting to be resumed by some other thread ` running ` in
775+ the same component instance (via its ` index ` )
776+ * ` pending ` : waiting to to be resumed by the host (in ` Store.tick ` once ` ready `
636777
637778``` python
638779class Thread :
639- task: Task
640- fiber: threading.Thread
641- fiber_lock: threading.Lock
642- parent_lock: Optional[threading.Lock]
780+ cont: Optional[Continuation]
643781 ready_func: Optional[Callable[[], bool ]]
644- cancellable: bool
645- cancelled: Cancelled
782+ task: Task
646783 index: Optional[int ]
647784 context: list[int ]
785+ cancellable: bool
648786
649787 CONTEXT_LENGTH = 2
650788
651789 def running (self ):
652- return self .parent_lock is not None
790+ return self .cont is None
653791
654792 def suspended (self ):
655793 return not self .running() and self .ready_func is None
@@ -3494,10 +3632,11 @@ optimization to avoid allocating stacks for async languages that have avoided
34943632the need for stackful coroutines by design (e.g., ` async ` /` await ` in JS,
34953633Python, C# and Rust).
34963634
3497- Uncaught Core WebAssembly [ exceptions] result in a trap at component
3498- boundaries. Thus, if a component wishes to signal an error, it must use some
3499- sort of explicit type such as ` result ` (whose ` error ` case particular language
3500- bindings may choose to map to and from exceptions):
3635+ Uncaught Core WebAssembly [ exceptions] or, in a future with [ stack-switching] ,
3636+ unhandled events, result in a trap at component boundaries. Thus, if a component
3637+ wishes to signal an error, it must use some sort of explicit type such as
3638+ ` result ` (whose ` error ` case particular language bindings may choose to map to
3639+ and from exceptions):
35013640``` python
35023641def call_and_trap_on_throw (callee , thread , args ):
35033642 try :
@@ -4981,16 +5120,22 @@ def canon_thread_available_parallelism():
49815120[ Shared-Everything Dynamic Linking ] : examples/SharedEverythingDynamicLinking.md
49825121[ Concurrency Explainer ] : Concurrency.md
49835122[ Suspended ] : Concurrency#thread-built-ins
5123+ [ Thread Index ] : Concurrency#thread-built-ins
5124+ [ Async Call Stack ] : Concurrency.md#subtasks-and-supertasks
49845125[ Structured Concurrency ] : Concurrency.md#subtasks-and-supertasks
49855126[ Recursive Reentrance ] : Concurrency.md#subtasks-and-supertasks
49865127[ Backpressure ] : Concurrency.md#backpressure
5128+ [ Thread ] : Concurrency.md#threads-and-tasks
5129+ [ Threads ] : Concurrency.md#threads-and-tasks
49875130[ Current Thread ] : Concurrency.md#current-thread-and-task
49885131[ Current Task ] : Concurrency.md#current-thread-and-task
49895132[ Block ] : Concurrency.md#blocking
5133+ [ Waiting on External I/O ] : Concurrency.md#blocking
49905134[ Subtasks ] : Concurrency.md#subtasks-and-supertasks
49915135[ Readable and Writable Ends ] : Concurrency.md#streams-and-futures
49925136[ Readable or Writable End ] : Concurrency.md#streams-and-futures
49935137[ Thread-Local Storage ] : Concurrency.md#thread-local-storage
5138+ [ Cancellation ] : Concurrency.md#cancellation
49945139[ Subtask State Machine ] : Concurrency.md#cancellation
49955140[ Stream Readiness ] : Concurrency.md#stream-readiness
49965141
@@ -5013,6 +5158,7 @@ def canon_thread_available_parallelism():
50135158[ WASI ] : https://github.com/webassembly/wasi
50145159[ Deterministic Profile ] : https://github.com/WebAssembly/profiles/blob/main/proposals/profiles/Overview.md
50155160[ stack-switching ] : https://github.com/WebAssembly/stack-switching
5161+ [ Control Tags ] : https://github.com/WebAssembly/stack-switching/blob/main/proposals/stack-switching/Explainer.md#declaring-control-tags
50165162[ `memaddr` ] : https://webassembly.github.io/spec/core/exec/runtime.html#syntax-memaddr
50175163[ `memaddrs` table ] : https://webassembly.github.io/spec/core/exec/runtime.html#syntax-moduleinst
50185164[ `memidx` ] : https://webassembly.github.io/spec/core/syntax/modules.html#syntax-memidx
@@ -5028,8 +5174,7 @@ def canon_thread_available_parallelism():
50285174[ Code Units ] : https://www.unicode.org/glossary/#code_unit
50295175[ Surrogate ] : https://unicode.org/faq/utf_bom.html#utf16-2
50305176[ Name Mangling ] : https://en.wikipedia.org/wiki/Name_mangling
5031- [ Kernel Thread ] : https://en.wikipedia.org/wiki/Thread_(computing)#kernel_thread
5032- [ Fiber ] : https://en.wikipedia.org/wiki/Fiber_(computer_science)
5177+ [ Fibers ] : https://en.wikipedia.org/wiki/Fiber_(computer_science)
50335178[ Asyncify ] : https://emscripten.org/docs/porting/asyncify.html
50345179
50355180[ `import_name` ] : https://clang.llvm.org/docs/AttributeReference.html#import-name
@@ -5040,7 +5185,8 @@ def canon_thread_available_parallelism():
50405185
50415186[ `threading` ] : https://docs.python.org/3/library/threading.html
50425187[ `threading.Thread` ] : https://docs.python.org/3/library/threading.html#thread-objects
5043- [ `threading.Lock` ] : https://docs.python.org/3/library/threading.html#lock-objects
5188+ [ `threading.Lock` ] : https://docs.python.org/3/library/threading.html#lock-objects
5189+ [ `threading.local` ] : https://docs.python.org/3/library/threading.html#thread-local-data
50445190
50455191[ OIO ] : https://en.wikipedia.org/wiki/Overlapped_I/O
50465192[ io_uring ] : https://en.wikipedia.org/wiki/Io_uring
0 commit comments