Skip to content

Latest commit

 

History

History
1685 lines (1185 loc) · 72 KB

File metadata and controls

1685 lines (1185 loc) · 72 KB
.. index:: ! soundness, type system

Type Soundness

The :ref:`type system <type-system>` of WebAssembly is sound, implying both type safety and memory safety with respect to the WebAssembly semantics. For example:

Soundness also is instrumental in ensuring additional properties, most notably, encapsulation of function and module scopes: no :ref:`locals <syntax-local>` can be accessed outside their own function and no :ref:`module <syntax-module>` components can be accessed outside their own module unless they are explicitly :ref:`exported <syntax-export>` or :ref:`imported <syntax-import>`.

The typing rules defining WebAssembly :ref:`validation <valid>` only cover the static components of a WebAssembly program. In order to state and prove soundness precisely, the typing rules must be extended to the dynamic components of the abstract :ref:`runtime <syntax-runtime>`, that is, the :ref:`store <syntax-store>`, :ref:`configurations <syntax-config>`, and :ref:`administrative instructions <syntax-instr-admin>`. [1]

.. index:: context, recursive type, recursive type index

Contexts

In order to check :ref:`rolled up <aux-roll-rectype>` recursive types, the :ref:`context <context>` is locally extended with an additional component that records the :ref:`sub type <syntax-subtype>` corresponding to each :ref:`recursive type index <syntax-rectypeidx>` within the current :ref:`recursive type <syntax-rectype>`:

\begin{array}{llll}
\production{context} & C &::=&
  \{~ \dots, \CRECS ~ \subtype^\ast ~\} \\
\end{array}
.. index:: value type, reference type, heap type, bottom type, sub type, recursive type, recursive type index

Types

Well-formedness for :ref:`extended type forms <type-ext>` is defined as follows.

  • The heap type is valid.
\frac{
}{
  C \vdashheaptype \BOTH : \OKheaptype
}
  • The recursive type index i must exist in C.\CRECS.
  • Then the heap type is valid.
\frac{
  C.\CRECS[i] = \subtype
}{
  C \vdashheaptype \REC~i : \OKheaptype
}
  • The value type is valid.
\frac{
}{
  C \vdashvaltype \BOT : \OKvaltype
}
\frac{
  C,\CRECS~\subtype^\ast \vdashrectype \TREC~\subtype^\ast : {\OKrectype}(x,0)
}{
  C \vdashrectype \TREC~\subtype^\ast : {\OKrectype}(x)
}
\frac{
}{
  C \vdashrectype \TREC~\epsilon : {\OKrectype}(x,i)
}
\qquad
\frac{
  C \vdashsubtype \subtype : {\OKsubtype}(x,i)
  \qquad
  C \vdashrectype \TREC~{\subtype'}^\ast : {\OKrectype}(x+1,i+1)
}{
  C \vdashrectype \TREC~\subtype~{\subtype'}^\ast : {\OKrectype}(x,i)
}

Note

These rules are a generalisation of the ones :ref:`previously given <valid-rectype>`.

:ref:`Sub types <syntax-subtype>` \TSUB~\TFINAL^?~\X{ht}^\ast~\comptype

\frac{
  \begin{array}{@{}c@{}}
  |\X{ht}^\ast| \leq 1
  \qquad
  (\X{ht} \prec x,i)^\ast
  \qquad
  (\unrollht_{C}(\X{ht}) = \TSUB~{\X{ht}'}^\ast~\comptype')^\ast
  \\
  C \vdashcomptype \comptype : \OKcomptype
  \qquad
  (C \vdashcomptypematch \comptype \subcomptypematch \comptype')^\ast
  \end{array}
}{
  C \vdashsubtype \TSUB~\TFINAL^?~\X{ht}^\ast~\comptype : {\OKsubtype}(x,i)
}

where:

\begin{array}{@{}lll@{}}
(\deftype \prec x,i) &=& {\F{true}} \\
(y \prec x,i) &=& y < x \\
(\REC~j \prec x,i) &=& j < i \\
[2ex]
\unrollht_{C}(\deftype) &=& \unrolldt(\deftype) \\
\unrollht_{C}(y) &=& \unrolldt(C.\CTYPES[y]) \\
\unrollht_{C}(\REC~j) &=& C.\CRECS[j] \\
\end{array}

Note

This rule is a generalisation of the ones :ref:`previously given <valid-subtype>`, which only allowed type indices as supertypes.

.. index:: defined type, recursive type, unroll, expand

$${rule-prose: Deftype_ok}

$${rule: Deftype_ok}

.. index:: heap type, recursive type, recursive type index

Subtyping

In a :ref:`rolled-up <aux-roll-rectype>` :ref:`recursive type <syntax-rectype>`, a :ref:`recursive type indices <syntax-rectypeidx>` \REC~i :ref:`matches <match-heaptype>` another :ref:`heap type <syntax-heaptype>` \X{ht} if:

\frac{
  C.\CRECS[i] = \TSUB~\TFINAL^?~(\X{ht}_1^\ast~\X{ht}~\X{ht}_2^\ast)~\comptype
}{
  C \vdashheaptypematch \REC~i \subheaptypematch \X{ht}
}
.. index:: value, value type, result, result type, trap, exception, throw

Results

:ref:`Results <syntax-result>` can be classified by :ref:`result types <syntax-resulttype>` as follows.

\frac{
  (S \vdashval \val : t)^\ast
}{
  S \vdashresult \val^\ast : [t^\ast]
}

:ref:`Results <syntax-result>` (\REFEXNADDR~a)~\THROWREF

\frac{
  S \vdashval \REFEXNADDR~a : \REF~\EXN
  \qquad
  \vdashresulttype [t^\ast] : \OKresulttype
}{
  S \vdashresult (\REFEXNADDR~a)~\THROWREF : [t^\ast]
}
\frac{
  \vdashresulttype [t^\ast] : \OKresulttype
}{
  S \vdashresult \TRAP : [t^\ast]
}

Store Validity

The following typing rules specify when a runtime :ref:`store <syntax-store>` S is valid. A valid store must consist of :ref:`tag <syntax-taginst>`, :ref:`global <syntax-globalinst>`, :ref:`memory <syntax-meminst>`, :ref:`table <syntax-tableinst>`, :ref:`function <syntax-funcinst>`, :ref:`data <syntax-datainst>`, :ref:`element <syntax-eleminst>`, :ref:`structure <syntax-structinst>`, :ref:`array <syntax-arrayinst>`, :ref:`exception <syntax-exninst>`, and :ref:`module <syntax-moduleinst>` instances that are themselves valid, relative to S.

To that end, each kind of instance is classified by a respective :ref:`tag <syntax-tagtype>`, :ref:`global <syntax-globaltype>`, :ref:`memory <syntax-memtype>`, :ref:`table <syntax-tabletype>`, :ref:`function <syntax-functype>`, or :ref:`element <syntax-eleminst>`, type, or just ${:OK} in the case of :ref:`data <syntax-datainst>` :ref:`structures <syntax-structinst>`, :ref:`arrays <syntax-arrayinst>`, or :ref:`exceptions <syntax-exninst>`. Module instances are classified by module contexts, which are regular :ref:`contexts <context>` repurposed as module types describing the :ref:`index spaces <syntax-index>` defined by a module.

.. index:: store, tag instance, global instance, memory instance, table instance, function instance, data instance, element instance, structure instance, array instance, tag type, global type, memory type, table type, function type, defined type, structure type, array type

~\\[-1ex]
\frac{
  \begin{array}{@{}c@{}}
  (S \vdashtaginst \taginst : \tagtype)^\ast
  \qquad
  (S \vdashglobalinst \globalinst : \globaltype)^\ast
  \\
  (S \vdashmeminst \meminst : \memtype)^\ast
  \qquad
  (S \vdashtableinst \tableinst : \tabletype)^\ast
  \\
  (S \vdashfuncinst \funcinst : \deftype)^\ast
  \qquad
  (S \vdashdatainst \datainst : \OKdatainst)^\ast
  \qquad
  (S \vdasheleminst \eleminst : \reftype)^\ast
  \\
  (S \vdashstructinst \structinst : \OKstructinst)^\ast
  \qquad
  (S \vdasharrayinst \arrayinst : \OKarrayinst)^\ast
  \qquad
  (S \vdashexninst \exninst : \OKexninst)^\ast
  \\
  S = \{
    \begin{array}[t]{@{}l@{}}
    \STAGS~\taginst^\ast,
    \SGLOBALS~\globalinst^\ast,
    \SMEMS~\meminst^\ast,
    \STABLES~\tableinst^\ast,
    \SFUNCS~\funcinst^\ast, \\
    \SDATAS~\datainst^\ast,
    \SELEMS~\eleminst^\ast,
    \SSTRUCTS~\structinst^\ast,
    \SARRAYS~\arrayinst^\ast,
    \SEXNS~\exninst^\ast \}
    \end{array}
  \\
  (S.\SSTRUCTS[a_{\F{s}}] = \structinst)^\ast
  \qquad
  ((\REFSTRUCTADDR~a_{\F{s}}) \not\gg^+_S (\REFSTRUCTADDR~a_{\F{s}}))^\ast
  \\
  (S.\SARRAYS[a_{\F{a}}] = \arrayinst)^\ast
  \qquad
  ((\REFARRAYADDR~a_{\F{a}}) \not\gg^+_S (\REFARRAYADDR~a_{\F{a}}))^\ast
  \\
  (S.\SEXNS[a_{\F{e}}] = \exninst)^\ast
  \qquad
  ((\REFEXNADDR~a_{\F{e}}) \not\gg^+_S (\REFEXNADDR~a_{\F{e}}))^\ast
  \end{array}
}{
  \vdashstore S : \OKstore
}
.. index:: reachability

where \val_1 \gg^+_S \val_2 denotes the transitive closure of the following immutable reachability relation on :ref:`values <syntax-val>`:

\begin{array}{@{}lcll@{}}
(\REFSTRUCTADDR~a) &\gg_S& S.\SSTRUCTS[a].\SIFIELDS[i]
  & \iff \expanddt(S.\SSTRUCTS[a].\SITYPE) = \TSTRUCT~\X{ft}_1^i~\X{st}~\X{ft}_2^\ast \\
(\REFARRAYADDR~a) &\gg_S& S.\SARRAYS[a].\AIFIELDS[i]
  & \iff \expanddt(S.\SARRAYS[a].\AITYPE) = \TARRAY~\X{st} \\
(\REFEXNADDR~a) &\gg_S& S.\SEXNS[a].\EIFIELDS[i] \\
(\REFEXTERN~\reff) &\gg_S& \reff \\
\end{array}

Note

The constraint on reachability through immutable fields prevents the presence of cyclic data structures that can not be constructed in the language. Cycles can only be formed using mutation.

.. index:: tag type, tag instance
\frac{
  \vdashtagtype \tagtype : \OKtagtype
}{
  S \vdashtaginst \{ \HITYPE~\tagtype \} : \tagtype
}
.. index:: global type, global instance, value, mutability

:ref:`Global Instances <syntax-globalinst>` \{ \GITYPE~\mut~t, \GIVALUE~\val \}

\frac{
  \vdashglobaltype \mut~t : \OKglobaltype
  \qquad
  S \vdashval \val : t'
  \qquad
  \vdashvaltypematch t' \subvaltypematch t
}{
  S \vdashglobalinst \{ \GITYPE~\mut~t, \GIVALUE~\val \} : \mut~t
}
.. index:: memory type, memory instance, limits, byte

:ref:`Memory Instances <syntax-meminst>` \{ \MITYPE~(\addrtype~\limits), \MIBYTES~b^\ast \}

\frac{
  \vdashmemtype \addrtype~[n\,{..}\,m] : \OKmemtype
  \qquad
  |b^\ast| = n \cdot 64\,\F{Ki}
}{
  S \vdashmeminst \{ \MITYPE~(\addrtype~[n\,{..}\,m]), \MIBYTES~b^\ast \} : \addrtype~[n\,{..}\,m]
}
.. index:: table type, table instance, limits, function address

:ref:`Table Instances <syntax-tableinst>` \{ \TITYPE~(\addrtype~\limits~t), \TIREFS~\reff^\ast \}

\frac{
  \vdashtabletype \addrtype~[n\,{..}\,m]~t : \OKtabletype
  \qquad
  |\reff^\ast| = n
  \qquad
  (S \vdash \reff : t')^\ast
  \qquad
  (\vdashreftypematch t' \subvaltypematch t)^\ast
}{
  S \vdashtableinst \{ \TITYPE~(\addrtype~[n\,{..}\,m]~t), \TIREFS~\reff^\ast \} : \addrtype~[n\,{..}\,m]~t
}
.. index:: function type, defined type, function instance

:ref:`Function Instances <syntax-funcinst>` \{\FITYPE~\deftype, \FIMODULE~\moduleinst, \FICODE~\func\}

\frac{
  \begin{array}{@{}c@{}}
  \vdashdeftype \deftype : \OKdeftype
  \qquad
  S \vdashmoduleinst \moduleinst : C
  \\
  C \vdashfunc \func : \deftype'
  \qquad
  C \vdashdeftypematch \deftype' \subdeftypematch \deftype
  \end{array}
}{
  S \vdashfuncinst \{\FITYPE~\deftype, \FIMODULE~\moduleinst, \FICODE~\func\} : \deftype
}
.. index:: function type, defined type, function instance, host function

:ref:`Host Function Instances <syntax-funcinst>` \{\FITYPE~\deftype, \FIHOSTFUNC~\X{hf}\}

\frac{
  \begin{array}[b]{@{}r@{}}
  \vdashdeftype \deftype : \OKdeftype
  \\
  \deftype \approx \TFUNC~ [t_1^\ast] \Tarrow [t_2^\ast]
  \end{array}
  \quad
  \begin{array}[b]{@{}l@{}}
  \forall S_1, \val^\ast,~
    {\vdashstore S_1 : \OKstore} \wedge
    {\vdashstoreextends S \extendsto S_1} \wedge
    {S_1 \vdashresult \val^\ast : [t_1^\ast]}
    \Longrightarrow {} \\ \qquad
    \X{hf}(S_1; \val^\ast) \supset \emptyset \wedge {} \\ \qquad
  \forall R \in \X{hf}(S_1; \val^\ast),~
    R = \bot \vee {} \\ \qquad\qquad
    \exists S_2, \result,~
    {\vdashstore S_2 : \OKstore} \wedge
    {\vdashstoreextends S_1 \extendsto S_2} \wedge
    {S_2 \vdashresult \result : [t_2^\ast]} \wedge
    R = (S_2; \result)
  \end{array}
}{
  S \vdashfuncinst \{\FITYPE~\deftype, \FIHOSTFUNC~\X{hf}\} : \deftype
}

Note

This rule states that, if appropriate pre-conditions about store and arguments are satisfied, then executing the host function must satisfy appropriate post-conditions about store and results. The post-conditions match the ones in the :ref:`execution rule <exec-invoke-host>` for invoking host functions.

Any store under which the function is invoked is assumed to be an extension of the current store. That way, the function itself is able to make sufficient assumptions about future stores.

.. index:: data instance, byte
  • The data instance is valid.
\frac{
}{
  S \vdashdatainst \{ \DIBYTES~b^\ast \} : \OKdatainst
}
.. index:: element instance, reference

:ref:`Element Instances <syntax-eleminst>` \{ \EITYPE~t, \EIREFS~\reff^\ast \}

\frac{
  \vdashreftype t : \OKreftype
  \qquad
  (S \vdashval \reff : t')^\ast
  \qquad
  (\vdashreftypematch t' \subvaltypematch t)^\ast
}{
  S \vdasheleminst \{ \EITYPE~t, \EIREFS~\reff^\ast \} : t
}
.. index:: structure instance, field value, field type, storage type, defined type

:ref:`Structure Instances <syntax-structinst>` \{ \SITYPE~\deftype, \SIFIELDS~\fieldval^\ast \}

\frac{
  \vdashdeftype \X{dt} : \OKdeftype
  \qquad
  \expanddt(\X{dt}) = \TSTRUCT~(\mut~\X{st})^\ast
  \qquad
  (S \vdashfieldval \X{fv} : \X{st})^\ast
}{
  S \vdashstructinst \{ \SITYPE~\X{dt}, \SIFIELDS~\X{fv}^\ast \} : \OKstructinst
}
.. index:: array instance, field value, field type, storage type, defined type

:ref:`Array Instances <syntax-arrayinst>` \{ \AITYPE~\deftype, \AIFIELDS~\fieldval^\ast \}

\frac{
  \vdashdeftype \X{dt} : \OKdeftype
  \qquad
  \expanddt(\X{dt}) = \TARRAY~(\mut~\X{st})
  \qquad
  (S \vdashfieldval \X{fv} : \X{st})^\ast
}{
  S \vdasharrayinst \{ \AITYPE~\X{dt}, \AIFIELDS~\X{fv}^\ast \} : \OKarrayinst
}
.. index:: field value, field type, validation, store, packed value, packed type
\frac{
}{
  S \vdashpackval \X{pt}.\PACK~i : \X{pt}
}
.. index:: exception instance, tag, tag address

:ref:`Exception Instances <syntax-exninst>` \{ \EITAG~a, \EIFIELDS~\val^\ast \}

\frac{
  S.\STAGS[a].\HITYPE \approx \TFUNC~ [t^\ast] \Tarrow []
  \qquad
  (S \vdashval \val : t)^\ast
}{
  S \vdashexninst \{ \EITAG~a, \EIFIELDS~\val^\ast \} : \OKexninst
}
.. index:: external type, export instance, name, external address

:ref:`Export Instances <syntax-exportinst>` \{ \XINAME~\name, \XIADDR~\externaddr \}

\frac{
  S \vdashexternaddr \externaddr : \externtype
}{
  S \vdashexportinst \{ \XINAME~\name, \XIADDR~\externaddr \} : \OKexportinst
}
.. index:: module instance, context
~\\[-1ex]
\frac{
  \begin{array}{@{}c@{}}
  (\vdashdeftype \deftype : \OKdeftype)^\ast
  \qquad
  (S \vdashexternaddr \XATAG~\tagaddr : \XTTAG~\tagtype)^\ast
  \\
  (S \vdashexternaddr \XAGLOBAL~\globaladdr : \XTGLOBAL~\globaltype)^\ast
  \qquad
  (S \vdashexternaddr \XAFUNC~\funcaddr : \XTFUNC~\deftype_{\K{F}})^\ast
  \\
  (S \vdashexternaddr \XAMEM~\memaddr : \XTMEM~\memtype)^\ast
  \qquad
  (S \vdashexternaddr \XATABLE~\tableaddr : \XTTABLE~\tabletype)^\ast
  \\
  (S \vdashdatainst S.\SDATAS[\dataaddr] : \X{ok})^\ast
  \qquad
  (S \vdasheleminst S.\SELEMS[\elemaddr] : \reftype)^\ast
  \\
  (S \vdashexportinst \exportinst : \OKexportinst)^\ast
  \qquad
  (\exportinst.\XINAME)^\ast ~\mbox{disjoint}
  \end{array}
}{
  S \vdashmoduleinst \{
    \begin{array}[t]{@{}l@{~}l@{}}
    \MITYPES & \deftype^\ast, \\
    \MITAGS & \tagaddr^\ast, \\
    \MIGLOBALS & \globaladdr^\ast, \\
    \MIMEMS & \memaddr^\ast, \\
    \MITABLES & \tableaddr^\ast, \\
    \MIFUNCS & \funcaddr^\ast, \\
    \MIDATAS & \dataaddr^\ast, \\
    \MIELEMS & \elemaddr^\ast, \\
    \MIEXPORTS & \exportinst^\ast ~\} : \{
      \begin{array}[t]{@{}l@{~}l@{}}
      \CTYPES & \deftype^\ast, \\
      \CTAGS & \tagtype^\ast, \\
      \CGLOBALS & \globaltype^\ast, \\
      \CMEMS & \memtype^\ast, \\
      \CTABLES & \tabletype^\ast, \\
      \CFUNCS & \deftype_{\K{F}}^\ast, \\
      \CDATAS & \X{ok}^\ast, \\
      \CELEMS & \reftype^\ast, \\
      \CREFS & 0 \dots (|\funcaddr^\ast|-1) ~\}
      \end{array}
    \end{array}
}
.. index:: configuration, administrative instruction, store, frame

Configuration Validity

To relate the WebAssembly :ref:`type system <valid>` to its :ref:`execution semantics <exec>`, the :ref:`typing rules for instructions <valid-instr>` must be extended to :ref:`configurations <syntax-config>` S;T, which relates the :ref:`store <syntax-store>` to execution :ref:`threads <syntax-thread>`.

Configurations and threads are classified by their :ref:`result type <syntax-resulttype>`. In addition to the store S, threads are typed under a return type \resulttype^?, which controls whether and with which type a |return| instruction is allowed. This type is absent (\epsilon) except for instruction sequences inside an administrative |FRAME| instruction.

Finally, :ref:`frames <syntax-frame>` are classified with frame contexts, which extend the :ref:`module contexts <module-context>` of a frame's associated :ref:`module instance <syntax-moduleinst>` with the :ref:`locals <syntax-local>` that the frame contains.

.. index:: result type, thread

\frac{
  \vdashstore S : \OKstore
  \qquad
  S; \epsilon \vdashthread T : [t^\ast]
}{
  \vdashconfig S; T : [t^\ast]
}
.. index:: thread, frame, instruction, result type, context
\frac{
  S \vdashframe F : C
  \qquad
  S; C,\CRETURN~\resulttype^? \vdashinstrs \instr^\ast : [] \to [t^\ast]
}{
  S; \resulttype^? \vdashthread F; \instr^\ast : [t^\ast]
}
.. index:: frame, local, module instance, value, value type, context

:ref:`Frames <syntax-frame>` \{\ALOCALS~\val^\ast, \AMODULE~\moduleinst\}

\frac{
  S \vdashmoduleinst \moduleinst : C
  \qquad
  (S \vdashval \val : t)^\ast
}{
  S \vdashframe \{\ALOCALS~\val^\ast, \AMODULE~\moduleinst\} : (C, \CLOCALS~t^\ast)
}
.. index:: administrative instruction, value type, context, store

Administrative Instructions

Typing rules for :ref:`administrative instructions <syntax-instr-admin>` are specified as follows. In addition to the :ref:`context <context>` C, typing of these instructions is defined under a given :ref:`store <syntax-store>` S.

To that end, all previous typing judgements C \vdash \X{prop} are generalized to include the store, as in S; C \vdash \X{prop}, by implicitly adding S to all rules -- S is never modified by the pre-existing rules, but it is accessed in the extra rules for :ref:`administrative instructions <valid-instr-admin>` given below.

.. index:: trap

\TRAP

\frac{
  C \vdashinstrtype [t_1^\ast] \to [t_2^\ast] : \OKinstrtype
}{
  S; C \vdashadmininstr \TRAP : [t_1^\ast] \to [t_2^\ast]
}
.. index:: value, value type

\val

\frac{
  S \vdashval \val : t
}{
  S; C \vdashadmininstr \val : [] \to [t]
}
.. index:: label, instruction, result type

\LABEL_n\{\instr_0^\ast\}~\instr^\ast

\frac{
  S; C \vdashinstrs \instr_0^\ast : [t_1^n] \to_{x^\ast} [t_2^*]
  \qquad
  S; C,\CLABELS\,[t_1^n] \vdashinstrs \instr^\ast : [] \to_{{x'}^\ast} [t_2^*]
}{
  S; C \vdashadmininstr \LABEL_n\{\instr_0^\ast\}~\instr^\ast : [] \to [t_2^*]
}
.. index:: frame, instruction, result type

\FRAME_n\{F\}~\instr^\ast

\frac{
  C \vdashresulttype [t^n] : \OKresulttype
  \qquad
  S; [t^n] \vdashinstrs F; \instr^\ast : [t^n]
}{
  S; C \vdashadmininstr \FRAME_n\{F\}~\instr^\ast : [] \to [t^n]
}
.. index:: handler, throw context

\HANDLER_n\{\catch^\ast\}~\instr^\ast

\frac{
  \begin{array}{c}
  (C \vdashcatch \catch : \OKcatch)^\ast
  \qquad
  S; C \vdashinstrs \instr^\ast : [t_1^\ast] \to [t_2^\ast] \\
  \end{array}
}{
  S; C \vdashadmininstr \HANDLER_n\{\catch^\ast\}~\instr^\ast : [t_1^\ast] \to [t_2^\ast]
}
.. index:: ! store extension, store

Store Extension

Programs can mutate the :ref:`store <syntax-store>` and its contained instances. Any such modification must respect certain invariants, such as not removing allocated instances or changing immutable definitions. While these invariants are inherent to the execution semantics of WebAssembly :ref:`instructions <exec-instr>` and :ref:`modules <exec-instantiation>`, :ref:`host functions <syntax-hostfunc>` do not automatically adhere to them. Consequently, the required invariants must be stated as explicit constraints on the :ref:`invocation <exec-invoke-host>` of host functions. Soundness only holds when the :ref:`embedder <embedder>` ensures these constraints.

The necessary constraints are codified by the notion of store extension: a store state S' extends state S, written S \extendsto S', when the following rules hold.

Note

Extension does not imply that the new store is valid, which is defined separately :ref:`above <valid-store>`.

.. index:: store, function instance, table instance, memory instance, global instance
\frac{
  \begin{array}{@{}ccc@{}}
  S_1.\STAGS = \taginst_1^\ast &
  S_2.\STAGS = {\taginst'_1}^\ast~\taginst_2^\ast &
  (\vdashtaginstextends \taginst_1 \extendsto \taginst'_1)^\ast
  \\
  S_1.\SGLOBALS = \globalinst_1^\ast &
  S_2.\SGLOBALS = {\globalinst'_1}^\ast~\globalinst_2^\ast &
  (\vdashglobalinstextends \globalinst_1 \extendsto \globalinst'_1)^\ast
  \\
  S_1.\SMEMS = \meminst_1^\ast &
  S_2.\SMEMS = {\meminst'_1}^\ast~\meminst_2^\ast &
  (\vdashmeminstextends \meminst_1 \extendsto \meminst'_1)^\ast
  \\
  S_1.\STABLES = \tableinst_1^\ast &
  S_2.\STABLES = {\tableinst'_1}^\ast~\tableinst_2^\ast &
  (\vdashtableinstextends \tableinst_1 \extendsto \tableinst'_1)^\ast \\
  S_1.\SFUNCS = \funcinst_1^\ast &
  S_2.\SFUNCS = {\funcinst'_1}^\ast~\funcinst_2^\ast &
  (\vdashfuncinstextends \funcinst_1 \extendsto \funcinst'_1)^\ast
  \\
  S_1.\SDATAS = \datainst_1^\ast &
  S_2.\SDATAS = {\datainst'_1}^\ast~\datainst_2^\ast &
  (\vdashdatainstextends \datainst_1 \extendsto \datainst'_1)^\ast
  \\
  S_1.\SELEMS = \eleminst_1^\ast &
  S_2.\SELEMS = {\eleminst'_1}^\ast~\eleminst_2^\ast &
  (\vdasheleminstextends \eleminst_1 \extendsto \eleminst'_1)^\ast
  \\
  S_1.\SSTRUCTS = \structinst_1^\ast &
  S_2.\SSTRUCTS = {\structinst'_1}^\ast~\structinst_2^\ast &
  (\vdashstructinstextends \structinst_1 \extendsto \structinst'_1)^\ast
  \\
  S_1.\SARRAYS = \arrayinst_1^\ast &
  S_2.\SARRAYS = {\arrayinst'_1}^\ast~\arrayinst_2^\ast &
  (\vdasharrayinstextends \arrayinst_1 \extendsto \arrayinst'_1)^\ast
  \\
  S_1.\SEXNS = \exninst_1^\ast &
  S_2.\SEXNS = {\exninst'_1}^\ast~\exninst_2^\ast &
  (\vdashexninstextends \exninst_1 \extendsto \exninst'_1)^\ast
  \\
  \end{array}
}{
  \vdashstoreextends S_1 \extendsto S_2
}
.. index:: tag instance
  • A tag instance must remain unchanged.
\frac{
}{
  \vdashtaginstextends \taginst \extendsto \taginst
}
.. index:: global instance, value, mutability
\frac{
  \mut = \TMUT \vee \val_1 = \val_2
}{
  \vdashglobalinstextends \{\GITYPE~(\mut~t), \GIVALUE~\val_1\} \extendsto \{\GITYPE~(\mut~t), \GIVALUE~\val_2\}
}
.. index:: memory instance
\frac{
  n_1 \leq n_2
}{
  \vdashmeminstextends \{\MITYPE~\X{mt}, \MIBYTES~b_1^{n_1}\} \extendsto \{\MITYPE~\X{mt}, \MIBYTES~b_2^{n_2}\}
}
.. index:: table instance
\frac{
  n_1 \leq n_2
}{
  \vdashtableinstextends \{\TITYPE~\X{tt}, \TIREFS~(\X{fa}_1^?)^{n_1}\} \extendsto \{\TITYPE~\X{tt}, \TIREFS~(\X{fa}_2^?)^{n_2}\}
}
.. index:: function instance
  • A function instance must remain unchanged.
\frac{
}{
  \vdashfuncinstextends \funcinst \extendsto \funcinst
}
.. index:: data instance
  • The list \datainst.\DIBYTES must:
    • either remain unchanged,
    • or shrink to length 0.
\frac{
}{
  \vdashdatainstextends \{\DIBYTES~b^\ast\} \extendsto \{\DIBYTES~b^\ast\}
}
\frac{
}{
  \vdashdatainstextends \{\DIBYTES~b^\ast\} \extendsto \{\DIBYTES~\epsilon\}
}
.. index:: element instance
\frac{
}{
  \vdasheleminstextends \{\EITYPE~t, \EIREFS~a^\ast\} \extendsto \{\EITYPE~t, \EIREFS~a^\ast\}
}
\frac{
}{
  \vdasheleminstextends \{\EITYPE~t, \EIREFS~a^\ast\} \extendsto \{\EITYPE~t, \EIREFS~\epsilon\}
}
.. index:: structure instance, field value, field type
\frac{
  (\mut = \TMUT \vee \fieldval_1 = \fieldval_2)^\ast
}{
  \vdashstructinstextends \{\SITYPE~(\mut~\X{st})^\ast, \SIFIELDS~\fieldval_1^\ast\} \extendsto \{\SITYPE~(\mut~\X{st})^\ast, \SIFIELDS~\fieldval_2^\ast\}
}
.. index:: array instance, field value, field type
\frac{
  \mut = \TMUT \vee \fieldval_1^\ast = \fieldval_2^\ast
}{
  \vdasharrayinstextends \{\AITYPE~(\mut~\X{st}), \AIFIELDS~\fieldval_1^\ast\} \extendsto \{\AITYPE~(\mut~\X{st}), \AIFIELDS~\fieldval_2^\ast\}
}
.. index:: exception instance
  • An exception instance must remain unchanged.
\frac{
}{
  \vdashexninstextends \exninst \extendsto \exninst
}
.. index:: ! preservation, ! progress, soundness, configuration, thread, terminal configuration, instantiation, invocation, validity, module

Theorems

Given the definition of :ref:`valid configurations <valid-config>`, the standard soundness theorems hold. [2] [3]

Theorem (Preservation). If a :ref:`configuration <syntax-config>` S;T is :ref:`valid <valid-config>` with :ref:`result type <syntax-resulttype>` [t^\ast] (i.e., \vdashconfig S;T : [t^\ast]), and steps to S';T' (i.e., S;T \stepto S';T'), then S';T' is a valid configuration with the same result type (i.e., \vdashconfig S';T' : [t^\ast]). Furthermore, S' is an :ref:`extension <extend-store>` of S (i.e., \vdashstoreextends S \extendsto S').

A terminal :ref:`thread <syntax-thread>` is one whose sequence of :ref:`instructions <syntax-instr>` is a :ref:`result <syntax-result>`. A terminal configuration is a configuration whose thread is terminal.

Theorem (Progress). If a :ref:`configuration <syntax-config>` S;T is :ref:`valid <valid-config>` (i.e., \vdashconfig S;T : [t^\ast] for some :ref:`result type <syntax-resulttype>` [t^\ast]), then either it is terminal, or it can step to some configuration S';T' (i.e., S;T \stepto S';T').

From Preservation and Progress the soundness of the WebAssembly type system follows directly.

Corollary (Soundness). If a :ref:`configuration <syntax-config>` S;T is :ref:`valid <valid-config>` (i.e., \vdashconfig S;T : [t^\ast] for some :ref:`result type <syntax-resulttype>` [t^\ast]), then it either diverges or takes a finite number of steps to reach a terminal configuration S';T' (i.e., S;T \stepto^\ast S';T') that is valid with the same result type (i.e., \vdashconfig S';T' : [t^\ast]) and where S' is an :ref:`extension <extend-store>` of S (i.e., \vdashstoreextends S \extendsto S').

In other words, every thread in a valid configuration either runs forever, traps, throws an exception, or terminates with a result that has the expected type. Consequently, given a :ref:`valid store <valid-store>`, no computation defined by :ref:`instantiation <exec-instantiation>` or :ref:`invocation <exec-invocation>` of a valid module can "crash" or otherwise (mis)behave in ways not covered by the :ref:`execution <exec>` semantics given in this specification.

[1]The formalization and theorems are derived from the following article: Andreas Haas, Andreas Rossberg, Derek Schuff, Ben Titzer, Dan Gohman, Luke Wagner, Alon Zakai, JF Bastien, Michael Holman. |PLDI2017|_. Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 2017). ACM 2017.
[2]A machine-verified version of the formalization and soundness proof of the PLDI 2017 paper is described in the following article: Conrad Watt. |CPP2018|_. Proceedings of the 7th ACM SIGPLAN Conference on Certified Programs and Proofs (CPP 2018). ACM 2018.
[3]Machine-verified formalizations and soundness proofs of the semantics from the official specification are described in the following article: Conrad Watt, Xiaojia Rao, Jean Pichon-Pharabod, Martin Bodin, Philippa Gardner. |FM2021|_. Proceedings of the 24th International Symposium on Formal Methods (FM 2021). Springer 2021.
.. index:: type system

Type System Properties

.. index:: ! principal types, type system, subtyping, polymorphism, instruction, syntax, instruction type

Principal Types

The :ref:`type system <type-system>` of WebAssembly features both :ref:`subtyping <match>` and simple forms of :ref:`polymorphism <polymorphism>` for :ref:`instruction types <syntax-instrtype>`. That has the effect that every instruction or instruction sequence can be classified with multiple different instruction types.

However, the typing rules still allow deriving principal types for instruction sequences. That is, every valid instruction sequence has one particular type scheme, possibly containing some unconstrained place holder type variables, that is a subtype of all its valid instruction types, after substituting its type variables with suitable specific types.

Moreover, when deriving an instruction type in a "forward" manner, i.e., the input of the instruction sequence is already fixed to specific types, then it has a principal output type expressible without type variables, up to a possibly :ref:`polymorphic stack <polymorphism>` bottom representable with one single variable. In other words, "forward" principal types are effectively closed.

Note

For example, in isolation, the instruction \REFASNONNULL has the type [(\REF~\NULL~\X{ht})] \to [(\REF~\X{ht})] for any choice of valid :ref:`heap type <syntax-type>` \X{ht}. Moreover, if the input type [(\REF~\NULL~\X{ht})] is already determined, i.e., a specific \X{ht} is given, then the output type [(\REF~\X{ht})] is fully determined as well.

The implication of the latter property is that a validator for complete instruction sequences (as they occur in valid modules) can be implemented with a simple left-to-right :ref:`algorithm <algo-valid>` that does not require the introduction of type variables.

A typing algorithm capable of handling partial instruction sequences (as might be considered for program analysis or program manipulation) needs to introduce type variables and perform substitutions, but it does not need to perform backtracking or record any non-syntactic constraints on these type variables.

Technically, the :ref:`syntax <syntax-type>` of :ref:`heap <syntax-heaptype>`, :ref:`value <syntax-valtype>`, and :ref:`result <syntax-resulttype>` types can be enriched with type variables as follows:

\begin{array}{llll}
\production{nullability} & \X{null} &::=&
  \NULL^? ~|~ \alpha_{\X{null}} \\
\production{heap type} & \heaptype &::=&
  \dots ~|~ \alpha_{\heaptype} \\
\production{reference type} & \reftype &::=&
  \REF~\X{null}~\heaptype \\
\production{value type} & \valtype &::=&
  \dots ~|~ \alpha_{\valtype} ~|~ \alpha_{\X{numvectype}} \\
\production{result type} & \resulttype &::=&
  [\alpha_{\valtype^\ast}^?~\valtype^\ast] \\
\end{array}

where each \alpha_{\X{xyz}} ranges over a set of type variables for syntactic class \X{xyz}, respectively. The special class \X{numvectype} is defined as \numtype ~|~ \vectype ~|~ \BOT, and is only needed to handle unannotated |SELECT| instructions.

A type is closed when it does not contain any type variables, and open otherwise. A type substitution \sigma is a finite mapping from type variables to closed types of the respective syntactic class. When applied to an open type, it replaces the type variables \alpha from its domain with the respective \sigma(\alpha).

Theorem (Principal Types). If an instruction sequence \instr^\ast is :ref:`valid <valid-config>` with some closed :ref:`instruction type <syntax-instrtype>` \instrtype (i.e., C \vdashinstrs \instr^\ast : \instrtype), then it is also valid with a possibly open instruction type \instrtype_{\min} (i.e., C \vdashinstrs \instr^\ast : \instrtype_{\min}), such that for every closed type \instrtype' with which \instr^\ast is valid (i.e., for all C \vdashinstrs \instr^\ast : \instrtype'), there exists a substitution \sigma, such that \sigma(\instrtype_{\min}) is a subtype of \instrtype' (i.e., C \vdashinstrtypematch \sigma(\instrtype_{\min}) \subinstrtypematch \instrtype'). Furthermore, \instrtype_{\min} is unique up to the choice of type variables.

Theorem (Closed Principal Forward Types). If closed input type [t_1^\ast] is given and the instruction sequence \instr^\ast is :ref:`valid <valid-config>` with :ref:`instruction type <syntax-instrtype>` [t_1^\ast] \to_{x^\ast} [t_2^\ast] (i.e., C \vdashinstrs \instr^\ast : [t_1^\ast] \to_{x^\ast} [t_2^\ast]), then it is also valid with instruction type [t_1^\ast] \to_{x^\ast} [\alpha_{\valtype^\ast}~t^\ast] (i.e., C \vdashinstrs \instr^\ast : [t_1^\ast] \to_{x^\ast} [\alpha_{\valtype^\ast}~t^\ast]), where all t^\ast are closed, such that for every closed result type [{t'_2}^\ast] with which \instr^\ast is valid (i.e., for all C \vdashinstrs \instr^\ast : [t_1^\ast] \to_{x^\ast} [{t'_2}^\ast]), there exists a substitution \sigma, such that [{t'_2}^\ast] = [\sigma(\alpha_{\valtype^\ast})~t^\ast].

.. index:: ! type lattice, subtyping, least upper bound, greatest lower bound, instruction type

Type Lattice

The :ref:`Principal Types <principality>` property depends on the existence of a greatest lower bound for any pair of types.

Theorem (Greatest Lower Bounds for Value Types). For any two value types t_1 and t_2 that are :ref:`valid <valid-valtype>` (i.e., C \vdashvaltype t_1 : \OKvaltype and C \vdashvaltype t_2 : \OKvaltype), there exists a valid value type t that is a subtype of both t_1 and t_2 (i.e., C \vdashvaltype t : \OKvaltype and C \vdashvaltypematch t \subvaltypematch t_1 and C \vdashvaltypematch t \subvaltypematch t_2), such that every valid value type t' that also is a subtype of both t_1 and t_2 (i.e., for all C \vdashvaltype t' : \OKvaltype and C \vdashvaltypematch t' \subvaltypematch t_1 and C \vdashvaltypematch t' \subvaltypematch t_2), is a subtype of t (i.e., C \vdashvaltypematch t' \subvaltypematch t).

Note

The greatest lower bound of two types may be |BOT|.

Theorem (Conditional Least Upper Bounds for Value Types). Any two value types t_1 and t_2 that are :ref:`valid <valid-valtype>` (i.e., C \vdashvaltype t_1 : \OKvaltype and C \vdashvaltype t_2 : \OKvaltype) either have no common supertype, or there exists a valid value type t that is a supertype of both t_1 and t_2 (i.e., C \vdashvaltype t : \OKvaltype and C \vdashvaltypematch t_1 \subvaltypematch t and C \vdashvaltypematch t_2 \subvaltypematch t), such that every valid value type t' that also is a supertype of both t_1 and t_2 (i.e., for all C \vdashvaltype t' : \OKvaltype and C \vdashvaltypematch t_1 \subvaltypematch t' and C \vdashvaltypematch t_2 \subvaltypematch t'), is a supertype of t (i.e., C \vdashvaltypematch t \subvaltypematch t').

Note

If a top type was added to the type system, a least upper bound would exist for any two types.

Corollary (Type Lattice). Assuming the addition of a provisional top type, :ref:`value types <syntax-valtype>` form a lattice with respect to their :ref:`subtype <match-valtype>` relation.

Finally, value types can be partitioned into multiple disjoint hierarchies that are not related by subtyping, except through |BOT|.

Theorem (Disjoint Subtype Hierarchies). The greatest lower bound of two :ref:`value types <syntax-valtype>` is \BOT or \REF~\BOT if and only if they do not have a least upper bound.

In other words, types that do not have common supertypes, do not have common subtypes either (other than \BOT or \REF~\BOT), and vice versa.

Note

Types from disjoint hierarchies can safely be represented in mutually incompatible ways in an implementation, because their values can never flow to the same place.

.. index:: ! compositionality, instruction type, subtyping

Compositionality

:ref:`Valid <valid-instrs>` :ref:`instruction sequences <syntax-instr>` can be freely composed, as long as their types match up.

Theorem (Composition). If two instruction sequences \instr_1^\ast and \instr_2^\ast are valid with types [t_1^\ast] \to_{x_1^\ast} [t^\ast] and [t^\ast] \to_{x_2^\ast} [t_2^\ast], respectively (i.e., C \vdashinstrs \instr_1^\ast : [t_1^\ast] \to_{x_1^\ast} [t^\ast] and C \vdashinstrs \instr_1^\ast : [t^\ast] \to_{x_2^\ast} [t_2^\ast]), then the concatenated instruction sequence (\instr_1^\ast\;\instr_2^\ast) is valid with type [t_1^\ast] \to_{x_1^\ast\,x_2^\ast} [t_2^\ast] (i.e., C \vdashinstrs \instr_1^\ast\;\instr_2^\ast : [t_1^\ast] \to_{x_1^\ast\,x_2^\ast} [t_2^\ast]).

Note

More generally, instead of a shared type [t^\ast], it suffices if the output type of \instr_1^\ast is a :ref:`subtype <match-resulttype>` of the input type of \instr_1^\ast, since the subtype can always be weakened to its supertype by subsumption.

Inversely, valid instruction sequences can also freely be decomposed, that is, splitting them anywhere produces two instruction sequences that are both :ref:`valid <valid-instrs>`.

Theorem (Decomposition). If an instruction sequence \instr^\ast that is valid with type [t_1^\ast] \to_{x^\ast} [t_2^\ast] (i.e., C \vdashinstrs \instr^\ast : [t_1^\ast] \to_{x^\ast} [t_2^\ast]) is split into two instruction sequences \instr_1^\ast and \instr_2^\ast at any point (i.e., \instr^\ast = \instr_1^\ast\;\instr_2^\ast), then these are separately valid with some types [t_1^\ast] \to_{x_1^\ast} [t^\ast] and [t^\ast] \to_{x_2^\ast} [t_2^\ast], respectively (i.e., C \vdashinstrs \instr_1^\ast : [t_1^\ast] \to_{x_1^\ast} [t^\ast] and C \vdashinstrs \instr_1^\ast : [t^\ast] \to_{x_2^\ast} [t_2^\ast]), where x^\ast = x_1^\ast\;x_2^\ast.

Note

This property holds because validation is required even for unreachable code. Without that, \instr_2^\ast might not be valid in isolation.