11# Interpreter
22
33The interpreter is a virtual machine for executing MIR without compiling to
4- machine code. It is usually invoked via ` tcx.const_eval_* ` functions. The
5- interpreter is shared between the compiler (for compile-time function
4+ machine code.
5+ It is usually invoked via ` tcx.const_eval_* ` functions.
6+ The interpreter is shared between the compiler (for compile-time function
67evaluation, CTFE) and the tool [ Miri] ( https://github.com/rust-lang/miri/ ) , which
78uses the same virtual machine to detect Undefined Behavior in (unsafe) Rust
89code.
@@ -26,7 +27,8 @@ The compiler needs to figure out the length of the array before being able to
2627create items that use the type (locals, constants, function arguments, ...).
2728
2829To obtain the (in this case empty) parameter environment, one can call
29- ` let param_env = tcx.param_env(length_def_id); ` . The ` GlobalId ` needed is
30+ ` let param_env = tcx.param_env(length_def_id); ` .
31+ The ` GlobalId ` needed is
3032
3133``` rust,ignore
3234let gid = GlobalId {
@@ -36,7 +38,8 @@ let gid = GlobalId {
3638```
3739
3840Invoking ` tcx.const_eval(param_env.and(gid)) ` will now trigger the creation of
39- the MIR of the array length expression. The MIR will look something like this:
41+ the MIR of the array length expression.
42+ The MIR will look something like this:
4043
4144``` mir
4245Foo::{{constant}}#0: usize = {
@@ -59,37 +62,45 @@ Before the evaluation, a virtual memory location (in this case essentially a
5962` vec![u8; 4] ` or ` vec![u8; 8] ` ) is created for storing the evaluation result.
6063
6164At the start of the evaluation, ` _0 ` and ` _1 ` are
62- ` Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef)) ` . This is quite
65+ ` Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef)) ` .
66+ This is quite
6367a mouthful: [ ` Operand ` ] can represent either data stored somewhere in the
6468[ interpreter memory] ( #memory ) (` Operand::Indirect ` ), or (as an optimization)
65- immediate data stored in-line. And [ ` Immediate ` ] can either be a single
69+ immediate data stored in-line.
70+ And [ ` Immediate ` ] can either be a single
6671(potentially uninitialized) [ scalar value] [ `Scalar` ] (integer or thin pointer),
67- or a pair of two of them. In our case, the single scalar value is * not * (yet)
68- initialized.
72+ or a pair of two of them.
73+ In our case, the single scalar value is * not * (yet) initialized.
6974
7075When the initialization of ` _1 ` is invoked, the value of the ` FOO ` constant is
7176required, and triggers another call to ` tcx.const_eval_* ` , which will not be shown
72- here. If the evaluation of FOO is successful, ` 42 ` will be subtracted from its
77+ here.
78+ If the evaluation of FOO is successful, ` 42 ` will be subtracted from its
7379value ` 4096 ` and the result stored in ` _1 ` as
7480`Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. },
75- Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value,
76- the second part is a bool that's true if an overflow happened. A ` Scalar::Raw `
81+ Scalar::Raw { data: 0, .. })`.
82+ The first part of the pair is the computed value,
83+ the second part is a bool that's true if an overflow happened.
84+ A ` Scalar::Raw `
7785also stores the size (in bytes) of this scalar value; we are eliding that here.
7886
79- The next statement asserts that said boolean is ` 0 ` . In case the assertion
87+ The next statement asserts that said boolean is ` 0 ` .
88+ In case the assertion
8089fails, its error message is used for reporting a compile-time error.
8190
8291Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw {
8392data: 4054, .. }))` is stored in the virtual memory it was allocated before the
84- evaluation. ` _0 ` always refers to that location directly.
93+ evaluation.
94+ ` _0 ` always refers to that location directly.
8595
8696After the evaluation is done, the return value is converted from [ ` Operand ` ] to
8797[ ` ConstValue ` ] by [ ` op_to_const ` ] : the former representation is geared towards
8898what is needed * during* const evaluation, while [ ` ConstValue ` ] is shaped by the
8999needs of the remaining parts of the compiler that consume the results of const
90- evaluation. As part of this conversion, for types with scalar values, even if
100+ evaluation.
101+ As part of this conversion, for types with scalar values, even if
91102the resulting [ ` Operand ` ] is ` Indirect ` , it will return an immediate
92- ` ConstValue::Scalar(computed_value) ` (instead of the usual ` ConstValue::ByRef ` ).
103+ ` ConstValue::Scalar(computed_value) ` (instead of the usual ` ConstValue::Indirect ` ).
93104This makes using the result much more efficient and also more convenient, as no
94105further queries need to be executed in order to get at something as simple as a
95106` usize ` .
@@ -107,12 +118,13 @@ the interpreter, but just use the cached result.
107118
108119The interpreter's outside-facing datastructures can be found in
109120[ rustc_middle/src/mir/interpret] ( https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_middle/src/mir/interpret ) .
110- This is mainly the error enum and the [ ` ConstValue ` ] and [ ` Scalar ` ] types. A
111- ` ConstValue ` can be either ` Scalar ` (a single ` Scalar ` , i.e., integer or thin
121+ This is mainly the error enum and the [ ` ConstValue ` ] and [ ` Scalar ` ] types.
122+ A ` ConstValue ` can be either ` Scalar ` (a single ` Scalar ` , i.e., integer or thin
112123pointer), ` Slice ` (to represent byte slices and strings, as needed for pattern
113- matching) or ` ByRef ` , which is used for anything else and refers to a virtual
114- allocation. These allocations can be accessed via the methods on
115- ` tcx.interpret_interner ` . A ` Scalar ` is either some ` Raw ` integer or a pointer;
124+ matching) or ` Indirect ` , which is used for anything else and refers to a virtual
125+ allocation.
126+ These allocations can be accessed via the methods on ` tcx.interpret_interner ` .
127+ A ` Scalar ` is either some ` Raw ` integer or a pointer;
116128see [ the next section] ( #memory ) for more on that.
117129
118130If you are expecting a numeric result, you can use ` eval_usize ` (panics on
@@ -122,61 +134,74 @@ in an `Option<u64>` yielding the `Scalar` if possible.
122134## Memory
123135
124136To support any kind of pointers, the interpreter needs to have a "virtual memory" that the
125- pointers can point to. This is implemented in the [ ` Memory ` ] type. In the
126- simplest model, every global variable, stack variable and every dynamic
127- allocation corresponds to an [ ` Allocation ` ] in that memory. (Actually using an
137+ pointers can point to.
138+ This is implemented in the [ ` Memory ` ] type.
139+ In the simplest model, every global variable, stack variable and every dynamic
140+ allocation corresponds to an [ ` Allocation ` ] in that memory.
141+ (Actually using an
128142allocation for every MIR stack variable would be very inefficient; that's why we
129143have ` Operand::Immediate ` for stack variables that are both small and never have
130- their address taken. But that is purely an optimization.)
144+ their address taken.
145+ But that is purely an optimization.)
131146
132147Such an ` Allocation ` is basically just a sequence of ` u8 ` storing the value of
133- each byte in this allocation. (Plus some extra data, see below.) Every
134- ` Allocation ` has a globally unique ` AllocId ` assigned in ` Memory ` . With that, a
148+ each byte in this allocation.
149+ (Plus some extra data, see below.) Every
150+ ` Allocation ` has a globally unique ` AllocId ` assigned in ` Memory ` .
151+ With that, a
135152[ ` Pointer ` ] consists of a pair of an ` AllocId ` (indicating the allocation) and
136153an offset into the allocation (indicating which byte of the allocation the
137- pointer points to). It may seem odd that a ` Pointer ` is not just an integer
154+ pointer points to).
155+ It may seem odd that a ` Pointer ` is not just an integer
138156address, but remember that during const evaluation, we cannot know at which
139157actual integer address the allocation will end up -- so we use ` AllocId ` as
140- symbolic base addresses, which means we need a separate offset. (As an aside,
158+ symbolic base addresses, which means we need a separate offset.
159+ (As an aside,
141160it turns out that pointers at run-time are
142161[ more than just integers, too] ( https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance ) .)
143162
144163These allocations exist so that references and raw pointers have something to
145- point to. There is no global linear heap in which things are allocated, but each
164+ point to.
165+ There is no global linear heap in which things are allocated, but each
146166allocation (be it for a local variable, a static or a (future) heap allocation)
147- gets its own little memory with exactly the required size. So if you have a
167+ gets its own little memory with exactly the required size.
168+ So if you have a
148169pointer to an allocation for a local variable ` a ` , there is no possible (no
149170matter how unsafe) operation that you can do that would ever change said pointer
150171to a pointer to a different local variable ` b ` .
151172Pointer arithmetic on ` a ` will only ever change its offset; the ` AllocId ` stays the same.
152173
153174This, however, causes a problem when we want to store a ` Pointer ` into an
154175` Allocation ` : we cannot turn it into a sequence of ` u8 ` of the right length!
155- ` AllocId ` and offset together are twice as big as a pointer "seems" to be. This
156- is what the ` relocation ` field of ` Allocation ` is for: the byte offset of the
176+ ` AllocId ` and offset together are twice as big as a pointer "seems" to be.
177+ This is what the ` relocation ` field of ` Allocation ` is for: the byte offset of the
157178` Pointer ` gets stored as a bunch of ` u8 ` , while its ` AllocId ` gets stored
158- out-of-band. The two are reassembled when the ` Pointer ` is read from memory.
179+ out-of-band.
180+ The two are reassembled when the ` Pointer ` is read from memory.
159181The other bit of extra data an ` Allocation ` needs is ` undef_mask ` for keeping
160182track of which of its bytes are initialized.
161183
162184### Global memory and exotic allocations
163185
164186` Memory ` exists only during evaluation; it gets destroyed when the
165- final value of the constant is computed. In case that constant contains any
187+ final value of the constant is computed.
188+ In case that constant contains any
166189pointers, those get "interned" and moved to a global "const eval memory" that is
167- part of ` TyCtxt ` . These allocations stay around for the remaining computation
190+ part of ` TyCtxt ` .
191+ These allocations stay around for the remaining computation
168192and get serialized into the final output (so that dependent crates can use
169193them).
170194
171195Moreover, to also support function pointers, the global memory in ` TyCtxt ` can
172196also contain "virtual allocations": instead of an ` Allocation ` , these contain an
173- ` Instance ` . That allows a ` Pointer ` to point to either normal data or a
197+ ` Instance ` .
198+ That allows a ` Pointer ` to point to either normal data or a
174199function, which is needed to be able to evaluate casts from function pointers to
175200raw pointers.
176201
177202Finally, the [ ` GlobalAlloc ` ] type used in the global memory also contains a
178- variant ` Static ` that points to a particular ` const ` or ` static ` item. This is
179- needed to support circular statics, where we need to have a ` Pointer ` to a
203+ variant ` Static ` that points to a particular ` const ` or ` static ` item.
204+ This is needed to support circular statics, where we need to have a ` Pointer ` to a
180205` static ` for which we cannot yet have an ` Allocation ` as we do not know the
181206bytes of its value.
182207
@@ -188,17 +213,19 @@ bytes of its value.
188213### Pointer values vs Pointer types
189214
190215One common cause of confusion in the interpreter is that being a pointer * value* and having
191- a pointer * type* are entirely independent properties. By "pointer value", we
216+ a pointer * type* are entirely independent properties.
217+ By "pointer value", we
192218refer to a ` Scalar::Ptr ` containing a ` Pointer ` and thus pointing somewhere into
193- the interpreter's virtual memory. This is in contrast to ` Scalar::Raw ` , which is just some
194- concrete integer.
219+ the interpreter's virtual memory.
220+ This is in contrast to ` Scalar::Raw ` , which is just some concrete integer.
195221
196222However, a variable of pointer or reference * type* , such as ` *const T ` or ` &T ` ,
197223does not have to have a pointer * value* : it could be obtained by casting or
198- transmuting an integer to a pointer.
224+ transmuting an integer to a pointer.
199225And similarly, when casting or transmuting a reference to some
200226actual allocation to an integer, we end up with a pointer * value*
201- (` Scalar::Ptr ` ) at integer * type* (` usize ` ). This is a problem because we
227+ (` Scalar::Ptr ` ) at integer * type* (` usize ` ).
228+ This is a problem because we
202229cannot meaningfully perform integer operations such as division on pointer
203230values.
204231
@@ -207,30 +234,33 @@ values.
207234Although the main entry point to constant evaluation is the ` tcx.const_eval_* `
208235functions, there are additional functions in
209236[ rustc_const_eval/src/const_eval] ( https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/index.html )
210- that allow accessing the fields of a ` ConstValue ` (` ByRef ` or otherwise). You should
237+ that allow accessing the fields of a ` ConstValue ` (` Indirect ` or otherwise).
238+ You should
211239never have to access an ` Allocation ` directly except for translating it to the
212240compilation target (at the moment just LLVM).
213241
214242The interpreter starts by creating a virtual stack frame for the current constant that is
215- being evaluated. There's essentially no difference between a constant and a
243+ being evaluated.
244+ There's essentially no difference between a constant and a
216245function with no arguments, except that constants do not allow local (named)
217246variables at the time of writing this guide.
218247
219248A stack frame is defined by the ` Frame ` type in
220249[ rustc_const_eval/src/interpret/eval_context.rs] ( https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/eval_context.rs )
221- and contains all the local
222- variables memory ( ` None ` at the start of evaluation). Each frame refers to the
223- evaluation of either the root constant or subsequent calls to ` const fn ` . The
224- evaluation of another constant simply calls ` tcx.const_eval_* ` , which produce an
250+ and contains all the local variables memory ( ` None ` at the start of evaluation).
251+ Each frame refers to the
252+ evaluation of either the root constant or subsequent calls to ` const fn ` .
253+ The evaluation of another constant simply calls ` tcx.const_eval_* ` , which produce an
225254entirely new and independent stack frame.
226255
227256The frames are just a ` Vec<Frame> ` , there's no way to actually refer to a
228- ` Frame ` 's memory even if horrible shenanigans are done via unsafe code. The only
229- memory that can be referred to are ` Allocation ` s.
257+ ` Frame ` 's memory even if horrible shenanigans are done via unsafe code.
258+ The only memory that can be referred to are ` Allocation ` s.
230259
231260The interpreter now calls the ` step ` method (in
232261[ rustc_const_eval/src/interpret/step.rs] ( https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/step.rs )
233- ) until it either returns an error or has no further statements to execute. Each
234- statement will now initialize or modify the locals or the virtual memory
235- referred to by a local. This might require evaluating other constants or
262+ ) until it either returns an error or has no further statements to execute.
263+ Each statement will now initialize or modify the locals or the virtual memory
264+ referred to by a local.
265+ This might require evaluating other constants or
236266statics, which just recursively invokes ` tcx.const_eval_* ` .
0 commit comments