diff --git a/src/appendix/code-index.md b/src/appendix/code-index.md index 6c5f85da4..25c770d4f 100644 --- a/src/appendix/code-index.md +++ b/src/appendix/code-index.md @@ -1,8 +1,8 @@ # Code Index -rustc has a lot of important data structures. This is an attempt to give some -guidance on where to learn more about some of the key data structures of the -compiler. +rustc has a lot of important data structures. +This is an attempt to give some guidance on where to learn more +about some of the key data structures of the compiler. Item | Kind | Short description | Chapter | Declaration ----------------|----------|-----------------------------|--------------------|------------------- diff --git a/src/const-eval/interpret.md b/src/const-eval/interpret.md index afd6c4f4c..0462876ef 100644 --- a/src/const-eval/interpret.md +++ b/src/const-eval/interpret.md @@ -1,8 +1,9 @@ # Interpreter The interpreter is a virtual machine for executing MIR without compiling to -machine code. It is usually invoked via `tcx.const_eval_*` functions. The -interpreter is shared between the compiler (for compile-time function +machine code. +It is usually invoked via `tcx.const_eval_*` functions. +The interpreter is shared between the compiler (for compile-time function evaluation, CTFE) and the tool [Miri](https://github.com/rust-lang/miri/), which uses the same virtual machine to detect Undefined Behavior in (unsafe) Rust code. @@ -26,7 +27,8 @@ The compiler needs to figure out the length of the array before being able to create items that use the type (locals, constants, function arguments, ...). To obtain the (in this case empty) parameter environment, one can call -`let param_env = tcx.param_env(length_def_id);`. The `GlobalId` needed is +`let param_env = tcx.param_env(length_def_id);`. +The `GlobalId` needed is ```rust,ignore let gid = GlobalId { @@ -36,7 +38,8 @@ let gid = GlobalId { ``` Invoking `tcx.const_eval(param_env.and(gid))` will now trigger the creation of -the MIR of the array length expression. The MIR will look something like this: +the MIR of the array length expression. +The MIR will look something like this: ```mir Foo::{{constant}}#0: usize = { @@ -59,35 +62,43 @@ Before the evaluation, a virtual memory location (in this case essentially a `vec![u8; 4]` or `vec![u8; 8]`) is created for storing the evaluation result. At the start of the evaluation, `_0` and `_1` are -`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. This is quite +`Operand::Immediate(Immediate::Scalar(ScalarMaybeUndef::Undef))`. +This is quite a mouthful: [`Operand`] can represent either data stored somewhere in the [interpreter memory](#memory) (`Operand::Indirect`), or (as an optimization) -immediate data stored in-line. And [`Immediate`] can either be a single +immediate data stored in-line. +And [`Immediate`] can either be a single (potentially uninitialized) [scalar value][`Scalar`] (integer or thin pointer), -or a pair of two of them. In our case, the single scalar value is *not* (yet) -initialized. +or a pair of two of them. +In our case, the single scalar value is *not* (yet) initialized. When the initialization of `_1` is invoked, the value of the `FOO` constant is required, and triggers another call to `tcx.const_eval_*`, which will not be shown -here. If the evaluation of FOO is successful, `42` will be subtracted from its +here. +If the evaluation of FOO is successful, `42` will be subtracted from its value `4096` and the result stored in `_1` as `Operand::Immediate(Immediate::ScalarPair(Scalar::Raw { data: 4054, .. }, -Scalar::Raw { data: 0, .. })`. The first part of the pair is the computed value, -the second part is a bool that's true if an overflow happened. A `Scalar::Raw` +Scalar::Raw { data: 0, .. })`. +The first part of the pair is the computed value, +the second part is a bool that's true if an overflow happened. +A `Scalar::Raw` also stores the size (in bytes) of this scalar value; we are eliding that here. -The next statement asserts that said boolean is `0`. In case the assertion +The next statement asserts that said boolean is `0`. +In case the assertion fails, its error message is used for reporting a compile-time error. Since it does not fail, `Operand::Immediate(Immediate::Scalar(Scalar::Raw { data: 4054, .. }))` is stored in the virtual memory it was allocated before the -evaluation. `_0` always refers to that location directly. +evaluation. +`_0` always refers to that location directly. After the evaluation is done, the return value is converted from [`Operand`] to [`ConstValue`] by [`op_to_const`]: the former representation is geared towards what is needed *during* const evaluation, while [`ConstValue`] is shaped by the needs of the remaining parts of the compiler that consume the results of const -evaluation. As part of this conversion, for types with scalar values, even if +evaluation. +As part of this conversion, for types with scalar values, even if the resulting [`Operand`] is `Indirect`, it will return an immediate `ConstValue::Scalar(computed_value)` (instead of the usual `ConstValue::Indirect`). This makes using the result much more efficient and also more convenient, as no @@ -107,12 +118,13 @@ the interpreter, but just use the cached result. The interpreter's outside-facing datastructures can be found in [rustc_middle/src/mir/interpret](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_middle/src/mir/interpret). -This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. A -`ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin +This is mainly the error enum and the [`ConstValue`] and [`Scalar`] types. +A `ConstValue` can be either `Scalar` (a single `Scalar`, i.e., integer or thin pointer), `Slice` (to represent byte slices and strings, as needed for pattern matching) or `Indirect`, which is used for anything else and refers to a virtual -allocation. These allocations can be accessed via the methods on -`tcx.interpret_interner`. A `Scalar` is either some `Raw` integer or a pointer; +allocation. +These allocations can be accessed via the methods on `tcx.interpret_interner`. +A `Scalar` is either some `Raw` integer or a pointer; see [the next section](#memory) for more on that. If you are expecting a numeric result, you can use `eval_usize` (panics on @@ -122,29 +134,38 @@ in an `Option` yielding the `Scalar` if possible. ## Memory To support any kind of pointers, the interpreter needs to have a "virtual memory" that the -pointers can point to. This is implemented in the [`Memory`] type. In the -simplest model, every global variable, stack variable and every dynamic -allocation corresponds to an [`Allocation`] in that memory. (Actually using an +pointers can point to. +This is implemented in the [`Memory`] type. +In the simplest model, every global variable, stack variable and every dynamic +allocation corresponds to an [`Allocation`] in that memory. +(Actually using an allocation for every MIR stack variable would be very inefficient; that's why we have `Operand::Immediate` for stack variables that are both small and never have -their address taken. But that is purely an optimization.) +their address taken. +But that is purely an optimization.) Such an `Allocation` is basically just a sequence of `u8` storing the value of -each byte in this allocation. (Plus some extra data, see below.) Every -`Allocation` has a globally unique `AllocId` assigned in `Memory`. With that, a +each byte in this allocation. +(Plus some extra data, see below.) Every +`Allocation` has a globally unique `AllocId` assigned in `Memory`. +With that, a [`Pointer`] consists of a pair of an `AllocId` (indicating the allocation) and an offset into the allocation (indicating which byte of the allocation the -pointer points to). It may seem odd that a `Pointer` is not just an integer +pointer points to). +It may seem odd that a `Pointer` is not just an integer address, but remember that during const evaluation, we cannot know at which actual integer address the allocation will end up -- so we use `AllocId` as -symbolic base addresses, which means we need a separate offset. (As an aside, +symbolic base addresses, which means we need a separate offset. +(As an aside, it turns out that pointers at run-time are [more than just integers, too](https://rust-lang.github.io/unsafe-code-guidelines/glossary.html#pointer-provenance).) These allocations exist so that references and raw pointers have something to -point to. There is no global linear heap in which things are allocated, but each +point to. +There is no global linear heap in which things are allocated, but each allocation (be it for a local variable, a static or a (future) heap allocation) -gets its own little memory with exactly the required size. So if you have a +gets its own little memory with exactly the required size. +So if you have a pointer to an allocation for a local variable `a`, there is no possible (no matter how unsafe) operation that you can do that would ever change said pointer to a pointer to a different local variable `b`. @@ -152,31 +173,35 @@ Pointer arithmetic on `a` will only ever change its offset; the `AllocId` stays This, however, causes a problem when we want to store a `Pointer` into an `Allocation`: we cannot turn it into a sequence of `u8` of the right length! -`AllocId` and offset together are twice as big as a pointer "seems" to be. This -is what the `relocation` field of `Allocation` is for: the byte offset of the +`AllocId` and offset together are twice as big as a pointer "seems" to be. +This is what the `relocation` field of `Allocation` is for: the byte offset of the `Pointer` gets stored as a bunch of `u8`, while its `AllocId` gets stored -out-of-band. The two are reassembled when the `Pointer` is read from memory. +out-of-band. +The two are reassembled when the `Pointer` is read from memory. The other bit of extra data an `Allocation` needs is `undef_mask` for keeping track of which of its bytes are initialized. ### Global memory and exotic allocations `Memory` exists only during evaluation; it gets destroyed when the -final value of the constant is computed. In case that constant contains any +final value of the constant is computed. +In case that constant contains any pointers, those get "interned" and moved to a global "const eval memory" that is -part of `TyCtxt`. These allocations stay around for the remaining computation +part of `TyCtxt`. +These allocations stay around for the remaining computation and get serialized into the final output (so that dependent crates can use them). Moreover, to also support function pointers, the global memory in `TyCtxt` can also contain "virtual allocations": instead of an `Allocation`, these contain an -`Instance`. That allows a `Pointer` to point to either normal data or a +`Instance`. +That allows a `Pointer` to point to either normal data or a function, which is needed to be able to evaluate casts from function pointers to raw pointers. Finally, the [`GlobalAlloc`] type used in the global memory also contains a -variant `Static` that points to a particular `const` or `static` item. This is -needed to support circular statics, where we need to have a `Pointer` to a +variant `Static` that points to a particular `const` or `static` item. +This is needed to support circular statics, where we need to have a `Pointer` to a `static` for which we cannot yet have an `Allocation` as we do not know the bytes of its value. @@ -188,17 +213,19 @@ bytes of its value. ### Pointer values vs Pointer types One common cause of confusion in the interpreter is that being a pointer *value* and having -a pointer *type* are entirely independent properties. By "pointer value", we +a pointer *type* are entirely independent properties. +By "pointer value", we refer to a `Scalar::Ptr` containing a `Pointer` and thus pointing somewhere into -the interpreter's virtual memory. This is in contrast to `Scalar::Raw`, which is just some -concrete integer. +the interpreter's virtual memory. +This is in contrast to `Scalar::Raw`, which is just some concrete integer. However, a variable of pointer or reference *type*, such as `*const T` or `&T`, does not have to have a pointer *value*: it could be obtained by casting or -transmuting an integer to a pointer. +transmuting an integer to a pointer. And similarly, when casting or transmuting a reference to some actual allocation to an integer, we end up with a pointer *value* -(`Scalar::Ptr`) at integer *type* (`usize`). This is a problem because we +(`Scalar::Ptr`) at integer *type* (`usize`). + This is a problem because we cannot meaningfully perform integer operations such as division on pointer values. @@ -207,30 +234,33 @@ values. Although the main entry point to constant evaluation is the `tcx.const_eval_*` functions, there are additional functions in [rustc_const_eval/src/const_eval](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_const_eval/index.html) -that allow accessing the fields of a `ConstValue` (`Indirect` or otherwise). You should +that allow accessing the fields of a `ConstValue` (`Indirect` or otherwise). +You should never have to access an `Allocation` directly except for translating it to the compilation target (at the moment just LLVM). The interpreter starts by creating a virtual stack frame for the current constant that is -being evaluated. There's essentially no difference between a constant and a +being evaluated. +There's essentially no difference between a constant and a function with no arguments, except that constants do not allow local (named) variables at the time of writing this guide. A stack frame is defined by the `Frame` type in [rustc_const_eval/src/interpret/eval_context.rs](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/eval_context.rs) -and contains all the local -variables memory (`None` at the start of evaluation). Each frame refers to the -evaluation of either the root constant or subsequent calls to `const fn`. The -evaluation of another constant simply calls `tcx.const_eval_*`, which produce an +and contains all the local variables memory (`None` at the start of evaluation). +Each frame refers to the +evaluation of either the root constant or subsequent calls to `const fn`. +The evaluation of another constant simply calls `tcx.const_eval_*`, which produce an entirely new and independent stack frame. The frames are just a `Vec`, there's no way to actually refer to a -`Frame`'s memory even if horrible shenanigans are done via unsafe code. The only -memory that can be referred to are `Allocation`s. +`Frame`'s memory even if horrible shenanigans are done via unsafe code. +The only memory that can be referred to are `Allocation`s. The interpreter now calls the `step` method (in [rustc_const_eval/src/interpret/step.rs](https://github.com/rust-lang/rust/blob/HEAD/compiler/rustc_const_eval/src/interpret/step.rs) -) until it either returns an error or has no further statements to execute. Each -statement will now initialize or modify the locals or the virtual memory -referred to by a local. This might require evaluating other constants or +) until it either returns an error or has no further statements to execute. +Each statement will now initialize or modify the locals or the virtual memory +referred to by a local. +This might require evaluating other constants or statics, which just recursively invokes `tcx.const_eval_*`. diff --git a/src/hir.md b/src/hir.md index 250b02902..c2e227dc9 100644 --- a/src/hir.md +++ b/src/hir.md @@ -1,13 +1,15 @@ # The HIR The HIR – "High-Level Intermediate Representation" – is the primary IR used -in most of rustc. It is a compiler-friendly representation of the abstract +in most of rustc. +It is a compiler-friendly representation of the abstract syntax tree (AST) that is generated after parsing, macro expansion, and name resolution (see [Lowering](./hir/lowering.md) for how the HIR is created). Many parts of HIR resemble Rust surface syntax quite closely, with the exception that some of Rust's expression forms have been desugared away. For example, `for` loops are converted into a `loop` and do not appear in -the HIR. This makes HIR more amenable to analysis than a normal AST. +the HIR. +This makes HIR more amenable to analysis than a normal AST. This chapter covers the main concepts of the HIR. @@ -30,7 +32,8 @@ cargo rustc -- -Z unpretty=hir The top-level data-structure in the HIR is the [`Crate`], which stores the contents of the crate currently being compiled (we only ever -construct HIR for the current crate). Whereas in the AST the crate +construct HIR for the current crate). +Whereas in the AST the crate data structure basically just contains the root module, the HIR `Crate` structure contains a number of maps and other things that serve to organize the content of the crate for easier access. @@ -39,8 +42,8 @@ serve to organize the content of the crate for easier access. For example, the contents of individual items (e.g. modules, functions, traits, impls, etc) in the HIR are not immediately -accessible in the parents. So, for example, if there is a module item -`foo` containing a function `bar()`: +accessible in the parents. +So, for example, if there is a module item `foo` containing a function `bar()`: ```rust mod foo { @@ -49,8 +52,8 @@ mod foo { ``` then in the HIR the representation of module `foo` (the [`Mod`] -struct) would only have the **`ItemId`** `I` of `bar()`. To get the -details of the function `bar()`, we would lookup `I` in the +struct) would only have the **`ItemId`** `I` of `bar()`. +To get the details of the function `bar()`, we would lookup `I` in the `items` map. [`Mod`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/hir/struct.Mod.html @@ -62,9 +65,11 @@ There are similar maps for things like trait items and impl items, as well as "bodies" (explained below). The other reason to set up the representation this way is for better -integration with incremental compilation. This way, if you gain access +integration with incremental compilation. +This way, if you gain access to an [`&rustc_hir::Item`] (e.g. for the mod `foo`), you do not immediately -gain access to the contents of the function `bar()`. Instead, you only +gain access to the contents of the function `bar()`. +Instead, you only gain access to the **id** for `bar()`, and you must invoke some function to lookup the contents of `bar()` given its id; this gives the compiler a chance to observe that you accessed the data for @@ -79,23 +84,27 @@ the compiler a chance to observe that you accessed the data for The HIR uses a bunch of different identifiers that coexist and serve different purposes. - A [`DefId`], as the name suggests, identifies a particular definition, or top-level - item, in a given crate. It is composed of two parts: a [`CrateNum`] which identifies + item, in a given crate. + It is composed of two parts: a [`CrateNum`] which identifies the crate the definition comes from, and a [`DefIndex`] which identifies the definition - within the crate. Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which + within the crate. + Unlike [`HirId`]s, there isn't a [`DefId`] for every expression, which makes them more stable across compilations. - A [`LocalDefId`] is basically a [`DefId`] that is known to come from the current crate. This allows us to drop the [`CrateNum`] part, and use the type system to ensure that only local definitions are passed to functions that expect a local definition. -- A [`HirId`] uniquely identifies a node in the HIR of the current crate. It is composed - of two parts: an `owner` and a `local_id` that is unique within the `owner`. This - combination makes for more stable values which are helpful for incremental compilation. +- A [`HirId`] uniquely identifies a node in the HIR of the current crate. + It is composed of two parts: + an `owner` and a `local_id` that is unique within the `owner`. + This combination makes for more stable values which are helpful for incremental compilation. Unlike [`DefId`]s, a [`HirId`] can refer to [fine-grained entities][Node] like expressions, but stays local to the current crate. -- A [`BodyId`] identifies a HIR [`Body`] in the current crate. It is currently only - a wrapper around a [`HirId`]. For more info about HIR bodies, please refer to the +- A [`BodyId`] identifies a HIR [`Body`] in the current crate. + It is currently only a wrapper around a [`HirId`]. + For more info about HIR bodies, please refer to the [HIR chapter][hir-bodies]. These identifiers can be converted into one another through the `TyCtxt`. @@ -112,8 +121,8 @@ These identifiers can be converted into one another through the `TyCtxt`. ## HIR Operations -Most of the time when you are working with the HIR, you will do so via -`TyCtxt`. It contains a number of methods, defined in the `hir::map` module and +Most of the time when you are working with the HIR, you will do so via `TyCtxt`. +It contains a number of methods, defined in the `hir::map` module and mostly prefixed with `hir_`, to convert between IDs of various kinds and to lookup data associated with a HIR node. @@ -126,8 +135,10 @@ You need a `LocalDefId`, rather than a `DefId`, since only local items have HIR [local_def_id_to_hir_id]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.local_def_id_to_hir_id Similarly, you can use [`tcx.hir_node(n)`][hir_node] to lookup the node for a -[`HirId`]. This returns a `Option>`, where [`Node`] is an enum -defined in the map. By matching on this, you can find out what sort of +[`HirId`]. +This returns a `Option>`, where [`Node`] is an enum +defined in the map. +By matching on this, you can find out what sort of node the `HirId` referred to and also get a pointer to the data itself. Often, you know what sort of node `n` is – e.g. if you know that `n` must be some HIR expression, you can do @@ -148,8 +159,8 @@ calls like [`tcx.parent_hir_node(n)`][parent_hir_node]. ## HIR Bodies A [`rustc_hir::Body`] represents some kind of executable code, such as the body -of a function/closure or the definition of a constant. Bodies are -associated with an **owner**, which is typically some kind of item +of a function/closure or the definition of a constant. +Bodies are associated with an **owner**, which is typically some kind of item (e.g. an `fn()` or `const`), but could also be a closure expression (e.g. `|x, y| x + y`). You can use the `TyCtxt` to find the body associated with a given def-id ([`hir_maybe_body_owned_by`]) or to find diff --git a/src/rustdoc-internals.md b/src/rustdoc-internals.md index 9fd05aa0b..f3fd47812 100644 --- a/src/rustdoc-internals.md +++ b/src/rustdoc-internals.md @@ -108,19 +108,20 @@ Here is the list of passes as of March 2023: - `calculate-doc-coverage` calculates information used for the `--show-coverage` flag. -- `check-doc-test-visibility` runs `doctest` visibility–related `lint`s. This pass - runs before `strip-private`, which is why it needs to be separate from `run-lints`. +- `check-doc-test-visibility` runs `doctest` visibility–related `lint`s. + This pass runs before `strip-private`, + which is why it needs to be separate from `run-lints`. - `collect-intra-doc-links` resolves [intra-doc links](https://doc.rust-lang.org/nightly/rustdoc/write-documentation/linking-to-items-by-name.html). -- `collect-trait-impls` collects `trait` `impl`s for each item in the crate. For - example, if we define a `struct` that implements a `trait`, this pass will note - that the `struct` implements that `trait`. +- `collect-trait-impls` collects `trait` `impl`s for each item in the crate. + For example, if we define a `struct` that implements a `trait`, + this pass will note that the `struct` implements that `trait`. - `propagate-doc-cfg` propagates `#[doc(cfg(...))]` to child items. -- `run-lints` runs some of `rustdoc`'s `lint`s, defined in `passes/lint`. This is - the last pass to run. +- `run-lints` runs some of `rustdoc`'s `lint`s, defined in `passes/lint`. + This is the last pass to run. - `bare_urls` detects links that are not linkified, e.g., in Markdown such as `Go to https://example.com/.` It suggests wrapping the link with angle brackets: @@ -233,7 +234,8 @@ is complicated from two other constraints that `rustdoc` runs under: configurations, such as `libstd` having a single package of docs that cover all supported operating systems. This means `rustdoc` has to be able to generate docs from `HIR`. -* Docs can inline across crates. Since crate metadata doesn't contain `HIR`, +* Docs can inline across crates. + Since crate metadata doesn't contain `HIR`, it must be possible to generate inlined docs from the `rustc_middle` data. The "clean" [`AST`][ast] acts as a common output format for both input formats. diff --git a/src/tests/adding.md b/src/tests/adding.md index 10483265c..7e2b9015f 100644 --- a/src/tests/adding.md +++ b/src/tests/adding.md @@ -2,12 +2,13 @@ **In general, we expect every PR that fixes a bug in rustc to come accompanied by a regression test of some kind.** This test should fail in `main` but pass -after the PR. These tests are really useful for preventing us from repeating the +after the PR. +These tests are really useful for preventing us from repeating the mistakes of the past. -The first thing to decide is which kind of test to add. This will depend on the -nature of the change and what you want to exercise. Here are some rough -guidelines: +The first thing to decide is which kind of test to add. +This will depend on the nature of the change and what you want to exercise. +Here are some rough guidelines: - The majority of compiler tests are done with [compiletest]. - The majority of compiletest tests are [UI](ui.md) tests in the [`tests/ui`] @@ -24,14 +25,17 @@ guidelines: `library/${crate}tests/lib.rs`. - If the code is part of an isolated system, and you are not testing compiler output, consider using a [unit or integration test](intro.md#package-tests). -- Need to run rustdoc? Prefer a `rustdoc` or `rustdoc-ui` test. Occasionally - you'll need `rustdoc-js` as well. +- Need to run rustdoc? + Prefer a `rustdoc` or `rustdoc-ui` test. + Occasionally you'll need `rustdoc-js` as well. - Other compiletest test suites are generally used for special purposes: - - Need to run gdb or lldb? Use the `debuginfo` test suite. - - Need to inspect LLVM IR or MIR IR? Use the `codegen` or `mir-opt` test - suites. - - Need to inspect the resulting binary in some way? Or if all the other test - suites are too limited for your purposes? Then use `run-make`. + - Need to run gdb or lldb? + Use the `debuginfo` test suite. + - Need to inspect LLVM IR or MIR IR? + Use the `codegen` or `mir-opt` test suites. + - Need to inspect the resulting binary in some way? + Or if all the other test suites are too limited for your purposes? + Then use `run-make`. - Use `run-make-cargo` if you need to exercise in-tree `cargo` in conjunction with in-tree `rustc`. - Check out the [compiletest] chapter for more specialized test suites. @@ -47,14 +51,16 @@ modified several years later, how can we make it easier for them?). ## UI test walkthrough The following is a basic guide for creating a [UI test](ui.md), which is one of -the most common compiler tests. For this tutorial, we'll be adding a test for an -async error message. +the most common compiler tests. +For this tutorial, we'll be adding a test for an async error message. ### Step 1: Add a test file The first step is to create a Rust source file somewhere in the [`tests/ui`] -tree. When creating a test, do your best to find a good location and name (see -[Test organization](ui.md#test-organization) for more). Since naming is the +tree. +When creating a test, do your best to find a good location and name (see +[Test organization](ui.md#test-organization) for more). +Since naming is the hardest part of development, everything should be downhill from here! Let's place our async test at `tests/ui/async-await/await-without-async.rs`: @@ -77,19 +83,23 @@ A few things to notice about our test: - The top should start with a short comment that [explains what the test is for](#explanatory_comment). - The `//@ edition:2018` comment is called a [directive](directives.md) which - provides instructions to compiletest on how to build the test. Here we need to + provides instructions to compiletest on how to build the test. + Here we need to set the edition for `async` to work (the default is edition 2015). -- Following that is the source of the test. Try to keep it succinct and to the - point. This may require some effort if you are trying to minimize an example +- Following that is the source of the test. + Try to keep it succinct and to the point. + This may require some effort if you are trying to minimize an example from a bug report. -- We end this test with an empty `fn main` function. This is because the default +- We end this test with an empty `fn main` function. + This is because the default for UI tests is a `bin` crate-type, and we don't want the "main not found" - error in our test. Alternatively, you could add `#![crate_type="lib"]`. + error in our test. + Alternatively, you could add `#![crate_type="lib"]`. ### Step 2: Generate the expected output -The next step is to create the expected output snapshots from the compiler. This -can be done with the `--bless` option: +The next step is to create the expected output snapshots from the compiler. +This can be done with the `--bless` option: ```sh ./x test tests/ui/async-await/await-without-async.rs --bless @@ -99,8 +109,8 @@ This will build the compiler (if it hasn't already been built), compile the test, and place the output of the compiler in a file called `tests/ui/async-await/await-without-async.stderr`. -However, this step will fail! You should see an error message, something like -this: +However, this step will fail! +You should see an error message, something like this: > error: /rust/tests/ui/async-await/await-without-async.rs:7: unexpected > error: '7:10: 7:16: `await` is only allowed inside `async` functions and @@ -112,7 +122,8 @@ annotations in the source file. ### Step 3: Add error annotations Every error needs to be annotated with a comment in the source with the text of -the error. In this case, we can add the following comment to our test file: +the error. +In this case, we can add the following comment to our test file: ```rust,ignore fn bar() { @@ -136,9 +147,10 @@ It should now pass, yay! ### Step 4: Review the output Somewhat hand-in-hand with the previous step, you should inspect the `.stderr` -file that was created to see if it looks like how you expect. If you are adding -a new diagnostic message, now would be a good time to also consider how readable -the message looks overall, particularly for people new to Rust. +file that was created to see if it looks like how you expect. +If you are adding a new diagnostic message, +now would be a good time to also consider how readable the message looks overall, +particularly for people new to Rust. Our example `tests/ui/async-await/await-without-async.stderr` file should look like this: @@ -161,9 +173,9 @@ You may notice some things look a little different than the regular compiler output. - The `$DIR` removes the path information which will differ between systems. -- The `LL` values replace the line numbers. That helps avoid small changes in - the source from triggering large diffs. See the - [Normalization](ui.md#normalization) section for more. +- The `LL` values replace the line numbers. + That helps avoid small changes in the source from triggering large diffs. + See the [Normalization](ui.md#normalization) section for more. Around this stage, you may need to iterate over the last few steps a few times to tweak your test, re-bless the test, and re-review the output. @@ -171,8 +183,10 @@ to tweak your test, re-bless the test, and re-review the output. ### Step 5: Check other tests Sometimes when adding or changing a diagnostic message, this will affect other -tests in the test suite. The final step before posting a PR is to check if you -have affected anything else. Running the UI suite is usually a good start: +tests in the test suite. +The final step before posting a PR is to check if you +have affected anything else. +Running the UI suite is usually a good start: ```sh ./x test tests/ui @@ -188,16 +202,18 @@ You may also need to re-bless the output with the `--bless` flag. ## Comment explaining what the test is about The first comment of a test file should **summarize the point of the test**, and -highlight what is important about it. If there is an issue number associated -with the test, include the issue number. +highlight what is important about it. +If there is an issue number associated with the test, include the issue number. -This comment doesn't have to be super extensive. Just something like "Regression -test for #18060: match arms were matching in the wrong order." might already be -enough. +This comment doesn't have to be super extensive. +Just something like the following might be enough: +"Regression test for #18060: match arms were matching in the wrong order". These comments are very useful to others later on when your test breaks, since -they often can highlight what the problem is. They are also useful if for some +they often can highlight what the problem is. +They are also useful if for some reason the tests need to be refactored, since they let others know which parts -of the test were important. Often a test must be rewritten because it no longer +of the test were important. +Often a test must be rewritten because it no longer tests what it was meant to test, and then it's useful to know what it *was* meant to test exactly. diff --git a/src/traits/separate-projection-bounds.md b/src/traits/separate-projection-bounds.md index 07e88c37b..144e27316 100644 --- a/src/traits/separate-projection-bounds.md +++ b/src/traits/separate-projection-bounds.md @@ -1,20 +1,29 @@ # Having separate `Trait` and `Projection` bounds -Given `T: Foo` where-bound, we currently lower it to a `Trait(Foo)` and separate `Projection(::AssocA, u32)` and `Projection(::AssocB, i32)` bounds. Why do we not represent this as a single `Trait(Foo[T], [AssocA = u32, AssocB = u32]` bound instead? +Given `T: Foo` where-bound, we currently lower it to a `Trait(Foo)` and separate `Projection(::AssocA, u32)` and `Projection(::AssocB, i32)` bounds. +Why do we not represent this as a single `Trait(Foo[T], [AssocA = u32, AssocB = u32]` bound instead? The way we prove `Projection` bounds directly relies on proving the corresponding `Trait` bound: [old solver](https://github.com/rust-lang/rust/blob/461e9738a47e313e4457957fa95ff6a19a4b88d4/compiler/rustc_trait_selection/src/traits/project.rs#L898) [new solver](https://github.com/rust-lang/rust/blob/461e9738a47e313e4457957fa95ff6a19a4b88d4/compiler/rustc_next_trait_solver/src/solve/normalizes_to/mod.rs#L37-L41). It feels like it might make more sense to just have a single implementation which checks whether a trait is implemented and returns (a way to compute) its associated types. -This is unfortunately quite difficult, as we may use a different candidate for norm than for the corresponding trait bound. See [alias-bound vs where-bound](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-always-consider-aliasbound-candidates) and [global where-bound vs impl](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-prefer-global-where-bounds-over-impls). +This is unfortunately quite difficult, as we may use a different candidate for norm than for the corresponding trait bound. +See [alias-bound vs where-bound](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-always-consider-aliasbound-candidates) and [global where-bound vs impl](https://rustc-dev-guide.rust-lang.org/solve/candidate-preference.html#we-prefer-global-where-bounds-over-impls). -There are also some other subtle reasons for why we can't do so. The most stupid is that for rigid aliases, trying to normalize them does not consider any lifetime constraints from proving the trait bound. This is necessary due to a lack of assumptions on binders - https://github.com/rust-lang/trait-system-refactor-initiative/issues/177 - and should be fixed longterm. +There are also some other subtle reasons for why we can't do so. +The most stupid is that for rigid aliases; +trying to normalize them does not consider any lifetime constraints from proving the trait bound. +This is necessary due to a lack of assumptions on binders - https://github.com/rust-lang/trait-system-refactor-initiative/issues/177 - and should be fixed longterm. -A separate issue is that right now, fetching the `type_of` associated types for `Trait` goals or in shadowed `Projection` candidates can cause query cycles for RPITIT. See https://github.com/rust-lang/trait-system-refactor-initiative/issues/185. +A separate issue is that, right now, +fetching the `type_of` associated types for `Trait` goals or in shadowed `Projection` candidates can cause query cycles for RPITIT. +See https://github.com/rust-lang/trait-system-refactor-initiative/issues/185. There are also slight differences between candidates for some of the builtin impls, these do all seem generally undesirable and I consider them to be bugs which would be fixed if we had a unified approach here. -Finally, not having this split makes lowering where-clauses more annoying. With the current system having duplicate where-clauses is not an issue and it can easily happen when elaborating super trait bounds. We now need to make sure we merge all associated type constraints, e.g. +Finally, not having this split makes lowering where-clauses more annoying. +With the current system having duplicate where-clauses is not an issue and it can easily happen when elaborating super trait bounds. +We now need to make sure we merge all associated type constraints, e.g.: ```rust trait Super { @@ -36,4 +45,3 @@ trait Trait<'a>: Super<'a, A = i32> {} // how to elaborate // T: Trait<'a> + for<'b> Super<'b, B = u32> ``` - diff --git a/src/ty-module/generic-arguments.md b/src/ty-module/generic-arguments.md index cbf7b2c68..295b654e0 100644 --- a/src/ty-module/generic-arguments.md +++ b/src/ty-module/generic-arguments.md @@ -24,12 +24,14 @@ Adt(&'tcx AdtDef, GenericArgs<'tcx>) There are two parts: - The [`AdtDef`][adtdef] references the struct/enum/union but without the values for its type - parameters. In our example, this is the `MyStruct` part *without* the argument `u32`. + parameters. + In our example, this is the `MyStruct` part *without* the argument `u32`. (Note that in the HIR, structs, enums and unions are represented differently, but in `ty::Ty`, they are all represented using `TyKind::Adt`.) - The [`GenericArgs`] is a list of values that are to be substituted -for the generic parameters. In our example of `MyStruct`, we would end up with a list like -`[u32]`. We’ll dig more into generics and substitutions in a little bit. +for the generic parameters. + In our example of `MyStruct`, we would end up with a list like `[u32]`. +We’ll dig more into generics and substitutions in a little bit. [adtdef]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.AdtDef.html [`GenericArgs`]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/type.GenericArgs.html @@ -37,25 +39,29 @@ for the generic parameters. In our example of `MyStruct`, we would end up ### **`AdtDef` and `DefId`** For every type defined in the source code, there is a unique `DefId` (see [this -chapter](../hir.md#identifiers-in-the-hir)). This includes ADTs and generics. In the `MyStruct` -definition we gave above, there are two `DefId`s: one for `MyStruct` and one for `T`. Notice that -the code above does not generate a new `DefId` for `u32` because it is not defined in that code (it -is only referenced). - -`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. There is -essentially a one-to-one relationship between `AdtDef` and `DefId`. You can get the `AdtDef` for a -`DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. `AdtDef`s are all interned, as shown -by the `'tcx` lifetime. +chapter](../hir.md#identifiers-in-the-hir)). +This includes ADTs and generics. +In the `MyStruct` definition we gave above, +there are two `DefId`s: one for `MyStruct` and one for `T`. +Notice that the code above does not generate a new `DefId` for `u32` +because it is not defined in that code (it is only referenced). + +`AdtDef` is more or less a wrapper around `DefId` with lots of useful helper methods. +There is essentially a one-to-one relationship between `AdtDef` and `DefId`. +You can get the `AdtDef` for a `DefId` with the [`tcx.adt_def(def_id)` query][adtdefq]. +`AdtDef`s are all interned, as shown by the `'tcx` lifetime. [adtdefq]: https://doc.rust-lang.org/nightly/nightly-rustc/rustc_middle/ty/struct.TyCtxt.html#method.adt_def ## Question: Why not substitute “inside” the `AdtDef`? -Recall that we represent a generic struct with `(AdtDef, args)`. So why bother with this scheme? +Recall that we represent a generic struct with `(AdtDef, args)`. +So why bother with this scheme? Well, the alternate way we could have chosen to represent types would be to always create a new, -fully-substituted form of the `AdtDef` where all the types are already substituted. This seems like -less of a hassle. However, the `(AdtDef, args)` scheme has some advantages over this. +fully-substituted form of the `AdtDef` where all the types are already substituted. +This seems like less of a hassle. +However, the `(AdtDef, args)` scheme has some advantages over this. First, `(AdtDef, args)` scheme has an efficiency win: @@ -68,7 +74,8 @@ struct MyStruct { ``` in an example like this, we can instantiate `MyStruct` as `MyStruct` (and so on) very cheaply, -by just replacing the one reference to `A` with `B`. But if we eagerly instantiated all the fields, +by just replacing the one reference to `A` with `B`. +But if we eagerly instantiated all the fields, that could be a lot more work because we might have to go through all of the fields in the `AdtDef` and update all of their types. @@ -83,7 +90,9 @@ definition of that name, and not carried along “within” the type itself). Given a generic type `MyType`, we have to store the list of generic arguments for `MyType`. -In rustc this is done using [`GenericArgs`]. `GenericArgs` is a thin pointer to a slice of [`GenericArg`] representing a list of generic arguments for a generic item. For example, given a `struct HashMap` with two type parameters, `K` and `V`, the `GenericArgs` used to represent the type `HashMap` would be represented by `&'tcx [tcx.types.i32, tcx.types.u32]`. +In rustc this is done using [`GenericArgs`]. +`GenericArgs` is a thin pointer to a slice of [`GenericArg`] representing a list of generic arguments for a generic item. +For example, given a `struct HashMap` with two type parameters, `K` and `V`, the `GenericArgs` used to represent the type `HashMap` would be represented by `&'tcx [tcx.types.i32, tcx.types.u32]`. `GenericArg` is conceptually an `enum` with three variants, one for type arguments, one for const arguments and one for lifetime arguments. In practice that is actually represented by [`GenericArgKind`] and [`GenericArg`] is a more space efficient version that has a method to @@ -146,7 +155,8 @@ The construct `MyStruct::::func::` is represented by a tuple: a The [`ty::Generics`] type (returned by the [`generics_of`] query) contains the information of how a nested hierarchy gets flattened down to a list, and lets you figure out which index in the `GenericArgs` list corresponds to which -generic. The general theme of how it works is outermost to innermost (`T` before `T2` in the example), left to right +generic. +The general theme of how it works is outermost to innermost (`T` before `T2` in the example), left to right (`T2` before `T3`), but there are several complications: - Traits have an implicit `Self` generic parameter which is the first (i.e. 0th) generic parameter. Note that `Self` doesn't mean a generic parameter in all situations, see [Res::SelfTyAlias](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def/enum.Res.html#variant.SelfTyAlias) and [Res::SelfCtor](https://doc.rust-lang.org/nightly/nightly-rustc/rustc_hir/def/enum.Res.html#variant.SelfCtor).