interpret: properly check for inhabitedness of nested references#156977
interpret: properly check for inhabitedness of nested references#156977RalfJung wants to merge 1 commit into
Conversation
| ty::Coroutine(..) => { | ||
| true // FIXME should these really be trivially inhabited? | ||
| } | ||
| ty::CoroutineClosure(..) => { | ||
| true // FIXME should these really be trivially inhabited? | ||
| } |
There was a problem hiding this comment.
I wasn't sure how to recurse into these. Are coroutines always inhabited via a trivial start state, or can they be uninhabited due to capturing ! as an "upvar"? How do coroutine closures work?
There was a problem hiding this comment.
Looks like yes they can be uninhabited
Coroutine(DefId(0:13 ~ diverges[9ce3]::async_let::{closure#0}), [(), std::future::ResumeTy, (), !, (Void,)]) is ABI-uninhabited but not opsem-uninhabited?
There was a problem hiding this comment.
I now made them all check the upvar_tys, but I am not entirely sure if that is enough.
There was a problem hiding this comment.
For coroutines, checking upvars is enough. The coroutine state is pretty much struct { upvars, enum { Unresumed, Returned, Panicked, Suspend0 { .. }, Suspent1 { .. }, .. } }
Note that #135527 proposes to change this to enum { Unresumed { upvars }, Returned, Panicked, Suspend0 { .. }, Suspent1 { .. }, .. }. Starting from that PR, the enum state will be inhabited in all cases.
Coroutine closures work exactly like closures.
| len.try_to_target_usize(tcx).unwrap() == 0 | ||
| || is_opsem_inhabited_recursor(elem, tcx, root, adt_handler) | ||
| } | ||
| ty::Pat(inner, _pat) => is_opsem_inhabited_recursor(inner, tcx, root, adt_handler), |
There was a problem hiding this comment.
I guess in theory the pattern could make a type uninhabited... so technically if we ever want to use that for the opsem we have to add it has a check here before pattern types get stabilized.
There was a problem hiding this comment.
none of the current patterns can, but yes, that may be possible in the future
| /// | ||
| /// When we git an ADT, we call `adt_handler`, giving it as its last argument a closure that it | ||
| /// can invoke to continue the recursion. | ||
| fn is_opsem_inhabited_recursor<'tcx>( |
There was a problem hiding this comment.
Do we need to do something to bound this recursion? We already stop when encountering the same ADT again, so recursion is bounded by the depth of ADT field types until it comes back to the original type.
Do we need to do some stack growing magic?
There was a problem hiding this comment.
it's possible this can't be hit due to other stack growth protections, but theoretically you can create a very nested tuple, or just &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&T with a lot of repetitions of the reference. I'd honestly just wait for an issue
This comment has been minimized.
This comment has been minimized.
4e90bd5 to
e6d1440
Compare
This comment has been minimized.
This comment has been minimized.
e6d1440 to
33dc892
Compare
This comment has been minimized.
This comment has been minimized.
33dc892 to
846fa4f
Compare
This comment has been minimized.
This comment has been minimized.
846fa4f to
8d272fd
Compare
This comment has been minimized.
This comment has been minimized.
8d272fd to
06e6db9
Compare
This comment has been minimized.
This comment has been minimized.
06e6db9 to
f328265
Compare
| /// | ||
| /// When we git an ADT, we call `adt_handler`, giving it as its last argument a closure that it | ||
| /// can invoke to continue the recursion. | ||
| fn is_opsem_inhabited_recursor<'tcx>( |
There was a problem hiding this comment.
it's possible this can't be hit due to other stack growth protections, but theoretically you can create a very nested tuple, or just &&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&T with a lot of repetitions of the reference. I'd honestly just wait for an issue
| len.try_to_target_usize(tcx).unwrap() == 0 | ||
| || is_opsem_inhabited_recursor(elem, tcx, root, adt_handler) | ||
| } | ||
| ty::Pat(inner, _pat) => is_opsem_inhabited_recursor(inner, tcx, root, adt_handler), |
There was a problem hiding this comment.
none of the current patterns can, but yes, that may be possible in the future
f328265 to
bffa3cd
Compare
| // Bailing out here is unfortuante as it means that the recursion limit affects the | ||
| // operational semantics... but what else could we do? | ||
| return true; | ||
| } |
There was a problem hiding this comment.
I pushed a proper recursion check including a recursion limit check.
But... I am not sure it's sound. This means that depending on the recursion limit, different crates might consider the same type inhabited or not.
OTOH if we just hard-code e.g. 256 here, then with a sufficiently high query recursion limit one can write a 257-level-nested type (Wrap<Wrap<...<Wrap<!>>...>>) that layout will still consider uninhabited but this check will give up.
There was a problem hiding this comment.
Why exactly is this check soundness-critical? I don't quite follow...
As far as I can tell, it seems like this is just a courtesy best-effort check for UB in consteval, and a sanity check after computing layout?
There was a problem hiding this comment.
This is also used by Miri so it is supposed to define what is and isn't UB.
Also our query system relies on cross-crate-consistent results, doesn't it?
There was a problem hiding this comment.
@camsteffen proposed to do something more like inhabited_predicate_adt. I must admit that I do not understand how that query works -- I can see that it produces a predicate and ships it to the trait solver, but how is that helpful for recursive types?
There was a problem hiding this comment.
Sorry, that was kind of a red herring.
More to the point is check_representability. That query uses query cycle detection to detect an infinite sized type and abort compilation.
So I think we could have a check_opsem_inhabited query which also has query cycle detection, but instead of aborting, return OpsemInhabited::False or something like that for the query result.
check_representability uses params_in_repr, so check_opsem_inhabited could use something similar like params_in_repr_or_ref to include references.
There was a problem hiding this comment.
Another idea, that lcnr proposed, is that we can error on overflow. I.e. when computing inhabitedness of a type, if we reach the recursion limit, emit a hard error.
Since an error prevents the code from being compiled, there is no problem with linking two different crates with different recursion limits -- they either both get the same result, or one of them doesn't compile.
We can still support (assume inhabited) types which refer to themselves directly like
struct A(&'static A);And error only if the same type is mentioned with different generic parameters and overflows:
struct B<T: 'static>(&'static B<(T, T)>);This is technically a breaking change, but it seems fine not to support types which infinitely grow like this (or am I missing something?).
There was a problem hiding this comment.
But this seems fine, considering how much of an edge case this is?
Given,
struct Wrap<T>(T);I believe that the algorithm you proposed would say that &Wrap<Wrap<!>> is inhabited, which doesn't seem like a particularly strange edge case. @WaffleLapkin
There was a problem hiding this comment.
This is technically a breaking change, but it seems fine not to support types which infinitely grow like this (or am I missing something?).
Given that #138599 fixed multiple issues, it seems that there exist some use cases where people want such strange recursive types. We can't know how much that is before we run crater though.
There was a problem hiding this comment.
I believe that the algorithm you proposed would say that
&Wrap<Wrap<!>>is inhabited, which doesn't seem like a particularly strange edge case. @WaffleLapkin
I personally think that it's "fine", especially given that this improves over the status quo. But I suppose we could also allow recursing through the same definition a set number of times (i.e. setting a recursion limit higher than 1)...
There was a problem hiding this comment.
@WaffleLapkin I like that first version, and I think I can combine it with an idea I had to avoid the problem @theemathas mentioned. This avoids any dependency on the recursion limit while also still satisfying the desired implication "layout inhabited => opsem inhabited". I pushed that now.
This comment has been minimized.
This comment has been minimized.
bffa3cd to
2552ff6
Compare
This comment has been minimized.
This comment has been minimized.
|
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
|
@rustbot ready |
| | ty::Param(..) | ||
| | ty::Alias(..) | ||
| | ty::CoroutineWitness(..) => { | ||
| bug!("non-normalized type in `is_opsem_uninhabited_raw::rec`: `{ty}`") |
There was a problem hiding this comment.
| bug!("non-normalized type in `is_opsem_uninhabited_raw::rec`: `{ty}`") | |
| bug!("non-normalized type in `is_opsem_uninhabited_recursor`: `{ty}`") |
84e3817 to
19c1da7
Compare
| // ADTs need a special handler to avoid infinite recursion. That handler is meant to | ||
| // call back into the recursor. Ideally it'd just call `is_opsem_inhabited_recursor` but | ||
| // then it would have to pass itself as the adt_handler argument which is not possible | ||
| // in Rust... so we provide the handler with a callback that it can use to continue the | ||
| // recurison with the same `adt_handler`. |
There was a problem hiding this comment.
Maybe it would be cleaner to have a struct that manually implements a trait? I kinda hate that less than having this confusing knot of callbacks.
There was a problem hiding this comment.
Alternatively, we could have a function that takes some arguments and returns the needed adt_handler. Each time we need an adt_handler, we call this function instead of trying (and failing) to use the closure itself.
There was a problem hiding this comment.
Returning a closure is awkward and requires Box. And personally I prefer the current approach over a new trait. But I can add a trait if that becomes a blocking concern.
This comment has been minimized.
This comment has been minimized.
19c1da7 to
09db86e
Compare
|
@craterbot cancel See #157814 |
|
🗑️ Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
|
👌 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
|
🚧 Experiment ℹ️ Crater is a tool to run experiments across parts of the Rust ecosystem. Learn more |
|
🎉 Experiment
Footnotes
|
|
Crater looks good, so @WaffleLapkin this is ready for review. :) |
| ty::Pat(inner, _pat) => { | ||
| is_opsem_inhabited_recursor(inner, tcx, seen, stop_at_ref, adt_handler) | ||
| } |
There was a problem hiding this comment.
When pattern types start supporting enums, we'll need to decide if type X here is inhabited or not:
#![feature(pattern_types)]
#![feature(pattern_type_macro)]
#![feature(never_type)]
enum E {
A,
B(!),
}
type X = pattern_type!(E is E::B(_));| } | ||
| } | ||
|
|
||
| fn is_opsem_inhabited_raw<'tcx>( |
There was a problem hiding this comment.
I feel like the code is fairly confusing with the closures & mutual recursion.
This query is only called for ADTs. I think if you change it to be is_adt_opsem_inhabited(tcx, adt_def, adt_args, seen) the code will become a lot clearer.
Then you can have
is_opsem_inhabitedcallsis_opsem_inhabited_recursoris_opsem_inhabited_recursorcallsis_adt_opsem_inhabitedfor adtsis_adt_opsem_inhabitedcallsis_opsem_inhabited_recursordirectly
I think this should work & make the code a lot easier to follow.
| // If we have seen this ADT before, stop at the next reference to avoid infinite | ||
| // recursion. We can't stop here since we have to ensure that "layout inhabited" | ||
| // implies "opsem inhabited". | ||
| let stop_at_ref = !new_adt; |
There was a problem hiding this comment.
Q: wasn't the thing that we want "layout uninhabited" => "opsem uninhabited"?
|
|
||
| /// Recurse over a type to determine whether it is inhabited on the opsem level. | ||
| /// Key constraints are: | ||
| /// - if a type's validity invariant is satisfiable, it must be opsem-inhabited. | ||
| /// - if a type's layout is marked uninhabited, it must be opsem-uninhabited. | ||
| /// | ||
| /// Beyond that, the value returned by this function is not a stable guarantee. |
There was a problem hiding this comment.
I feel like this should be documented on the public function.
| @@ -3,7 +3,7 @@ use std::mem::{forget, transmute}; | |||
|
|
|||
| fn main() { | |||
| unsafe { | |||
| let x: Box<!> = transmute(&mut 42); //~ERROR: encountered a box pointing to uninhabited type ! | |||
| let x: Box<!> = transmute(&mut 42); //~ERROR: encountered a box pointing to uninhabited type `!` | |||
There was a problem hiding this comment.
Q: Do we need to special case Box in is_opsem_inhabited?
| @@ -4,6 +4,6 @@ enum Void {} | |||
|
|
|||
| fn main() { | |||
| unsafe { | |||
| let _x: &(i32, Void) = transmute(&42); //~ERROR: encountered a reference pointing to uninhabited type (i32, Void) | |||
| let _x: &&(i32, Void) = transmute(&&42); //~ERROR: encountered a reference pointing to uninhabited type `&(i32, Void)` | |||
There was a problem hiding this comment.
Q: why change this?
| // If we just unfold this type going down the first variant of every enum, we'll never stop; we'll | ||
| // never even encounter the same type a second time. | ||
| struct S<T: 'static>(&'static S<(T, T)>, PhantomData<T>); | ||
| const C: &Result<S<()>, ()> = &Err(()); |
There was a problem hiding this comment.
"every enum"?
View all comments
This implements the opsem from the ongoing FCP in rust-lang/unsafe-code-guidelines#413. The bit we were previously missing is that transmuting a
&&!into existence was not caught as being immediate UB -- only the&!case behaved as expected.I did not adjust the layout computation because when we compute the layout of
&T, we cannot know the layout ofT(as that might be recursive).r? @oli-obk