match: Use an aggregate equality comparison for constant array/slice patterns when possible#155216
match: Use an aggregate equality comparison for constant array/slice patterns when possible#155216jakubadamw wants to merge 10 commits intorust-lang:mainfrom
Conversation
|
Some changes occurred in match lowering cc @Nadrieril |
|
rustbot has assigned @JonathanBrouwer. Use Why was this reviewer chosen?The reviewer was selected based on:
|
This comment has been minimized.
This comment has been minimized.
|
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
match: Use an aggregate equality comparison for constant array/slice patterns when possible
This comment has been minimized.
This comment has been minimized.
|
Finished benchmarking commit (5a6dba6): comparison URL. Overall result: ❌✅ regressions and improvements - no action neededBenchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf. @bors rollup=never Instruction countOur most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.
Max RSS (memory usage)Results (primary -0.0%, secondary 0.5%)A less reliable metric. May be of interest, but not used to determine the overall result above.
CyclesResults (primary -2.3%, secondary 14.4%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Binary sizeResults (primary -0.1%, secondary 0.1%)A less reliable metric. May be of interest, but not used to determine the overall result above.
Bootstrap: 491.114s -> 490.988s (-0.03%) |
|
@rustbot reroll |
| if let PatKind::Constant { value } = pat.kind { | ||
| Some(ty::Const::new_value(tcx, value.valtree, value.ty)) | ||
| } else { | ||
| None | ||
| } |
There was a problem hiding this comment.
It might also be worth reconstructing aggregate constants for arrays/slices of arrays of constants, etc.? I'm not a specialization expert, but it looks like arrays of bytewise-comparable things are also bytewise-comparable, at least for common array lengths1. Since array and slice equality are specialized based on their element types' bytewise-comparability, we should be able to get better codegen for nested array patterns too (as long as the inner arrays are of one of those common lengths), I think?
Footnotes
There was a problem hiding this comment.
@dianne, interesting. I’ll look into this next! 🙂
| // When there is no `..`, all elements are constants, and | ||
| // there are at least two of them, collapse the individual | ||
| // element subpairs into a single aggregate comparison that | ||
| // is performed after the length check. | ||
| if slice.is_none() |
There was a problem hiding this comment.
An additional possibility: even if there is a .., the comparisons for the sub-slices before and after the .. could be done via aggregate equality when applicable. Credit to #121540, which I think did this?
Edit: assuming prefixes and suffixes are typically small and hand-written, it's probably not worth the trouble to use aggregate equality for them.
Even if only handling the case with no .., it might be worth moving the special-casing into prefix_slice_suffix to share it between PatKind::Slice and PatKind::Array, since that's where the commonalities live.
Edit: after prefix_slice_suffix's cleanup in #154943, I don't think it makes much sense to put this in there. I still think the logic for deciding whether to use aggregate equality is complex enough that it could be worth factoring out, but that's probably not the way to do it.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
The general approach feels unfortunate: we transformed constants into patterns in const_to_pat, and now we try to reverse that transformation. Have we tried keeping the original constant around in the output of const_to_pat and using it at runtime? Tho this has the same issue of const-dependent MIR lowering that @dianne pointed out.
There was a problem hiding this comment.
If we do that, maybe we could also synthesize a constant during THIR building for hand-written array/slice patterns, like this PR currently does in MIR building? That way, we wouldn't end up worse codegen for hand-written array patterns than for const array items used as patterns.
There was a problem hiding this comment.
I dunno, small handwritten arrays behave pretty much like tuples, it could be better codegen sometimes not to make them into constants:
match foo {
// compiles to three nested `if`s today, would become 8 sequential `if`s if turned into constants
[false, false, false] => ...,
[false, false, true] => ...,
[false, true, false] => ...,
...
}It's admittedly a stretch but I'm tempted to err on the side of respecting user intent especially before the MIR boundary.
There was a problem hiding this comment.
Oh, right. Maybe we should have a test for that in mir-opt/building/match/sort_candidates.rs or such if we start optimizing array/slice comparisons? I think as-is this PR may also turn that into 8 sequential tests, each a TestKind::AggregateEq for a different constant.
There was a problem hiding this comment.
@Nadrieril, as an alternative to being guided solely by user “intent”, would it be sensible to raise the threshold on the number of elements in an array pattern where we would use the aggregate equality? Right now it’s > 2. Perhaps 4 would work better? I suppose a quantitative comparison with benchmarks could be of use here, but sadly I can’t commit to that with my present schedule.
There was a problem hiding this comment.
On further thought, I think this is an unsound optimization, for example:
/// Safety: if the first byte is zero the rest of the slice
/// must be initialized; otherwise all bets are off.
unsafe fn foo(x: *const [u8]) {
unsafe { match *x {
[0, 1] => ...,
_ => ...,
} }
}This function is sound if patterns were guaranteed to match left-to-right (this isn't set in stone yet, but likely).
Here if you replace this with if PartialEq::eq(&*x, &[0, 1]) not only are you taking a reference which itself has opsem consequences (this may make some later foreign writes invalid), but also may be reading uninitialized data.
I think this convinces me we shouldn't even attempt to do this. Preserving "user intent" actually means preserving correct semantics here.
|
@dianne do you want to take over review here? |
There was a problem hiding this comment.
I don't have much context on the ctfe or const traits side of things to evaluate the approach. cc @oli-obk maybe? It feels like we have to make some sort of compromise here to avoid calling anything const-unstable. Possibly we could block on const_cmp stabilizing, or possibly we could have some workaround until then if we want to land this first.
I think I can handle the technical review, at least. Tentatively, r? me (though feel free to steal the assignment if that'd be easier ^^)
There was a problem hiding this comment.
If we do that, maybe we could also synthesize a constant during THIR building for hand-written array/slice patterns, like this PR currently does in MIR building? That way, we wouldn't end up worse codegen for hand-written array patterns than for const array items used as patterns.
0cd52bb to
3cc9f73
Compare
This comment has been minimized.
This comment has been minimized.
|
@dianne, @Nadrieril, thank you very much for your review and feedback, I really appreciate it. I’ve rebased and addressed the comments. The refactor where we would preserve a trace of the original constant in |
…s when possible
When every element in an array or slice pattern is a constant and there
is no `..` subpattern, the match builder now emits a single call to
`PartialEq::eq` instead of comparing each element one by one.
This drastically reduces the number of MIR basic blocks for large
constant-array matches – e.g. a 64-element `[u8; 64]` match previously
generated 64 separate comparison blocks and now generates just one
`PartialEq::eq` call that LLVM can lower to a `memcmp()`
The optimisation is gated on having at least two constant elements.
Single-element arrays still use a plain scalar comparison.
Example:
```rust
const FOO: [u8; 64] = *b"0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef";
pub fn foo(x: &[u8; 64]) -> bool {
// Before: 64 basic blocks, one per byte.
// After: a single `PartialEq::eq()` call.
matches!(x, &FOO)
}
```
…on a large fixed-length array
…t contexts This is unless the `const_cmp` feature is enabled, in which case `PartialEq` becomes available in said contexts.
…n't get captured by this logic
6de0afc to
a287bc9
Compare
|
This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed. Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers. |
View all comments
When every element in an array or slice pattern is a constant and there is no
..subpattern, the match builder will now emit a single call toPartialEq::eqinstead of comparing each element from the value one by one against the respective constant in the pattern.This drastically reduces the number of MIR basic blocks for large constant-array matches – e.g. a 64-element
[u8; 64]match previously generated 64 separate comparison blocks and now generates just onePartialEq::eqcall that LLVM can lower to amemcmp(). The optimisation is gated on having at least two constant elements, meaning single-element arrays will still use a plain scalar comparison.Example:
Closes #103073.
Closes #110870.