Skip to content

Add unstable loop unrolling hint attributes#156816

Open
saethlin wants to merge 3 commits into
rust-lang:mainfrom
saethlin:loop-attributes
Open

Add unstable loop unrolling hint attributes#156816
saethlin wants to merge 3 commits into
rust-lang:mainfrom
saethlin:loop-attributes

Conversation

@saethlin
Copy link
Copy Markdown
Member

@saethlin saethlin commented May 22, 2026

View all comments

Tracking issue: #156874

This adds as new attribute #[unroll]/#[unroll(full)]/#[unroll(never)]/#[unroll(16)] (or any u32).

#[unroll] is behind a new feature gate #![feature(loop_hints)] because I intend to add an attribute for loop vectorization as well. If a user wants to turn off loop unrolling to locally minimize code size, LLVM may vectorize the loop even though it isn't unrolled which can produce a similar code size explosion.

@rustbot rustbot added A-attributes Area: Attributes (`#[…]`, `#![…]`) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 22, 2026
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@saethlin saethlin force-pushed the loop-attributes branch 2 times, most recently from 9c8f21c to 22381f6 Compare May 24, 2026 21:15
@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

@rust-bors

This comment has been minimized.

@saethlin saethlin force-pushed the loop-attributes branch 2 times, most recently from 4d232fd to 9c8493b Compare May 31, 2026 20:58
@saethlin saethlin marked this pull request as ready for review May 31, 2026 20:58
@rustbot rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 31, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 31, 2026

Some changes occurred in compiler/rustc_passes/src/check_attr.rs

cc @jdonszelmann, @JonathanBrouwer

Some changes occurred in compiler/rustc_attr_parsing

cc @jdonszelmann, @JonathanBrouwer

Some changes occurred in compiler/rustc_hir/src/attrs

cc @jdonszelmann, @JonathanBrouwer

Some changes occurred to MIR optimizations

cc @rust-lang/wg-mir-opt

Some changes occurred in match lowering

cc @Nadrieril

Some changes occurred in coverage instrumentation.

cc @Zalathar

@rustbot rustbot removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label May 31, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented May 31, 2026

r? @folkertdev

rustbot has assigned @folkertdev.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

Why was this reviewer chosen?

The reviewer was selected based on:

  • Owners of files modified in this PR: compiler
  • compiler expanded to 73 candidates
  • Random selection from 17 candidates

@saethlin saethlin changed the title Prototype loop unrolling hint attributes Add unstable loop unrolling hint attributes May 31, 2026
@JonathanBrouwer
Copy link
Copy Markdown
Contributor

(would like to take a look at this as well, should have time in the next few days)

@JonathanBrouwer JonathanBrouwer self-assigned this May 31, 2026
@rust-log-analyzer

This comment has been minimized.

Copy link
Copy Markdown
Contributor

@folkertdev folkertdev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable, I'll let Jonathan take a closer look at the attribute stuff.

Maybe I'm missing something or am just used to different parts of the code base, but some style things seemed off to me. Feel free to disregard that though, I guess.

View changes since this review

Comment thread compiler/rustc_middle/src/mir/terminator.rs Outdated
Comment thread compiler/rustc_middle/src/mir/mod.rs Outdated
Comment thread compiler/rustc_codegen_llvm/src/builder.rs Outdated
Comment thread compiler/rustc_codegen_llvm/src/builder.rs Outdated
Comment thread compiler/rustc_codegen_llvm/src/builder.rs Outdated
Comment thread compiler/rustc_attr_parsing/src/attributes/unroll.rs
Comment thread compiler/rustc_attr_parsing/src/attributes/unroll.rs Outdated
Comment thread compiler/rustc_mir_build/src/thir/cx/expr.rs Outdated
#[no_mangle]
pub fn unroll_count() {
// CHECK-LABEL: @unroll_count
// CHECK: !llvm.loop ![[COUNT:[0-9]+]]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is checking the actual number tricky? or does it not map one-to-one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am checking the number, it's at the very bottom of this file. Loop metadata looks like this:

bb7:                                              ; preds = %bb3
  call void @maybe_has_side_effect() #5
  br label %bb1, !llvm.loop !11 
...
!11 = distinct !{!11, !12}
!12 = !{!"llvm.loop.unroll.count", i32 5}

@folkertdev
Copy link
Copy Markdown
Contributor

It might be due to layout changes because some types got bigger so now something is aligned that wasn't before? Still that is a massive change.

Copy link
Copy Markdown
Contributor

@JonathanBrouwer JonathanBrouwer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not reviewed everything yet, but I think these are the biggest points.
PR looks good in general and happy to see this new feature :)

View changes since this review

MacroCall,
Crate,
Delegation { mac: bool },
ForLoop,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add this information as a field of Target::Expr, rather than its own target type?
I think this is confusing because now not all expressions produce Target::Expr

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Furthermore, does it make sense to combine these three to just Loop, or do they need to be separate targets?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For #[loop_match] it's kind of nice to just have a Loop one corresponding to loop {}, but that's validated separately further down the line too.

Copy link
Copy Markdown
Contributor

@JonathanBrouwer JonathanBrouwer Jun 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right if that attribute is only valid on Loop then keeping the seperate targets is perfectly reasonable

use super::*;
// tidy-alphabetical-start
static_assert_size!(BasicBlockData<'_>, 152);
static_assert_size!(BasicBlockData<'_>, 160);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The perf result of (probably) this change is a bad sad, can we improve that?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I looked through the cachegrind diffs and I'm pretty sure the perf impact is mostly caused by adding a new field to an encoded struct (MIR Terminators), not by the size increase of the struct.

I did think about where to stash the data for a while. In THIR I was able to create a single collection for each Body, which means that the effect when not used is a single empty collection (or single zero byte when encoding/decoding). But in MIR, I want this to have a chance of surviving MIR optimizations, so putting attributes on something like a FxHashMap<BasicBlock, Vec<Attribute>> would mean that we'd need to repair that mapping any time a basic block was added or removed by a MIR transform. That sounds hard to maintain. So I think the only viable approach is to attach this to the Goto Terminators somehow, and there are often many terminators per basic block, so it's not shocking that the overhead surfaces here.

There are things I could do here so I'll try one, but you might not like it 😛

}
}
ArgParser::NameValue(_) => {
cx.adcx().warn_ill_formed_attribute_input(ILL_FORMED_ATTRIBUTE_INPUT);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be .expected_list_or_no_args

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should start linting against weird parsing practices :3 we totally can detect these. @JonathanBrouwer

}
}

match l.meta_item().and_then(|i| i.path().word_sym()) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This forgets to check whether l has any arguments, use the new meta_item_no_args method introduced in #155193 (which is in the queue atm)

print_tup!(A B C D E F G H);
print_skip!(Span, (), ErrorGuaranteed, AttrId);
print_disp!(u8, u16, u32, u128, usize, bool, NonZero<u32>, Limit);
print_disp!(u8, u16, u32, u64, u128, usize, bool, NonZero<u32>, Limit);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the u64 used anywhere?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops. Previously I was accepting any u64 in the unroll count, but I changed to u32 because that's what clang does.

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jun 1, 2026
@saethlin
Copy link
Copy Markdown
Member Author

saethlin commented Jun 1, 2026

I think the image perf result is just something broken in collection. https://rust-lang.zulipchat.com/#narrow/channel/247081-t-compiler.2Fperformance/topic/sus.20perf.20results/near/599190868

@Kobzol
Copy link
Copy Markdown
Member

Kobzol commented Jun 1, 2026

Let's try again to see if it persists.

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 1, 2026
@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Jun 1, 2026
Add unstable loop unrolling hint attributes
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Jun 1, 2026

☀️ Try build successful (CI)
Build commit: 57c302e (57c302e0d23431b9eb5bf77a5403230fb921e506, parent: 968d50ad35115bc2c8c19cb9039f7ed3dfe56a81)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (57c302e): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.3%] 21
Regressions ❌
(secondary)
0.2% [0.0%, 0.3%] 21
Improvements ✅
(primary)
-36.8% [-77.8%, -11.1%] 4
Improvements ✅
(secondary)
-0.4% [-0.6%, -0.1%] 3
All ❌✅ (primary) -5.7% [-77.8%, 0.3%] 25

Max RSS (memory usage)

Results (primary -2.1%, secondary 1.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.7% [0.6%, 4.6%] 10
Regressions ❌
(secondary)
1.6% [0.5%, 2.8%] 12
Improvements ✅
(primary)
-14.8% [-24.2%, -1.9%] 3
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.1% [-24.2%, 4.6%] 13

Cycles

Results (primary -37.5%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-37.5% [-78.5%, -11.0%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -37.5% [-78.5%, -11.0%] 4

Binary size

Results (primary 0.3%, secondary 0.4%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.3% [0.0%, 0.7%] 87
Regressions ❌
(secondary)
0.4% [0.0%, 1.4%] 70
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [0.0%, 0.7%] 87

Bootstrap: 511.125s -> 514.465s (0.65%)
Artifact size: 400.79 MiB -> 401.10 MiB (0.08%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 1, 2026
@rustbot
Copy link
Copy Markdown
Collaborator

rustbot commented Jun 2, 2026

This PR was rebased onto a different main commit. Here's a range-diff highlighting what actually changed.

Rebasing is a normal part of keeping PRs up to date, so no action is needed—this note is just to help reviewers.

@saethlin
Copy link
Copy Markdown
Member Author

saethlin commented Jun 2, 2026

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 2, 2026
@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Jun 2, 2026
Add unstable loop unrolling hint attributes
@rust-log-analyzer
Copy link
Copy Markdown
Collaborator

The job x86_64-gnu-tools failed! Check out the build log: (web) (plain enhanced) (plain)

Click to see the possible cause of the failure (guessed by this bot)
[TIMING:end] compile::StdLink { compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target_compiler: Compiler { stage: 0, host: x86_64-unknown-linux-gnu, forced_compiler: false }, target: x86_64-unknown-linux-gnu, crates: [], force_recompile: false } -- 0.001
##[group]Building stage1 compiler artifacts (stage0 -> stage1, x86_64-unknown-linux-gnu)
error: process didn't exit successfully: `sccache /checkout/obj/build/bootstrap/debug/rustc -vV` (exit status: 2)
--- stderr
sccache: error: Timed out waiting for server startup. Maybe the remote service is unreachable?
Run with SCCACHE_LOG=debug SCCACHE_NO_DAEMON=1 to get more information

Bootstrap failed while executing `build --stage 2 compiler rustdoc`
Build completed unsuccessfully in 0:00:30
  local time: Tue Jun  2 02:51:46 UTC 2026
  network time: Tue, 02 Jun 2026 02:51:46 GMT

@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Jun 2, 2026

☀️ Try build successful (CI)
Build commit: c1e5e0f (c1e5e0ffb239a664235b939ffad67af5426f4d9f, parent: 4a31759ad18b3c29c5ec99ca23c4764a8bedcf52)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (c1e5e0f): comparison URL.

Overall result: ❌ regressions - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.2% [0.1%, 0.3%] 17
Regressions ❌
(secondary)
0.2% [0.0%, 0.6%] 13
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.1%] 1
All ❌✅ (primary) 0.2% [0.1%, 0.3%] 17

Max RSS (memory usage)

Results (primary 0.1%, secondary 2.9%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
1.3% [0.5%, 2.2%] 3
Regressions ❌
(secondary)
4.7% [1.4%, 6.6%] 3
Improvements ✅
(primary)
-3.7% [-3.7%, -3.7%] 1
Improvements ✅
(secondary)
-2.7% [-2.7%, -2.7%] 1
All ❌✅ (primary) 0.1% [-3.7%, 2.2%] 4

Cycles

Results (primary -2.1%, secondary 5.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
5.1% [2.9%, 7.3%] 2
Improvements ✅
(primary)
-2.1% [-2.1%, -2.1%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -2.1% [-2.1%, -2.1%] 1

Binary size

Results (primary -0.0%, secondary -0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.0%] 12
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 11
Improvements ✅
(primary)
-0.0% [-0.1%, -0.0%] 26
Improvements ✅
(secondary)
-0.1% [-0.1%, -0.0%] 20
All ❌✅ (primary) -0.0% [-0.1%, 0.0%] 38

Bootstrap: 511.17s -> 513.02s (0.36%)
Artifact size: 400.72 MiB -> 400.98 MiB (0.06%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jun 2, 2026
@rust-bors
Copy link
Copy Markdown
Contributor

rust-bors Bot commented Jun 2, 2026

☔ The latest upstream changes (presumably #157303) made this pull request unmergeable. Please resolve the merge conflicts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-attributes Area: Attributes (`#[…]`, `#![…]`) A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. perf-regression Performance regression. S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

9 participants