clippy: warn for large_stack_arrays lint#11868
Conversation
|
GNU testsuite comparison: |
Merging this PR will improve performance by 27.43%
Performance Changes
Comparing Footnotes
|
4edfb02 to
10bb935
Compare
10bb935 to
5a39ee5
Compare
@drinkcat Interesting that we see improvements for |
I see this when clicking the links, not sure to understand how the simulation environment works (I haven't contributed to coreutils in a... while):
I'd be curious to see numbers on real hardware if this is a simulation, the fact that syscalls are not modeled is a concern too: the main advantage of a larger buffer should be reduced system calls. |
|
Is it easy to change config? Tere is a warning about syscalls:
|
5c989de to
b55bfc5
Compare
|
@oech3 I did see small ~1% regression locally (but statistical outliers were detected as I don't have taskset available): $ seq 10000000000000 1 inf | head -n 100000000 > seq100M
$ cargo build -r -p uu_wc && hyperfine --warmup 10 -L wc ./wc.main,target/release/wc "{wc} -l seq100M" |
0419264 to
bd96bbd
Compare
bd96bbd to
e9cb33d
Compare
|
@drinkcat CodSpeed still reports a significant performance difference even with a consistent runtime environment. However, the size of the improvement doesn’t seem credible, so I think you’re right that CodSpeed is likely flawed due to not modeling syscalls. |
@oech3 This benchmark had 139 system calls with e9cb33d but only 41 before, so the reported 27.43% improvement is not meaningful due to missing instrumentation. |
|
Thanks! |
|
|
If |
|
GNU allocates on stack with |
This comment was marked as off-topic.
This comment was marked as off-topic.
|
@oech3 Would you run this benchmark of your system to double check performance, I had statistical outliers on macOS (taskset was not available): $ seq 10000000000000 1 inf | head -n 100000000 > seq100M
$ taskset -c 0 hyperfine --warmup 10 -L wc ./wc.40070cadf,/wc.4704caee0 "{wc} -l seq100M" |
This comment was marked as resolved.
This comment was marked as resolved.
This should be valid for all platforms since |
|
Are you sure the benchmark direction is correct? A ~14% improvement with a smaller buffer size seems unexpected. |
|
Rust always 0-fill stack. |
|
|
You have +1 0 at head which masks O(1) 0-fill overhead. |
|
|
Set
array-size-thresholdto ~64 KiB. This is relatively large, but matches existing usage in some areas.CodSpeed results are not reliable due to the absence of syscall measurement. In local benchmarks using hyperfine, a ~1% regression was observed: #11868 (comment).
https://rust-lang.github.io/rust-clippy/rust-1.95.0/index.html#large_stack_arrays