clippy: warn for large_stack_arrays lint by xtqqczze · Pull Request #11868 · uutils/coreutils

xtqqczze · 2026-04-16T20:26:30Z

Set array-size-threshold to ~64 KiB. This is relatively large, but matches existing usage in some areas.

CodSpeed results are not reliable due to the absence of syscall measurement. In local benchmarks using hyperfine, a ~1% regression was observed: #11868 (comment).

https://rust-lang.github.io/rust-clippy/rust-1.95.0/index.html#large_stack_arrays

github-actions · 2026-04-16T20:36:45Z

GNU testsuite comparison:

Skip an intermittent issue tests/pr/bounded-memory (fails in this run but passes in the 'main' branch)
Skip an intermittent issue tests/tail/symlink (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/rm/isatty (passes in this run but fails in the 'main' branch)
Note: The gnu test tests/tail/tail-n0f is now being skipped but was previously passing.
Congrats! The gnu test tests/expand/bounded-memory is now passing!
Congrats! The gnu test tests/seq/seq-epipe is now passing!
Congrats! The gnu test tests/tail/pipe-f is now passing!
Note: The gnu test tests/env/env-signal-handler was skipped on 'main' but is now failing.

codspeed-hq · 2026-04-16T20:51:16Z

Merging this PR will improve performance by 27.43%

⚡ 4 improved benchmarks
✅ 305 untouched benchmarks
⏩ 46 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Simulation	`wc_chars_large_line_count[100000]`	908.4 µs	712.9 µs	+27.43%
⚡	Simulation	`wc_lines_variable_length[(50, 500)]`	3.3 ms	3.2 ms	+5.66%
⚡	Simulation	`wc_lines_large_line_count[500000]`	2.8 ms	2.6 ms	+7%
⚡	Simulation	`wc_lines_extreme_line_lengths[(100000, 200)]`	1.6 ms	1.4 ms	+13.41%

_{Comparing xtqqczze:clippy/large_stack_arrays (e9cb33d) with main (01b7177)}

46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

xtqqczze · 2026-04-16T22:08:50Z

⚡ Simulation wc_chars_large_line_count[100000] 913.3 µs 713.7 µs +27.97%
⚡ Simulation wc_bytes_synthetic[500] 86.1 µs 82.4 µs +4.56%
⚡ Simulation wc_lines_extreme_line_lengths[(100000, 200)] 1.6 ms 1.4 ms +13.72%
⚡ Simulation wc_lines_variable_length[(50, 500)] 3.3 ms 3.2 ms +5.79%
⚡ Simulation wc_bytes_synthetic[1] 86.7 µs 82.9 µs +4.6%
⚡ Simulation wc_lines_large_line_count[500000] 2.8 ms 2.6 ms +7.15%

@drinkcat Interesting that we see improvements for wc when decreasing buffer size from 256 KiB to 64 KiB, after it was increased in #7934.

drinkcat · 2026-04-17T02:09:31Z

⚡ Simulation wc_chars_large_line_count[100000] 913.3 µs 713.7 µs +27.97%
⚡ Simulation wc_bytes_synthetic[500] 86.1 µs 82.4 µs +4.56%
⚡ Simulation wc_lines_extreme_line_lengths[(100000, 200)] 1.6 ms 1.4 ms +13.72%
⚡ Simulation wc_lines_variable_length[(50, 500)] 3.3 ms 3.2 ms +5.79%
⚡ Simulation wc_bytes_synthetic[1] 86.7 µs 82.9 µs +4.6%
⚡ Simulation wc_lines_large_line_count[500000] 2.8 ms 2.6 ms +7.15%

@drinkcat Interesting that we see improvements for wc when decreasing buffer size from 256 KiB to 64 KiB, after it was increased in #7934.

I see this when clicking the links, not sure to understand how the simulation environment works (I haven't contributed to coreutils in a... while):

This benchmark shows a significant performance change and was compared across different runtime environments. Results may be affected.
AMD EPYC 9V74 80-Core Processor→ AMD EPYC 7763 64-Core Processor

I'd be curious to see numbers on real hardware if this is a simulation, the fact that syscalls are not modeled is a concern too: the main advantage of a larger buffer should be reduced system calls.

oech3 · 2026-04-17T03:18:41Z

Is it easy to change config? Tere is a warning about syscalls:

This benchmark contains * system calls, totalling * of execution time. Since they cannot be consistently instrumented, those calls are not included in the measure. Please switch to the Walltime instrument to accurately measure system calls. Learn more about measurement and system calls.

xtqqczze · 2026-04-18T17:40:47Z

@oech3 I did see small ~1% regression locally (but statistical outliers were detected as I don't have taskset available):

$ seq 10000000000000 1 inf | head -n 100000000 > seq100M
$ cargo build -r -p uu_wc && hyperfine --warmup 10 -L wc ./wc.main,target/release/wc "{wc} -l seq100M"

https://rust-lang.github.io/rust-clippy/rust-1.95.0/index.html#large_stack_arrays

xtqqczze · 2026-04-18T20:29:18Z

@drinkcat CodSpeed still reports a significant performance difference even with a consistent runtime environment. However, the size of the improvement doesn’t seem credible, so I think you’re right that CodSpeed is likely flawed due to not modeling syscalls.

xtqqczze · 2026-04-18T20:49:03Z

⚡ Simulation wc_chars_large_line_count[100000] 908.4 µs 712.9 µs +27.43%

@oech3 This benchmark had 139 system calls with e9cb33d but only 41 before, so the reported 27.43% improvement is not meaningful due to missing instrumentation.

cakebaker · 2026-04-19T12:50:44Z

Thanks!

oech3 · 2026-04-19T13:27:57Z

wc_bytes_synthetic

wc's byte count does not use buffer on Linux since we splice() to /dev/full. So we should avoid allocation at splice fast-path.

oech3 · 2026-04-19T13:47:43Z

If wc reads size from metadata at most cases, it might better to avoid stack buffer always.

xtqqczze · 2026-04-19T14:50:28Z

GNU allocates on stack with IO_BUFSIZE = 256 * 1024, so we could look at increasing array-size-threshold if justified by performance. However, CodSpeed is not reliable for evaluating allocation changes.

xtqqczze · 2026-04-19T16:01:15Z

@oech3 Would you run this benchmark of your system to double check performance, I had statistical outliers on macOS (taskset was not available):

$ seq 10000000000000 1 inf | head -n 100000000 > seq100M
$ taskset -c 0 hyperfine --warmup 10 -L wc ./wc.40070cadf,/wc.4704caee0 "{wc} -l seq100M"

oech3 · 2026-04-19T16:13:26Z

$ seq 10000000000000 1 inf | head -n 10000000 > s
$ taskset -c 0 hyperfine -N --warmup 10 "target/release/wc64 -l s" "target/release/wc256 -l s"
Benchmark 1: target/release/wc64 -l s
  Time (mean ± σ):      20.3 ms ±   1.0 ms    [User: 4.4 ms, System: 15.5 ms]
  Range (min … max):    19.7 ms …  28.6 ms    148 runs 
Benchmark 2: target/release/wc256 -l s
  Time (mean ± σ):      23.1 ms ±   0.5 ms    [User: 5.7 ms, System: 17.0 ms]
  Range (min … max):    22.5 ms …  26.5 ms    129 runs 
Summary
  target/release/wc64 -l s ran
    1.14 ± 0.06 times faster than target/release/wc256 -l s

This should be valid for all platforms since -l cannot use splice().

xtqqczze · 2026-04-19T16:22:00Z

Are you sure the benchmark direction is correct? A ~14% improvement with a smaller buffer size seems unexpected.

oech3 · 2026-04-19T16:23:21Z

Rust always 0-fill stack.

xtqqczze · 2026-04-19T16:40:18Z

$ seq 10000000000000 1 inf | head -n 100000000 > s
$ hyperfine -N --warmup 10 "target/release/wc64 -l s" "target/release/wc256 -l s"
Benchmark 1: target/release/wc64 -l s
  Time (mean ± σ):      69.4 ms ±   0.5 ms    [User: 11.2 ms, System: 57.8 ms]
  Range (min … max):    68.1 ms …  71.3 ms    43 runs
 
Benchmark 2: target/release/wc256 -l s
  Time (mean ± σ):      68.5 ms ±   0.2 ms    [User: 12.1 ms, System: 56.0 ms]
  Range (min … max):    68.1 ms …  69.4 ms    43 runs
 
Summary
  target/release/wc256 -l s ran
    1.01 ± 0.01 times faster than target/release/wc64 -l s

oech3 · 2026-04-19T17:11:58Z

You have +1 0 at head which masks O(1) 0-fill overhead.

oech3 · 2026-04-20T06:37:46Z

wc -c does not need to read content of buffer to count bytes. In the case, it might be able to completely avoid 0-fill without nightly and unsafe even not on Linux.

xtqqczze mentioned this pull request Apr 16, 2026

wc: Speed optimization #7934

Merged

xtqqczze force-pushed the clippy/large_stack_arrays branch from 4edfb02 to 10bb935 Compare April 16, 2026 21:28

xtqqczze marked this pull request as draft April 16, 2026 21:29

xtqqczze force-pushed the clippy/large_stack_arrays branch from 10bb935 to 5a39ee5 Compare April 16, 2026 21:31

xtqqczze force-pushed the clippy/large_stack_arrays branch 2 times, most recently from 5c989de to b55bfc5 Compare April 18, 2026 17:40

xtqqczze force-pushed the clippy/large_stack_arrays branch 2 times, most recently from 0419264 to bd96bbd Compare April 18, 2026 18:03

xtqqczze added 2 commits April 18, 2026 20:52

clippy: warn for large_stack_arrays lint

d1301d9

https://rust-lang.github.io/rust-clippy/rust-1.95.0/index.html#large_stack_arrays

clippy: lower array-size-threshold to ~64 KiB

e9cb33d

xtqqczze force-pushed the clippy/large_stack_arrays branch from bd96bbd to e9cb33d Compare April 18, 2026 20:07

xtqqczze marked this pull request as ready for review April 18, 2026 20:53

cakebaker merged commit 4704cae into uutils:main Apr 19, 2026
170 checks passed

This comment was marked as off-topic.

Sign in to view

xtqqczze deleted the clippy/large_stack_arrays branch April 19, 2026 16:01

This comment was marked as resolved.

Sign in to view

Uh oh!

Conversation

xtqqczze commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codspeed-hq Bot commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by 27.43%

Performance Changes

Footnotes

Uh oh!

xtqqczze commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

drinkcat commented Apr 17, 2026

Uh oh!

oech3 commented Apr 17, 2026

Uh oh!

xtqqczze commented Apr 18, 2026

Uh oh!

xtqqczze commented Apr 18, 2026

Uh oh!

xtqqczze commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cakebaker commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

This comment was marked as off-topic.

xtqqczze commented Apr 19, 2026

Uh oh!

This comment was marked as resolved.

oech3 commented Apr 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

xtqqczze commented Apr 19, 2026

Uh oh!

oech3 commented Apr 19, 2026

Uh oh!

oech3 commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xtqqczze commented Apr 16, 2026 •

edited

Loading

github-actions Bot commented Apr 16, 2026 •

edited

Loading

codspeed-hq Bot commented Apr 16, 2026 •

edited

Loading

xtqqczze commented Apr 16, 2026 •

edited

Loading

xtqqczze commented Apr 18, 2026 •

edited

Loading

oech3 commented Apr 19, 2026 •

edited

Loading

oech3 commented Apr 19, 2026 •

edited

Loading