Skip to content

tee: improve throughput by raw syscall#12131

Open
oech3 wants to merge 1 commit intouutils:mainfrom
oech3:tee-throughput
Open

tee: improve throughput by raw syscall#12131
oech3 wants to merge 1 commit intouutils:mainfrom
oech3:tee-throughput

Conversation

@oech3
Copy link
Copy Markdown
Contributor

@oech3 oech3 commented May 3, 2026

Remove flush overhead and use raw syscall for POSIX tee requirement.

$ truncate -s 1PB huge
$ gnu9.11/tee<huge|pv>/dev/null
7.13GiB 0:00:02 [3.61GiB/s]
$ target/release/tee<huge|pv>/dev/null
2.56GiB 0:00:01 [2.56GiB/s]
$ target/release/tee-raw<huge|pv>/dev/null
3.35GiB 0:00:01 [3.35GiB/s]

$ cat huge|gnu9.11/tee|pv>/dev/null
2.15GiB 0:00:01 [2.15GiB/s]
$ cat huge|target/release/tee|pv>/dev/null
2.31GiB 0:00:01 [2.31GiB/s]
$ cat huge|target/release/tee-raw|pv>/dev/null
3.00GiB 0:00:01 [3.00GiB/s]

I was really surprised that std::io has such overhead.

@oech3 oech3 marked this pull request as ready for review May 3, 2026 09:07
@sylvestre
Copy link
Copy Markdown
Contributor

is it possible to have a codspeed benchmark to evaluate the win?

@oech3

This comment was marked as resolved.

@sylvestre
Copy link
Copy Markdown
Contributor

i asked on discord

@oech3

This comment was marked as outdated.

@oech3

This comment was marked as off-topic.

Comment thread src/uu/tee/src/tee.rs Outdated
@oech3

This comment was marked as outdated.

@oech3 oech3 force-pushed the tee-throughput branch from 76b5121 to 6dc2d9f Compare May 3, 2026 14:06
@oech3

This comment was marked as outdated.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 3, 2026

GNU testsuite comparison:

Skip an intermittent issue tests/cut/bounded-memory (fails in this run but passes in the 'main' branch)
Skipping an intermittent issue tests/date/date-locale-hour (passes in this run but fails in the 'main' branch)

Comment thread src/uu/tee/src/tee.rs Outdated
@oech3 oech3 force-pushed the tee-throughput branch 2 times, most recently from 72a2adc to e81feb1 Compare May 5, 2026 11:11
@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 5, 2026

The Write implementation is somewhat unintuitive; it could be removed in favor of a simpler approach with a single method:

fn write(&mut self, buf: &[u8]) -> Result<()>

Additionally, avoiding Write may be preferable if we plan to support a MaybeUninit buffer in the future.

@oech3

This comment was marked as resolved.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 5, 2026

This impl is because rustix::io::write does not support writing to stdout on Windows. If we can override write by similar way, it is preffered.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 5, 2026

I still think its's confusing to implement Write on Unix, but not Windows...

@oech3

This comment was marked as resolved.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 5, 2026

I was thinking something like this: b0edaae

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 5, 2026

No. Because write syscall does not guarantee writing everything. Needs loop and take sum of returned values.
(also rustix::io::write_all is not provided...)

@oech3 oech3 force-pushed the tee-throughput branch from e81feb1 to 687a7cb Compare May 5, 2026 15:18
@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 5, 2026

It would solve many problems if we could use std::io::stdio::StdoutRaw. Perhaps this functionality could be implemented as a crate.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 5, 2026

Private? Is it need to copy and paste unsafe code to uucore?

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 5, 2026

The issue we’re working around is that std::io::Stdout uses buffering, which doesn’t match our requirements. Ideally, we should provide our own Stdout implementation that performs unbuffered I/O instead. But that would be a much larger piece of work.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 5, 2026

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 5, 2026

We already have a solution in uucore::io::OwnedFileDescriptorOrHandle, see #10235.

@oech3

This comment was marked as resolved.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 5, 2026

We cannot use it #12157

@oech3 oech3 force-pushed the tee-throughput branch from 687a7cb to bcca96f Compare May 5, 2026 22:04
@codspeed-hq
Copy link
Copy Markdown

codspeed-hq Bot commented May 5, 2026

Merging this PR will improve performance by ×17

⚡ 1 improved benchmark
✅ 316 untouched benchmarks
⏩ 46 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation tee_stdin_file[10000000] 3,970.8 µs 232.8 µs ×17

Comparing oech3:tee-throughput (cfef3d7) with main (94b06d9)

Open in CodSpeed

Footnotes

  1. 46 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 5, 2026

+syscall time is still better

@oech3

This comment was marked as resolved.

@oech3 oech3 force-pushed the tee-throughput branch 2 times, most recently from fe34eec to fe4aba2 Compare May 7, 2026 01:48
@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 7, 2026

Windows example (with unsafe) is rust-lang/rust#58326 (comment) and rust-lang/rust#58326 (comment)

@oech3

This comment was marked as outdated.

@oech3 oech3 force-pushed the tee-throughput branch 3 times, most recently from fc1e944 to ce98ec5 Compare May 8, 2026 08:33
@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 8, 2026

I would have expected uucore::io::write_all_raw to take a RawFd. Could you please rename it to write_all?

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 8, 2026

raw means raw-syscall.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 8, 2026

Considering most API is buffered, I think write_all_unbuffered is the best effort.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 8, 2026

raw means raw-syscall.

It would be better to move it to a new sys module.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 8, 2026

I think new module for 1 fn is overkill and causing confusion as it is clearly io related operation.

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 8, 2026

How naming it write_all_fd, and exposing it via an extension trait?

pub trait WriteFdExt: AsFd {
    fn write_all_fd(&self, buf: &[u8]) -> io::Result<()> { ... }
}

impl<T: AsFd + ?Sized> WriteFdExt for T {}

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 8, 2026

Why? I'm OK to have it as a method. But I prefer write_all_unbuffered to clarify usecase.

@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 8, 2026

I'll rename it to something after several PRs depending on it were merged (to avoid CI breakage).

@xtqqczze
Copy link
Copy Markdown
Contributor

xtqqczze commented May 9, 2026

I'm trying a different approach with ManuallyDrop<File> in #12207

@oech3 oech3 force-pushed the tee-throughput branch from ce98ec5 to e8d7844 Compare May 10, 2026 13:18
@oech3
Copy link
Copy Markdown
Contributor Author

oech3 commented May 10, 2026

I minimized diff by sharing code with cat. #12207 can be closed.
I think we should wait std's change for Windows if it requires additional unsafe.

@sylvestre Would you review this? Benchmark is already enabled.
This PR blocks #11943 to make tee similar speed with RAM's bandwidth.

@oech3 oech3 force-pushed the tee-throughput branch from e8d7844 to cfef3d7 Compare May 10, 2026 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants