Skip to content

perf: Use Rich Status for batched spinner rendering instead of per-line updates#14026

Open
KRRT7 wants to merge 13 commits into
pypa:mainfrom
KRRT7:ui-refactor-perf
Open

perf: Use Rich Status for batched spinner rendering instead of per-line updates#14026
KRRT7 wants to merge 13 commits into
pypa:mainfrom
KRRT7:ui-refactor-perf

Conversation

@KRRT7
Copy link
Copy Markdown
Contributor

@KRRT7 KRRT7 commented Jun 2, 2026

in pip/_internal/utils/subprocess.py pip currently reads the child process’s stdout one line at a time (proc.stdout.readline() inside a while True loop).
Every call to readline() triggers a system call and a Python‑level buffer check. When the subprocess produces many small lines (e.g., downloading a large wheel), the loop runs thousands of times

The cumulative overhead of those syscalls and Python function calls dominates the wall‑clock time, even though the actual work (downloading files) is I/O‑bound.

The legacy open_spinner / InteractiveSpinner implementation writes raw back‑space characters to the terminal and rewrites the whole line for every spinner tick.
When showing_subprocess is false (the usual case for quiet installs), the spinner updates on every line of subprocess output, not on a timed interval. That means dozens or hundreds of terminal writes per second, each of which forces the terminal emulator to repaint the line. The repeated cursor‑hiding/showing logic (hidden_cursor) adds extra writes.

The original code used a custom _PipRichSpinner that wrapped Rich’s Live panel. It still performed the low‑level terminal writes itself rather than letting Rich’s own rendering engine batch updates. This prevented Rich from applying its own optimisations (e.g., double‑buffered drawing, minimal refresh rates).

KRRT7 added 5 commits June 2, 2026 04:40
Move spinners.py and progress_bars.py functionality into a new unified
ui.py module. This consolidates UI concerns and prepares for switching
from manual spinner management to Rich's Status context manager for
proper buffering and rate-limited rendering.
Add _detect_no_color() function to check --no-color flag and environment
variables. In setup_logging(), detect if stderr is a TTY and auto-disable
colors when writing to non-TTY streams (like in tests).

Set force_terminal only when stderr is actually a TTY, preventing ANSI
codes from appearing in test output while maintaining proper terminal
rendering in interactive use.
Spinners are now managed through Rich's Status context manager rather
than being passed to call_subprocess. Remove the spinner parameter and
related code that updated the spinner on each line.

This eliminates per-line spinner updates and lets Rich's Status handle
rendering with proper batching and rate limiting, addressing the
syscall and terminal write overhead of the old implementation.
Replace imports of spinners.py and progress_bars.py with ui module.
Update build_env.py and subprocess.py to use status() context manager
instead of open_spinner/open_rich_spinner.

Add logger.info() call to status() function to ensure status messages
(like 'Installing build dependencies') appear in output through the
logging system rather than transient spinner display.
Remove the old spinners.py and progress_bars.py files now that their
functionality has been consolidated into ui.py.

Update test imports to use the new ui module instead of spinners module.
@sepehr-rs
Copy link
Copy Markdown
Member

Hi @KRRT7, thank you for your contribution.
Could you first open an issue for this so we can discuss it there? This PR is also missing a news entry. You can read more about news entries here.

Please let me know if you need any help fixing the failing tests. Also, please note that pip is a volunteer-driven project and reviewer capacity is limited, so it may take some time before a maintainer can take a look at your PR.

@KRRT7
Copy link
Copy Markdown
Contributor Author

KRRT7 commented Jun 2, 2026

Hey @sepehr-rs I'll open an issue shortly, though from what I saw, there's a few open PRs I could just link to that this closes.

@KRRT7
Copy link
Copy Markdown
Contributor Author

KRRT7 commented Jun 2, 2026

closes #14028

@sepehr-rs
Copy link
Copy Markdown
Member

closes #14028

Great! Please update your news file to reflect the issue number you're closing. (e.g. 14028.feature.rst)

@sepehr-rs
Copy link
Copy Markdown
Member

@KRRT7 The news entry filename should be 14028.feature.rst, not 14028.ui-refactor-perf.feature.rst.

Copy link
Copy Markdown
Member

@sepehr-rs sepehr-rs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I benchmarked the spinner path under a TTY using runner_with_spinner_message() and a subprocess producing output for ~5 seconds.

In my measurements:

  • main: 434 stdout writes, ~2 KB terminal output
  • PR: 671 stdout writes, ~20 KB terminal output
  • wall-clock time was effectively unchanged

Based on these results, I wasn't able to reproduce the performance improvements described in the PR description. Could you share the benchmark used to support those claims, or clarify whether they refer to a different code path (for example download progress bars rather than subprocess spinners)?

Separately, the PR appears to include a number of changes that don't seem directly related to the spinner/status refactor. Please consider reducing the scope or splitting out unrelated changes, as that would make the review process much easier.

@KRRT7
Copy link
Copy Markdown
Contributor Author

KRRT7 commented Jun 2, 2026

those numbers don't make sense to me, what do your benchmarks look like? I'll share mine in a bit

@sepehr-rs
Copy link
Copy Markdown
Member

sepehr-rs commented Jun 2, 2026

those numbers don't make sense to me, what do your benchmarks look like? I'll share mine in a bit

Sure. I benchmarked the runner_with_spinner_message() path under a TTY with INFO-level logging enabled, using a child process that emits 500 lines to stderr over ~2.5 seconds and running it 10 times.

I compared main and your PR using:

strace -f -e write -o main.trace python benchmark.py
strace -f -e write -o pr.trace python benchmark.py

and then:

grep 'write(1' main.trace | wc -l
grep 'write(1' pr.trace | wc -l

grep 'write(1' main.trace | awk -F',' '{print $3}' | awk '{sum += $1} END {print sum}'
grep 'write(1' pr.trace   | awk -F',' '{print $3}' | awk '{sum += $1} END {print sum}'

Results Results (after a re-run with a ~2.5s workload):

  • main: 236 stdout writes, 1259 bytes
  • PR: 356 stdout writes, 10449 bytes

Wall-clock time was essentially identical (~2.7s).

So for this workload I wasn't able to reproduce the performance improvements described in the PR. It's possible we're measuring different code paths, though, so I'd be interested in seeing the benchmark you used and comparing methodologies.

For reference, here's the benchmark script I used:
https://gist.github.com/sepehr-rs/791e2f13870d5b79b0fac9a59011749a

@KRRT7
Copy link
Copy Markdown
Contributor Author

KRRT7 commented Jun 2, 2026

that'd be why:

https://github.com/KRRT7/pip/tree/benchmarks-only/benchmarks

you're not running it for a long enough time where it gets worse

@KRRT7
Copy link
Copy Markdown
Contributor Author

KRRT7 commented Jun 2, 2026

the workloads is closer to a longer-running install path where subprocess output stays active for much longer, and that's important bc he cost I’m trying to reduce is cumulative, The benchmarks are useful for isolating progress-rendering behavior, which is where the issue surfaces, rich internally has a more efficient way of doing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants