perf: Use Rich Status for batched spinner rendering instead of per-line updates#14026
perf: Use Rich Status for batched spinner rendering instead of per-line updates#14026KRRT7 wants to merge 13 commits into
Conversation
Move spinners.py and progress_bars.py functionality into a new unified ui.py module. This consolidates UI concerns and prepares for switching from manual spinner management to Rich's Status context manager for proper buffering and rate-limited rendering.
Add _detect_no_color() function to check --no-color flag and environment variables. In setup_logging(), detect if stderr is a TTY and auto-disable colors when writing to non-TTY streams (like in tests). Set force_terminal only when stderr is actually a TTY, preventing ANSI codes from appearing in test output while maintaining proper terminal rendering in interactive use.
Spinners are now managed through Rich's Status context manager rather than being passed to call_subprocess. Remove the spinner parameter and related code that updated the spinner on each line. This eliminates per-line spinner updates and lets Rich's Status handle rendering with proper batching and rate limiting, addressing the syscall and terminal write overhead of the old implementation.
Replace imports of spinners.py and progress_bars.py with ui module. Update build_env.py and subprocess.py to use status() context manager instead of open_spinner/open_rich_spinner. Add logger.info() call to status() function to ensure status messages (like 'Installing build dependencies') appear in output through the logging system rather than transient spinner display.
Remove the old spinners.py and progress_bars.py files now that their functionality has been consolidated into ui.py. Update test imports to use the new ui module instead of spinners module.
|
Hi @KRRT7, thank you for your contribution. Please let me know if you need any help fixing the failing tests. Also, please note that pip is a volunteer-driven project and reviewer capacity is limited, so it may take some time before a maintainer can take a look at your PR. |
|
Hey @sepehr-rs I'll open an issue shortly, though from what I saw, there's a few open PRs I could just link to that this closes. |
|
closes #14028 |
Great! Please update your news file to reflect the issue number you're closing. (e.g. |
|
@KRRT7 The news entry filename should be |
sepehr-rs
left a comment
There was a problem hiding this comment.
Hi, I benchmarked the spinner path under a TTY using runner_with_spinner_message() and a subprocess producing output for ~5 seconds.
In my measurements:
- main: 434 stdout writes, ~2 KB terminal output
- PR: 671 stdout writes, ~20 KB terminal output
- wall-clock time was effectively unchanged
Based on these results, I wasn't able to reproduce the performance improvements described in the PR description. Could you share the benchmark used to support those claims, or clarify whether they refer to a different code path (for example download progress bars rather than subprocess spinners)?
Separately, the PR appears to include a number of changes that don't seem directly related to the spinner/status refactor. Please consider reducing the scope or splitting out unrelated changes, as that would make the review process much easier.
|
those numbers don't make sense to me, what do your benchmarks look like? I'll share mine in a bit |
Sure. I benchmarked the I compared main and your PR using: strace -f -e write -o main.trace python benchmark.py
strace -f -e write -o pr.trace python benchmark.pyand then: grep 'write(1' main.trace | wc -l
grep 'write(1' pr.trace | wc -l
grep 'write(1' main.trace | awk -F',' '{print $3}' | awk '{sum += $1} END {print sum}'
grep 'write(1' pr.trace | awk -F',' '{print $3}' | awk '{sum += $1} END {print sum}'Results Results (after a re-run with a ~2.5s workload):
Wall-clock time was essentially identical (~2.7s). So for this workload I wasn't able to reproduce the performance improvements described in the PR. It's possible we're measuring different code paths, though, so I'd be interested in seeing the benchmark you used and comparing methodologies. For reference, here's the benchmark script I used: |
|
that'd be why: https://github.com/KRRT7/pip/tree/benchmarks-only/benchmarks you're not running it for a long enough time where it gets worse |
|
the workloads is closer to a longer-running install path where subprocess output stays active for much longer, and that's important bc he cost I’m trying to reduce is cumulative, The benchmarks are useful for isolating progress-rendering behavior, which is where the issue surfaces, rich internally has a more efficient way of doing it. |
in
pip/_internal/utils/subprocess.pypip currently reads the child process’s stdout one line at a time (proc.stdout.readline() inside a while True loop).Every call to readline() triggers a system call and a Python‑level buffer check. When the subprocess produces many small lines (e.g., downloading a large wheel), the loop runs thousands of times
The cumulative overhead of those syscalls and Python function calls dominates the wall‑clock time, even though the actual work (downloading files) is I/O‑bound.
The legacy open_spinner / InteractiveSpinner implementation writes raw back‑space characters to the terminal and rewrites the whole line for every spinner tick.
When showing_subprocess is false (the usual case for quiet installs), the spinner updates on every line of subprocess output, not on a timed interval. That means dozens or hundreds of terminal writes per second, each of which forces the terminal emulator to repaint the line. The repeated cursor‑hiding/showing logic (hidden_cursor) adds extra writes.
The original code used a custom _PipRichSpinner that wrapped Rich’s Live panel. It still performed the low‑level terminal writes itself rather than letting Rich’s own rendering engine batch updates. This prevented Rich from applying its own optimisations (e.g., double‑buffered drawing, minimal refresh rates).