feat(perf-monitor): aggregate CPU/RAM across descendant processes#1519
feat(perf-monitor): aggregate CPU/RAM across descendant processes#1519shm11C3 wants to merge 2 commits into
Conversation
Apps that spawn helper or renderer processes (e.g. Tauri's WebView2 host) put the bulk of their resource usage in children, so sampling only the parent PID under-reported CPU and RSS. Refresh all processes each tick and sum the metrics over the parent and its full descendant tree, with a visited-set guard so a pathological parent chain can't double-count or loop. https://claude.ai/code/session_01XnYxTyEBg9CarpcTbEArjn
📝 WalkthroughWalkthroughAggregates CPU and RSS metrics across the target process and its descendants each tick, records per-PID samples, computes per-process avg/max/p95 CPU and memory, surfaces per-process breakdowns and failure reasons in text/JSON reports, and updates docs and CLI error text. ChangesProcess Tree Aggregation
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@perf-monitor/src/monitor.rs`:
- Around line 90-93: The branch handling aggregate_descendants returning None
currently discards process exit diagnostics and returns a generic error; change
it to re-check the child's try_wait (call the same logic used earlier) and, if
the child has exited, call format_exit_error to produce a detailed error
(including exit status/stderr) instead of the generic message; otherwise
propagate the original "process disappeared" error — update the match arm around
aggregate_descendants(&system, pid, num_cpus) to perform the try_wait check and
delegate to format_exit_error when appropriate so exit diagnostics are
preserved.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 7d29325f-1a22-4016-9ffb-4286720278cc
📒 Files selected for processing (2)
perf-monitor/README.mdperf-monitor/src/monitor.rs
| match aggregate_descendants(&system, pid, num_cpus) { | ||
| Some(metrics) => samples.push(metrics), | ||
| None => return Err("Process disappeared during measurement".into()), | ||
| } |
There was a problem hiding this comment.
Preserve exit diagnostics when aggregation returns None.
If the process exits between the earlier try_wait and aggregate_descendants, this path drops exit status/stderr details and returns only a generic message. Re-checking try_wait here and delegating to format_exit_error improves debuggability.
Suggested patch
- match aggregate_descendants(&system, pid, num_cpus) {
- Some(metrics) => samples.push(metrics),
- None => return Err("Process disappeared during measurement".into()),
- }
+ match aggregate_descendants(&system, pid, num_cpus) {
+ Some(metrics) => samples.push(metrics),
+ None => {
+ if guard.0.try_wait()?.is_some() {
+ return Err(format_exit_error(&mut guard, "measurement"));
+ }
+ return Err("Process disappeared during measurement".into());
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@perf-monitor/src/monitor.rs` around lines 90 - 93, The branch handling
aggregate_descendants returning None currently discards process exit diagnostics
and returns a generic error; change it to re-check the child's try_wait (call
the same logic used earlier) and, if the child has exited, call
format_exit_error to produce a detailed error (including exit status/stderr)
instead of the generic message; otherwise propagate the original "process
disappeared" error — update the match arm around aggregate_descendants(&system,
pid, num_cpus) to perform the try_wait check and delegate to format_exit_error
when appropriate so exit diagnostics are preserved.
There was a problem hiding this comment.
Pull request overview
perf-monitor の計測対象を「親 PID のみ」から「親 + 全子孫プロセス」に拡張し、Tauri(WebView2) 等のマルチプロセス構成で CPU/RSS を過小評価しないようにする変更です。
Changes:
refresh_processesをProcessesToUpdate::Allに切り替え、計測中に spawn された子孫プロセスも継続的に観測できるよう変更aggregate_descendants/sum_subtreeを追加し、親子マップ構築 + cycle-safe DFS で CPU/RSS を合算- 子孫合算ロジック向けのユニットテスト追加と README の Metrics 説明更新
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| perf-monitor/src/monitor.rs | プロセスツリー全体の CPU/RSS 合算ロジック追加、refresh を All に変更、ユニットテスト追加 |
| perf-monitor/README.md | Metrics が「親 + 子孫プロセス合算」になることと背景を明記 |
Surfaces what's actually consuming CPU/RSS so that GHA logs are diagnosable at a glance: - Aggregate tree stats are now followed by a Per-Process Breakdown block (one entry per PID observed during the measurement window, with sample_count, avg/max/P95 CPU, and avg/max/P95 RSS). The root process is listed first and tagged "(root)"; descendants follow in descending order of average CPU. - A Failure Reasons section enumerates exactly which thresholds were exceeded (and by how much) when the run fails, instead of leaving CI to diff the numbers against the config. - The JSON output mirrors both: new "per_process" array and "failure_reasons" array. - A pre-sampling abort (binary crashed during warmup, etc.) now prints under a "=== Performance Test Failed ===" banner with the captured child stderr so the cause is visible at the tail of the log. https://claude.ai/code/session_01XnYxTyEBg9CarpcTbEArjn
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
perf-monitor/src/monitor.rs (1)
126-129:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPreserve exit diagnostics when subtree capture fails
At Line 128, returning a generic error can drop useful exit status/stderr if the child exited between checks. Re-check
try_waithere and delegate toformat_exit_errorbefore the fallback message.Suggested patch
- let snapshots = match capture_subtree(&system, pid) { - Some(s) => s, - None => return Err("Process disappeared during measurement".into()), - }; + let snapshots = match capture_subtree(&system, pid) { + Some(s) => s, + None => { + if guard.0.try_wait()?.is_some() { + return Err(format_exit_error(&mut guard, "measurement")); + } + return Err("Process disappeared during measurement".into()); + } + };🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@perf-monitor/src/monitor.rs` around lines 126 - 129, When capture_subtree(&system, pid) returns None in monitor.rs, re-check the child process status by calling try_wait on the child/process handle and, if it indicates the child exited, call format_exit_error to produce a detailed error (including exit status/stderr) and return that; only fall back to the generic "Process disappeared during measurement" message if try_wait shows the child is still running or no exit info is available. Locate the failing branch around capture_subtree, use the same PID/child handle used elsewhere for try_wait, and replace the plain Err(...) return with logic that delegates to format_exit_error when try_wait reports an exit.
🧹 Nitpick comments (1)
perf-monitor/src/main.rs (1)
75-78: ⚡ Quick winUse project logging macros in the new failure path
These new direct
eprintln!calls should use the repository logging macros for consistency with Rust logging policy.As per coding guidelines, "Use logging macros (
log_debug!,log_info!,log_warn!,log_error!) fromcore/src/utils/logger.rsinstead of direct logging calls."🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@perf-monitor/src/main.rs` around lines 75 - 78, The new direct eprintln! calls should be replaced with the project's logging macros: use log_error! (from core/src/utils/logger.rs) instead of eprintln! for the failure banner and reason/details; e.g. replace the three eprintln! invocations (the "=== Performance Test Failed ===", "Failure reason: ..." and "Details: {e}") with appropriate log_error! calls that include the same messages and the error variable {e} so logs follow the repository logging policy.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@perf-monitor/src/main.rs`:
- Around line 77-78: The failure message printed by the eprintln! call is too
specific ("monitoring aborted before metrics could be collected") and can be
misleading for mid-run aborts; update the first eprintln! invocation used in
main (the eprintln!("Failure reason: ...") line) to neutral wording such as
"Failure reason: monitoring aborted before completion" (leave the separate
eprintln!("Details: {e}") unchanged so the actual error is still logged).
---
Duplicate comments:
In `@perf-monitor/src/monitor.rs`:
- Around line 126-129: When capture_subtree(&system, pid) returns None in
monitor.rs, re-check the child process status by calling try_wait on the
child/process handle and, if it indicates the child exited, call
format_exit_error to produce a detailed error (including exit status/stderr) and
return that; only fall back to the generic "Process disappeared during
measurement" message if try_wait shows the child is still running or no exit
info is available. Locate the failing branch around capture_subtree, use the
same PID/child handle used elsewhere for try_wait, and replace the plain
Err(...) return with logic that delegates to format_exit_error when try_wait
reports an exit.
---
Nitpick comments:
In `@perf-monitor/src/main.rs`:
- Around line 75-78: The new direct eprintln! calls should be replaced with the
project's logging macros: use log_error! (from core/src/utils/logger.rs) instead
of eprintln! for the failure banner and reason/details; e.g. replace the three
eprintln! invocations (the "=== Performance Test Failed ===", "Failure reason:
..." and "Details: {e}") with appropriate log_error! calls that include the same
messages and the error variable {e} so logs follow the repository logging
policy.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 968b52b3-e435-4ef4-9216-a38e1c05695f
📒 Files selected for processing (4)
perf-monitor/README.mdperf-monitor/src/main.rsperf-monitor/src/monitor.rsperf-monitor/src/report.rs
| eprintln!("Failure reason: monitoring aborted before metrics could be collected"); | ||
| eprintln!("Details: {e}"); |
There was a problem hiding this comment.
Failure reason text can be misleading for mid-run aborts
At Line 77, the message says metrics could not be collected, but aborts can also happen after some ticks. Use neutral wording (e.g., “monitoring aborted before completion”) to avoid misleading CI triage.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@perf-monitor/src/main.rs` around lines 77 - 78, The failure message printed
by the eprintln! call is too specific ("monitoring aborted before metrics could
be collected") and can be misleading for mid-run aborts; update the first
eprintln! invocation used in main (the eprintln!("Failure reason: ...") line) to
neutral wording such as "Failure reason: monitoring aborted before completion"
(leave the separate eprintln!("Details: {e}") unchanged so the actual error is
still logged).
Summary
refresh_processesfromSome(&[pid])toAllso children spawned mid-measurement are also picked up. On each tick we build a parent→children adjacency map and DFS from the root PID to sum metrics.sum_subtree(root, children, metrics)with avisitedset that guards against cycles caused by PID reuse / inconsistent snapshots.Changes
perf-monitor/src/monitor.rsaggregate_descendants: builds per-PID metrics and a parent→children map from theSystemsnapshot, then returns aProcessMetricssummed over the subtree rooted at the target PID.sum_subtree: pure, cycle-safe DFS that totals CPU and RSS.ProcessesToUpdate::Allnow.perf-monitor/README.mdTest plan
cargo test --manifest-path perf-monitor/Cargo.toml— 13/13 PASS, including 5 new unit tests for the descendant aggregation (root only, multi-level descendants, sibling tree excluded, cyclic parent chain, missing root).cargo clippy --manifest-path perf-monitor/Cargo.toml --all-targets -- -D warnings— clean.cargo build --manifest-path perf-monitor/Cargo.toml— succeeds.perf-testCI workflow on Windows / Linux / macOS to confirm the new aggregated values do not unexpectedly trip thresholds; adjustperf-thresholds.tomlin a follow-up PR if needed.https://claude.ai/code/session_01XnYxTyEBg9CarpcTbEArjn
Summary by CodeRabbit
Documentation
New Features
Bug Fixes
Tests