Optimize iTermProcessCache with coalesced incremental refresh#574
Optimize iTermProcessCache with coalesced incremental refresh#574chall37 wants to merge 9 commits into
Conversation
Major refactoring to reduce CPU usage from process monitoring with many tabs: - Add global coalescer using DISPATCH_SOURCE_TYPE_DATA_OR to merge concurrent monitor events into batched refresh cycles - Implement collection-first refresh that walks existing process collection instead of enumerating all PIDs via syscalls - Suspend monitors for background tabs; only foreground tabs get fast updates - Add epoch-based tracking to detect stale cache entries - Preserve dirty PID signaling for session title/status updates - Only remove cache entries when kill(pid,0) confirms process is dead, avoiding transient nil results from collection staleness Supporting changes: - iTermProcessMonitor: Add trackedRootPID property and pause/resume methods - iTermLSOF: Add childPidsForPid: using proc_listchildpids with retry loop - PseudoTerminal: Call setForegroundRootPIDs: on tab selection
- Throttle backgroundRefreshTick to 0.5s minimum intervals instead of running on every updateIfNeeded call - Batch dispatch_sync calls in refreshRootCollectionFirst: collect PIDs in NSMutableIndexSet, single sync at end instead of per-PID - Same batching fix for refreshRootWithKernelFallback - Remove unused markSeenForEpoch helper method - Fix property name: isForeground → isForegroundJob
- Add iTermProcessCacheTests covering background refresh cadence, foreground/background priority transitions, and coalescer behavior - Add iTermProcessMonitorTests for pause/resume, hierarchy tracking, and child monitor management - Add +Testing categories to expose internals for test verification - Add iTermProcessCacheTestHelper to bridge ObjC internals to Swift tests - Fix background roots being dropped from dirty set after refresh (re-add alive low-priority roots so they continue periodic updates)
Notify the process cache of foreground root PIDs when the current tab changes, enabling priority-based process monitoring. Also add the iTermProcessMonitor+Testing category files to the Xcode project.
When child monitors are removed from the tree, their dispatch sources must be invalidated before the monitor is deallocated. Without this, the dispatch source is still active when the monitor is freed, causing a crash in _dispatch_queue_xref_dispose.
a6ebba3 to
e97ca42
Compare
When pauseMonitoring was called before setProcessInfo (i.e., before a dispatch source existed), the paused state was not recorded. Later when setProcessInfo created the source, it would auto-resume, bypassing the intended suspension for background tabs. Changes: - pauseMonitoring/resumeMonitoring now record _isPaused even without a source - setProcessInfo respects pre-recorded pause state when creating sources - Added unit tests for pre-source pause behavior - Added test for child removal invalidation
Document queue affinity for setupCoalescer, processIsAlive, and backgroundRefreshTick methods.
…onitor Add a new ITAssertOnQueue macro that wraps dispatch_assert_queue() for debug builds. Use it in iTermProcessMonitor to enforce that methods marked "Called on _queue" are actually called on that queue. Fix tests to call setProcessInfo on the correct queue.
|
This is definitely an area that needs improvement. My first reaction is to wonder if this whole thing would benefit from a fresh start instead of trying to bolt on an optimization. It would be nice to call out the tradeoffs we want to make in terms of responsiveness vs CPU churn and to think explicitly about how to handle having a small number of sessions that update rarely (where we can afford to be more aggressive) vs having a large number of sessions with a few that are churning hard (what I think this change was aimed at). Here's what I've noticed so far (just read the code, haven't tried it yet): It looks like the fast path won't pick up new child processes until the half-second cadence tick:
So the "fast path" for the foreground tab actually delivers stale data for structural changes (fork/exec). It only reliably detects process death. The cache preserves old data rather than showing nothing, but tab titles / job indicators won't update promptly for new processes. setForegroundRootPids is global but PseudoTerminal assumes it is per-windowPseudoTerminal.m calls setForegroundRootPIDs: in both windowDidBecomeKey: and tabView:didSelectTabViewItem:. The implementation (setForegroundRootPIDsLQ:) iterates all roots and demotes anything not in the
If you have two windows side-by-side both showing compiles, only one gets fast updates at any time. _lastBackgroundRefreshTime has no synchronizationupdateIfNeeded is documented as runnable on "Any queue" and reads/writes _lastBackgroundRefreshTime without locks (iTermProcessCache.m new lines ~388-392). This is a data race. _currentEpoch races between _workQueue and _lockQueue_currentEpoch is incremented on _workQueue in refreshDirtyHighPriorityRoots but read under _lockQueue in backgroundRefreshTick. These are different serial queues with no ordering guarantee. evictUnseenNodesForRoot:epoch: is a no-opRemove it? cachedDescendants grows unboundedcachedDescendants grows monotonically -- PIDs of dead children are never removed. iTermTrackedRootInfo.monitor comment says "nil if suspended" but the code doesn't nil itsuspendMonitorForRootLQ: pauses the monitor but keeps the reference. resumeMonitorForRootLQ: checks if (info.monitor) and creates a new one if nil. The comment and code disagree about the design intent. Testing infrastructure compiled into productionCan we limit the +Testing.m files to the ModernTests target? |
| // Returns direct child PIDs for a given parent PID. | ||
| // Note: proc_listchildpids returns PID count (not bytes like proc_listpids). | ||
| + (NSArray<NSNumber *> *)childPidsForPid:(pid_t)parentPid { | ||
| int count = proc_listchildpids(parentPid, NULL, 0); |
There was a problem hiding this comment.
It might make sense to give yourself some headroom automatically to avoid having to loop in 99% of cases. Even doubling the count would have basically no performance or memory impact.
| } | ||
|
|
||
| NSMutableArray<NSNumber *> *picked = [NSMutableArray array]; | ||
| __block NSUInteger remaining = 2; // Process at most 2 background roots per tick |
There was a problem hiding this comment.
If you have a hundred tabs does this mean it'll take up to 100 seconds to refresh each tab? At some point doing a single global refresh becomes worthwhile since with a large number of tabs you'll probably have to look at the vast majority of processes on the system anyway.
Summary
Refactoring to reduce CPU usage, frequency and duration of blocking operations on main thread from process monitoring with many tabs open. The process monitoring code path dominates in workloads that generate process churn across multiple tabs (e.g., compiling in two or more visible tabs/windows).
Event Coalescing
monitors firing = 1 coalescer callback.
Foreground/Background Split
Incremental Refresh
proc_listchildpids() only if stale.
Root PID Tracking
Data Structures
Includes comprehensive unit tests
Supporting changes
iTermProcessMonitor: AddtrackedRootPIDproperty and pause/resume methodsiTermLSOF: AddchildPidsForPid:usingproc_listchildpidswith retry loopPseudoTerminal: CallsetForegroundRootPIDs:on tab selectionComparison using process-generating stress tests across 5 tabs over 30 seconds: