Skip to content

Optimize iTermProcessCache with coalesced incremental refresh#574

Open
chall37 wants to merge 9 commits into
gnachman:masterfrom
chall37:refactor/process_cache
Open

Optimize iTermProcessCache with coalesced incremental refresh#574
chall37 wants to merge 9 commits into
gnachman:masterfrom
chall37:refactor/process_cache

Conversation

@chall37
Copy link
Copy Markdown
Contributor

@chall37 chall37 commented Feb 4, 2026

Summary

Refactoring to reduce CPU usage, frequency and duration of blocking operations on main thread from process monitoring with many tabs open. The process monitoring code path dominates in workloads that generate process churn across multiple tabs (e.g., compiling in two or more visible tabs/windows).

  • Event Coalescing

    • Legacy: Each monitor callback can trigger a full refresh
    • Refactor: Uses DISPATCH_SOURCE_TYPE_DATA_OR coalescer - multiple events merge into one refresh cycle. 50
      monitors firing = 1 coalescer callback.
  • Foreground/Background Split

    • Legacy: All monitors run equally, no distinction
    • Refactor: Adds pauseMonitoring()/resumeMonitoring() + setForegroundRootPIDs():
      • Foreground tabs: Fast path via coalescer
      • Background tabs: Monitors paused (dispatch source suspended), updates via 0.5s cadence timer (only 2 roots per tick)
  • Incremental Refresh

    • Legacy: reallyUpdate() enumerates all processes via lsof on every update - O(all_pids)
    • Refactor: Collection-first fast path walks cached collection - O(depth). Falls back to
      proc_listchildpids() only if stale.
  • Root PID Tracking

    • Legacy: No trackedRootPID property
    • Refactor: Each monitor knows its root PID, enabling O(1) routing of callbacks to the right tab
  • Data Structures

    • Legacy: Flat _trackedPidsLQ dictionary
    • Refactor: Adds iTermTrackedRootInfo with epoch tracking, _dirtyHighRootsLQ/_dirtyLowRootsLQ split, staleness detection
  • Includes comprehensive unit tests

Supporting changes

  • iTermProcessMonitor: Add trackedRootPID property and pause/resume methods
  • iTermLSOF: Add childPidsForPid: using proc_listchildpids with retry loop
  • PseudoTerminal: Call setForegroundRootPIDs: on tab selection
  • Add comprehensive unit tests for process cache and process monitor

Comparison using process-generating stress tests across 5 tabs over 30 seconds:

Metric Legacy Refactored Change
proc_pidinfo 634,151 19,666 -96%
listAllPids 638 20 -96%
reallyUpdate 638 20 -96%
newProcessCollection 638 20 -96%
highPriorityRefresh 0 1,110 (new)
backgroundRefreshTick 0 15 (new)

Major refactoring to reduce CPU usage from process monitoring with many tabs:

- Add global coalescer using DISPATCH_SOURCE_TYPE_DATA_OR to merge concurrent
  monitor events into batched refresh cycles
- Implement collection-first refresh that walks existing process collection
  instead of enumerating all PIDs via syscalls
- Suspend monitors for background tabs; only foreground tabs get fast updates
- Add epoch-based tracking to detect stale cache entries
- Preserve dirty PID signaling for session title/status updates
- Only remove cache entries when kill(pid,0) confirms process is dead,
  avoiding transient nil results from collection staleness

Supporting changes:
- iTermProcessMonitor: Add trackedRootPID property and pause/resume methods
- iTermLSOF: Add childPidsForPid: using proc_listchildpids with retry loop
- PseudoTerminal: Call setForegroundRootPIDs: on tab selection
- Throttle backgroundRefreshTick to 0.5s minimum intervals instead of
  running on every updateIfNeeded call
- Batch dispatch_sync calls in refreshRootCollectionFirst: collect PIDs
  in NSMutableIndexSet, single sync at end instead of per-PID
- Same batching fix for refreshRootWithKernelFallback
- Remove unused markSeenForEpoch helper method
- Fix property name: isForeground → isForegroundJob
- Add iTermProcessCacheTests covering background refresh cadence,
  foreground/background priority transitions, and coalescer behavior
- Add iTermProcessMonitorTests for pause/resume, hierarchy tracking,
  and child monitor management
- Add +Testing categories to expose internals for test verification
- Add iTermProcessCacheTestHelper to bridge ObjC internals to Swift tests
- Fix background roots being dropped from dirty set after refresh
  (re-add alive low-priority roots so they continue periodic updates)
Notify the process cache of foreground root PIDs when the current tab
changes, enabling priority-based process monitoring. Also add the
iTermProcessMonitor+Testing category files to the Xcode project.
When child monitors are removed from the tree, their dispatch sources
must be invalidated before the monitor is deallocated. Without this,
the dispatch source is still active when the monitor is freed, causing
a crash in _dispatch_queue_xref_dispose.
@chall37 chall37 force-pushed the refactor/process_cache branch from a6ebba3 to e97ca42 Compare February 4, 2026 20:43
@chall37 chall37 changed the title Optimize iTermProcessCache with coalesced incremental refresh WIP: Optimize iTermProcessCache with coalesced incremental refresh Feb 4, 2026
When pauseMonitoring was called before setProcessInfo (i.e., before a
dispatch source existed), the paused state was not recorded. Later when
setProcessInfo created the source, it would auto-resume, bypassing the
intended suspension for background tabs.

Changes:
- pauseMonitoring/resumeMonitoring now record _isPaused even without a source
- setProcessInfo respects pre-recorded pause state when creating sources
- Added unit tests for pre-source pause behavior
- Added test for child removal invalidation
Document queue affinity for setupCoalescer, processIsAlive, and
backgroundRefreshTick methods.
…onitor

Add a new ITAssertOnQueue macro that wraps dispatch_assert_queue() for
debug builds. Use it in iTermProcessMonitor to enforce that methods
marked "Called on _queue" are actually called on that queue. Fix tests
to call setProcessInfo on the correct queue.
@chall37 chall37 changed the title WIP: Optimize iTermProcessCache with coalesced incremental refresh Optimize iTermProcessCache with coalesced incremental refresh Feb 4, 2026
@chall37 chall37 marked this pull request as ready for review February 4, 2026 22:41
@gnachman
Copy link
Copy Markdown
Owner

gnachman commented Feb 5, 2026

This is definitely an area that needs improvement.

My first reaction is to wonder if this whole thing would benefit from a fresh start instead of trying to bolt on an optimization. It would be nice to call out the tradeoffs we want to make in terms of responsiveness vs CPU churn and to think explicitly about how to handle having a small number of sessions that update rarely (where we can afford to be more aggressive) vs having a large number of sessions with a few that are churning hard (what I think this change was aimed at).

Here's what I've noticed so far (just read the code, haven't tried it yet):

It looks like the fast path won't pick up new child processes until the half-second cadence tick:

  • A process forks in the foreground tab
  • The dispatch source fires, the coalescer runs refreshDirtyHighPriorityRoots
  • It walks the old collection, doesn't find the new child
  • Falls back to refreshRootWithKernelFallback which calls proc_listchildpids but returns nil for the deepest foreground job
  • The existing cache entry is preserved (stale data kept alive by processIsAlive:)
  • The new child's info only appears after the next full reallyUpdate, whenever that happens to run

So the "fast path" for the foreground tab actually delivers stale data for structural changes (fork/exec). It only reliably detects process death. The cache preserves old data rather than showing nothing, but tab titles / job indicators won't update promptly for new processes.

setForegroundRootPids is global but PseudoTerminal assumes it is per-window

PseudoTerminal.m calls setForegroundRootPIDs: in both windowDidBecomeKey: and tabView:didSelectTabViewItem:. The implementation (setForegroundRootPIDsLQ:) iterates all roots and demotes anything not in the
provided set. This means:

  • Switching to Window A sets Window A's current tab as foreground and demotes Window B's visible tab to background
  • Selecting a tab in a non-key window fires tabView:didSelectTabViewItem:, which sets that tab foreground and demotes the key window's tab

If you have two windows side-by-side both showing compiles, only one gets fast updates at any time.

_lastBackgroundRefreshTime has no synchronization

updateIfNeeded is documented as runnable on "Any queue" and reads/writes _lastBackgroundRefreshTime without locks (iTermProcessCache.m new lines ~388-392). This is a data race.

_currentEpoch races between _workQueue and _lockQueue

_currentEpoch is incremented on _workQueue in refreshDirtyHighPriorityRoots but read under _lockQueue in backgroundRefreshTick. These are different serial queues with no ordering guarantee.

evictUnseenNodesForRoot:epoch: is a no-op

Remove it?

cachedDescendants grows unbounded

cachedDescendants grows monotonically -- PIDs of dead children are never removed.

iTermTrackedRootInfo.monitor comment says "nil if suspended" but the code doesn't nil it

suspendMonitorForRootLQ: pauses the monitor but keeps the reference. resumeMonitorForRootLQ: checks if (info.monitor) and creates a new one if nil. The comment and code disagree about the design intent.

Testing infrastructure compiled into production

Can we limit the +Testing.m files to the ModernTests target?

Comment thread sources/iTermLSOF.m
// Returns direct child PIDs for a given parent PID.
// Note: proc_listchildpids returns PID count (not bytes like proc_listpids).
+ (NSArray<NSNumber *> *)childPidsForPid:(pid_t)parentPid {
int count = proc_listchildpids(parentPid, NULL, 0);
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make sense to give yourself some headroom automatically to avoid having to loop in 99% of cases. Even doubling the count would have basically no performance or memory impact.

}

NSMutableArray<NSNumber *> *picked = [NSMutableArray array];
__block NSUInteger remaining = 2; // Process at most 2 background roots per tick
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have a hundred tabs does this mean it'll take up to 100 seconds to refresh each tab? At some point doing a single global refresh becomes worthwhile since with a large number of tabs you'll probably have to look at the vast majority of processes on the system anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants