Skip to content

perf: Replace exclusive lock with ReaderWriterLockSlim in ModuleScheduler#1469

Merged
thomhurst merged 10 commits into
mainfrom
fix/issue-1458-lock-contention
Dec 30, 2025
Merged

perf: Replace exclusive lock with ReaderWriterLockSlim in ModuleScheduler#1469
thomhurst merged 10 commits into
mainfrom
fix/issue-1458-lock-contention

Conversation

@thomhurst
Copy link
Copy Markdown
Owner

Summary

  • Refactor ModuleScheduler locking strategy to reduce contention in the hot execution path
  • Replace object _stateLock with ReaderWriterLockSlim to allow concurrent reads
  • Use read locks for query operations (GetStatistics, GetPendingModules, GetExecutingModules)
  • Use write locks only for state mutations (MarkModuleStarted, MarkModuleCompleted, etc.)
  • Move LINQ processing and logging operations outside of locks where possible
  • Add proper disposal of the ReaderWriterLockSlim

This allows multiple threads to read scheduler state concurrently while maintaining exclusive access for writes, improving performance in scenarios with many parallel module queries.

Fixes #1458

Test plan

  • Build succeeds
  • Existing unit tests pass
  • Run example pipelines to verify no deadlocks or race conditions
  • Verify concurrent module execution still works correctly

🤖 Generated with Claude Code

thomhurst and others added 9 commits December 29, 2025 18:39
🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…uler

Refactor ModuleScheduler locking strategy to reduce contention in the hot
execution path. The previous implementation held an exclusive lock during
read-only operations like querying module states and statistics.

Changes:
- Replace `object _stateLock` with `ReaderWriterLockSlim`
- Use read locks for query operations (GetStatistics, GetPendingModules,
  GetExecutingModules, ShouldExitScheduler snapshot creation)
- Use write locks for state mutations (MarkModuleStarted, MarkModuleCompleted,
  AddModule, FindReadyModules, CancelPendingModules)
- Move LINQ processing and logging operations outside of locks where possible
- Add proper disposal of the ReaderWriterLockSlim

This allows concurrent reads while maintaining exclusive access for writes,
improving performance in scenarios with many parallel module queries.

Fixes #1458

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Summary

Replaces exclusive lock with ReaderWriterLockSlim in ModuleScheduler to reduce contention by allowing concurrent reads.

Critical Issues

1. Race condition in ShouldExitScheduler (lines 262-302)

The method takes a read lock to snapshot state, releases it, evaluates conditions outside the lock, then takes a write lock to update _schedulerCompleted. Between releasing the read lock and acquiring the write lock, another thread can modify _moduleStates, making the snapshot stale and potentially causing incorrect exit decisions.

Current flow:

_stateLock.EnterReadLock();
snapshot = ModuleStateSnapshot.Create(_moduleStates.Values);
_stateLock.ExitReadLock();
// <-- Race window: state can change here
var shouldExit = _exitConditions.ShouldExit(snapshot, queuedCount);
if (shouldExit) {
    _stateLock.EnterWriteLock();
    _schedulerCompleted = true;
    _stateLock.ExitWriteLock();
}

Fix: Keep read lock held while evaluating exit conditions, only upgrade to write lock when actually setting _schedulerCompleted:

_stateLock.EnterReadLock();
try {
    snapshot = ModuleStateSnapshot.Create(_moduleStates.Values);
    var shouldExit = _exitConditions.ShouldExit(snapshot, queuedCount);
    
    if (shouldExit) {
        _stateLock.ExitReadLock();
        _stateLock.EnterWriteLock();
        try {
            _schedulerCompleted = true;
        } finally {
            _stateLock.ExitWriteLock();
        }
        // Continue outside all locks
    } else {
        return false;
    }
} finally {
    if (_stateLock.IsReadLockHeld) _stateLock.ExitReadLock();
}

2. Copy-paste bug in CancelPendingModules (line 599)

The logging statement always logs ModuleExecutionState.Completed instead of the actual moduleState.State. This loses valuable debugging information about which state each module was in when cancelled.

Line 599:

_logger.LogDebug(
    "Cancelling pending module {ModuleName} (State={State})",
    MarkupFormatter.FormatModuleName(moduleState.ModuleType.Name),
    ModuleExecutionState.Completed);  // Bug: should be moduleState.State

Fix: Capture the state before modifying it:

var originalState = moduleState.State;
// ... modify state ...
// Then in logging section:
_logger.LogDebug(
    "Cancelling pending module {ModuleName} (State={State})",
    MarkupFormatter.FormatModuleName(moduleState.ModuleType.Name),
    originalState);

3. Potential race in AddModule (lines 179-197)

Dependencies are resolved outside the write lock (line 180), but then processed inside the lock. Between resolution and lock acquisition, the _moduleStates collection could change, potentially causing inconsistencies.

Fix: Move dependency resolution inside the write lock as it was in the original code, or document why external resolution is safe.

Suggestions

  1. Consider upgradeability: The current LockRecursionPolicy.NoRecursion prevents lock upgrades. If you need read-to-write upgrades, use LockRecursionPolicy.SupportsRecursion, but be aware of potential deadlocks.

  2. Performance verification: The issue mentions 10-30% improvement. Consider adding before/after benchmarks to verify the performance gain justifies the added complexity.

  3. Documentation: Add XML comments noting that ReaderWriterLockSlim requires careful handling of lock/unlock pairs in all code paths (including exceptions).

Verdict

⚠️ REQUEST CHANGES - Critical race condition and copy-paste bug must be fixed

1. ShouldExitScheduler: Keep read lock held while evaluating exit
   conditions to prevent state changes between snapshot and decision.
   The isDeadlocked check is now captured under the read lock before
   upgrading to write lock.

2. CancelPendingModules: Fix copy-paste bug where logging statement
   was hardcoded to log ModuleExecutionState.Completed instead of
   the original state before modification. Now captures original state
   before modifying and logs that value.

3. AddModule: Move dependency resolution inside the write lock to
   prevent race conditions where _moduleStates could change between
   resolution and processing. Logging of unregistered dependencies
   is done outside the lock to avoid holding the lock during I/O.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown
Contributor

Summary

This PR refactors ModuleScheduler to use ReaderWriterLockSlim instead of an exclusive lock, enabling concurrent reads while maintaining exclusive writes for better performance.

Critical Issues

None found ✅

All three critical issues from the previous review have been properly addressed in commit a004751:

  1. Race condition in ShouldExitScheduler ✅ FIXED

    • Now keeps read lock held while evaluating exit conditions (lines 276-324)
    • Captures isDeadlocked under read lock before upgrading to write lock
    • Properly handles lock transitions with IsReadLockHeld check in finally block
  2. Copy-paste bug in CancelPendingModules ✅ FIXED

    • Now captures originalState before modifying module state (line 602)
    • Logs the original state instead of hardcoded ModuleExecutionState.Completed (line 623)
  3. Race condition in AddModule ✅ FIXED

    • Dependency resolution moved back inside write lock (line 186)
    • Unregistered dependencies tracked in a list and logged outside lock (lines 198-217)
    • Prevents race where _moduleStates changes between resolution and processing

Suggestions

  1. Consider performance benchmarks: The PR description mentions this fixes issue MEDIUM: Scheduler lock contention in hot path could impact performance #1458 which likely documented performance problems. Consider adding benchmark results or performance metrics to validate the improvement.

  2. Lock recursion policy trade-off: The current LockRecursionPolicy.NoRecursion is correct for this implementation but means lock upgrades aren't supported. The current code pattern (exit read lock, enter write lock) is appropriate. Document this if it becomes a maintenance concern.

  3. Minor: .gitignore formatting (commit b00c9f1): The last line has nul.worktrees/ which should probably be two separate lines:

    nul
    .worktrees/
    

Previous Review Status

All critical issues from the previous review (race conditions and logging bug) have been properly addressed in commit a004751.

Verdict

APPROVE - No critical issues, all previous concerns resolved

@thomhurst thomhurst merged commit 1dd3f30 into main Dec 30, 2025
10 of 12 checks passed
@thomhurst thomhurst deleted the fix/issue-1458-lock-contention branch December 30, 2025 10:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MEDIUM: Scheduler lock contention in hot path could impact performance

1 participant