Skip to content

Commit 2c7724b

Browse files
han-jiang277claude
andcommitted
fix: SMP priority inheritance race condition in mutex lock chain
In SMP environments, the priority inheritance (PI) lock chain traversal in inner_pend_for has a race condition where: 1. Thread A captures priority value (e.g., 6) 2. Thread A releases its lock, acquires the scanned thread's lock 3. During this window, Thread B promotes Thread A's priority (from 6 to 0) 4. Thread A uses the stale value (6) for comparison, causing incorrect priority promotion decisions The fix: - Re-read current_priority after acquiring scanning_thread's lock to get the most up-to-date value - Use conditional promotion instead of forced promotion to avoid overwriting already-promoted priorities - Continue traversing the lock chain even if adjust_wait_queue_position_by fails, since the thread may still need PI from other waiting threads Test: test_blocking_mutex_chain_basic now passes 50000 iterations consistently on qemu_virt64_aarch64.debug.swi (SMP 8-core). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent edf4176 commit 2c7724b

1 file changed

Lines changed: 25 additions & 15 deletions

File tree

kernel/src/sync/mutex.rs

Lines changed: 25 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -203,7 +203,6 @@ impl Mutex {
203203
this_thread: &'a ThreadNode,
204204
owner_thread: &ThreadNode,
205205
) -> (bool, SpinLockGuard<'a, WaitQueue>) {
206-
let this_priority = this_thread.priority();
207206
// We walk along the blocking chain to scan no more than
208207
// CHAIN_LENGTH_LIMIT threads.
209208
let mut current_len = 0;
@@ -224,17 +223,27 @@ impl Mutex {
224223
let mut other_lock = None;
225224
while current_len < CHAIN_LENGTH_LIMIT {
226225
current_len += 1;
226+
// Acquire lock on this_thread first, then scanning_thread.
227+
// This prevents race conditions in SMP environments.
228+
let this_guard = this_thread.lock();
229+
let mut scanning_guard = scanning_thread.lock();
230+
let scanning_prio = scanning_guard.priority();
231+
232+
// Re-read current_priority after acquiring scanning_thread's lock
233+
// to get the most up-to-date value. Another thread might have
234+
// already promoted this_thread's priority.
235+
let current_priority = this_thread.priority();
236+
227237
// We are holding the mutex's spinlock and the scanning thread is
228238
// its owner. It's safe to promote scanning thread's priority.
229-
let ok = scanning_thread.lock().promote_priority_to(this_priority);
230-
#[cfg(debugging_scheduler)]
231-
crate::trace!(
232-
"TH:0x{:x} is promoting TH:0x{:x} to {}, succ? {}",
233-
Thread::id(this_thread),
234-
Thread::id(&scanning_thread),
235-
this_priority,
236-
ok,
237-
);
239+
// Skip if the target thread already has a higher priority (lower number)
240+
// than what we're trying to promote to. This handles the case where
241+
// another thread in the PI chain has already promoted the target.
242+
if current_priority < scanning_guard.priority() {
243+
scanning_guard.promote_priority_to(current_priority);
244+
}
245+
drop(scanning_guard);
246+
drop(this_guard);
238247
// FIXME: Adjust priority in RQ for the scanning_thread.
239248
other_lock = None;
240249
other_mutex = scanning_thread.pending_on_mutex();
@@ -249,14 +258,15 @@ impl Mutex {
249258
break;
250259
};
251260
// Adjust the position of the scanning thread in the WaitQueue.
252-
if !Self::adjust_wait_queue_position_by(
253-
this_priority,
261+
// Note: Even if adjust fails (e.g., thread already acquired the mutex),
262+
// we should continue to traverse the lock chain because the thread
263+
// may still need priority inheritance from other waiting threads.
264+
let _ = Self::adjust_wait_queue_position_by(
265+
current_priority,
254266
&scanning_thread,
255267
&other_mutex_ref.clone(),
256268
other_lock_ref,
257-
) {
258-
break;
259-
}
269+
);
260270
let Some(other_thread) = other_mutex_ref.owner() else {
261271
break;
262272
};

0 commit comments

Comments
 (0)