Move fence synchronisation into wgpu-hal by Vecvec · Pull Request #9475 · gfx-rs/wgpu

Vecvec · 2026-04-30T08:33:37Z

Connections
Fixes #9470

Description
Previously, queue submit wrote to the fence in wgpu-core. However, device.poll read from the same fence, meaning that queue.submit could not execute while device.poll was running. This could be a problem in a multi-threaded environment with long execution times of command buffer. This PR moves the locks into wgpu-hal (where needed) so that they can be dropped when waiting. Also fixes a deadlock exposed because the fence write

One thing that I don't like is the requirement for Arc<RwLock<Fence>>s inside a RwLock<Vec<...>>s for vulkan and openGL, but I could not find any way to remove this. The inner RwLocks are to stop the backend from resetting fences that are being waited on (the wait here should be minimal because the fences are already determined to be signalled) and the Arcs are to allow them to be kept while outside of the main lock (such as in wait).

The lock on the fence in core was not removed. It seems that present uses it to exclude other queue operations. @inner-daemons, is the only operation this needs to exclude queue.submit? I'm happy to complete that in this PR, or wait it until another PR.

Testing
Existing tests run the same as previously. Solves the issue.

Squash or Rebase?

Squash

Checklist

I self-reviewed and fully understand this PR.
WebGPU implementations built with wgpu may be affected behaviorally. (At least, in terms of speed)
Validation and feature gates are in place to confine behavioral changes.
Tests demonstrate the validation and altered logic works.
CHANGELOG.md entries for the user-facing effects of this change are present.
The PR is minimal, and doesn't make sense to land as multiple PRs. The deadlock is so common that I think the PR would be broken otherwise (I usually deadlocked within 10 frames [one submit per frame, the other thread only submitted 20 times per second])
Commits are logically scoped and individually reviewable.
The PR description has enough context to understand the motivation and solution implemented.

Vecvec · 2026-04-30T08:52:18Z

After rebasing, the code that was causing the deadlock appears to have been refactored, so ignore the comments about a deadlock.

Edit: was #9307

andyleiserson · 2026-05-01T19:22:49Z

        // Lock ordering requires that the fence lock be acquired after the snatch lock and
        // before the command index lock.
-        let fence = self.device.fence.write();
+        let fence = self.device.fence.read();


I don't particularly like that things are set up this way (or at least, I think it should be documented more clearly what is going on), but I believe that the fence RwLock is being used not just to protect &mut self methods on the fence, but also to ensure mutual exclusion between submit and other things that don't want a submit to happen concurrently. Which will no longer be the case if submit only acquires a read lock.

I also don't particularly like that there are separate locks for the fence and command indices (I feel like the Fence could also have responsibility for giving out command indices), nor do I like that the protection against concurrent submits (see https://github.com/gfx-rs/wgpu/pull/9307/changes#diff-150156a37cf3627465ceb22096ed995ee26ae640007c421e726134bafd499dbeR1679-R1683) is more pessimistic than necessary (the validate_command_buffers processing probably could be done concurrently). But looking for solutions that don't bite off too much refactoring, one strategy might be to switch to using the command indices lock rather than the fence lock to provide mutual exclusion with concurrent submits (and document this, since it's non-obvious). If we do that, then I think we could get rid of the fence lock in wgpu_core entirely and rely on the locking in hal.

I believe that the fence RwLock is being used not just to protect &mut self methods on the fence, but also to ensure mutual exclusion between submit and other things that don't want a submit to happen concurrently. Which will no longer be the case if submit only acquires a read lock.

I had assumed (at least for submission) that command indices was held for that purpose. The only thing which appeared to use fences for exclusion was present, which should still exclude due to using a write.

The ones I'm worried about are:

wgpu/wgpu-core/src/device/resource.rs

Lines 5223 to 5225 in fa37706

// Wait for all work to finish before configuring the surface.

let snatch_guard = self.snatchable_lock.read();

let fence = self.fence.read();

wgpu/wgpu-core/src/present.rs

Lines 168 to 176 in fa37706

let fence = device.fence.read();

let suf = self.raw(device.backend()).unwrap();

let (texture, status) = match unsafe {

suf.acquire_texture(

Some(core::time::Duration::from_millis(FRAME_TIMEOUT_MS as u64)),

fence.as_ref(),

)

} {

Do these require no queue work to be submitted? to me it seems that these shouldn't require queue locking -aquire uses it to wait for it's previous usage to be finished and configure talks about guarding against queue submissions:

wgpu/wgpu-core/src/device/resource.rs

Lines 5235 to 5236 in fa37706

// After the wait, the queue should be empty. It can only be non-empty

// if another thread is submitting at the same time.

I think that this probably requires an issue for going through and documenting or switching to use a RwLock<()> for exclusion.

I couldn't find an issue for this, so I opened #9538.

Vecvec · 2026-05-01T19:56:40Z

The other way for Vulkan & OpenGl that I can see would probably be having a Fence wrapper that destroyed (or reset on Vulkan) the fence when dropped. It couldn't be clone as I think it would have to be unsafe, but it would remove the need for the lock. However, for Vulkan at least, the free list would have to have an Arc<RwLock<...>> around it to allow the drop impl to add the item back.

Edit: If unsafe was fine the lock could be forgotten and then force unlocked. That wouldn't require the Arc at least.

inner-daemons · 2026-05-11T23:16:14Z

@Vecvec To my knowledge surface presentation and queue submission are the only exclusive operations on the queue.

andyleiserson · 2026-05-11T23:56:26Z

It seems like the simplest way forward may be to change the places that currently take the fence lock to acquire the command indices lock instead / in addition?

I also wonder if (once that is done) we can get rid of the fence lock entirely (i.e. isn't the fence fully internally synchronized in hal? Are there any &mut methods left?).

I do think this needs to be addressed in some way before we can merge this PR. Changing fence.write() to fence.read() in submit() is relaxing the exclusion properties that currently apply (and I think those properties are important).

Vecvec · 2026-05-12T00:19:40Z

I also wonder if (once that is done) we can get rid of the fence lock entirely (i.e. isn't the fence fully internally synchronized in hal? Are there any &mut methods left?).

Yes, that was the idea, I was just waiting for @inner-daemons to confirm that present is the only one that uses it.

I do think this needs to be addressed in some way before we can merge this PR. Changing fence.write() to fence.read() in submit() is relaxing the exclusion properties that currently apply (and I think those properties are important).

I think that it would probably make sense for this PR to do it, I mainly suggested the other option in case there was some major issue with removing the lock.

Vecvec · 2026-05-13T05:17:46Z

CI failure seems unrelated

Vecvec · 2026-05-13T22:10:08Z

Just realized that obtaining the command indices before queue maintain would mean lock validation would fail. I wonder if checking validity before checking the queue is empty would be better - no submissions would proceed as they would find that the device was no longer valid. The alternative would probably be keeping an optional queue around longer, which could be messy.

Vecvec · 2026-05-14T19:22:49Z

Just realized that obtaining the command indices before queue maintain would mean lock validation would fail.

I moved the validity check further up, so this should now be fine.

andyleiserson · 2026-05-14T19:25:45Z

I think moving the validity check earlier is the right strategy, but I'm still working on convincing myself of all the details.

Separately, I merged this branch with my lock validation branch, and other than the issue you noted with the command indices lock, the lock validation is still happy.

Vecvec · 2026-05-14T19:33:15Z

Separately, I merged this branch with my lock validation branch, and other than the issue you noted with the command indices lock, the lock validation is still happy.

I ran current lock validation (with a tweak) and that's how I noticed the command indices issues. Glad there's nothing else that I missed.

I'm still working on convincing myself of all the details.

I wrote a comment about it in the code to convince myself.

andyleiserson · 2026-05-15T21:03:08Z

        // "lose the device".
        let mut should_release_gpu_resource = false;
-        if !self.is_valid() && queue_empty {
+        if !device_valid && queue_empty {


I think there is a problem with queue_empty now that maintain and submit don't hold the fence lock across the bulk of their operations. Previously, queue_empty was valid until the fence lock was released. Now, queue_empty is only valid until Queue::maintain unlocks the life tracker, and there is a window at the end of submit_pending_submission when it has released the locks but not yet added the work to the life tracker.

What do you think of changing the release_gpu_resources block to look like this:

if !self.is_valid() { let exclusive_snatch_guard = self.snatchable_lock.write(); // Once the device is invalid and we hold the exclusive snatch lock, // `last_successful_submission_index` cannot change. (Before // acquiring the snatch lock, there could be a submit in progress // that started before the device was destroyed.) let last_successful_submission_index = self .last_successful_submission_index .load(Ordering::Acquire); if current_finished_submission >= last_successful_submission_index {

(We could also take the command indices lock instead of the snatch lock. The reason I propose the snatch lock is that it is needed anyways in release_gpu_resources; as part of my lock ordering fixes, I change the code to acquire it once and pass it in to release_gpu_resources. If we did take the command indices lock, then the store of last_successful_submission_index in submit_pending_submission would need to move before drop(command_index_guard).)

And probably also read last_successful_submission_index earlier in this function, when deciding whether to return PollStatus::QueueEmpty.

I'd missed that the tracking happens after command indices was dropped. However, I'm not sure that version will work. destroy gets the queue, which (according to the comment when checking the queue is empty) could deadlock if the snatch lock is held over its drop.

Could someone explain why the queue is required to be empty before releasing these resources?release_gpu_resources calls the resources' respective destroy methods, which as far as I can tell already check for pending submissions and schedule a later resource destruction if so.

Vecvec added 4 commits April 30, 2026 20:39

Switch vulkan fence pools to rwlocks

37d2803

Switch to internal synchronization on GL

7ca8c54

use rwlocks in metal

a646f00

Don't require &mut in hal

865af59

Vecvec force-pushed the unlocked-fences branch from b1d0ed1 to eb9cc5b Compare April 30, 2026 08:50

Change core to only read lock fence

814b919

Vecvec force-pushed the unlocked-fences branch from eb9cc5b to 814b919 Compare April 30, 2026 09:00

andyleiserson reviewed May 1, 2026

View reviewed changes

Vecvec added 3 commits May 6, 2026 07:32

Merge branch 'trunk' into unlocked-fences

f1c0426

Switch to using just an Arc in gl and vk

3396794

Typos

7fa1971

inner-daemons self-requested a review May 6, 2026 15:11

inner-daemons assigned inner-daemons and andyleiserson May 6, 2026

Vecvec added 3 commits May 7, 2026 10:52

Merge remote-tracking branch 'trunk/trunk' into unlocked-fences

76508e0

Fix merge

6bcdd13

Format

34531a5

Vecvec mentioned this pull request May 11, 2026

Document wgpu-hal's synchronisation requirements in wgpu-core #9538

Open

Merge branch 'trunk' into unlocked-fences

56cbfbc

Vecvec added 4 commits May 12, 2026 13:11

Merge branch 'trunk' into unlocked-fences

6c6db49

Remove fence lock from wgpu-core

ea15c67

Format

4fc75ae

Fix docs

b285ec2

andyleiserson approved these changes May 12, 2026

View reviewed changes

Comment thread wgpu-core/src/as_hal.rs Outdated

Comment thread wgpu-core/src/device/resource.rs

Comment thread wgpu-core/src/device/resource.rs

Switch to SimpleResourceGuard

59eca38

Vecvec added 3 commits May 13, 2026 13:20

Merge branch 'trunk' into unlocked-fences

ba18951

Make sure Queue::maintain's queue empty is never stale.

4314cf0

Format

20732b3

Vecvec requested a review from andyleiserson May 13, 2026 05:44

Add previous leaf from fence lock into previous lock

7c9cf44

Vecvec added 2 commits May 15, 2026 07:08

Move device validity check to while command indices is held

0051d12

Format

c03d2dc

Typo

8f037b8

Merge branch 'trunk' into unlocked-fences

a10b7f9

andyleiserson reviewed May 15, 2026

View reviewed changes

inner-daemons removed their request for review May 15, 2026 22:52

	// Wait for all work to finish before configuring the surface.
	let snatch_guard = self.snatchable_lock.read();
	let fence = self.fence.read();

	let fence = device.fence.read();

	let suf = self.raw(device.backend()).unwrap();
	let (texture, status) = match unsafe {
	suf.acquire_texture(
	Some(core::time::Duration::from_millis(FRAME_TIMEOUT_MS as u64)),
	fence.as_ref(),
	)
	} {

	// After the wait, the queue should be empty. It can only be non-empty
	// if another thread is submitting at the same time.

Conversation

Vecvec commented Apr 30, 2026

Uh oh!

Vecvec commented Apr 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

andyleiserson May 1, 2026

Choose a reason for hiding this comment

Uh oh!

Vecvec May 1, 2026

Choose a reason for hiding this comment

Uh oh!

andyleiserson May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Vecvec May 5, 2026

Choose a reason for hiding this comment

Uh oh!

Vecvec May 11, 2026

Choose a reason for hiding this comment

Uh oh!

Vecvec commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

inner-daemons commented May 11, 2026

Uh oh!

andyleiserson commented May 11, 2026

Uh oh!

Vecvec commented May 12, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Vecvec commented May 13, 2026

Uh oh!

Vecvec commented May 13, 2026

Uh oh!

Vecvec commented May 14, 2026

Uh oh!

andyleiserson commented May 14, 2026

Uh oh!

Vecvec commented May 14, 2026

Uh oh!

andyleiserson May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Vecvec May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Vecvec commented Apr 30, 2026 •

edited

Loading

Vecvec commented May 1, 2026 •

edited

Loading