Skip to content

implements #8119 : Metal backend's wgpu_hal::Device::wait implementat…#9328

Open
39ali wants to merge 3 commits into
gfx-rs:trunkfrom
39ali:metal-device-wait
Open

implements #8119 : Metal backend's wgpu_hal::Device::wait implementat…#9328
39ali wants to merge 3 commits into
gfx-rs:trunkfrom
39ali:metal-device-wait

Conversation

@39ali
Copy link
Copy Markdown
Contributor

@39ali 39ali commented Mar 29, 2026

…ion polls instead of waiting

Connections
#8119

Description
this uses a CondVar and lets the thread sleep instead of polling every 1ms

Testing
ran poll tests

Squash or Rebase?
either

Checklist

  • Run cargo fmt.
  • Run taplo format.
  • Run cargo clippy --tests. If applicable, add:
    • --target wasm32-unknown-unknown
  • Run cargo xtask test to run tests.
  • If this contains user-facing changes, add a CHANGELOG.md entry.

@39ali
Copy link
Copy Markdown
Contributor Author

39ali commented Mar 29, 2026

If timeout is provided, the function will block indefinitely or until , is this a typo ?, should be If timeout is NOT provided

@39ali
Copy link
Copy Markdown
Contributor Author

39ali commented Apr 1, 2026

@jimblandy

@39ali

This comment was marked as resolved.

@inner-daemons inner-daemons self-requested a review April 11, 2026 03:21
@inner-daemons inner-daemons self-assigned this Apr 11, 2026
@inner-daemons inner-daemons requested review from inner-daemons and removed request for inner-daemons April 13, 2026 18:37
Copy link
Copy Markdown
Collaborator

@inner-daemons inner-daemons left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. 2 questions. Also going to CC @cwfitzgerald because this will almost certainly have to be reassigned

Comment thread wgpu-hal/src/metal/mod.rs Outdated
// to create command buffers for internal purposes. In those cases we always
// commit the buffer immediately, so we don't adjust the counter for them.)
command_buffer_created_not_submitted: atomic::AtomicUsize,
sync: Arc<(Mutex<()>, Condvar)>,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm skeptical about why this is Mutex<()> instead of just keeping track of the last completed submission index directly like Mutex<u64>.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is also pub type AtomicFenceValue = core::sync::atomic::AtomicU64

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andyleiserson I get that but I feel like that could be replaced at this point, combined into the mutex

Comment thread wgpu-hal/src/metal/adapter.rs Outdated
shared: Arc::new(QueueShared {
raw: queue,
command_buffer_created_not_submitted: atomic::AtomicUsize::new(0),
sync: Arc::clone(&self.shared.queue_sync),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the sync really be shared between devices in the same adapter?

@jimblandy jimblandy requested review from cwfitzgerald and removed request for cwfitzgerald April 22, 2026 15:18
@jimblandy jimblandy assigned cwfitzgerald and unassigned jimblandy Apr 22, 2026
@39ali
Copy link
Copy Markdown
Contributor Author

39ali commented May 2, 2026

made couple of changes that should address both comments :

  • Value inside the mutex (correctness / clarity)
    Previously, completed_value was an AtomicU64 that lived outside the mutex, kinda an anti-pattern, now Mutex<FenceValue> directly guards the value, which is the canonical condvar usage.

  • Sync is now per-fence (reduced spurious wakeups)
    Previously sync lived on AdapterShared, shared across every fence submitted to any device from that adapter. When any command buffer completed, it called notify_all() on that shared condvar, waking every thread waiting on any fence from that adapter, even completely unrelated ones. They'd all re-check their predicate and go back to sleep. Now each Fence owns its own condvar, so a completion only wakes threads actually waiting on that fence.

Copy link
Copy Markdown
Contributor

@andyleiserson andyleiserson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It pains me a little to replace an Atomic with a Mutex, but this is the right way of using the Condvar, and I don't think it's worth speculatively adding a shadow copy in an atomic.

if start.elapsed() >= timeout {
return Ok(false);
if let Some(deadline) =
timeout.and_then(|timeout| std::time::Instant::now().checked_add(timeout))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to unwrap here on overflow. Or maybe use wait_while_for so we don't have to do timeout accounting ourselves (unless our build configuration doesn't support wait_for?).

Comment thread wgpu-hal/src/metal/mod.rs
Comment on lines 1042 to 1044
if cmd_buf.status() == MTLCommandBufferStatus::Completed {
max_value = value;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noted in #9532 (comment) that this check isn't correct -- it should be looking for MTLCommandBufferStatus::Error as well. Although, I'm also not sure why it's necessary to scan for additional completed command buffers here, maybe we could remove this entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants