Skip to content

Lock-free and alloc-free implementation#9

Closed
fereidani wants to merge 1 commit into
zesterer:masterfrom
fereidani:master
Closed

Lock-free and alloc-free implementation#9
fereidani wants to merge 1 commit into
zesterer:masterfrom
fereidani:master

Conversation

@fereidani
Copy link
Copy Markdown
Contributor

Hi, Thank you for writing pollster, I had some free time and rewrote the project with a lock-free and alloc-free algorithm.

Before merging please beware, I wrote this implementation quickly and it needs more review, also it's passing the test suite, I'm worried if the current test suite covers enough sections and branches of the codebase.
In my algorithm, I added 1024 cycles of spinning before parking the thread and it is similar to what mutex implementations are doing, it can be removed or reduced according to the project goals. but I recommend spinning for a short time before parking the thread to avoid short park unparks.

@zesterer
Copy link
Copy Markdown
Owner

zesterer commented Dec 5, 2022

Thanks! The implementation looks broadly sensible. Out of interest, did you have any benchmarks or cases where you found a substantial saving from this approach? If so, it would be nice to have one in-tree. Additionally, you might be interested in the rather horrifying stuff I've done in #8, which are mostly tangential to your changes but also try to improve theoretical performance.

@fereidani
Copy link
Copy Markdown
Contributor Author

My pleasure!
No, I didn't benchmark it, But theoretically, it should be faster.
I checked the #8 I agree it seems similar with less unsafe code, but I think it should perform the same or better than #8.
Another advantage is the thread_local signal, which eliminates the need for alloc of Arc::new on each block_on call, I didn't know at first, but it is lazy inited, so it will not call Arc::new until the thread calls block_on.

@zesterer
Copy link
Copy Markdown
Owner

zesterer commented Dec 8, 2022

It would be nice to have benchmarks to justify the change, although I think it does hold up on its own. I'll try to find time to create one over the next week.

@fereidani
Copy link
Copy Markdown
Contributor Author

Any update on this PR?

@zesterer
Copy link
Copy Markdown
Owner

Sorry, I've found myself pressed for time recently. Moved home and a lot of other activities on other software I maintain has annoyingly pushed reviewing this to the back of the queue. I want to get time this week. Resolving the conflicts would definitely help speed up the process though.

@fereidani
Copy link
Copy Markdown
Contributor Author

No problem, I understand.
the conflict is resolved, also I optimized some atomic operations in my review to boost the performance a little more.

Copy link
Copy Markdown
Owner

@zesterer zesterer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few points about the implementation. Broadly this looks fine.

Comment thread src/lib.rs Outdated
Comment on lines +54 to +55
// SAFETY: SignalState is repr by u8 and will not be converted back by anything but itself
unsafe { std::mem::transmute(v) }
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not safe: the unsafe isn't being propagated to the caller, which is where the invariant needs to be upheld. I'd rather this were an unsafe method like unsafe fn SignalState::from_u8_unchecked(u8) -> SignalState.

Comment thread src/lib.rs Outdated
Err(state) => {
match state.into() {
SignalState::Empty => {
// Spin for a short time before puting the thread in park state. Similiar to what mutex does.
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

puting -> putting

Comment thread src/lib.rs Outdated
match state.into() {
SignalState::Empty => {
// Spin for a short time before puting the thread in park state. Similiar to what mutex does.
for i in 0..16 {
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a backoff crate for this, which we should probably use, perhaps behind a feature flag instead of implementing it ourselves.

@fereidani
Copy link
Copy Markdown
Contributor Author

@zesterer I updated the PR with a new implementation that is inspired by #14 .

Comment thread src/lib.rs
Comment on lines +11 to +14
thread_local! {
// A local reusable signal for each thread.
static LOCAL_THREAD_SIGNAL: Arc<Signal> = Arc::new(Signal::new());
}
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little sceptical about this thread local still. On some platforms, accessing thread local data can be rather slow indeed, meaning that this might not necessarily be a net performance win over simply creating a new signal.

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to get the time to write some benchmarks this weekend.

@A248
Copy link
Copy Markdown
Contributor

A248 commented Sep 29, 2023

For what it's worth, the docs for Wake detail an implementation based on park/unpark. It is more or less the same as this PR, minus the thread-local. Notably, it mentions:

This example trades correctness for simplicity. In order to prevent
deadlocks, production-grade implementations will also need to handle
intermediate calls to thread::unpark as well as nested invocations

I admit I don't have a good enough grasp of Rust futures, but this seems to suggest that a park/unpark implementation does not support nested blocking. The current pollster::block_on implementation, in contrast, does handle nesting, because it is based on stack-local rather than thread-local state.

Documentation for context:

/// A basic `block_on` function that takes a future and runs it to completion on
/// the current thread.
///
/// **Note:** This example trades correctness for simplicity. In order to prevent
/// deadlocks, production-grade implementations will also need to handle
/// intermediate calls to `thread::unpark` as well as nested invocations.
///
/// ```rust
/// use std::future::Future;
/// use std::sync::Arc;
/// use std::task::{Context, Poll, Wake};
/// use std::thread::{self, Thread};
/// use core::pin::pin;
///
/// /// A waker that wakes up the current thread when called.
/// struct ThreadWaker(Thread);
///
/// impl Wake for ThreadWaker {
///     fn wake(self: Arc<Self>) {
///         self.0.unpark();
///     }
/// }
///
/// /// Run a future to completion on the current thread.
/// fn block_on<T>(fut: impl Future<Output = T>) -> T {
///     // Pin the future so it can be polled.
///     let mut fut = pin!(fut);
///
///     // Create a new context to be passed to the future.
///     let t = thread::current();
///     let waker = Arc::new(ThreadWaker(t)).into();
///     let mut cx = Context::from_waker(&waker);
///
///     // Run the future to completion.
///     loop {
///         match fut.as_mut().poll(&mut cx) {
///             Poll::Ready(res) => return res,
///             Poll::Pending => thread::park(),
///         }
///     }
/// }
///
/// block_on(async {
///     println!("Hi from inside a future!");
/// });

@notgull
Copy link
Copy Markdown

notgull commented Sep 29, 2023

Yes, this is the reason why the parking crate exists. Just using thread::park isn't robust.

@zesterer
Copy link
Copy Markdown
Owner

Rest assured that I've not forgotten about this PR, I just have a relatively long backlog of personal and FOSS stuff to attend to before taking a proper look at this. It's sitting there in my notifications though, hopefully I'll get to it before Christmas.

@Anutrix
Copy link
Copy Markdown

Anutrix commented May 3, 2025

Friendly bump-up...

@zesterer
Copy link
Copy Markdown
Owner

I rebased this change. Performance of a rudimentary benchmark went from ~12.5 seconds to ~10.3 seconds. I've not really bothered to look into exactly what overhead is associated with other aspects of the benchmark, but it's a clear win!

@zesterer
Copy link
Copy Markdown
Owner

I've opened a new PR for the rebased changes: #28

@zesterer zesterer closed this May 25, 2025
@zesterer
Copy link
Copy Markdown
Owner

Incidentally, I've just opened a new PR that uses a nightly-only API to improve performance even further! #29

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants