Lock-free and alloc-free implementation#9
Conversation
|
Thanks! The implementation looks broadly sensible. Out of interest, did you have any benchmarks or cases where you found a substantial saving from this approach? If so, it would be nice to have one in-tree. Additionally, you might be interested in the rather horrifying stuff I've done in #8, which are mostly tangential to your changes but also try to improve theoretical performance. |
|
My pleasure! |
|
It would be nice to have benchmarks to justify the change, although I think it does hold up on its own. I'll try to find time to create one over the next week. |
|
Any update on this PR? |
|
Sorry, I've found myself pressed for time recently. Moved home and a lot of other activities on other software I maintain has annoyingly pushed reviewing this to the back of the queue. I want to get time this week. Resolving the conflicts would definitely help speed up the process though. |
|
No problem, I understand. |
zesterer
left a comment
There was a problem hiding this comment.
A few points about the implementation. Broadly this looks fine.
| // SAFETY: SignalState is repr by u8 and will not be converted back by anything but itself | ||
| unsafe { std::mem::transmute(v) } |
There was a problem hiding this comment.
That's not safe: the unsafe isn't being propagated to the caller, which is where the invariant needs to be upheld. I'd rather this were an unsafe method like unsafe fn SignalState::from_u8_unchecked(u8) -> SignalState.
| Err(state) => { | ||
| match state.into() { | ||
| SignalState::Empty => { | ||
| // Spin for a short time before puting the thread in park state. Similiar to what mutex does. |
| match state.into() { | ||
| SignalState::Empty => { | ||
| // Spin for a short time before puting the thread in park state. Similiar to what mutex does. | ||
| for i in 0..16 { |
There was a problem hiding this comment.
There is a backoff crate for this, which we should probably use, perhaps behind a feature flag instead of implementing it ourselves.
| thread_local! { | ||
| // A local reusable signal for each thread. | ||
| static LOCAL_THREAD_SIGNAL: Arc<Signal> = Arc::new(Signal::new()); | ||
| } |
There was a problem hiding this comment.
I'm a little sceptical about this thread local still. On some platforms, accessing thread local data can be rather slow indeed, meaning that this might not necessarily be a net performance win over simply creating a new signal.
There was a problem hiding this comment.
I'll try to get the time to write some benchmarks this weekend.
|
For what it's worth, the docs for
I admit I don't have a good enough grasp of Rust futures, but this seems to suggest that a park/unpark implementation does not support nested blocking. The current Documentation for context: /// A basic `block_on` function that takes a future and runs it to completion on
/// the current thread.
///
/// **Note:** This example trades correctness for simplicity. In order to prevent
/// deadlocks, production-grade implementations will also need to handle
/// intermediate calls to `thread::unpark` as well as nested invocations.
///
/// ```rust
/// use std::future::Future;
/// use std::sync::Arc;
/// use std::task::{Context, Poll, Wake};
/// use std::thread::{self, Thread};
/// use core::pin::pin;
///
/// /// A waker that wakes up the current thread when called.
/// struct ThreadWaker(Thread);
///
/// impl Wake for ThreadWaker {
/// fn wake(self: Arc<Self>) {
/// self.0.unpark();
/// }
/// }
///
/// /// Run a future to completion on the current thread.
/// fn block_on<T>(fut: impl Future<Output = T>) -> T {
/// // Pin the future so it can be polled.
/// let mut fut = pin!(fut);
///
/// // Create a new context to be passed to the future.
/// let t = thread::current();
/// let waker = Arc::new(ThreadWaker(t)).into();
/// let mut cx = Context::from_waker(&waker);
///
/// // Run the future to completion.
/// loop {
/// match fut.as_mut().poll(&mut cx) {
/// Poll::Ready(res) => return res,
/// Poll::Pending => thread::park(),
/// }
/// }
/// }
///
/// block_on(async {
/// println!("Hi from inside a future!");
/// });
|
|
Yes, this is the reason why the |
|
Rest assured that I've not forgotten about this PR, I just have a relatively long backlog of personal and FOSS stuff to attend to before taking a proper look at this. It's sitting there in my notifications though, hopefully I'll get to it before Christmas. |
|
Friendly bump-up... |
|
I rebased this change. Performance of a rudimentary benchmark went from ~12.5 seconds to ~10.3 seconds. I've not really bothered to look into exactly what overhead is associated with other aspects of the benchmark, but it's a clear win! |
|
I've opened a new PR for the rebased changes: #28 |
|
Incidentally, I've just opened a new PR that uses a nightly-only API to improve performance even further! #29 |
Hi, Thank you for writing pollster, I had some free time and rewrote the project with a lock-free and alloc-free algorithm.
Before merging please beware, I wrote this implementation quickly and it needs more review, also it's passing the test suite, I'm worried if the current test suite covers enough sections and branches of the codebase.
In my algorithm, I added 1024 cycles of spinning before parking the thread and it is similar to what mutex implementations are doing, it can be removed or reduced according to the project goals. but I recommend spinning for a short time before parking the thread to avoid short park unparks.