SCP: Acqire mutexes for both cond signals and waits#516
SCP: Acqire mutexes for both cond signals and waits#516bscottm wants to merge 1 commit intoopen-simh:masterfrom
Conversation
Acquire enclosing mutex used with pthread_cond_wait() when calling pthread_cond_signal() or pthread_cond_broadcast(). This eliminates the possibility that pthread_cond_wait() misses the corresponding pthread_- cond_signal(). See https://stackoverflow.com/questions/4544234/ for a good diagram that explains the thread interleaving. Also look for premature mutex unlocks. As the Linux pthread_cond_- signal() man page states, "... if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal()." ETH: - Add startup condvar and mutex, have _eth_reader and _eth_writer signal successful thread startup (as other SIMH threads do.) - Set thread names for reader and writer threads.
|
Here we are again, with another PR that changes code which absolutely isn't demonstrably broken. If this was indeed broken, real failures would have arisen in the more than 20 years it has existed this way. Each time, you come along and propose changes to things that are not broken during any actual case of simulator operation, I would absolutely support such changes if you could actually demonstrate an operational failure in any running simulator situation, or if the events that actually happen are so rare in normal operation, but the theory still might be useful, you can readily implement a unit test using the existing simh TESTLIB framework that creates the conditions that demonstrate the failure and that with proposed changes avoids such a failure. You've previously added one such TESTLIB test, so the concept of unit testing isn't beyond you. Otherwise, you're making widespread changes merely for the sake of change since if you were writing things from scratch, that is how you would do it. Please show actual failure details that are then fixed and we can all get excited. Meanwhile, your commit (and PR) comments mention changes to sim_ether (ETH), but the actual changes you're providing sneak in changes to 8 additional (and unrelated) source modules. |
|
@markpizz Wouldn't have submitted a patch to correctly use Maybe fix your shit before shitting on others. |
|
Before you "fix" my shit, once again, it would be useful for everyone if you actually demonstrate that an actual bug exists AND the fact that such a bug is fixed by your efforts. |
|
I wonder if async I/O gets any significant use. Could you update this to reflect those points? |
|
@bscottm I'm inclined to commit this on ZIMH. How did you test this? Testing such things is notoriously hard and I'd like to be able to prove that we've got the correct code in place. If you don't have tests, don't worry, I'll try to create some, but it may require creativity. |
|
@pmetzger: This is applying proper condition signaling hygiene. There aren't any tests and you're going to have to be very creative on how to create the Goldilocks window where a condition signals before the corresponding waiting thread enters Mark may bitch, piss (Pizz?) and moan that "There's no demonstrable bug!": it's subtle, hard to reproduce and can lead to the waiting thread blocking forever for a signal that never arrives. |
|
I'm inclined to commit it now without tests. On testing later, I've been doing research on frameworks to simulate pthreads scheduling and calls in order to smoke out problems, and I don't think it's as hopeless as you think. I think in the future, ZIMH be able to exercise this stuff with reasonable assurance we have exercised all paths in a deterministic test. Right now your patch is in the queue behind a giant number of bugs that I found with existing UB runtime checks but I promise to get to it soon. |
|
Scott, the article you cited is incomplete. There is indeed a timing issue if the signal is accompanied by setting some state variable that the waiter is waiting for. But there isn't for "bare" signal calls, so wrapping those is clearly not a functional issue and per the pthreads spec is not needed. |
|
Paul: It's the top answer to the question (with 175 upvotes.) Quoting: "If you do not lock the mutex in the codepath that changes the condition and signals, you can lose wakeups." |
Yes, I read that. But the point is that it talks about a thread waiting for a state variable to change, as in: To make that work, setting state to ready must be done under the mutex; if not you have the missed wakeup if the setting of the state and the signal occur between the checking of the state and the wait. Note also that the examples don't "change the condition" -- rather, they change a state variable separate from the condition/mutex manipulated by the pthreads primitives. That separate variable is what creates the window if locking is not consistently applied. |
|
@pkoning2: IIRC, that pattern exists in the |
Acquire enclosing mutex used with pthread_cond_wait() when calling pthread_cond_signal() or pthread_cond_broadcast(). This eliminates the possibility that pthread_cond_wait() misses the corresponding pthread_- cond_signal(). See https://stackoverflow.com/questions/4544234/ for a good diagram that explains the thread interleaving.
Also look for premature mutex unlocks. As the Linux pthread_cond_- signal() man page states, "... if predictable scheduling behavior is required, then that mutex shall be locked by the thread calling pthread_cond_broadcast() or pthread_cond_signal()."
ETH:
Add startup condvar and mutex, have _eth_reader and _eth_writer signal successful thread startup (as other SIMH threads do.)
Set thread names for reader and writer threads.