spotupnp: fix teardown deadlock when removing a renderer#65
Open
007hacky007 wants to merge 1 commit into
Open
Conversation
UpdateThread holds Device->Mutex across spotDeletePlayer(). ~CSpotPlayer() then waits on runningMutex for CSpotPlayer::runTask() to exit, but the task's teardown path drives shadowRequest() which itself needs Device->Mutex. Classic AB/BA deadlock: the renderer disappears from Spotify, mDNS is unregistered, and the process stays stuck (with CPU pegged by the cspot TrackPlayer data-feed loop against a writePCM that now returns 0) until restart. Drop Device->Mutex before spotDeletePlayer() in every site that currently holds it across the call: * presence-timeout removal in UpdateThread (spotupnp.c) * SSDP BYE_BYE removal (spotupnp.c) * Sonos master-to-slave transition (spotupnp.c) * process-shutdown sweep in FlushMRDevices (mr_util.c) Device->SpotPlayer is cleared to NULL before unlocking so any concurrent consumer sees a consistent sentinel instead of a dying object. getMetaForUrl in spotify.cpp gains a null guard so that sentinel is safe for its one caller (spotGetMetaForUrl via ActionHandler) which, unlike every other path, does not go through the already-null-safe notify(). DelMRDevice()'s contract is unchanged: it still expects Device->Mutex held on entry and returns it unlocked. The reacquire before DelMRDevice() preserves that contract.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hit this on my setup: one of my UPnP renderers vanished from Spotify Connect and
spotupnpsat at 100% CPU until I restarted it. Traced it to a lock-ordering bug betweenUpdateThreadand theCSpotPlayertask.What happens
UpdateThreadholdsDevice->MutexacrossspotDeletePlayer(). For example the presence-timeout path atspotupnp.c:811:~CSpotPlayer()setsisRunning = false, unregisters mDNS (so the renderer disappears from Spotify Connect), then waits onrunningMutexforrunTask()to finish. ButrunTask()'s teardown drivesshadowRequest(), which also wantsDevice->Mutex. That lock is still held byUpdateThread, which is itself waiting on the destructor. Classic AB/BA deadlock.The 100% CPU falls out of the same state: once
isRunning=false,CSpotPlayer::writePCM()returns 0 for every write, and cspot'sTrackPlayerdata-feed loop retries without backoff. That thread pegs a core for as long as the destructor is parked, i.e. until restart.The logs line up with this: you see
player <X> deletion pendingbut never the subsequentdisconnecting player <X>-runTask()can't exit.Fix
Drop
Device->MutexbeforespotDeletePlayer()at every site that currently holds it across the call:spotupnp.c: presence-timeout removal, BYE_BYE removal, Sonos master-to-slave transition.mr_util.c:FlushMRDevices()on shutdown.Pattern:
Device->SpotPlayeris nulled out before unlocking so any concurrent consumer onDevice->Mutexsees a stable "no player" sentinel instead of a pointer to a dying object. Most of those consumers already null-check vianotify(). The exception isspotGetMetaForUrl()(called fromActionHandleratspotupnp.c:560), so add anif (!self) return false;at the top ofgetMetaForUrl()inspotify.cpp.DelMRDevice()'s contract is unchanged: still expectsDevice->Mutexheld on entry, still returns it unlocked. The reacquire beforeDelMRDevice()preserves that.Notes
After the fix, the renderer comes back on its own via the next UPnP search cycle (about 30 s), no restart required.
This doesn't prevent the presence-timeout path from firing in the first place - if
LastSeenis only refreshed from SSDP and SSDP is late or filtered, a healthy renderer can still be flagged as gone and then re-added a cycle later. A separate patch refreshingLastSeenfrom UPnP events and successful control responses handles that and is complementary to this one.