Reuse inactive publisher transceivers in RTCEngine to reduce memory growth on republish#1827
Reuse inactive publisher transceivers in RTCEngine to reduce memory growth on republish#1827moufmouf wants to merge 2 commits intolivekit:mainfrom
Conversation
…ansceiver churn `RTCEngine.createSender`/`createSimulcastSender` always created new publisher transceivers via addTransceiver(). During repeated unpublish/publish cycles, this could accumulate inactive transceivers and increase memory usage. Add a reuse path that: - looks for an inactive publisher transceiver of matching media kind - switches it back to sendonly - replaces its sender track instead of creating a new transceiver - applies to both primary sender creation and simulcast sender creation This reduces transceiver churn and keeps memory growth bounded in restart-heavy flows (e.g. screenshare stop/start loops).
🦋 Changeset detectedLatest commit: d3b394d The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
…gine to reduce memory growth on republish
|
Ping. Kindly looking for feedback as this removes a big memory leak in the browser. |
Because of the memory leak documented here (livekit/client-sdk-js#1827), we tried to use `track.pauseUpstream` instead of `this.localParticipant.unpublishTrack` to stop the screenshares. While this works memory-wise, it causes issues with the egress CompositeRoom. Instead, paused track are still displayed as "black boxes". We are reverting the memory leak fix to restore the correct behaviour or the egress. We will absolutely need to have the Livekit client-sdk-js PR merged in the future to fix this memory leak.
Because of the memory leak documented here (livekit/client-sdk-js#1827), we tried to use `track.pauseUpstream` instead of `this.localParticipant.unpublishTrack` to stop the screenshares. While this works memory-wise, it causes issues with the egress CompositeRoom. Instead, paused track are still displayed as "black boxes". We are reverting the memory leak fix to restore the correct behaviour or the egress. We will absolutely need to have the Livekit client-sdk-js PR merged in the future to fix this memory leak.
Because of the memory leak documented here (livekit/client-sdk-js#1827), we tried to use `track.pauseUpstream` instead of `this.localParticipant.unpublishTrack` to stop the screenshares. While this works memory-wise, it causes issues with the egress CompositeRoom. Instead, paused track are still displayed as "black boxes". We are reverting the memory leak fix to restore the correct behaviour or the egress. We will absolutely need to have the Livekit client-sdk-js PR merged in the future to fix this memory leak.
Because of the memory leak documented here (livekit/client-sdk-js#1827), we tried to use `track.pauseUpstream` instead of `this.localParticipant.unpublishTrack` to stop the screenshares. While this works memory-wise, it causes issues with the egress CompositeRoom. Instead, paused track are still displayed as "black boxes". We are reverting the memory leak fix to restore the correct behaviour or the egress. We will absolutely need to have the Livekit client-sdk-js PR merged in the future to fix this memory leak.
lukasIO
left a comment
There was a problem hiding this comment.
thanks for the PR and sorry for the delayed review!
We're reusing transceivers also on the subscriber side and with single peer connection mode (the default) those would also show up on the publisher peer connection.
In general fully on board with addressing any memory leaks. However reusing the transceiver as proposed here needs some careful consideration as - additionally to ensure only publisher side transceivers are reused as explained above - there might also still be stale frames from the previous publication in the pipeline if we're not flushing actively.
I also wasn't able to actually get this PR to work with the demo.ts sample (pnpm run dev in this repo). The second screen share never shows up.
| track.codec = opts.videoCodec; | ||
| } | ||
|
|
||
| const reusedSender = await this.reuseInactivePublisherSender(track.mediaStreamTrack, streams); |
There was a problem hiding this comment.
this doesn't appear to apply the encodings of the new track
|
Thanks for the feedback @lukasIO 🙏 |
Why?
I'm developing WorkAdventure, a virtual space platform and we use Livekit for big meetings.
The issue that triggered this PR is that when we start publishing a screen sharing, the memory goes up by about 400MB. We stop publishing the screen sharing with
localParticipant.unpublishTrack, the memory does not go down. We start publishing another screen sharing again (maybe another window), the memory goes up by another 400MB, etc...In short, we have a memory leak, where each started screen sharing is never releasing memory.
As a side note, using
track.pauseUpstreamto stop the track then replace the track with another track later seems to work better, but this workflow causes other issues (like the fact that egress CompositeRoom will display a black box for a paused video instead of not showing it at all)How to reproduce the issue
You can see the issue in https://meet.livekit.io/
After 3/4 screenshares, your tab takes more than 1GB RAM.
Finding the memory leak
I tried to track down where the memory leak was, and after a few trial and errors (and with the help of Codex), I ended up finding that the transceivers in the
RTCEngineseem to be piling up each time we start a new track. TheRTCRtpTransceiverseem to hold a reference to theMediaTrackthat is never freed and eating a lot of memory. I did not find any way to properly free theMediaTrackreference, but Codex (again) proposed me the patch below. Basically, it tries to reuse an inactive existing sender that shares the same track "kind".I tested this with WorkAdventure and the patch works. When I stop / start many screen sharing many times, the memory does not pile up. I also re-read the patch and validated it myself (to avoid any AI slop).
Note: I tried an alternative: freeing the memory as soon as
localParticipant.unpublishTrackis called (usingreplaceTrack(null)) instead of waiting for the next track to start to reuse the transceiver. Alas, I could not make this work (that would be ideal as the memory would be freed as soon as we finish publishing... here, we have to wait getting out of the Livekit room for the memory to be freed)What do you think of this patch? I can happily work on it if it needs any improvement.