Skip to content

Intermittent join/signal failures after moving from livekit-client-js 2.17.x to 2.18.1 against livekit-server 1.9.12 #1875

@dbastos

Description

@dbastos

Describe the bug

Our baseline was stable on livekit-client-js 2.17.x.

After moving to 2.18.1 (server still on 1.9.12), we started seeing errors like:

  • could not handle new participant
  • request canceled

We then tested server 1.10.1, but the problem did not clearly disappear in our environment. Rolling the client back to 2.17.x appeared to normalize behavior again on our side.

Because of that, I'm not sure whether this is:

  1. a regression/change in client behavior in 2.18.1,
  2. a compatibility problem with server 1.9.12,
  3. or a transport/proxy path issue that 2.18.1 is exposing more easily.

Observed behavior:

  • Intermittent participant join/restart failures
  • Some sessions appear to connect and then stall
  • Media freeze
  • We also observed DTLS/data-channel related warnings during the bad periods

Reproduction

SDK js 2.18.1 with v1.9.12.

Logs

Earlier participant join failure:

ERROR livekit routing/localrouter.go:111 could not handle new participant
{"room": "sporting-c_0407193619645", "participant": "552a637b-d178-471b-86a8-b4ce714b25ab", "connID": "CO_ANHZbD7fTVmz", "error": "request canceled"}

During later testing we also saw transport/proxy related errors:

caddy-1 | {"level":"error","logger":"layer4.handlers.proxy","msg":"upstream connection","local_address":"127.0.0.1:57986","remote_address":"127.0.0.1:7880","error":"local error: tls: bad record MAC"}

livekit-1 | WARN livekit.transport rtc/transport.go:924 error reading data channel
{"transport":"PUBLISHER","label":"_reliable","error":"dtls timeout: read/write timeout: context deadline exceeded"}

livekit-1 | ERROR livekit service/signal.go:186 could not handle new participant
{"room":"sporting-c_0407014153277","participant":"7f0e7c25-6097-42e4-86c3-901d6e6f4f72","connID":"CO_gCxdsGVcZZSc","error":"could not restart participant"}

System Info

Ubuntu 24.04;
Chrome latest version;
Our deployment is Linux + Docker Compose, as per the official documentation.

Severity

annoyance

Additional Information

We also saw warnings like could not get packet from bucket / received packet too old on VP8 video tracks during the same bad periods, even when packet loss in the stats was reported as 0, while jitter and PLI counts were high.

Questions

  • Is there any known behavior change in livekit-client-js 2.18.1 that could explain more fragile joins/restarts against livekit-server 1.9.12?
  • Is 2.18.1 expected to be fully safe with server 1.9.12, especially behind caddyl4?
  • Do the logs above suggest a known transport/signal issue rather than a pure client regression?

These errors start appearing a few seconds after the room is active. I don't know if it was a coincidence, but both rooms that had problems had more than 400 subscribers.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions