Skip to content

harden reconnect behaviour#1148

Merged
lukasIO merged 21 commits into
mainfrom
lukas/reconnect
Jun 24, 2026
Merged

harden reconnect behaviour#1148
lukasIO merged 21 commits into
mainfrom
lukas/reconnect

Conversation

@lukasIO

@lukasIO lukasIO commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Before you submit your PR

Make sure the following is true before submitting your PR:

  • I have read the contributing guidelines and validated that this PR will be accepted.
  • I have read and followed the principles regarding breaking changes, testing, and code quality.

PR description

Describe the changes in this PR. Explain what the PR is meant to solve and how to reproduce the issue in the first place.

Breaking changes

If this PR introduces breaking changes, list them here and document the rationale for introducing such a change.

MSRV

If the PR modifies the crate's MSRV (Minimum Supported Rust Version), document it here.

Testing

Ideally, unit test the code you add, but ensure you're not repeating existing test cases. Use as many already written scaffolding, utilities as possible; write your own, when needed. If external services, APIs, tokens are required (e.g., running an LK server instance), provide the necessary information. Make sure your tests perform useful, context-aware assertions and do not simply emulate "happy paths".

Async

We want the project to be runtime-agnostic, so please reuse what's already in livekit-runtime and feel free to add anything missing. It's ok to use Tokio directly, when writing unit tests, if necessary. When testing, do not use artificial delays for the state to "catch up"; instead, respect the event flow and subscribe properly using channels or other mechanisms.

@lukasIO lukasIO changed the title Lukas/reconnect harden reconnect behaviour Jun 9, 2026
@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

Changeset

The following package versions will be affected by this PR:

Package Bump
livekit patch
livekit-api patch
livekit-ffi patch
livekit-uniffi patch

Comment thread livekit/src/rtc_engine/mod.rs Outdated
Comment thread livekit/src/rtc_engine/rtc_session.rs
// A server-requested reconnect signals retry_now_notify to collapse
// this wait so the next attempt fires immediately.
let backoff = reconnect_backoff_delay(i);
tokio::select! {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: If the room is explicitly closed by the user, this task will not be properly cancelled. The addition of this select! brought my attention here, but this is a pre-existing issue.

You will find there is a close_notifier referenced on L805 with a comment mentioning it needs to be used as a signal to cancel this task—however, this was wired up. Solution is to wire this up and use it as a branch in your select! in order to be able to break out of the loop early when the room is closed.

@lukasIO lukasIO Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch!

addressed in e3320d3

@lukasIO lukasIO marked this pull request as ready for review June 19, 2026 11:43
@lukasIO lukasIO requested a review from ladvoc June 19, 2026 11:43
@github-actions github-actions Bot requested a review from 1egoman as a code owner June 19, 2026 11:44

@lukasIO lukasIO Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wanted to call this out specifically. It's "just" a spec generated by https://github.com/juxt/allium, but in case someone is opposed to adding this in, let me know

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern is whether any parts of the spec could be misleading to an agent reading it in the future. In those cases, relying on the code as the source of truth might help avoid confusion. That said, I don’t feel strongly about it.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's a fair concern. I think it's minimised as much as possible by having a special (allium specific) extension and allium itself regularly compares spec against code.

### Before you submit your PR

Make sure the following is true before submitting your PR:

- [ ] I have read the [contributing
guidelines](https://github.com/livekit/rust-sdks/blob/main/CONTRIBUTING.md)
and validated that this PR will be accepted.
- [ ] I have read and followed the principles regarding breaking
changes, testing, and code quality.

### PR description

Describe the changes in this PR. Explain what the PR is meant to solve
and how to reproduce the issue in the first place.

### Breaking changes

If this PR introduces breaking changes, list them here and document the
rationale for introducing such a change.

### MSRV

If the PR modifies the crate's MSRV (Minimum Supported Rust Version),
document it here.

### Testing

Ideally, unit test the code you add, but ensure you're not repeating
existing test cases. Use as many already written scaffolding, utilities
as possible; write your own, when needed. If external services, APIs,
tokens are required (e.g., running an LK server instance), provide the
necessary information. Make sure your tests perform useful,
context-aware assertions and do not simply emulate "happy paths".

### Async

We want the project to be runtime-agnostic, so please reuse what's
already in
[livekit-runtime](https://github.com/livekit/rust-sdks/blob/main/livekit-runtime/)
and feel free to add anything missing. It's ok to use Tokio directly,
when writing unit tests, if necessary. When testing, do not use
artificial delays for the state to "catch up"; instead, respect the
event flow and subscribe properly using channels or other mechanisms.
@xianshijing-lk xianshijing-lk requested a review from jhugman June 23, 2026 13:04

@stephen-derosa stephen-derosa left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any world where we'd want to apply this reconnet behavior to initial connection? Im thinking of an application facing the same sort of network instability that may cause a reconnect on initial connection.

@ladvoc ladvoc left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM ✅

### Before you submit your PR

Make sure the following is true before submitting your PR:

- [ ] I have read the [contributing
guidelines](https://github.com/livekit/rust-sdks/blob/main/CONTRIBUTING.md)
and validated that this PR will be accepted.
- [ ] I have read and followed the principles regarding breaking
changes, testing, and code quality.

### PR description

Describe the changes in this PR. Explain what the PR is meant to solve
and how to reproduce the issue in the first place.

### Breaking changes

If this PR introduces breaking changes, list them here and document the
rationale for introducing such a change.

### MSRV

If the PR modifies the crate's MSRV (Minimum Supported Rust Version),
document it here.

### Testing

Ideally, unit test the code you add, but ensure you're not repeating
existing test cases. Use as many already written scaffolding, utilities
as possible; write your own, when needed. If external services, APIs,
tokens are required (e.g., running an LK server instance), provide the
necessary information. Make sure your tests perform useful,
context-aware assertions and do not simply emulate "happy paths".

### Async

We want the project to be runtime-agnostic, so please reuse what's
already in
[livekit-runtime](https://github.com/livekit/rust-sdks/blob/main/livekit-runtime/)
and feel free to add anything missing. It's ok to use Tokio directly,
when writing unit tests, if necessary. When testing, do not use
artificial delays for the state to "catch up"; instead, respect the
event flow and subscribe properly using channels or other mechanisms.
@lukasIO

lukasIO commented Jun 24, 2026

Copy link
Copy Markdown
Contributor Author

is there any world where we'd want to apply this reconnet behavior to initial connection? Im thinking of an application facing the same sort of network instability that may cause a reconnect on initial connection.

we currently differentiate between initial connection and reconnection, but we still allow for re-trying on the initial connection (which would effectively be a "reconnect" just for a connection that had never been established in the first place), see

for i in 0..(max_retries + 1) {
match try_connect().await {

@lukasIO lukasIO merged commit 6f108fd into main Jun 24, 2026
23 checks passed
@lukasIO lukasIO deleted the lukas/reconnect branch June 24, 2026 07:51
@knope-bot knope-bot Bot mentioned this pull request Jun 24, 2026
ladvoc pushed a commit that referenced this pull request Jun 24, 2026
> [!IMPORTANT]
> Merging this pull request will create these releases

# livekit-ffi 0.12.67 (2026-06-24)
## Fixes

- Increase room event ready timeout
- harden reconnect behaviour - #1148 (@lukasIO)
# livekit-api 0.5.4 (2026-06-24)
## Fixes

- harden reconnect behaviour - #1148 (@lukasIO)
# livekit-uniffi 0.1.3 (2026-06-24)
## Fixes

- harden reconnect behaviour - #1148 (@lukasIO)
# livekit 0.7.49 (2026-06-24)
## Fixes

- harden reconnect behaviour - #1148 (@lukasIO)

Co-authored-by: knope-bot[bot] <152252888+knope-bot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants